All of lore.kernel.org
 help / color / mirror / Atom feed
From: sagi@grimberg.me (Sagi Grimberg)
Subject: [PATCH 0/3] Introduce fabrics controller loss timeout
Date: Tue, 28 Mar 2017 14:37:38 +0300	[thread overview]
Message-ID: <97faaf24-c95f-fa94-9ea7-7b91f3fc1291@grimberg.me> (raw)
In-Reply-To: <859829333.6134255.1490575296451.JavaMail.zimbra@redhat.com>


> Hello Sagi
> With these three patches, the reconnecting stopped after 60 times.

Progress..

> I restart another test that do fio testing on nvme0n1[1] on client before executing "nvmetclt clear" on target side.
> After that, I found another issue that the fio jobs cannot be stopped even I tried "Ctrl + C", and the device node also cannot be released[2].
> Here is the kernel log[3].

Thanks for the new test case ;)

> [3]
> [  356.812399] nvme nvme0: Reconnecting in 10 seconds...
> [  366.965161] nvme nvme0: Connect rejected: status 8 (invalid service ID).
> [  367.002048] nvme nvme0: rdma_resolve_addr wait failed (-104).
> [  367.029926] nvme nvme0: Failed reconnect attempt 21
> [  367.051905] nvme nvme0: Reconnecting in 10 seconds...
> [  371.444001] INFO: task kworker/u130:1:155 blocked for more than 120 seconds.
> [  371.480773]       Not tainted 4.11.0-rc3.ctrl_tmo+ #1
> [  371.505608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  371.540918] kworker/u130:1  D    0   155      2 0x00000000
> [  371.565584] Workqueue: writeback wb_workfn (flush-259:0)
> [  371.590031] Call Trace:
> [  371.600981]  __schedule+0x289/0x8f0
> [  371.616644]  schedule+0x36/0x80
> [  371.630693]  io_schedule+0x16/0x40
> [  371.645565]  blk_mq_get_tag+0x16c/0x280
> [  371.662929]  ? remove_wait_queue+0x60/0x60
> [  371.680942]  __blk_mq_alloc_request+0x1b/0xe0
> [  371.700508]  blk_mq_sched_get_request+0x1a0/0x240
> [  371.721616]  blk_mq_make_request+0x113/0x620
> [  371.741215]  generic_make_request+0x110/0x2c0
> [  371.760755]  submit_bio+0x75/0x150

Looks like we have I/O waiting for a tag, but the
controller teardown couldn't interrupt and fail it...

In this specific case, its a writeback, also udevd is
stuck in the same location below...

I'm thinking we might need something similar to Keith
nvme_start_freeze/nvme_wait_freeze/nvme_unfreeze calls
for fabrics too.. :/

      reply	other threads:[~2017-03-28 11:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-18 22:42 [PATCH 0/3] Introduce fabrics controller loss timeout Sagi Grimberg
2017-03-18 22:42 ` [PATCH 1/3] nvme-rdma: get rid of local reconnect_delay Sagi Grimberg
2017-03-27  9:50   ` Christoph Hellwig
2017-03-18 22:42 ` [PATCH 2/3] nvme-fabrics: Allow ctrl loss timeout configuration Sagi Grimberg
2017-03-27  9:50   ` Christoph Hellwig
2017-04-17 22:29   ` James Smart
2017-04-20 10:20     ` Sagi Grimberg
2017-03-18 22:42 ` [PATCH 3/3] nvme-rdma: Support ctrl_loss_tmo Sagi Grimberg
2017-03-27  9:50   ` Christoph Hellwig
2017-04-25  0:46   ` James Smart
2017-05-03  8:05     ` Sagi Grimberg
2017-03-27  0:41 ` [PATCH 0/3] Introduce fabrics controller loss timeout Yi Zhang
2017-03-28 11:37   ` Sagi Grimberg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=97faaf24-c95f-fa94-9ea7-7b91f3fc1291@grimberg.me \
    --to=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.