linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: [PATCH 3/3] nvme-rdma: fix possible hang when trying to set a live path during I/O
Date: Mon, 15 Mar 2021 15:27:14 -0700	[thread overview]
Message-ID: <20210315222714.378417-4-sagi@grimberg.me> (raw)
In-Reply-To: <20210315222714.378417-1-sagi@grimberg.me>

When we teardown a controller we first freeze the queue to prevent
request submissions, and quiesce the queue to prevent request queueing
and we only unfreeze/unquiesce when we successfully reconnect a
controller.

In case we attempt to set a live path (optimized/non-optimized) and
update the current_path reference, we first need to wait for any
ongoing dispatches (synchronize the head->srcu).

However bio submissions _can_ block as the underlying controller queue
is frozen. which creates the below deadlock [1]. So it is clear that
the namespaces request queue must be unfrozen and unquiesced asap when
we teardown the controller.

However, when we are not in a multipath environment (!multipath or cmic
indicates ns isn't shared) we don't want to fail-fast the I/O, hence we
must keep the namespaces request queue frozen and quiesced and only
expire them when the controller successfully reconnects (and FAILFAST
may fail the I/O sooner).

[1] (happened in nvme-tcp, but works the same in nvme-rdma):
Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
Call Trace:
 __schedule+0x293/0x730
 schedule+0x33/0xa0
 schedule_timeout+0x1d3/0x2f0
 wait_for_completion+0xba/0x140
 __synchronize_srcu.part.21+0x91/0xc0
 synchronize_srcu_expedited+0x27/0x30
 synchronize_srcu+0xce/0xe0
 nvme_mpath_set_live+0x64/0x130 [nvme_core]
 nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
 nvme_update_ana_state+0xcd/0xe0 [nvme_core]
 nvme_parse_ana_log+0xa1/0x180 [nvme_core]
 nvme_read_ana_log+0x76/0x100 [nvme_core]
 nvme_mpath_init+0x122/0x180 [nvme_core]
 nvme_init_identify+0x80e/0xe20 [nvme_core]
 nvme_tcp_setup_ctrl+0x359/0x660 [nvme_tcp]
 nvme_tcp_reconnect_ctrl_work+0x24/0x70 [nvme_tcp]

Fix this by looking into the newly introduced nvme_ctrl_is_mpath and
unquiesce/unfreeze the namespaces request queues accordingly (in
the teardown for mpath and after a successful reconnect for non-mpath).

Also, we no longer need the explicit nvme_start_queues in the error
recovery work.

Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index be905d4fdb47..43e8608ad5b7 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -989,19 +989,22 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new)
 		goto out_cleanup_connect_q;
 
 	if (!new) {
-		nvme_start_queues(&ctrl->ctrl);
-		if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
-			/*
-			 * If we timed out waiting for freeze we are likely to
-			 * be stuck.  Fail the controller initialization just
-			 * to be safe.
-			 */
-			ret = -ENODEV;
-			goto out_wait_freeze_timed_out;
+		if (!nvme_ctrl_is_mpath(&ctrl->ctrl)) {
+			nvme_start_queues(&ctrl->ctrl);
+			if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) {
+				/*
+				 * If we timed out waiting for freeze we are likely to
+				 * be stuck.  Fail the controller initialization just
+				 * to be safe.
+				 */
+				ret = -ENODEV;
+				goto out_wait_freeze_timed_out;
+			}
 		}
 		blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset,
 			ctrl->ctrl.queue_count - 1);
-		nvme_unfreeze(&ctrl->ctrl);
+		if (!nvme_ctrl_is_mpath(&ctrl->ctrl))
+			nvme_unfreeze(&ctrl->ctrl);
 	}
 
 	return 0;
@@ -1043,8 +1046,11 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
 		nvme_sync_io_queues(&ctrl->ctrl);
 		nvme_rdma_stop_io_queues(ctrl);
 		nvme_cancel_tagset(&ctrl->ctrl);
-		if (remove)
+		if (nvme_ctrl_is_mpath(&ctrl->ctrl)) {
 			nvme_start_queues(&ctrl->ctrl);
+			nvme_wait_freeze(&ctrl->ctrl);
+			nvme_unfreeze(&ctrl->ctrl);
+		}
 		nvme_rdma_destroy_io_queues(ctrl, remove);
 	}
 }
@@ -1191,7 +1197,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
 	nvme_rdma_teardown_io_queues(ctrl, false);
-	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
-- 
2.27.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  parent reply	other threads:[~2021-03-15 22:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 22:27 [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs Sagi Grimberg
2021-03-15 22:27 ` [PATCH 1/3] nvme: introduce nvme_ctrl_is_mpath helper Sagi Grimberg
2021-03-15 22:27 ` [PATCH 2/3] nvme-tcp: fix possible hang when trying to set a live path during I/O Sagi Grimberg
2021-03-15 22:27 ` Sagi Grimberg [this message]
2021-03-16  3:24 ` [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs Chao Leng
2021-03-16  5:04   ` Sagi Grimberg
2021-03-16  6:18     ` Chao Leng
2021-03-16  6:25       ` Sagi Grimberg
2021-03-16 20:07 ` Sagi Grimberg
2021-03-16 20:42   ` Keith Busch
2021-03-16 23:51     ` Sagi Grimberg
2021-03-17  2:55       ` Chao Leng
2021-03-17  6:59         ` Christoph Hellwig
2021-03-17  7:59           ` Chao Leng
2021-03-17 18:43             ` Sagi Grimberg
2021-03-18  1:51               ` Chao Leng
2021-03-18  4:45                 ` Christoph Hellwig
2021-03-18 18:46                 ` Sagi Grimberg
2021-03-18 19:16                   ` Keith Busch
2021-03-18 19:31                     ` Sagi Grimberg
2021-03-18 21:52                       ` Keith Busch
2021-03-18 22:45                         ` Sagi Grimberg
2021-03-19 14:05                         ` Christoph Hellwig
2021-03-19 17:28                           ` Christoph Hellwig
2021-03-19 19:07                             ` Keith Busch
2021-03-19 19:34                             ` Sagi Grimberg
2021-03-20  6:11                               ` Christoph Hellwig
2021-03-21  6:49                                 ` Sagi Grimberg
2021-03-22  6:34                                   ` Christoph Hellwig
2021-03-17  8:16           ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210315222714.378417-4-sagi@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).