All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chao Leng <lengchao@huawei.com>
To: <linux-nvme@lists.infradead.org>
Cc: kbusch@kernel.org, axboe@fb.com, hch@lst.de, lengchao@huawei.com,
	sagi@grimberg.me
Subject: [PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail
Date: Mon, 27 Jul 2020 16:09:26 +0800	[thread overview]
Message-ID: <20200727080926.30776-1-lengchao@huawei.com> (raw)

A deadlock happens when test link blink for nvme over roce. If time out
in reconneting process, nvme_rdma_timeout->nvme_rdma_teardown_io_queues
will quiesce the io queues, and then the ctrl will be deleted after
reconnect times exceed max_reconnects. If run fdisk from the time
when the queue is quiesced to the time when the ctrl is deleted,
delete ctrl will deadlock, the process: nvme_do_delete_ctrl->
nvme_remove_namespaces->nvme_ns_remove->blk_cleanup_queue->
blk_freeze_queue->blk_mq_freeze_queue_wait, blk_mq_freeze_queue_wait
will wait until q_usage_counter of queue become 0, but the queue is
quiesced, can not clean any request.

Solution: nvme_rdma_timeout should call nvme_start_queues after
call nvme_rdma_teardown_io_queues. further more, we need start queues
regardless of whether the remove flag is set, after cancel requests
in nvme_rdma_teardown_io_queues.

Signed-off-by: Chao Leng <lengchao@huawei.com>
---
 drivers/nvme/host/rdma.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f8f856dc0c67..b381e2cde50a 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -989,8 +989,7 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
 				nvme_cancel_request, &ctrl->ctrl);
 			blk_mq_tagset_wait_completed_request(ctrl->ctrl.tagset);
 		}
-		if (remove)
-			nvme_start_queues(&ctrl->ctrl);
+		nvme_start_queues(&ctrl->ctrl);
 		nvme_rdma_destroy_io_queues(ctrl, remove);
 	}
 }
@@ -1128,7 +1127,6 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
 
 	nvme_stop_keep_alive(&ctrl->ctrl);
 	nvme_rdma_teardown_io_queues(ctrl, false);
-	nvme_start_queues(&ctrl->ctrl);
 	nvme_rdma_teardown_admin_queue(ctrl, false);
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
-- 
2.16.4


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

             reply	other threads:[~2020-07-27  8:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27  8:09 Chao Leng [this message]
2020-07-27 18:44 ` [PATCH] nvme-rdma: fix deadlock when delete ctrl due to reconnect fail Sagi Grimberg
2020-07-27 23:31   ` Sagi Grimberg
2020-07-28  3:06     ` Chao Leng
2020-07-28  3:32       ` Sagi Grimberg
     [not found]         ` <1288e338-9e92-eeeb-6f7b-86590c6e1a4c@broadcom.com>
2020-07-28 16:27           ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727080926.30776-1-lengchao@huawei.com \
    --to=lengchao@huawei.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.