linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Keith Busch <keith.busch@intel.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	James Smart <james.smart@broadcom.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org,
	Laurence Oberman <loberman@redhat.com>
Subject: Re: [PATCH V5 0/9] nvme: pci: fix & improve timeout handling
Date: Mon, 14 May 2018 20:22:13 +0800	[thread overview]
Message-ID: <20180514122211.GB807@ming.t460p> (raw)
In-Reply-To: <008cb38d-aa91-6ab7-64d9-417d6c53a1eb@oracle.com>

Hi Jianchao,

On Mon, May 14, 2018 at 06:05:50PM +0800, jianchao.wang wrote:
> Hi ming
> 
> On 05/14/2018 05:38 PM, Ming Lei wrote:
> >> Here is the deadlock scenario.
> >>
> >> nvme_eh_work // EH0
> >>   -> nvme_reset_dev //hold reset_lock
> >>     -> nvme_setup_io_queues
> >>       -> nvme_create_io_queues
> >>         -> nvme_create_queue
> >>           -> set nvmeq->cq_vector
> >>           -> adapter_alloc_cq
> >>           -> adapter_alloc_sq
> >>              irq has not been requested
> >>              io timeout 
> >>                                     nvme_eh_work //EH1
> >>                                       -> nvme_dev_disable
> >>                                         -> quiesce the adminq //----> here !
> >>                                         -> nvme_suspend_queue
> >>                                           print out warning Trying to free already-free IRQ 133
> >>                                         -> nvme_cancel_request // complete the timeout admin request
> >>                                       -> require reset_lock
> >>           -> adapter_delete_cq
> > If the admin IO submitted in adapter_alloc_sq() is timed out,
> > nvme_dev_disable() in EH1 will complete it which is set as REQ_FAILFAST_DRIVER,
> > then adapter_alloc_sq() should return error, and the whole reset in EH0
> > should have been terminated immediately.
> 
> Please refer to the following segment:
> 
> static int nvme_create_queue(struct nvme_queue *nvmeq, int qid)
> {
> 	struct nvme_dev *dev = nvmeq->dev;
> 	int result;
> ...
> 	nvmeq->cq_vector = dev->num_vecs == 1 ? 0 : qid;
> 	result = adapter_alloc_cq(dev, qid, nvmeq);
> 	if (result < 0)
> 		goto release_vector;
> 
> 	result = adapter_alloc_sq(dev, qid, nvmeq);   // if timeout and failed here
> 	if (result < 0)
> 		goto release_cq;
> 
> 	nvme_init_queue(nvmeq, qid);
> 	result = queue_request_irq(nvmeq);
> 	if (result < 0)
> 		goto release_sq;
> 
> 	return result;
> 
>  release_sq:
> 	dev->online_queues--;
> 	adapter_delete_sq(dev, qid);
>  release_cq:                                        // we will be here !
> 	adapter_delete_cq(dev, qid);                // another cq delete admin command will be sent out.
>  release_vector:
> 	nvmeq->cq_vector = -1;
> 	return result;
> }

Given admin queue has been suspended, all admin IO should have
been terminated immediately, so could you test the following patch?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f509d37b2fb8..44e38be259f2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1515,9 +1515,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
 	nvmeq->cq_vector = -1;
 	spin_unlock_irq(&nvmeq->q_lock);
 
-	if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
-		blk_mq_quiesce_queue(nvmeq->dev->ctrl.admin_q);
-
 	pci_free_irq(to_pci_dev(nvmeq->dev->dev), vector, nvmeq);
 
 	return 0;
@@ -1741,8 +1738,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
 			dev->ctrl.admin_q = NULL;
 			return -ENODEV;
 		}
-	} else
-		blk_mq_unquiesce_queue(dev->ctrl.admin_q);
+	}
 
 	return 0;
 }

> 
> 
> > 
> > I guess the issue should be that nvme_create_io_queues() ignores the failure.
> > 
> > Could you dump the stack trace of EH0 reset task? So that we may see
> > where EH0 reset kthread hangs.
> 
> root@will-ThinkCentre-M910s:/home/will/Desktop# cat /proc/2273/stack 
> [<0>] blk_execute_rq+0xf7/0x150
> [<0>] __nvme_submit_sync_cmd+0x94/0x110
> [<0>] nvme_submit_sync_cmd+0x1b/0x20
> [<0>] adapter_delete_queue+0xad/0xf0
> [<0>] nvme_reset_dev+0x1b67/0x2450
> [<0>] nvme_eh_work+0x19c/0x4b0
> [<0>] process_one_work+0x3ca/0xaa0
> [<0>] worker_thread+0x89/0x6c0
> [<0>] kthread+0x18d/0x1e0
> [<0>] ret_from_fork+0x24/0x30
> [<0>] 0xffffffffffffffff

Even without this patch, the above hang can happen in reset context,
so this issue isn't related with the introduced 'reset_lock'.

> root@will-ThinkCentre-M910s:/home/will/Desktop# cat /proc/2275/stack 
> [<0>] nvme_eh_work+0x11a/0x4b0
> [<0>] process_one_work+0x3ca/0xaa0
> [<0>] worker_thread+0x89/0x6c0
> [<0>] kthread+0x18d/0x1e0
> [<0>] ret_from_fork+0x24/0x30
> [<0>] 0xffffffffffffffff
> 
> > 
> >>             -> adapter_delete_queue // submit to the adminq which has been quiesced.
> >>               -> nvme_submit_sync_cmd
> >>                 -> blk_execute_rq
> >>                   -> wait_for_completion_io_timeout
> >>                   hang_check is true, so there is no hung task warning for this context
> >>
> >> EH0 submit cq delete admin command, but it will never be completed or timed out, because the admin request queue has
> >> been quiesced, so the reset_lock cannot be released, and EH1 cannot get reset_lock and make things forward.
> > The nvme_dev_disable() in outer EH(EH1 in above log) will complete all
> > admin command, which won't be retried because it is set as
> > REQ_FAILFAST_DRIVER, so nvme_cancel_request() will complete it in
> > nvme_dev_disable().
> 
> This cq delete admin command is sent out after EH 1 nvme_dev_disable completed and failed the
> previous timeout sq alloc admin command. please refer to the code segment above.

Right, as I mentioned above, this admin command should have been failed
immediately.


Thanks,
Ming

  reply	other threads:[~2018-05-14 12:22 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-11 12:29 [PATCH V5 0/9] nvme: pci: fix & improve timeout handling Ming Lei
2018-05-11 12:29 ` [PATCH V5 1/9] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() Ming Lei
2018-05-11 12:29 ` [PATCH V5 2/9] nvme: pci: cover timeout for admin commands running in EH Ming Lei
2018-05-11 12:29 ` [PATCH V5 3/9] nvme: pci: only wait freezing if queue is frozen Ming Lei
2018-05-11 12:29 ` [PATCH V5 4/9] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery Ming Lei
2018-05-11 12:29 ` [PATCH V5 5/9] nvme: pci: prepare for supporting error recovery from resetting context Ming Lei
2018-05-11 12:29 ` [PATCH V5 6/9] nvme: pci: move error handling out of nvme_reset_dev() Ming Lei
2018-05-11 12:29 ` [PATCH V5 7/9] nvme: pci: don't unfreeze queue until controller state updating succeeds Ming Lei
2018-05-11 12:29 ` [PATCH V5 8/9] nvme: core: introduce nvme_force_change_ctrl_state() Ming Lei
2018-05-11 12:29 ` [PATCH V5 9/9] nvme: pci: support nested EH Ming Lei
2018-05-15 10:02   ` jianchao.wang
2018-05-15 12:39     ` Ming Lei
2018-05-11 20:50 ` [PATCH V5 0/9] nvme: pci: fix & improve timeout handling Keith Busch
2018-05-12  0:21   ` Ming Lei
2018-05-14 15:18     ` Keith Busch
2018-05-14 23:47       ` Ming Lei
2018-05-15  0:33         ` Keith Busch
2018-05-15  9:08           ` Ming Lei
2018-05-16  4:31           ` Ming Lei
2018-05-16 15:18             ` Keith Busch
2018-05-16 22:18               ` Ming Lei
2018-05-14  8:21 ` jianchao.wang
2018-05-14  9:38   ` Ming Lei
2018-05-14 10:05     ` jianchao.wang
2018-05-14 12:22       ` Ming Lei [this message]
2018-05-15  0:33         ` Ming Lei
2018-05-15  9:56           ` jianchao.wang
2018-05-15 12:56             ` Ming Lei
2018-05-16  3:03               ` jianchao.wang
2018-05-16  2:04             ` Ming Lei
2018-05-16  2:09               ` Ming Lei
2018-05-16  2:15                 ` jianchao.wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180514122211.GB807@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=james.smart@broadcom.com \
    --cc=jianchao.w.wang@oracle.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=loberman@redhat.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).