From: Ming Lei <ming.lei@redhat.com>
To: "jianchao.wang" <jianchao.w.wang@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org,
Keith Busch <keith.busch@intel.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 1/2] nvme: pci: simplify timeout handling
Date: Fri, 27 Apr 2018 22:57:13 +0800 [thread overview]
Message-ID: <20180427145708.GA2767@ming.t460p> (raw)
In-Reply-To: <325688af-3ae2-49db-3a59-ef3903adcdf6@oracle.com>
On Fri, Apr 27, 2018 at 09:37:06AM +0800, jianchao.wang wrote:
>
>
> On 04/26/2018 11:57 PM, Ming Lei wrote:
> > Hi Jianchao,
> >
> > On Thu, Apr 26, 2018 at 11:07:56PM +0800, jianchao.wang wrote:
> >> Hi Ming
> >>
> >> Thanks for your wonderful solution. :)
> >>
> >> On 04/26/2018 08:39 PM, Ming Lei wrote:
> >>> +/*
> >>> + * This one is called after queues are quiesced, and no in-fligh timeout
> >>> + * and nvme interrupt handling.
> >>> + */
> >>> +static void nvme_pci_cancel_request(struct request *req, void *data,
> >>> + bool reserved)
> >>> +{
> >>> + /* make sure timed-out requests are covered too */
> >>> + if (req->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) {
> >>> + req->aborted_gstate = 0;
> >>> + req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;
> >>> + }
> >>> +
> >>> + nvme_cancel_request(req, data, reserved);
> >>> +}
> >>> +
> >>> static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> >>> {
> >>> int i;
> >>> @@ -2223,10 +2316,17 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> >>> for (i = dev->ctrl.queue_count - 1; i >= 0; i--)
> >>> nvme_suspend_queue(&dev->queues[i]);
> >>>
> >>> + /*
> >>> + * safe to sync timeout after queues are quiesced, then all
> >>> + * requests(include the time-out ones) will be canceled.
> >>> + */
> >>> + nvme_sync_queues(&dev->ctrl);
> >>> + blk_sync_queue(dev->ctrl.admin_q);
> >>> +
> >> Looks like blk_sync_queue cannot drain all the timeout work.
> >>
> >> blk_sync_queue
> >> -> del_timer_sync
> >> blk_mq_timeout_work
> >> -> mod_timer
> >> -> cancel_work_sync
> >> the timeout work may come back again.
> >> we may need to force all the in-flight requests to be timed out with blk_abort_request
> >>
> >
> > blk_abort_request() seems over-kill, we could avoid this race simply by
> > returning EH_NOT_HANDLED if the controller is in-recovery.
> return EH_NOT_HANDLED maybe not enough.
> please consider the following scenario.
>
> nvme_error_handler
> -> nvme_dev_disable
> -> blk_sync_queue
> //timeout comes again due to the
> //scenario above
I may not understand your point, once blk_sync_queue() returns, the
timer itself is deactivated, meantime the synced .nvme_timeout() only
returns EH_NOT_HANDLED before the deactivation.
That means this timer won't be expired any more, so could you explain
a bit why timeout can come again after blk_sync_queue() returns.
Thanks,
Ming
next prev parent reply other threads:[~2018-04-27 14:57 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-26 12:39 [PATCH 0/2] nvme: pci: fix & improve timeout handling Ming Lei
2018-04-26 12:39 ` [PATCH 1/2] nvme: pci: simplify " Ming Lei
2018-04-26 15:07 ` jianchao.wang
2018-04-26 15:57 ` Ming Lei
2018-04-26 16:16 ` Ming Lei
2018-04-27 1:37 ` jianchao.wang
2018-04-27 14:57 ` Ming Lei [this message]
2018-04-28 14:00 ` jianchao.wang
2018-04-28 21:57 ` Ming Lei
2018-04-28 22:27 ` Ming Lei
2018-04-29 1:36 ` Ming Lei
2018-04-29 2:21 ` jianchao.wang
2018-04-29 14:13 ` Ming Lei
2018-04-27 17:51 ` Keith Busch
2018-04-28 3:50 ` Ming Lei
2018-04-28 13:35 ` Keith Busch
2018-04-28 14:31 ` jianchao.wang
2018-04-28 21:39 ` Ming Lei
2018-04-30 19:52 ` Keith Busch
2018-04-30 23:14 ` Ming Lei
2018-05-08 15:30 ` Keith Busch
2018-05-10 20:52 ` Ming Lei
2018-05-10 21:05 ` Keith Busch
2018-05-10 21:10 ` Ming Lei
2018-05-10 21:18 ` Keith Busch
2018-05-10 21:24 ` Ming Lei
2018-05-10 21:44 ` Keith Busch
2018-05-10 21:50 ` Ming Lei
2018-05-10 21:53 ` Ming Lei
2018-05-10 22:03 ` Ming Lei
2018-05-10 22:43 ` Keith Busch
2018-05-11 0:14 ` Ming Lei
2018-05-11 2:10 ` Ming Lei
2018-04-26 12:39 ` [PATCH 2/2] nvme: pci: guarantee EH can make progress Ming Lei
2018-04-26 16:24 ` Keith Busch
2018-04-28 3:28 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180427145708.GA2767@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=jianchao.w.wang@oracle.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).