linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>, Chao Leng <lengchao@huawei.com>,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org
Subject: Re: [PATCH 2/2] nvme-multipath: don't block on blk_queue_enter of the underlying device
Date: Tue, 23 Mar 2021 23:53:30 +0900	[thread overview]
Message-ID: <20210323145330.GB21687@redsun51.ssa.fujisawa.hgst.com> (raw)
In-Reply-To: <250dc97d-8781-1655-02ca-5171b0bd6e24@suse.de>

On Tue, Mar 23, 2021 at 09:36:47AM +0100, Hannes Reinecke wrote:
> On 3/23/21 8:31 AM, Sagi Grimberg wrote:
> > 
> > > Actually, I had been playing around with marking the entire bio as
> > > 'NOWAIT'; that would avoid the tag stall, too:
> > > 
> > > @@ -313,7 +316,7 @@ blk_qc_t nvme_ns_head_submit_bio(struct bio *bio)
> > >          ns = nvme_find_path(head);
> > >          if (likely(ns)) {
> > >                  bio_set_dev(bio, ns->disk->part0);
> > > -               bio->bi_opf |= REQ_NVME_MPATH;
> > > +               bio->bi_opf |= REQ_NVME_MPATH | REQ_NOWAIT;
> > >                  trace_block_bio_remap(bio, disk_devt(ns->head->disk),
> > >                                        bio->bi_iter.bi_sector);
> > >                  ret = submit_bio_noacct(bio);
> > > 
> > > 
> > > My only worry here is that we might incur spurious failures under
> > > high load; but then this is not necessarily a bad thing.
> > 
> > What? making spurious failures is not ok under any load. what fs will
> > take into account that you may have run out of tags?
> 
> Well, it's not actually a spurious failure but rather a spurious failover,
> as we're still on a multipath scenario, and bios will still be re-routed to
> other paths. Or queued if all paths are out of tags.
> Hence the OS would not see any difference in behaviour.

Failover might be overkill. We can run out of tags in a perfectly normal
situation, and simply waiting may be the best option, or even scheduling
on a different CPU may be sufficient to get a viable tag  rather than
selecting a different path.

Does it make sense to just abort all allocated tags during a reset and
let the original bio requeue for multipath IO?

 
> But in the end, we abandoned this attempt, as the crash we've been seeing
> was in bio_endio (due to bi_bdev still pointing to the removed path device):
> 
> [ 6552.155251]  bio_endio+0x74/0x120
> [ 6552.155260]  nvme_ns_head_submit_bio+0x36f/0x3e0 [nvme_core]
> [ 6552.155271]  submit_bio_noacct+0x175/0x490
> [ 6552.155284]  ? nvme_requeue_work+0x5a/0x70 [nvme_core]
> [ 6552.155290]  nvme_requeue_work+0x5a/0x70 [nvme_core]
> [ 6552.155296]  process_one_work+0x1f4/0x3e0
> [ 6552.155299]  worker_thread+0x2d/0x3e0
> [ 6552.155302]  ? process_one_work+0x3e0/0x3e0
> [ 6552.155305]  kthread+0x10d/0x130
> [ 6552.155307]  ? kthread_park+0xa0/0xa0
> [ 6552.155311]  ret_from_fork+0x35/0x40
> 
> So we're not blocked on blk_queue_enter(), and it's a crash, not a deadlock.
> Blocking on blk_queue_enter() certainly plays a part here,
> but is seems not to be the full picture.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-03-23 14:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22  7:37 fix nvme-tcp and nvme-rdma controller reset hangs when using multipath Christoph Hellwig
2021-03-22  7:37 ` [PATCH 1/2] blk-mq: add a blk_mq_submit_bio_direct API Christoph Hellwig
2021-03-22 11:23   ` Hannes Reinecke
2021-03-22 15:30   ` Keith Busch
2021-03-22  7:37 ` [PATCH 2/2] nvme-multipath: don't block on blk_queue_enter of the underlying device Christoph Hellwig
2021-03-22 11:22   ` Hannes Reinecke
2021-03-22 15:31   ` Keith Busch
2021-03-23  2:57   ` Sagi Grimberg
2021-03-23  3:23     ` Sagi Grimberg
2021-03-23  7:04       ` Chao Leng
2021-03-23  7:36         ` Sagi Grimberg
2021-03-23  8:13           ` Chao Leng
2021-03-23 16:17             ` Christoph Hellwig
2021-03-23 16:15           ` Christoph Hellwig
2021-03-23 18:13             ` Sagi Grimberg
2021-03-23 18:22               ` Christoph Hellwig
2021-03-23 19:00                 ` Sagi Grimberg
2021-03-23 19:01                   ` Christoph Hellwig
2021-03-23 19:10                     ` Sagi Grimberg
2021-03-23  7:28     ` Hannes Reinecke
2021-03-23  7:31       ` Sagi Grimberg
2021-03-23  8:36         ` Hannes Reinecke
2021-03-23 14:53           ` Keith Busch [this message]
2021-03-23 16:19             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210323145330.GB21687@redsun51.ssa.fujisawa.hgst.com \
    --to=kbusch@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=lengchao@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).