linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs
Date: Wed, 17 Mar 2021 05:42:04 +0900	[thread overview]
Message-ID: <20210316204204.GA23332@redsun51.ssa.fujisawa.hgst.com> (raw)
In-Reply-To: <1b2ccda9-5789-e73a-f0c9-2dd40f320203@grimberg.me>

On Tue, Mar 16, 2021 at 01:07:12PM -0700, Sagi Grimberg wrote:
> > The below patches caused a regression in a multipath setup:
> > Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")
> > Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic")
> > 
> > These patches on their own are correct because they fixed a controller reset
> > regression.
> > 
> > When we reset/teardown a controller, we must freeze and quiesce the namespaces
> > request queues to make sure that we safely stop inflight I/O submissions.
> > Freeze is mandatory because if our hctx map changed between reconnects,
> > blk_mq_update_nr_hw_queues will immediately attempt to freeze the queue, and
> > if it still has pending submissions (that are still quiesced) it will hang.
> > This is what the above patches fixed.
> > 
> > However, by freezing the namespaces request queues, and only unfreezing them
> > when we successfully reconnect, inflight submissions that are running
> > concurrently can now block grabbing the nshead srcu until either we successfully
> > reconnect or ctrl_loss_tmo expired (or the user explicitly disconnected).
> > 
> > This caused a deadlock [1] when a different controller (different path on the
> > same subsystem) became live (i.e. optimized/non-optimized). This is because
> > nvme_mpath_set_live needs to synchronize the nshead srcu before requeueing I/O
> > in order to make sure that current_path is visible to future (re)submisions.
> > However the srcu lock is taken by a blocked submission on a frozen request
> > queue, and we have a deadlock.
> > 
> > For multipath, we obviously cannot allow that as we want to failover I/O asap.
> > However for non-mpath, we do not want to fail I/O (at least until controller
> > FASTFAIL expires, and that is disabled by default btw).
> > 
> > This creates a non-symmetric behavior of how the driver should behave in the
> > presence or absence of multipath.
> > 
> > [1]:
> > Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
> > Call Trace:
> >   __schedule+0x293/0x730
> >   schedule+0x33/0xa0
> >   schedule_timeout+0x1d3/0x2f0
> >   wait_for_completion+0xba/0x140
> >   __synchronize_srcu.part.21+0x91/0xc0
> >   synchronize_srcu_expedited+0x27/0x30
> >   synchronize_srcu+0xce/0xe0
> >   nvme_mpath_set_live+0x64/0x130 [nvme_core]
> >   nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
> >   nvme_update_ana_state+0xcd/0xe0 [nvme_core]
> >   nvme_parse_ana_log+0xa1/0x180 [nvme_core]
> >   nvme_read_ana_log+0x76/0x100 [nvme_core]
> >   nvme_mpath_init+0x122/0x180 [nvme_core]
> >   nvme_init_identify+0x80e/0xe20 [nvme_core]
> >   nvme_tcp_setup_ctrl+0x359/0x660 [nvme_tcp]
> >   nvme_tcp_reconnect_ctrl_work+0x24/0x70 [nvme_tcp]
> > 
> > 
> > In order to fix this, we recognize the different behavior a driver needs to take
> > in error recovery scenarios for mpath and non-mpath scenarios and expose this
> > awareness with a new helper nvme_ctrl_is_mpath and use that to know what needs
> > to be done.
> 
> Christoph, Keith,
> 
> Any thoughts on this? The RFC part is getting the transport driver to
> behave differently for mpath vs. non-mpath.

Will it work if nvme mpath used request NOWAIT flag for its submit_bio()
call, and add the bio to the requeue_list if blk_queue_enter() fails? I
think that looks like another way to resolve the deadlock, but we need
the block layer to return a failed status to the original caller.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-03-16 20:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 22:27 [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs Sagi Grimberg
2021-03-15 22:27 ` [PATCH 1/3] nvme: introduce nvme_ctrl_is_mpath helper Sagi Grimberg
2021-03-15 22:27 ` [PATCH 2/3] nvme-tcp: fix possible hang when trying to set a live path during I/O Sagi Grimberg
2021-03-15 22:27 ` [PATCH 3/3] nvme-rdma: " Sagi Grimberg
2021-03-16  3:24 ` [PATCH 0/3 rfc] Fix nvme-tcp and nvme-rdma controller reset hangs Chao Leng
2021-03-16  5:04   ` Sagi Grimberg
2021-03-16  6:18     ` Chao Leng
2021-03-16  6:25       ` Sagi Grimberg
2021-03-16 20:07 ` Sagi Grimberg
2021-03-16 20:42   ` Keith Busch [this message]
2021-03-16 23:51     ` Sagi Grimberg
2021-03-17  2:55       ` Chao Leng
2021-03-17  6:59         ` Christoph Hellwig
2021-03-17  7:59           ` Chao Leng
2021-03-17 18:43             ` Sagi Grimberg
2021-03-18  1:51               ` Chao Leng
2021-03-18  4:45                 ` Christoph Hellwig
2021-03-18 18:46                 ` Sagi Grimberg
2021-03-18 19:16                   ` Keith Busch
2021-03-18 19:31                     ` Sagi Grimberg
2021-03-18 21:52                       ` Keith Busch
2021-03-18 22:45                         ` Sagi Grimberg
2021-03-19 14:05                         ` Christoph Hellwig
2021-03-19 17:28                           ` Christoph Hellwig
2021-03-19 19:07                             ` Keith Busch
2021-03-19 19:34                             ` Sagi Grimberg
2021-03-20  6:11                               ` Christoph Hellwig
2021-03-21  6:49                                 ` Sagi Grimberg
2021-03-22  6:34                                   ` Christoph Hellwig
2021-03-17  8:16           ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210316204204.GA23332@redsun51.ssa.fujisawa.hgst.com \
    --to=kbusch@kernel.org \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).