linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
@ 2019-09-13  2:44 Balbir Singh
  2019-09-13  2:44 ` [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout Balbir Singh
  2019-09-13 15:01 ` [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Balbir Singh @ 2019-09-13  2:44 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, Balbir Singh, hch, sagi

This race is hard to hit in general, now that we
have the shutdown_lock in both nvme_reset_work() and
nvme_dev_disable()

The real issue is that after doing all the setup work
in nvme_reset_work(), when get another timeout (nvme_timeout()),
then we proceed to disable the controller. This causes
the reset work to only partially progress and then fail.

Depending on the progress made, we call into
nvme_remove_dead_ctrl(), which does another
nvme_dev_disable() freezing the block-mq queues.

I've noticed a race with udevd with udevd trying to re-read
the partition table, it ends up with the bd_mutex held and
it ends up waiting in blk_queue_enter(), since we froze
the queues in nvme_dev_disable(). nvme_kill_queues() calls
revalidate_disk() and ends up waiting on the bd_mutex
resulting in a deadlock.

Allow the hung tasks a chance by unfreezing the queues after
setting dying bits on the queue, then call revalidate_disk()
to update the disk size.

NOTE: I've seen this race when the controller does not
respond to IOs or abort requests, but responds to other
commands and even signals it's ready after its reset,
but still drops IO. I've tested this by emulating the
behaviour in the driver.

Signed-off-by: Balbir Singh <sblbir@amzn.com>
---
 drivers/nvme/host/core.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b45f82d58be8..45b96c6ac2d5 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -103,10 +103,15 @@ static void nvme_set_queue_dying(struct nvme_ns *ns)
 	 */
 	if (!ns->disk || test_and_set_bit(NVME_NS_DEAD, &ns->flags))
 		return;
-	revalidate_disk(ns->disk);
 	blk_set_queue_dying(ns->queue);
+	/*
+	 * Allow any pending udevd commands to be unblocked
+	 * so that revalidate_disk can then get bd_mutex
+	 */
+	blk_mq_unfreeze_queue(ns->queue);
 	/* Forcibly unquiesce queues to avoid blocking dispatch */
 	blk_mq_unquiesce_queue(ns->queue);
+	revalidate_disk(ns->disk);
 }
 
 static void nvme_queue_scan(struct nvme_ctrl *ctrl)
-- 
2.16.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout
  2019-09-13  2:44 [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Balbir Singh
@ 2019-09-13  2:44 ` Balbir Singh
  2019-09-13 14:55   ` Keith Busch
  2019-09-13 15:01 ` [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Balbir Singh @ 2019-09-13  2:44 UTC (permalink / raw)
  To: linux-nvme; +Cc: kbusch, axboe, Balbir Singh, hch, sagi

Largely for debugging purposes where controllers with large
timeouts get stuck during reset.

Signed-off-by: Balbir Singh <sblbir@amzn.com>
---
 drivers/nvme/host/core.c | 8 ++++++++
 drivers/nvme/host/nvme.h | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 45b96c6ac2d5..fa7982dfe551 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -40,6 +40,11 @@ module_param_named(io_timeout, nvme_io_timeout, uint, 0644);
 MODULE_PARM_DESC(io_timeout, "timeout in seconds for I/O");
 EXPORT_SYMBOL_GPL(nvme_io_timeout);
 
+unsigned int nvme_wait_ready_timeout = 0;
+module_param_named(wait_ready_timeout, nvme_wait_ready_timeout, uint, 0644);
+MODULE_PARM_DESC(wait_ready_timeout, "timeout in seconds for wait ready on reset");
+EXPORT_SYMBOL_GPL(nvme_wait_ready_timeout);
+
 static unsigned char shutdown_timeout = 5;
 module_param(shutdown_timeout, byte, 0644);
 MODULE_PARM_DESC(shutdown_timeout, "timeout in seconds for controller shutdown");
@@ -1937,6 +1942,9 @@ static int nvme_wait_ready(struct nvme_ctrl *ctrl, u64 cap, bool enabled)
 	u32 csts, bit = enabled ? NVME_CSTS_RDY : 0;
 	int ret;
 
+	if (nvme_wait_ready_timeout)
+		timeout = NVME_WAIT_READY_TIMEOUT + jiffies;
+
 	while ((ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &csts)) == 0) {
 		if (csts == ~0)
 			return -ENODEV;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b5013c101b35..c3caabc1f149 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -21,6 +21,9 @@
 extern unsigned int nvme_io_timeout;
 #define NVME_IO_TIMEOUT	(nvme_io_timeout * HZ)
 
+extern unsigned int nvme_wait_ready_timeout;
+#define NVME_WAIT_READY_TIMEOUT	(nvme_wait_ready_timeout * HZ)
+
 extern unsigned int admin_timeout;
 #define ADMIN_TIMEOUT	(admin_timeout * HZ)
 
-- 
2.16.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout
  2019-09-13  2:44 ` [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout Balbir Singh
@ 2019-09-13 14:55   ` Keith Busch
  2019-09-13 20:47     ` Singh, Balbir
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2019-09-13 14:55 UTC (permalink / raw)
  To: Balbir Singh; +Cc: axboe, hch, linux-nvme, sagi

On Fri, Sep 13, 2019 at 02:44:32AM +0000, Balbir Singh wrote:
> Largely for debugging purposes where controllers with large
> timeouts get stuck during reset.
> 
> Signed-off-by: Balbir Singh <sblbir@amzn.com>
> ---
>  drivers/nvme/host/core.c | 8 ++++++++
>  drivers/nvme/host/nvme.h | 3 +++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 45b96c6ac2d5..fa7982dfe551 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -40,6 +40,11 @@ module_param_named(io_timeout, nvme_io_timeout, uint, 0644);
>  MODULE_PARM_DESC(io_timeout, "timeout in seconds for I/O");
>  EXPORT_SYMBOL_GPL(nvme_io_timeout);
>  
> +unsigned int nvme_wait_ready_timeout = 0;
> +module_param_named(wait_ready_timeout, nvme_wait_ready_timeout, uint, 0644);
> +MODULE_PARM_DESC(wait_ready_timeout, "timeout in seconds for wait ready on reset");
> +EXPORT_SYMBOL_GPL(nvme_wait_ready_timeout);

There's no need to export this symbol.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
  2019-09-13  2:44 [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Balbir Singh
  2019-09-13  2:44 ` [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout Balbir Singh
@ 2019-09-13 15:01 ` Keith Busch
  2019-09-13 20:58   ` Singh, Balbir
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Busch @ 2019-09-13 15:01 UTC (permalink / raw)
  To: Balbir Singh; +Cc: axboe, hch, linux-nvme, sagi

On Fri, Sep 13, 2019 at 02:44:31AM +0000, Balbir Singh wrote:
> Allow the hung tasks a chance by unfreezing the queues after
> setting dying bits on the queue, then call revalidate_disk()
> to update the disk size.

Isn't it enough to just set the queue to dying? That should unblock
everything calling blk_queue_enter().

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout
  2019-09-13 14:55   ` Keith Busch
@ 2019-09-13 20:47     ` Singh, Balbir
  0 siblings, 0 replies; 9+ messages in thread
From: Singh, Balbir @ 2019-09-13 20:47 UTC (permalink / raw)
  To: kbusch, sblbir; +Cc: axboe, hch, linux-nvme, sagi

On Fri, 2019-09-13 at 08:55 -0600, Keith Busch wrote:
> On Fri, Sep 13, 2019 at 02:44:32AM +0000, Balbir Singh wrote:
> > Largely for debugging purposes where controllers with large
> > timeouts get stuck during reset.
> > 
> > Signed-off-by: Balbir Singh <sblbir@amzn.com>
> > ---
> >  drivers/nvme/host/core.c | 8 ++++++++
> >  drivers/nvme/host/nvme.h | 3 +++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 45b96c6ac2d5..fa7982dfe551 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -40,6 +40,11 @@ module_param_named(io_timeout, nvme_io_timeout,
> > uint, 0644);
> >  MODULE_PARM_DESC(io_timeout, "timeout in seconds for I/O");
> >  EXPORT_SYMBOL_GPL(nvme_io_timeout);
> >  
> > +unsigned int nvme_wait_ready_timeout = 0;
> > +module_param_named(wait_ready_timeout, nvme_wait_ready_timeout,
> > uint, 0644);
> > +MODULE_PARM_DESC(wait_ready_timeout, "timeout in seconds for wait
> > ready on reset");
> > +EXPORT_SYMBOL_GPL(nvme_wait_ready_timeout);
> 
> There's no need to export this symbol.

Good point, I'll repost without the export

Balbir Singh.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
  2019-09-13 15:01 ` [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Keith Busch
@ 2019-09-13 20:58   ` Singh, Balbir
  2019-09-13 21:40     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Singh, Balbir @ 2019-09-13 20:58 UTC (permalink / raw)
  To: kbusch, sblbir; +Cc: axboe, hch, linux-nvme, sagi

On Fri, 2019-09-13 at 09:01 -0600, Keith Busch wrote:
> On Fri, Sep 13, 2019 at 02:44:31AM +0000, Balbir Singh wrote:
> > Allow the hung tasks a chance by unfreezing the queues after
> > setting dying bits on the queue, then call revalidate_disk()
> > to update the disk size.
> 
> Isn't it enough to just set the queue to dying? That should unblock
> everything calling blk_queue_enter().

The real issue is that by then udevd has already called into blk_queue_enter()
and is waiting on wait_event(q->mq_freeze_wq,...), so we do need to unfreeze
and wake_all waiting on the event.

Balbir Singh.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
  2019-09-13 20:58   ` Singh, Balbir
@ 2019-09-13 21:40     ` Bart Van Assche
  2019-09-13 22:01       ` Singh, Balbir
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-09-13 21:40 UTC (permalink / raw)
  To: Singh, Balbir, kbusch, sblbir; +Cc: axboe, hch, linux-nvme, sagi

On 9/13/19 1:58 PM, Singh, Balbir wrote:
> The real issue is that by then udevd has already called into blk_queue_enter()
> and is waiting on wait_event(q->mq_freeze_wq,...), so we do need to unfreeze
> and wake_all waiting on the event.

I don't think that's correct. blk_set_queue_dying() wakes up 
blk_queue_enter() and causes it to return -ENODEV.

Bart.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
  2019-09-13 21:40     ` Bart Van Assche
@ 2019-09-13 22:01       ` Singh, Balbir
  2019-09-13 22:46         ` Singh, Balbir
  0 siblings, 1 reply; 9+ messages in thread
From: Singh, Balbir @ 2019-09-13 22:01 UTC (permalink / raw)
  To: kbusch, bvanassche, sblbir; +Cc: axboe, hch, linux-nvme, sagi

On Fri, 2019-09-13 at 14:40 -0700, Bart Van Assche wrote:
> On 9/13/19 1:58 PM, Singh, Balbir wrote:
> > The real issue is that by then udevd has already called into blk_queue_enter()
> > and is waiting on wait_event(q->mq_freeze_wq,...), so we do need to unfreeze
> > and wake_all waiting on the event.
> 
> I don't think that's correct. blk_set_queue_dying() wakes up 
> blk_queue_enter() and causes it to return -ENODEV.
> 

Fair enough.. I presume looking at the name, it was twiddling
bits. I am going to test with just the re-ordering and check.

> Bart.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] nvme/host/pci: Fix a race in controller removal
  2019-09-13 22:01       ` Singh, Balbir
@ 2019-09-13 22:46         ` Singh, Balbir
  0 siblings, 0 replies; 9+ messages in thread
From: Singh, Balbir @ 2019-09-13 22:46 UTC (permalink / raw)
  To: kbusch, bvanassche, sblbir; +Cc: axboe, hch, linux-nvme, sagi

On Sat, 2019-09-14 at 08:01 +1000, Balbir Singh wrote:
> On Fri, 2019-09-13 at 14:40 -0700, Bart Van Assche wrote:
> > On 9/13/19 1:58 PM, Singh, Balbir wrote:
> > > The real issue is that by then udevd has already called into
> > > blk_queue_enter()
> > > and is waiting on wait_event(q->mq_freeze_wq,...), so we do need
> > > to unfreeze
> > > and wake_all waiting on the event.
> > 
> > I don't think that's correct. blk_set_queue_dying() wakes up 
> > blk_queue_enter() and causes it to return -ENODEV.
> > 
> 
> Fair enough.. I presume looking at the name, it was twiddling
> bits. I am going to test with just the re-ordering and check.
> 

Upon further check, blk_set_queue_dying() tries to freeze the
queue again, but then it does do an unconditional wakeup.
I can confirm that just re-ordering seems to help and
blk_set_queue_dying() seems to have the right bits.

Respinning for comments, thanks for the reviews Bart
and Keith

Balbir Singh.

> > Bart.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-13 22:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13  2:44 [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Balbir Singh
2019-09-13  2:44 ` [PATCH 2/2] nvme/host/core: Allow overriding of wait_ready timeout Balbir Singh
2019-09-13 14:55   ` Keith Busch
2019-09-13 20:47     ` Singh, Balbir
2019-09-13 15:01 ` [PATCH 1/2] nvme/host/pci: Fix a race in controller removal Keith Busch
2019-09-13 20:58   ` Singh, Balbir
2019-09-13 21:40     ` Bart Van Assche
2019-09-13 22:01       ` Singh, Balbir
2019-09-13 22:46         ` Singh, Balbir

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).