Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] nvme-pci: Shutdown when removing dead controller
@ 2019-10-03 19:13 Tyler Ramer
  2019-10-04 15:36 ` Tyler Ramer
  2019-10-07 22:11 ` [PATCH] " Singh, Balbir
  0 siblings, 2 replies; 12+ messages in thread
From: Tyler Ramer @ 2019-10-03 19:13 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel

Always shutdown the controller when nvme_remove_dead_controller is
reached.

It's possible for nvme_remove_dead_controller to be called as part of a
failed reset, when there is a bad NVME_CSTS. The controller won't
be comming back online, so we should shut it down rather than just
disabling.

Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c0808f9eb8ab..c3f5ba22c625 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 {
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, true);
 	nvme_kill_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
-- 
2.23.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-03 19:13 [PATCH] nvme-pci: Shutdown when removing dead controller Tyler Ramer
@ 2019-10-04 15:36 ` Tyler Ramer
  2019-10-05  2:07   ` Singh, Balbir
  2019-10-06 19:21   ` Keith Busch
  2019-10-07 22:11 ` [PATCH] " Singh, Balbir
  1 sibling, 2 replies; 12+ messages in thread
From: Tyler Ramer @ 2019-10-04 15:36 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, linux-kernel

Here's a failure we had which represents the issue the patch is
intended to solve:

Aug 26 15:00:56 testhost kernel: nvme nvme4: async event result 00010300
Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
reset: CSTS=0x3, PCI_STATUS=0x10
Aug 26 15:02:10 testhost kernel: nvme nvme4: Device not ready; aborting reset
Aug 26 15:02:10 testhost kernel: nvme nvme4: Removing after probe
failure status: -19

The CSTS warnings comes from nvme_timeout, and is printed by
nvme_warn_reset. A reset then occurs
Controller state should be NVME_CTRL_RESETTING

Now, in nvme_reset_work, controller is never marked "CONNECTING"  at:

     if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING))

because several lines above, we can determine that
nvme_pci_configure_admin_queues returns
a bad result, which triggers a goto out_unlock and prints "removing
after probe failure status: -19"

Because state is never changed to NVME_CTRL_CONNECTING or
NVME_CTRL_DELETING, the
logic added in https://github.com/torvalds/linux/commit/2036f7263d70e67d70a67899a468588cb7356bc9
should not apply. We can further validate that dev->ctrl.state ==
NVME_CTRL_RESETTING thanks to
the WARN_ON in nvme_reset_work.






On Thu, Oct 3, 2019 at 3:13 PM Tyler Ramer <tyaramer@gmail.com> wrote:
>
> Always shutdown the controller when nvme_remove_dead_controller is
> reached.
>
> It's possible for nvme_remove_dead_controller to be called as part of a
> failed reset, when there is a bad NVME_CSTS. The controller won't
> be comming back online, so we should shut it down rather than just
> disabling.
>
> Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> ---
>  drivers/nvme/host/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c0808f9eb8ab..c3f5ba22c625 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>  static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
>  {
>         nvme_get_ctrl(&dev->ctrl);
> -       nvme_dev_disable(dev, false);
> +       nvme_dev_disable(dev, true);
>         nvme_kill_queues(&dev->ctrl);
>         if (!queue_work(nvme_wq, &dev->remove_work))
>                 nvme_put_ctrl(&dev->ctrl);
> --
> 2.23.0
>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-04 15:36 ` Tyler Ramer
@ 2019-10-05  2:07   ` Singh, Balbir
  2019-10-05 21:58     ` Tyler Ramer
  2019-10-06 19:21   ` Keith Busch
  1 sibling, 1 reply; 12+ messages in thread
From: Singh, Balbir @ 2019-10-05  2:07 UTC (permalink / raw)
  To: kbusch, hch, linux-kernel, linux-nvme, tyaramer, axboe, sagi

On Fri, 2019-10-04 at 11:36 -0400, Tyler Ramer wrote:
> Here's a failure we had which represents the issue the patch is
> intended to solve:
> 
> Aug 26 15:00:56 testhost kernel: nvme nvme4: async event result 00010300
> Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
> reset: CSTS=0x3, PCI_STATUS=0x10
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Device not ready; aborting
> reset
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Removing after probe
> failure status: -19
> 
> The CSTS warnings comes from nvme_timeout, and is printed by
> nvme_warn_reset. A reset then occurs
> Controller state should be NVME_CTRL_RESETTING
> 
> Now, in nvme_reset_work, controller is never marked "CONNECTING"  at:
> 
>      if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING))
> 
> because several lines above, we can determine that
> nvme_pci_configure_admin_queues returns
> a bad result, which triggers a goto out_unlock and prints "removing
> after probe failure status: -19"
> 
> Because state is never changed to NVME_CTRL_CONNECTING or
> NVME_CTRL_DELETING, the
> logic added in 
> https://github.com/torvalds/linux/commit/2036f7263d70e67d70a67899a468588cb7356bc9
> should not apply. We can further validate that dev->ctrl.state ==
> NVME_CTRL_RESETTING thanks to
> the WARN_ON in nvme_reset_work.
> 
> 
> 
> 
> 
> 
> On Thu, Oct 3, 2019 at 3:13 PM Tyler Ramer <tyaramer@gmail.com> wrote:
> > 
> > Always shutdown the controller when nvme_remove_dead_controller is
> > reached.
> > 
> > It's possible for nvme_remove_dead_controller to be called as part of a
> > failed reset, when there is a bad NVME_CSTS. The controller won't
> > be comming back online, so we should shut it down rather than just
> > disabling.
> > 

What is the bad CSTS bit? CSTS.RDY? The entire reset/disable race is quite
tricky in general, it was made better with the shutdown_lock, but if the
timeout value is small, several of them can occur in the middle of a reset.

For this patch

Acked-by: Balbir Singh <sblbir@amzn.com>

> > Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> > ---
> >  drivers/nvme/host/pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index c0808f9eb8ab..c3f5ba22c625 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl
> > *ctrl)
> >  static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
> >  {
> >         nvme_get_ctrl(&dev->ctrl);
> > -       nvme_dev_disable(dev, false);
> > +       nvme_dev_disable(dev, true);




> >         nvme_kill_queues(&dev->ctrl);
> >         if (!queue_work(nvme_wq, &dev->remove_work))
> >                 nvme_put_ctrl(&dev->ctrl);
> > --
> > 2.23.0
> > 
> 
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-05  2:07   ` Singh, Balbir
@ 2019-10-05 21:58     ` Tyler Ramer
  0 siblings, 0 replies; 12+ messages in thread
From: Tyler Ramer @ 2019-10-05 21:58 UTC (permalink / raw)
  To: Singh, Balbir; +Cc: sagi, linux-kernel, linux-nvme, axboe, kbusch, hch

> What is the bad CSTS bit? CSTS.RDY?

The reset will be triggered by the result of nvme_should_reset():

1196 static bool nvme_should_reset(struct nvme_dev *dev, u32 csts)
1197 {
1198
1199 ⇥       /* If true, indicates loss of adapter communication, possibly by a
1200 ⇥        * NVMe Subsystem reset.
1201 ⇥        */
1202 ⇥       bool nssro = dev->subsystem && (csts & NVME_CSTS_NSSRO);

This csts value is set in nvme_timeout:

1240 static enum blk_eh_timer_return nvme_timeout(struct request *req,
bool reserved)
1241 {
...
1247 ⇥       u32 csts = readl(dev->bar + NVME_REG_CSTS);
...
1256 ⇥       /*
1257 ⇥        * Reset immediately if the controller is failed
1258 ⇥        */
1259 ⇥       if (nvme_should_reset(dev, csts)) {
1260 ⇥       ⇥       nvme_warn_reset(dev, csts);
1261 ⇥       ⇥       nvme_dev_disable(dev, false);
1262 ⇥       ⇥       nvme_reset_ctrl(&dev->ctrl);


Again, here's the message printed by nvme_warn_reset:

Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
reset: CSTS=0x3, PCI_STATUS=0x10

From  include/linux/nvme.h:
 105 ⇥       NVME_REG_CSTS⇥  = 0x001c,⇥      /* Controller Status */

- Tyler

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-04 15:36 ` Tyler Ramer
  2019-10-05  2:07   ` Singh, Balbir
@ 2019-10-06 19:21   ` Keith Busch
  2019-10-07 15:13     ` Tyler Ramer
  1 sibling, 1 reply; 12+ messages in thread
From: Keith Busch @ 2019-10-06 19:21 UTC (permalink / raw)
  To: Tyler Ramer
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Fri, Oct 04, 2019 at 11:36:42AM -0400, Tyler Ramer wrote:
> Here's a failure we had which represents the issue the patch is
> intended to solve:
> 
> Aug 26 15:00:56 testhost kernel: nvme nvme4: async event result 00010300
> Aug 26 15:01:27 testhost kernel: nvme nvme4: controller is down; will
> reset: CSTS=0x3, PCI_STATUS=0x10
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Device not ready; aborting reset
> Aug 26 15:02:10 testhost kernel: nvme nvme4: Removing after probe
> failure status: -19
> 
> The CSTS warnings comes from nvme_timeout, and is printed by
> nvme_warn_reset. A reset then occurs
> Controller state should be NVME_CTRL_RESETTING
> 
> Now, in nvme_reset_work, controller is never marked "CONNECTING"  at:
> 
>      if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_CONNECTING))
> 
> because several lines above, we can determine that
> nvme_pci_configure_admin_queues returns
> a bad result, which triggers a goto out_unlock and prints "removing
> after probe failure status: -19"
> 
> Because state is never changed to NVME_CTRL_CONNECTING or
> NVME_CTRL_DELETING, the
> logic added in https://github.com/torvalds/linux/commit/2036f7263d70e67d70a67899a468588cb7356bc9
> should not apply. 

Nor should it, because there are no IO in flight at this point, there
can't be any timeout work to check the state.

> We can further validate that dev->ctrl.state ==
> NVME_CTRL_RESETTING thanks to
> the WARN_ON in nvme_reset_work.

I'm not sure I see what this is fixing. Setting the shutdown to true is
usually just to get the queues flushed, but the nvme_kill_queues() that
we call accomplishes the same thing.
 
> On Thu, Oct 3, 2019 at 3:13 PM Tyler Ramer <tyaramer@gmail.com> wrote:
> >
> > Always shutdown the controller when nvme_remove_dead_controller is
> > reached.
> >
> > It's possible for nvme_remove_dead_controller to be called as part of a
> > failed reset, when there is a bad NVME_CSTS. The controller won't
> > be comming back online, so we should shut it down rather than just
> > disabling.
> >
> > Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> > ---
> >  drivers/nvme/host/pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index c0808f9eb8ab..c3f5ba22c625 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
> >  static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
> >  {
> >         nvme_get_ctrl(&dev->ctrl);
> > -       nvme_dev_disable(dev, false);
> > +       nvme_dev_disable(dev, true);
> >         nvme_kill_queues(&dev->ctrl);
> >         if (!queue_work(nvme_wq, &dev->remove_work))
> >                 nvme_put_ctrl(&dev->ctrl);
> > --
> > 2.23.0
> >

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-06 19:21   ` Keith Busch
@ 2019-10-07 15:13     ` Tyler Ramer
  2019-10-07 15:44       ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Tyler Ramer @ 2019-10-07 15:13 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-nvme, Sagi Grimberg

> Setting the shutdown to true is
> usually just to get the queues flushed, but the nvme_kill_queues() that
> we call accomplishes the same thing.

The intention of this patch was to clean up another location where
nvme_dev_disable()
is called with shutdown == false, but the device is being removed due
to a failure
condition, so it should be shutdown.

Perhaps though, given nvme_kill_queues() provides a subset of the
functionality of
nvme_dev_disable() with shutdown == true, we can just use
nvme_dev_disable() and
remove nvme_kill_queues()?

This will make nvme_remove_dead_ctrl() more in line with nvme_remove(),
nvme_shutdown(), etc.

- Tyler

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-07 15:13     ` Tyler Ramer
@ 2019-10-07 15:44       ` Keith Busch
  2019-10-07 17:50         ` [PATCH v2] " Tyler Ramer
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2019-10-07 15:44 UTC (permalink / raw)
  To: Tyler Ramer
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Mon, Oct 07, 2019 at 11:13:12AM -0400, Tyler Ramer wrote:
> > Setting the shutdown to true is
> > usually just to get the queues flushed, but the nvme_kill_queues() that
> > we call accomplishes the same thing.
> 
> The intention of this patch was to clean up another location where
> nvme_dev_disable()
> is called with shutdown == false, but the device is being removed due
> to a failure
> condition, so it should be shutdown.
> 
> Perhaps though, given nvme_kill_queues() provides a subset of the
> functionality of
> nvme_dev_disable() with shutdown == true, we can just use
> nvme_dev_disable() and
> remove nvme_kill_queues()?
> 
> This will make nvme_remove_dead_ctrl() more in line with nvme_remove(),
> nvme_shutdown(), etc.

It's fine to use the shutdown == true in this path as well, but I just wanted
to understand what problem it was fixing. It doesn't sound like your scenario
is going to end up setting CC.SHN, so the only thing the shutdown should
accomplish is flushing pending IO, but we already call kill_queues() right
after the nvme_dev_disable(), so that part should be okay.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] nvme-pci: Shutdown when removing dead controller
  2019-10-07 15:44       ` Keith Busch
@ 2019-10-07 17:50         ` " Tyler Ramer
  2019-10-07 18:28           ` Keith Busch
  0 siblings, 1 reply; 12+ messages in thread
From: Tyler Ramer @ 2019-10-07 17:50 UTC (permalink / raw)
  To: tyaramer
  Cc: Sagi Grimberg, linux-kernel, linux-nvme, Jens Axboe, Keith Busch,
	Christoph Hellwig

Shutdown the controller when nvme_remove_dead_controller is
reached.

If nvme_remove_dead_controller() is called, the controller won't
be comming back online, so we should shut it down rather than just
disabling.

Remove nvme_kill_queues() as nvme_dev_remove() will take care of
unquiescing queues.

Signed-off-by: Tyler Ramer <tyaramer@gmail.com>

---

Changes since v1:
    * Clean up commit message
    * Remove nvme_kill_queues()
---
 drivers/nvme/host/pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c0808f9eb8ab..68d5fb880d80 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2509,8 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 {
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
-	nvme_kill_queues(&dev->ctrl);
+	nvme_dev_disable(dev, true);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
-- 
2.23.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] nvme-pci: Shutdown when removing dead controller
  2019-10-07 17:50         ` [PATCH v2] " Tyler Ramer
@ 2019-10-07 18:28           ` Keith Busch
  2019-10-07 19:32             ` Tyler Ramer
  0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2019-10-07 18:28 UTC (permalink / raw)
  To: Tyler Ramer
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Mon, Oct 07, 2019 at 01:50:11PM -0400, Tyler Ramer wrote:
> Shutdown the controller when nvme_remove_dead_controller is
> reached.
> 
> If nvme_remove_dead_controller() is called, the controller won't
> be comming back online, so we should shut it down rather than just
> disabling.
> 
> Remove nvme_kill_queues() as nvme_dev_remove() will take care of
> unquiescing queues.


We do still need to kill the queues, though. The shutdown == true just flushes
all pending requests. Killing queues does that too, but it also sets the
request_queue to dying, which will terminate syncing any dirty pages.
 
> ---
> 
> Changes since v1:
>     * Clean up commit message
>     * Remove nvme_kill_queues()
> ---
>  drivers/nvme/host/pci.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c0808f9eb8ab..68d5fb880d80 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2509,8 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>  static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
>  {
>  	nvme_get_ctrl(&dev->ctrl);
> -	nvme_dev_disable(dev, false);
> -	nvme_kill_queues(&dev->ctrl);
> +	nvme_dev_disable(dev, true);
>  	if (!queue_work(nvme_wq, &dev->remove_work))
>  		nvme_put_ctrl(&dev->ctrl);
>  }
> -- 
> 2.23.0
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] nvme-pci: Shutdown when removing dead controller
  2019-10-07 18:28           ` Keith Busch
@ 2019-10-07 19:32             ` Tyler Ramer
  0 siblings, 0 replies; 12+ messages in thread
From: Tyler Ramer @ 2019-10-07 19:32 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, linux-kernel, Christoph Hellwig, linux-nvme, Sagi Grimberg

Keith,

Thanks for clarifying. I appreciate the comments.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme-pci: Shutdown when removing dead controller
  2019-10-03 19:13 [PATCH] nvme-pci: Shutdown when removing dead controller Tyler Ramer
  2019-10-04 15:36 ` Tyler Ramer
@ 2019-10-07 22:11 ` " Singh, Balbir
  2019-10-11 14:28   ` [PATCH v3] Always shutdown the controller when nvme_remove_dead_ctrl is reached Tyler Ramer
  1 sibling, 1 reply; 12+ messages in thread
From: Singh, Balbir @ 2019-10-07 22:11 UTC (permalink / raw)
  To: kbusch, hch, linux-kernel, linux-nvme, tyaramer, axboe, sagi

On Thu, 2019-10-03 at 15:13 -0400, Tyler Ramer wrote:
> Always shutdown the controller when nvme_remove_dead_controller is
> reached.
> 
> It's possible for nvme_remove_dead_controller to be called as part of a
> failed reset, when there is a bad NVME_CSTS. The controller won't
> be comming back online, so we should shut it down rather than just
> disabling.
> 

I would add that nvme_timeout() would go through the nvme_should_reset() path
where we don't shutdown the device during nvme_dev_disable, it makes sense to
do a full shutdown during nvme_remove_deal_ctrl work() when reset fails.



> Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
> ---

Reviewed-by: Balbir Singh <sblbir@amazon.com>

>  drivers/nvme/host/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index c0808f9eb8ab..c3f5ba22c625 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>  static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
>  {
>  	nvme_get_ctrl(&dev->ctrl);
> -	nvme_dev_disable(dev, false);
> +	nvme_dev_disable(dev, true);
>  	nvme_kill_queues(&dev->ctrl);
>  	if (!queue_work(nvme_wq, &dev->remove_work))
>  		nvme_put_ctrl(&dev->ctrl);
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] Always shutdown the controller when nvme_remove_dead_ctrl is reached.
  2019-10-07 22:11 ` [PATCH] " Singh, Balbir
@ 2019-10-11 14:28   ` Tyler Ramer
  0 siblings, 0 replies; 12+ messages in thread
From: Tyler Ramer @ 2019-10-11 14:28 UTC (permalink / raw)
  Cc: Sagi Grimberg, linux-kernel, linux-nvme, Jens Axboe, tyaramer,
	Keith Busch, Balbir Singh, Christoph Hellwig

nvme_timeout() will go through nvme_should_reset() path which won't
shutdown a device during nvme_dev_disable(). If the reset fails, then
the controller is bad and won't be coming back online, so it makes sense
to explicitly call a full shutdown during nvme_remove_dead_ctrl().

Signed-off-by: Tyler Ramer <tyaramer@gmail.com>
Reviewed-by: Balbir Singh <sblbir@amazon.com>

---
Changes since v2:
  * Clean up commit message with comment from Balbir
  * Still call nvme_kill_queues()
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index c0808f9eb8ab..c3f5ba22c625 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2509,7 +2509,7 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 {
 	nvme_get_ctrl(&dev->ctrl);
-	nvme_dev_disable(dev, false);
+	nvme_dev_disable(dev, true);
 	nvme_kill_queues(&dev->ctrl);
 	if (!queue_work(nvme_wq, &dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
-- 
2.23.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-03 19:13 [PATCH] nvme-pci: Shutdown when removing dead controller Tyler Ramer
2019-10-04 15:36 ` Tyler Ramer
2019-10-05  2:07   ` Singh, Balbir
2019-10-05 21:58     ` Tyler Ramer
2019-10-06 19:21   ` Keith Busch
2019-10-07 15:13     ` Tyler Ramer
2019-10-07 15:44       ` Keith Busch
2019-10-07 17:50         ` [PATCH v2] " Tyler Ramer
2019-10-07 18:28           ` Keith Busch
2019-10-07 19:32             ` Tyler Ramer
2019-10-07 22:11 ` [PATCH] " Singh, Balbir
2019-10-11 14:28   ` [PATCH v3] Always shutdown the controller when nvme_remove_dead_ctrl is reached Tyler Ramer

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git