All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2] nvme/pci: Fix hot removal during error handling
@ 2018-10-05 15:09 Keith Busch
  2018-10-05 18:14 ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2018-10-05 15:09 UTC (permalink / raw)


A removal waits for the reset_work to complete. If a surprise removal
occurs around the same time as an error triggered controller reset,
and reset work happened to dispatch a command to the removed controller,
the command won't be recovered since the timeout work doesn't do
anything during error recovery.

This patch fixes this by removing the admin queue prior to syncing reset.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---

v1 -> v2:

  This is simpler than the previous version. We only need to move
  syncing with the work after the admin queue has been successfully
  removed.

 drivers/nvme/host/pci.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index d668682f91df..9d8b0c49f8f6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2564,16 +2564,15 @@ static void nvme_remove(struct pci_dev *pdev)
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
-
-	cancel_work_sync(&dev->ctrl.reset_work);
 	pci_set_drvdata(pdev, NULL);
 
 	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
 		nvme_dev_disable(dev, true);
+		nvme_dev_remove_admin(dev);
 	}
 
-	flush_work(&dev->ctrl.reset_work);
+	cancel_work_sync(&dev->ctrl.reset_work);
 	nvme_stop_ctrl(&dev->ctrl);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_dev_disable(dev, true);
-- 
2.14.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCHv2] nvme/pci: Fix hot removal during error handling
  2018-10-05 15:09 [PATCHv2] nvme/pci: Fix hot removal during error handling Keith Busch
@ 2018-10-05 18:14 ` Sagi Grimberg
  2018-10-05 19:51   ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2018-10-05 18:14 UTC (permalink / raw)



> -	flush_work(&dev->ctrl.reset_work);
> +	cancel_work_sync(&dev->ctrl.reset_work);

Does reset_work requeue itself? if not I don't fully understand
this change.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCHv2] nvme/pci: Fix hot removal during error handling
  2018-10-05 18:14 ` Sagi Grimberg
@ 2018-10-05 19:51   ` Keith Busch
  2018-10-05 20:24     ` Sagi Grimberg
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2018-10-05 19:51 UTC (permalink / raw)


On Fri, Oct 05, 2018@11:14:50AM -0700, Sagi Grimberg wrote:
> 
> > -	flush_work(&dev->ctrl.reset_work);
> > +	cancel_work_sync(&dev->ctrl.reset_work);
> 
> Does reset_work requeue itself? if not I don't fully understand
> this change.

You'd have to abuse the state machine if reset_work wanted to requeue
itself, so that shouldn't happen.

We could just leave it as a flush work, and that should be okay too. The
cancel_work_sync should accomplish the same, but also prevent the work
from even starting if it just so happens to have been pending in a
work queue, but that seems very unlikely.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCHv2] nvme/pci: Fix hot removal during error handling
  2018-10-05 19:51   ` Keith Busch
@ 2018-10-05 20:24     ` Sagi Grimberg
  2018-10-15  9:44       ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2018-10-05 20:24 UTC (permalink / raw)



>>> -	flush_work(&dev->ctrl.reset_work);
>>> +	cancel_work_sync(&dev->ctrl.reset_work);
>>
>> Does reset_work requeue itself? if not I don't fully understand
>> this change.
> 
> You'd have to abuse the state machine if reset_work wanted to requeue
> itself, so that shouldn't happen.
> 
> We could just leave it as a flush work, and that should be okay too. The
> cancel_work_sync should accomplish the same, but also prevent the work
> from even starting if it just so happens to have been pending in a
> work queue, but that seems very unlikely.

I don't mind, but was looking to understand what it is trying to
achieve. If you do change that, would be a good idea to include it in
the change log

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCHv2] nvme/pci: Fix hot removal during error handling
  2018-10-05 20:24     ` Sagi Grimberg
@ 2018-10-15  9:44       ` Christoph Hellwig
  0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2018-10-15  9:44 UTC (permalink / raw)


On Fri, Oct 05, 2018@01:24:24PM -0700, Sagi Grimberg wrote:
>
>>>> -	flush_work(&dev->ctrl.reset_work);
>>>> +	cancel_work_sync(&dev->ctrl.reset_work);
>>>
>>> Does reset_work requeue itself? if not I don't fully understand
>>> this change.
>>
>> You'd have to abuse the state machine if reset_work wanted to requeue
>> itself, so that shouldn't happen.
>>
>> We could just leave it as a flush work, and that should be okay too. The
>> cancel_work_sync should accomplish the same, but also prevent the work
>> from even starting if it just so happens to have been pending in a
>> work queue, but that seems very unlikely.
>
> I don't mind, but was looking to understand what it is trying to
> achieve. If you do change that, would be a good idea to include it in
> the change log

Keith, any plans to resend?  Or should I take this one after all?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-15  9:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-05 15:09 [PATCHv2] nvme/pci: Fix hot removal during error handling Keith Busch
2018-10-05 18:14 ` Sagi Grimberg
2018-10-05 19:51   ` Keith Busch
2018-10-05 20:24     ` Sagi Grimberg
2018-10-15  9:44       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.