All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: remove pci device if no longer present
@ 2017-06-30 23:56 Wei Zhang
  2017-07-02 15:31 ` Christoph Hellwig
  0 siblings, 1 reply; 5+ messages in thread
From: Wei Zhang @ 2017-06-30 23:56 UTC (permalink / raw)
  To: linux-block; +Cc: kernel-team, keith.busch

This patch removes the PCI device from the kernel's topology tree
if the device is no longer present.

Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
failure) left the PCI device in the kernel's topology upon device failure.
However, this does not work well for the slot power off/on test cases.
After a slot power off, we need to manually remove the PCI device
before triggering the rescan, in order for the SSD to be rediscovered.

Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
Signed-off-by: Wei Zhang <wzhang@fb.com>
Reviewed-by: Jens Axboe <axboe@fb.com>
---
 drivers/nvme/host/pci.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 32a98e2..094b22f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	nvme_kill_queues(&dev->ctrl);
-	if (pci_get_drvdata(pdev))
-		device_release_driver(&pdev->dev);
+
+	/*
+	 * Remove the PCI device from the topology tree if the device is no longer
+	 * present.  Without removing, slot power off/on test cannot re-discover
+	 * the SSD.
+	 */
+	if (pci_get_drvdata(pdev)) {
+		if (!pci_device_is_present(pdev)) {
+			pci_stop_and_remove_bus_device_locked(pdev);
+		} else {
+			device_release_driver(&pdev->dev);
+		}
+	}
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: remove pci device if no longer present
  2017-06-30 23:56 [PATCH] nvme: remove pci device if no longer present Wei Zhang
@ 2017-07-02 15:31 ` Christoph Hellwig
  2017-07-05 16:03   ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2017-07-02 15:31 UTC (permalink / raw)
  To: Wei Zhang; +Cc: linux-block, kernel-team, keith.busch, linux-nvme, linux-pci

Please CC the linux-nvme list on any nvme issues.  Also this
code is getting a little too fancy for living in nvme, I think we
need to move it into the PCI core, ensure we properly take drv->lock
to synchronize it, and check for dev->drv instead of the private data
which is a guestimate.

On Fri, Jun 30, 2017 at 04:56:04PM -0700, Wei Zhang wrote:
> This patch removes the PCI device from the kernel's topology tree
> if the device is no longer present.
> 
> Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
> failure) left the PCI device in the kernel's topology upon device failure.
> However, this does not work well for the slot power off/on test cases.
> After a slot power off, we need to manually remove the PCI device
> before triggering the rescan, in order for the SSD to be rediscovered.
> 
> Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
> Signed-off-by: Wei Zhang <wzhang@fb.com>
> Reviewed-by: Jens Axboe <axboe@fb.com>
> ---
>  drivers/nvme/host/pci.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 32a98e2..094b22f 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
>  	struct pci_dev *pdev = to_pci_dev(dev->dev);
>  
>  	nvme_kill_queues(&dev->ctrl);
> -	if (pci_get_drvdata(pdev))
> -		device_release_driver(&pdev->dev);
> +
> +	/*
> +	 * Remove the PCI device from the topology tree if the device is no longer
> +	 * present.  Without removing, slot power off/on test cannot re-discover
> +	 * the SSD.
> +	 */
> +	if (pci_get_drvdata(pdev)) {
> +		if (!pci_device_is_present(pdev)) {
> +			pci_stop_and_remove_bus_device_locked(pdev);
> +		} else {
> +			device_release_driver(&pdev->dev);
> +		}
> +	}
>  	nvme_put_ctrl(&dev->ctrl);
>  }
>  
> -- 
> 2.9.3
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: remove pci device if no longer present
  2017-07-02 15:31 ` Christoph Hellwig
@ 2017-07-05 16:03   ` Keith Busch
  2017-07-05 16:05       ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2017-07-05 16:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Wei Zhang, linux-block, kernel-team, linux-nvme, linux-pci

On Sun, Jul 02, 2017 at 08:31:51AM -0700, Christoph Hellwig wrote:
> Please CC the linux-nvme list on any nvme issues.  Also this
> code is getting a little too fancy for living in nvme, I think we
> need to move it into the PCI core, ensure we properly take drv->lock
> to synchronize it, and check for dev->drv instead of the private data
> which is a guestimate.

I agree this sort of thing needs to go in the PCI layer to as common
solution for all devices. The NVMe driver shouldn't be responsible for bus
enumeration events. When we did that before, races with pciehp were a
problem.

Also, we don't have a once-per-second health check event that would have
been needed to even catch this event in the first place. To get here now,
you'll have to issue an nvme reset or wait 60 seconds after sending an
admin or IO command.
 
> On Fri, Jun 30, 2017 at 04:56:04PM -0700, Wei Zhang wrote:
> > This patch removes the PCI device from the kernel's topology tree
> > if the device is no longer present.
> > 
> > Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
> > failure) left the PCI device in the kernel's topology upon device failure.
> > However, this does not work well for the slot power off/on test cases.
> > After a slot power off, we need to manually remove the PCI device
> > before triggering the rescan, in order for the SSD to be rediscovered.
> > 
> > Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
> > Signed-off-by: Wei Zhang <wzhang@fb.com>
> > Reviewed-by: Jens Axboe <axboe@fb.com>
> > ---
> >  drivers/nvme/host/pci.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 32a98e2..094b22f 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> >  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> >  
> >  	nvme_kill_queues(&dev->ctrl);
> > -	if (pci_get_drvdata(pdev))
> > -		device_release_driver(&pdev->dev);
> > +
> > +	/*
> > +	 * Remove the PCI device from the topology tree if the device is no longer
> > +	 * present.  Without removing, slot power off/on test cannot re-discover
> > +	 * the SSD.
> > +	 */
> > +	if (pci_get_drvdata(pdev)) {
> > +		if (!pci_device_is_present(pdev)) {
> > +			pci_stop_and_remove_bus_device_locked(pdev);
> > +		} else {
> > +			device_release_driver(&pdev->dev);
> > +		}
> > +	}
> >  	nvme_put_ctrl(&dev->ctrl);
> >  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvme: remove pci device if no longer present
  2017-07-05 16:03   ` Keith Busch
@ 2017-07-05 16:05       ` Keith Busch
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2017-07-05 16:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Wei Zhang, linux-block, kernel-team, linux-nvme, linux-pci

[correcting linux-nvme in the CC]

On Wed, Jul 05, 2017 at 12:03:35PM -0400, Keith Busch wrote:
> On Sun, Jul 02, 2017 at 08:31:51AM -0700, Christoph Hellwig wrote:
> > Please CC the linux-nvme list on any nvme issues.  Also this
> > code is getting a little too fancy for living in nvme, I think we
> > need to move it into the PCI core, ensure we properly take drv->lock
> > to synchronize it, and check for dev->drv instead of the private data
> > which is a guestimate.
> 
> I agree this sort of thing needs to go in the PCI layer to as common
> solution for all devices. The NVMe driver shouldn't be responsible for bus
> enumeration events. When we did that before, races with pciehp were a
> problem.
> 
> Also, we don't have a once-per-second health check event that would have
> been needed to even catch this event in the first place. To get here now,
> you'll have to issue an nvme reset or wait 60 seconds after sending an
> admin or IO command.
>  
> > On Fri, Jun 30, 2017 at 04:56:04PM -0700, Wei Zhang wrote:
> > > This patch removes the PCI device from the kernel's topology tree
> > > if the device is no longer present.
> > > 
> > > Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
> > > failure) left the PCI device in the kernel's topology upon device failure.
> > > However, this does not work well for the slot power off/on test cases.
> > > After a slot power off, we need to manually remove the PCI device
> > > before triggering the rescan, in order for the SSD to be rediscovered.
> > > 
> > > Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
> > > Signed-off-by: Wei Zhang <wzhang@fb.com>
> > > Reviewed-by: Jens Axboe <axboe@fb.com>
> > > ---
> > >  drivers/nvme/host/pci.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 32a98e2..094b22f 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> > >  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> > >  
> > >  	nvme_kill_queues(&dev->ctrl);
> > > -	if (pci_get_drvdata(pdev))
> > > -		device_release_driver(&pdev->dev);
> > > +
> > > +	/*
> > > +	 * Remove the PCI device from the topology tree if the device is no longer
> > > +	 * present.  Without removing, slot power off/on test cannot re-discover
> > > +	 * the SSD.
> > > +	 */
> > > +	if (pci_get_drvdata(pdev)) {
> > > +		if (!pci_device_is_present(pdev)) {
> > > +			pci_stop_and_remove_bus_device_locked(pdev);
> > > +		} else {
> > > +			device_release_driver(&pdev->dev);
> > > +		}
> > > +	}
> > >  	nvme_put_ctrl(&dev->ctrl);
> > >  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme: remove pci device if no longer present
@ 2017-07-05 16:05       ` Keith Busch
  0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2017-07-05 16:05 UTC (permalink / raw)


[correcting linux-nvme in the CC]

On Wed, Jul 05, 2017@12:03:35PM -0400, Keith Busch wrote:
> On Sun, Jul 02, 2017@08:31:51AM -0700, Christoph Hellwig wrote:
> > Please CC the linux-nvme list on any nvme issues.  Also this
> > code is getting a little too fancy for living in nvme, I think we
> > need to move it into the PCI core, ensure we properly take drv->lock
> > to synchronize it, and check for dev->drv instead of the private data
> > which is a guestimate.
> 
> I agree this sort of thing needs to go in the PCI layer to as common
> solution for all devices. The NVMe driver shouldn't be responsible for bus
> enumeration events. When we did that before, races with pciehp were a
> problem.
> 
> Also, we don't have a once-per-second health check event that would have
> been needed to even catch this event in the first place. To get here now,
> you'll have to issue an nvme reset or wait 60 seconds after sending an
> admin or IO command.
>  
> > On Fri, Jun 30, 2017@04:56:04PM -0700, Wei Zhang wrote:
> > > This patch removes the PCI device from the kernel's topology tree
> > > if the device is no longer present.
> > > 
> > > Commit ddf097ec1d44c9648c4738d7cf2819411b44253a (NVMe: Unbind driver on
> > > failure) left the PCI device in the kernel's topology upon device failure.
> > > However, this does not work well for the slot power off/on test cases.
> > > After a slot power off, we need to manually remove the PCI device
> > > before triggering the rescan, in order for the SSD to be rediscovered.
> > > 
> > > Fixes: ddf097ec1d44c9648c4738d7cf2819411b44253a
> > > Signed-off-by: Wei Zhang <wzhang at fb.com>
> > > Reviewed-by: Jens Axboe <axboe at fb.com>
> > > ---
> > >  drivers/nvme/host/pci.c | 15 +++++++++++++--
> > >  1 file changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 32a98e2..094b22f 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -2174,8 +2174,19 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
> > >  	struct pci_dev *pdev = to_pci_dev(dev->dev);
> > >  
> > >  	nvme_kill_queues(&dev->ctrl);
> > > -	if (pci_get_drvdata(pdev))
> > > -		device_release_driver(&pdev->dev);
> > > +
> > > +	/*
> > > +	 * Remove the PCI device from the topology tree if the device is no longer
> > > +	 * present.  Without removing, slot power off/on test cannot re-discover
> > > +	 * the SSD.
> > > +	 */
> > > +	if (pci_get_drvdata(pdev)) {
> > > +		if (!pci_device_is_present(pdev)) {
> > > +			pci_stop_and_remove_bus_device_locked(pdev);
> > > +		} else {
> > > +			device_release_driver(&pdev->dev);
> > > +		}
> > > +	}
> > >  	nvme_put_ctrl(&dev->ctrl);
> > >  }

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-05 16:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30 23:56 [PATCH] nvme: remove pci device if no longer present Wei Zhang
2017-07-02 15:31 ` Christoph Hellwig
2017-07-05 16:03   ` Keith Busch
2017-07-05 16:05     ` Keith Busch
2017-07-05 16:05       ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.