All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme/pci: Use async_schedule for initial reset work
@ 2018-04-27 21:17 Keith Busch
  2018-04-28  9:11 ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2018-04-27 21:17 UTC (permalink / raw)


This patch schedules the initial controller reset in an async_domain so
that it can be synchronized from wait_for_device_probe(). This way the
kernel waits for the first nvme controller initialization to complete
for all devices before proceeding with the boot sequence, which may have
nvme dependencies.

Reported-by: Mikulas Patocka <mpatocka at redhat.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a2f3ad105620..9e04c7aba946 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -13,6 +13,7 @@
  */
 
 #include <linux/aer.h>
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
@@ -2489,6 +2490,12 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
 	return 0;
 }
 
+static void nvme_async_probe(void *data, async_cookie_t cookie)
+{
+	struct nvme_dev *dev = data;
+	nvme_reset_ctrl_sync(&dev->ctrl);
+}
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int node, result = -ENOMEM;
@@ -2533,7 +2540,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
-	nvme_reset_ctrl(&dev->ctrl);
+	async_schedule(nvme_async_probe, dev);
 
 	return 0;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-04-27 21:17 [PATCH] nvme/pci: Use async_schedule for initial reset work Keith Busch
@ 2018-04-28  9:11 ` Ming Lei
  2018-04-30 19:45   ` Keith Busch
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2018-04-28  9:11 UTC (permalink / raw)


On Sat, Apr 28, 2018@5:17 AM, Keith Busch <keith.busch@intel.com> wrote:
> This patch schedules the initial controller reset in an async_domain so
> that it can be synchronized from wait_for_device_probe(). This way the
> kernel waits for the first nvme controller initialization to complete
> for all devices before proceeding with the boot sequence, which may have
> nvme dependencies.
>
> Reported-by: Mikulas Patocka <mpatocka at redhat.com>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/pci.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index a2f3ad105620..9e04c7aba946 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -13,6 +13,7 @@
>   */
>
>  #include <linux/aer.h>
> +#include <linux/async.h>
>  #include <linux/blkdev.h>
>  #include <linux/blk-mq.h>
>  #include <linux/blk-mq-pci.h>
> @@ -2489,6 +2490,12 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
>         return 0;
>  }
>
> +static void nvme_async_probe(void *data, async_cookie_t cookie)
> +{
> +       struct nvme_dev *dev = data;
> +       nvme_reset_ctrl_sync(&dev->ctrl);
> +}
> +
>  static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>         int node, result = -ENOMEM;
> @@ -2533,7 +2540,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>
>         dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>
> -       nvme_reset_ctrl(&dev->ctrl);
> +       async_schedule(nvme_async_probe, dev);
>
>         return 0;

Looks fine,

Reviewed-by: Ming Lei <ming.lei at redhat.com>

-- 
Ming Lei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-04-28  9:11 ` Ming Lei
@ 2018-04-30 19:45   ` Keith Busch
  2018-05-01 23:33     ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2018-04-30 19:45 UTC (permalink / raw)


On Sat, Apr 28, 2018@05:11:18PM +0800, Ming Lei wrote:
> Looks fine,
> 
> Reviewed-by: Ming Lei <ming.lei at redhat.com>

Thanks, Ming.

Mikulas, would you be able to test this and confirm it works for you?
This appears successful in my testing, but want to hear from the source
if possible.

Thanks,
Keith

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-04-30 19:45   ` Keith Busch
@ 2018-05-01 23:33     ` Mikulas Patocka
  2018-05-02 15:29       ` Keith Busch
  0 siblings, 1 reply; 13+ messages in thread
From: Mikulas Patocka @ 2018-05-01 23:33 UTC (permalink / raw)




On Mon, 30 Apr 2018, Keith Busch wrote:

> On Sat, Apr 28, 2018@05:11:18PM +0800, Ming Lei wrote:
> > Looks fine,
> > 
> > Reviewed-by: Ming Lei <ming.lei at redhat.com>
> 
> Thanks, Ming.
> 
> Mikulas, would you be able to test this and confirm it works for you?
> This appears successful in my testing, but want to hear from the source
> if possible.
> 
> Thanks,
> Keith

The patch is not correct - scan_work is still called from a workqueue and 
if it's too slow, the nvme device is not found when mounting root.

You can add msleep(10000) at the beginning of nvme_scan_work to test for 
the race condition on your system.

Here I submit the corrected patch - I added 
flush_work(&dev->ctrl.scan_work) to nvme_async_probe

Mikulas



This patch schedules the initial controller reset in an async_domain so
that it can be synchronized from wait_for_device_probe(). This way the
kernel waits for the first nvme controller initialization to complete
for all devices before proceeding with the boot sequence, which may have
nvme dependencies.

Reported-by: Mikulas Patocka <mpatocka at redhat.com>
Tested-by: Mikulas Patocka <mpatocka at redhat.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: linux-4.16.6/drivers/nvme/host/pci.c
===================================================================
--- linux-4.16.6.orig/drivers/nvme/host/pci.c	2018-05-02 01:18:01.000000000 +0200
+++ linux-4.16.6/drivers/nvme/host/pci.c	2018-05-02 01:18:01.000000000 +0200
@@ -13,6 +13,7 @@
  */
 
 #include <linux/aer.h>
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
@@ -2471,6 +2472,13 @@ static unsigned long check_vendor_combin
 	return 0;
 }
 
+static void nvme_async_probe(void *data, async_cookie_t cookie)
+{
+	struct nvme_dev *dev = data;
+	nvme_reset_ctrl_sync(&dev->ctrl);
+	flush_work(&dev->ctrl.scan_work);
+}
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int node, result = -ENOMEM;
@@ -2515,7 +2523,7 @@ static int nvme_probe(struct pci_dev *pd
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
-	nvme_reset_ctrl(&dev->ctrl);
+	async_schedule(nvme_async_probe, dev);
 
 	return 0;
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-05-01 23:33     ` Mikulas Patocka
@ 2018-05-02 15:29       ` Keith Busch
  2018-05-03 14:55           ` Mikulas Patocka
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2018-05-02 15:29 UTC (permalink / raw)


On Tue, May 01, 2018@07:33:10PM -0400, Mikulas Patocka wrote:
> On Mon, 30 Apr 2018, Keith Busch wrote:
> 
> > On Sat, Apr 28, 2018@05:11:18PM +0800, Ming Lei wrote:
> > > Looks fine,
> > > 
> > > Reviewed-by: Ming Lei <ming.lei at redhat.com>
> > 
> > Thanks, Ming.
> > 
> > Mikulas, would you be able to test this and confirm it works for you?
> > This appears successful in my testing, but want to hear from the source
> > if possible.
> > 
> > Thanks,
> > Keith
> 
> The patch is not correct - scan_work is still called from a workqueue and 
> if it's too slow, the nvme device is not found when mounting root.
> 
> You can add msleep(10000) at the beginning of nvme_scan_work to test for 
> the race condition on your system.
> 
> Here I submit the corrected patch - I added 
> flush_work(&dev->ctrl.scan_work) to nvme_async_probe

Roger that. Will incorporate your adjustment and add your
Tested-by. Thanks for the confirmation.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-05-02 15:29       ` Keith Busch
@ 2018-05-03 14:55           ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2018-05-03 14:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, Ming Lei, Christoph Hellwig, linux-nvme,
	Sagi Grimberg, linux-pci, Bjorn Helgaas



On Wed, 2 May 2018, Keith Busch wrote:

> On Tue, May 01, 2018 at 07:33:10PM -0400, Mikulas Patocka wrote:
> > On Mon, 30 Apr 2018, Keith Busch wrote:
> > 
> > > On Sat, Apr 28, 2018 at 05:11:18PM +0800, Ming Lei wrote:
> > > > Looks fine,
> > > > 
> > > > Reviewed-by: Ming Lei <ming.lei@redhat.com>
> > > 
> > > Thanks, Ming.
> > > 
> > > Mikulas, would you be able to test this and confirm it works for you?
> > > This appears successful in my testing, but want to hear from the source
> > > if possible.
> > > 
> > > Thanks,
> > > Keith
> > 
> > The patch is not correct - scan_work is still called from a workqueue and 
> > if it's too slow, the nvme device is not found when mounting root.
> > 
> > You can add msleep(10000) at the beginning of nvme_scan_work to test for 
> > the race condition on your system.
> > 
> > Here I submit the corrected patch - I added 
> > flush_work(&dev->ctrl.scan_work) to nvme_async_probe
> 
> Roger that. Will incorporate your adjustment and add your
> Tested-by. Thanks for the confirmation.

I think there is still one more bug:

If nvme_probe is called, it schedules the asynchronous work using 
async_schedule - now suppose that the pci system calls the "remove", 
"shutdown" or "suspend" method - this method will race with 
nvme_async_probe running in the async domain - that will cause 
misbehavior.

Or - does the PCI subsystem flush the async queues before calling these 
methods? I'm not sure, but it doesn't seem so.

I think, you need to save the cookie returned by async_schedule and wait 
for this cookie with async_synchronize_cookie in the other methods.

Mikulas


This patch schedules the initial controller reset in an async_domain so
that it can be synchronized from wait_for_device_probe(). This way the
kernel waits for the first nvme controller initialization to complete
for all devices before proceeding with the boot sequence, which may have
nvme dependencies.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/nvme/host/pci.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Index: linux-4.16.6/drivers/nvme/host/pci.c
===================================================================
--- linux-4.16.6.orig/drivers/nvme/host/pci.c	2018-05-03 16:43:01.000000000 +0200
+++ linux-4.16.6/drivers/nvme/host/pci.c	2018-05-03 16:46:35.000000000 +0200
@@ -13,6 +13,7 @@
  */
 
 #include <linux/aer.h>
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
@@ -111,6 +112,8 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	async_cookie_t probe_cookie;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2471,6 +2474,13 @@ static unsigned long check_vendor_combin
 	return 0;
 }
 
+static void nvme_async_probe(void *data, async_cookie_t cookie)
+{
+	struct nvme_dev *dev = data;
+	nvme_reset_ctrl_sync(&dev->ctrl);
+	flush_work(&dev->ctrl.scan_work);
+}
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int node, result = -ENOMEM;
@@ -2515,7 +2525,7 @@ static int nvme_probe(struct pci_dev *pd
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
-	nvme_reset_ctrl(&dev->ctrl);
+	dev->probe_cookie = async_schedule(nvme_async_probe, dev);
 
 	return 0;
 
@@ -2546,6 +2556,9 @@ static void nvme_reset_done(struct pci_d
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_dev_disable(dev, true);
 }
 
@@ -2558,6 +2571,8 @@ static void nvme_remove(struct pci_dev *
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 
 	cancel_work_sync(&dev->ctrl.reset_work);
@@ -2605,6 +2620,8 @@ static int nvme_suspend(struct device *d
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_dev_disable(ndev, true);
 	return 0;
 }
@@ -2614,6 +2631,8 @@ static int nvme_resume(struct device *de
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_reset_ctrl(&ndev->ctrl);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
@ 2018-05-03 14:55           ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2018-05-03 14:55 UTC (permalink / raw)




On Wed, 2 May 2018, Keith Busch wrote:

> On Tue, May 01, 2018@07:33:10PM -0400, Mikulas Patocka wrote:
> > On Mon, 30 Apr 2018, Keith Busch wrote:
> > 
> > > On Sat, Apr 28, 2018@05:11:18PM +0800, Ming Lei wrote:
> > > > Looks fine,
> > > > 
> > > > Reviewed-by: Ming Lei <ming.lei at redhat.com>
> > > 
> > > Thanks, Ming.
> > > 
> > > Mikulas, would you be able to test this and confirm it works for you?
> > > This appears successful in my testing, but want to hear from the source
> > > if possible.
> > > 
> > > Thanks,
> > > Keith
> > 
> > The patch is not correct - scan_work is still called from a workqueue and 
> > if it's too slow, the nvme device is not found when mounting root.
> > 
> > You can add msleep(10000) at the beginning of nvme_scan_work to test for 
> > the race condition on your system.
> > 
> > Here I submit the corrected patch - I added 
> > flush_work(&dev->ctrl.scan_work) to nvme_async_probe
> 
> Roger that. Will incorporate your adjustment and add your
> Tested-by. Thanks for the confirmation.

I think there is still one more bug:

If nvme_probe is called, it schedules the asynchronous work using 
async_schedule - now suppose that the pci system calls the "remove", 
"shutdown" or "suspend" method - this method will race with 
nvme_async_probe running in the async domain - that will cause 
misbehavior.

Or - does the PCI subsystem flush the async queues before calling these 
methods? I'm not sure, but it doesn't seem so.

I think, you need to save the cookie returned by async_schedule and wait 
for this cookie with async_synchronize_cookie in the other methods.

Mikulas


This patch schedules the initial controller reset in an async_domain so
that it can be synchronized from wait_for_device_probe(). This way the
kernel waits for the first nvme controller initialization to complete
for all devices before proceeding with the boot sequence, which may have
nvme dependencies.

Reported-by: Mikulas Patocka <mpatocka at redhat.com>
Tested-by: Mikulas Patocka <mpatocka at redhat.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>

---
 drivers/nvme/host/pci.c |   21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

Index: linux-4.16.6/drivers/nvme/host/pci.c
===================================================================
--- linux-4.16.6.orig/drivers/nvme/host/pci.c	2018-05-03 16:43:01.000000000 +0200
+++ linux-4.16.6/drivers/nvme/host/pci.c	2018-05-03 16:46:35.000000000 +0200
@@ -13,6 +13,7 @@
  */
 
 #include <linux/aer.h>
+#include <linux/async.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/blk-mq-pci.h>
@@ -111,6 +112,8 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+
+	async_cookie_t probe_cookie;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2471,6 +2474,13 @@ static unsigned long check_vendor_combin
 	return 0;
 }
 
+static void nvme_async_probe(void *data, async_cookie_t cookie)
+{
+	struct nvme_dev *dev = data;
+	nvme_reset_ctrl_sync(&dev->ctrl);
+	flush_work(&dev->ctrl.scan_work);
+}
+
 static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	int node, result = -ENOMEM;
@@ -2515,7 +2525,7 @@ static int nvme_probe(struct pci_dev *pd
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
-	nvme_reset_ctrl(&dev->ctrl);
+	dev->probe_cookie = async_schedule(nvme_async_probe, dev);
 
 	return 0;
 
@@ -2546,6 +2556,9 @@ static void nvme_reset_done(struct pci_d
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_dev_disable(dev, true);
 }
 
@@ -2558,6 +2571,8 @@ static void nvme_remove(struct pci_dev *
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 
 	cancel_work_sync(&dev->ctrl.reset_work);
@@ -2605,6 +2620,8 @@ static int nvme_suspend(struct device *d
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_dev_disable(ndev, true);
 	return 0;
 }
@@ -2614,6 +2631,8 @@ static int nvme_resume(struct device *de
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+	async_synchronize_cookie(dev->probe_cookie);
+
 	nvme_reset_ctrl(&ndev->ctrl);
 	return 0;
 }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-05-03 14:55           ` Mikulas Patocka
@ 2018-05-03 20:15             ` Keith Busch
  -1 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2018-05-03 20:15 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Sagi Grimberg, Ming Lei, linux-nvme, Keith Busch, linux-pci,
	Bjorn Helgaas, Christoph Hellwig

On Thu, May 03, 2018 at 10:55:09AM -0400, Mikulas Patocka wrote:
> I think there is still one more bug:
> 
> If nvme_probe is called, it schedules the asynchronous work using 
> async_schedule - now suppose that the pci system calls the "remove", 
> "shutdown" or "suspend" method - this method will race with 
> nvme_async_probe running in the async domain - that will cause 
> misbehavior.
> 
> Or - does the PCI subsystem flush the async queues before calling these 
> methods? I'm not sure, but it doesn't seem so.
> 
> I think, you need to save the cookie returned by async_schedule and wait 
> for this cookie with async_synchronize_cookie in the other methods.

I think we're fine as-is without syncing the cookie. 

The remove path should be fine since we already sync with the necessary
work queues.

The shutdown, suspend and reset paths will just cause the initial reset
work to end early, the same result as what previously would happen.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
@ 2018-05-03 20:15             ` Keith Busch
  0 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2018-05-03 20:15 UTC (permalink / raw)


On Thu, May 03, 2018@10:55:09AM -0400, Mikulas Patocka wrote:
> I think there is still one more bug:
> 
> If nvme_probe is called, it schedules the asynchronous work using 
> async_schedule - now suppose that the pci system calls the "remove", 
> "shutdown" or "suspend" method - this method will race with 
> nvme_async_probe running in the async domain - that will cause 
> misbehavior.
> 
> Or - does the PCI subsystem flush the async queues before calling these 
> methods? I'm not sure, but it doesn't seem so.
> 
> I think, you need to save the cookie returned by async_schedule and wait 
> for this cookie with async_synchronize_cookie in the other methods.

I think we're fine as-is without syncing the cookie. 

The remove path should be fine since we already sync with the necessary
work queues.

The shutdown, suspend and reset paths will just cause the initial reset
work to end early, the same result as what previously would happen.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-05-03 20:15             ` Keith Busch
@ 2018-05-03 20:45               ` Mikulas Patocka
  -1 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2018-05-03 20:45 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, Ming Lei, linux-nvme, Keith Busch, linux-pci,
	Bjorn Helgaas, Christoph Hellwig



On Thu, 3 May 2018, Keith Busch wrote:

> On Thu, May 03, 2018 at 10:55:09AM -0400, Mikulas Patocka wrote:
> > I think there is still one more bug:
> > 
> > If nvme_probe is called, it schedules the asynchronous work using 
> > async_schedule - now suppose that the pci system calls the "remove", 
> > "shutdown" or "suspend" method - this method will race with 
> > nvme_async_probe running in the async domain - that will cause 
> > misbehavior.
> > 
> > Or - does the PCI subsystem flush the async queues before calling these 
> > methods? I'm not sure, but it doesn't seem so.
> > 
> > I think, you need to save the cookie returned by async_schedule and wait 
> > for this cookie with async_synchronize_cookie in the other methods.
> 
> I think we're fine as-is without syncing the cookie. 
> 
> The remove path should be fine since we already sync with the necessary
> work queues.
> 
> The shutdown, suspend and reset paths will just cause the initial reset
> work to end early, the same result as what previously would happen.

Suppose this:
task 1: nvme_probe
task 1: calls async_schedule(nvme_async_probe), that queues the work for 
	task 2
task 1: exits (so the device is active from pci subsystem's point of view)
task 3: the pci subsystem calls nvme_remove
task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the 
	work item hasn't started yet)
task 3: nvme_remove does all the remaining work
task 3: frees the device
task 3: exists nvme_remove
task 2: (in the async domain) runs nvme_async_probe
task 2: calls nvme_reset_ctrl_sync
task 2: nvme_reset_ctrl
task 2: calls nvme_change_ctrl_state and queue_work - on a structure that 
	was already freed by nvme_remove

This bug is rare - but it may happen if the user too quickly activates and 
deactivates the device by writing to sysfs.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
@ 2018-05-03 20:45               ` Mikulas Patocka
  0 siblings, 0 replies; 13+ messages in thread
From: Mikulas Patocka @ 2018-05-03 20:45 UTC (permalink / raw)




On Thu, 3 May 2018, Keith Busch wrote:

> On Thu, May 03, 2018@10:55:09AM -0400, Mikulas Patocka wrote:
> > I think there is still one more bug:
> > 
> > If nvme_probe is called, it schedules the asynchronous work using 
> > async_schedule - now suppose that the pci system calls the "remove", 
> > "shutdown" or "suspend" method - this method will race with 
> > nvme_async_probe running in the async domain - that will cause 
> > misbehavior.
> > 
> > Or - does the PCI subsystem flush the async queues before calling these 
> > methods? I'm not sure, but it doesn't seem so.
> > 
> > I think, you need to save the cookie returned by async_schedule and wait 
> > for this cookie with async_synchronize_cookie in the other methods.
> 
> I think we're fine as-is without syncing the cookie. 
> 
> The remove path should be fine since we already sync with the necessary
> work queues.
> 
> The shutdown, suspend and reset paths will just cause the initial reset
> work to end early, the same result as what previously would happen.

Suppose this:
task 1: nvme_probe
task 1: calls async_schedule(nvme_async_probe), that queues the work for 
	task 2
task 1: exits (so the device is active from pci subsystem's point of view)
task 3: the pci subsystem calls nvme_remove
task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the 
	work item hasn't started yet)
task 3: nvme_remove does all the remaining work
task 3: frees the device
task 3: exists nvme_remove
task 2: (in the async domain) runs nvme_async_probe
task 2: calls nvme_reset_ctrl_sync
task 2: nvme_reset_ctrl
task 2: calls nvme_change_ctrl_state and queue_work - on a structure that 
	was already freed by nvme_remove

This bug is rare - but it may happen if the user too quickly activates and 
deactivates the device by writing to sysfs.

Mikulas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme/pci: Use async_schedule for initial reset work
  2018-05-03 20:45               ` Mikulas Patocka
@ 2018-05-03 21:05                 ` Keith Busch
  -1 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2018-05-03 21:05 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Sagi Grimberg, Ming Lei, linux-nvme, Keith Busch, linux-pci,
	Bjorn Helgaas, Christoph Hellwig

On Thu, May 03, 2018 at 04:45:22PM -0400, Mikulas Patocka wrote:
> Suppose this:
> task 1: nvme_probe
> task 1: calls async_schedule(nvme_async_probe), that queues the work for 
> 	task 2
> task 1: exits (so the device is active from pci subsystem's point of view)
> task 3: the pci subsystem calls nvme_remove
> task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
> task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the 
> 	work item hasn't started yet)
> task 3: nvme_remove does all the remaining work
> task 3: frees the device
> task 3: exists nvme_remove
> task 2: (in the async domain) runs nvme_async_probe
> task 2: calls nvme_reset_ctrl_sync
> task 2: nvme_reset_ctrl
> task 2: calls nvme_change_ctrl_state and queue_work - on a structure that 
> 	was already freed by nvme_remove
> 
> This bug is rare - but it may happen if the user too quickly activates and 
> deactivates the device by writing to sysfs.

Okay, I think I see your point. Pairing a nvme_get_ctrl with a
nvme_put_ctrl should fix that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] nvme/pci: Use async_schedule for initial reset work
@ 2018-05-03 21:05                 ` Keith Busch
  0 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2018-05-03 21:05 UTC (permalink / raw)


On Thu, May 03, 2018@04:45:22PM -0400, Mikulas Patocka wrote:
> Suppose this:
> task 1: nvme_probe
> task 1: calls async_schedule(nvme_async_probe), that queues the work for 
> 	task 2
> task 1: exits (so the device is active from pci subsystem's point of view)
> task 3: the pci subsystem calls nvme_remove
> task 3: calls nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
> task 3: cancel_work_sync(&dev->ctrl.reset_work); (does nothing because the 
> 	work item hasn't started yet)
> task 3: nvme_remove does all the remaining work
> task 3: frees the device
> task 3: exists nvme_remove
> task 2: (in the async domain) runs nvme_async_probe
> task 2: calls nvme_reset_ctrl_sync
> task 2: nvme_reset_ctrl
> task 2: calls nvme_change_ctrl_state and queue_work - on a structure that 
> 	was already freed by nvme_remove
> 
> This bug is rare - but it may happen if the user too quickly activates and 
> deactivates the device by writing to sysfs.

Okay, I think I see your point. Pairing a nvme_get_ctrl with a
nvme_put_ctrl should fix that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-05-03 21:05 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-27 21:17 [PATCH] nvme/pci: Use async_schedule for initial reset work Keith Busch
2018-04-28  9:11 ` Ming Lei
2018-04-30 19:45   ` Keith Busch
2018-05-01 23:33     ` Mikulas Patocka
2018-05-02 15:29       ` Keith Busch
2018-05-03 14:55         ` Mikulas Patocka
2018-05-03 14:55           ` Mikulas Patocka
2018-05-03 20:15           ` Keith Busch
2018-05-03 20:15             ` Keith Busch
2018-05-03 20:45             ` Mikulas Patocka
2018-05-03 20:45               ` Mikulas Patocka
2018-05-03 21:05               ` Keith Busch
2018-05-03 21:05                 ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.