All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-pci: fix probe and remove race
@ 2019-07-19 19:42 Sagi Grimberg
  2019-07-20  7:52 ` Minwoo Im
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Sagi Grimberg @ 2019-07-19 19:42 UTC (permalink / raw)


It is possible that nvme_remove() being ran concurrently with
nvme_reset_work(), with following sequence:

nvme_probe()
  nvme_init_ctrl()
    //set to NEW
  nvme_async_probe()
                                                      nvme_remove()
                                                        //can not change to
                                                        //DELETING from NEW
    nvme_reset_ctrl_sync()
        nvme_reset_ctrl()
          //change from NEW
          //to RESETTING
                                                       flush reset_work()
                                                       //not yet queued
          queue reset_work
            nvme_reset_work()
              ....                                     ....

With the above running concurrently, then it is possible to cause some
strange issues, like kernel crash with illegal memory accessing
or something like:
kernel: pci 0000:00:1f.0: can't enable device: BAR 0
 [mem 0xc0000000-0xc0003fff] not claimed

Fix this by waiting for the async probe to complete before allowing
remove to make forward progress.

Reported-by: Li Zhong <lizhongfs at gmail.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4b508d5e45cf..50061abe49c6 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -127,6 +127,7 @@ struct nvme_dev {
 	dma_addr_t host_mem_descs_dma;
 	struct nvme_host_mem_buf_desc *host_mem_descs;
 	void **host_mem_desc_bufs;
+	async_cookie_t async_probe;
 };
 
 static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
@@ -2765,7 +2766,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
 	nvme_get_ctrl(&dev->ctrl);
-	async_schedule(nvme_async_probe, dev);
+	dev->async_probe = async_schedule(nvme_async_probe, dev);
 
 	return 0;
 
@@ -2810,6 +2811,8 @@ static void nvme_remove(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
+	/* wait for async probe to complete */
+	async_synchronize_cookie(dev->async_probe + 1);
 	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 	pci_set_drvdata(pdev, NULL);
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-19 19:42 [PATCH] nvme-pci: fix probe and remove race Sagi Grimberg
@ 2019-07-20  7:52 ` Minwoo Im
       [not found] ` <CAOSXXT7z4+pScQ+Kf0VauTCvPdRDEXX=H7jQN-Dkk=M2hkTFsA@mail.gmail.com>
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Minwoo Im @ 2019-07-20  7:52 UTC (permalink / raw)


On 19-07-19 12:42:56, Sagi Grimberg wrote:
> It is possible that nvme_remove() being ran concurrently with
> nvme_reset_work(), with following sequence:
> 
> nvme_probe()
>   nvme_init_ctrl()
>     //set to NEW
>   nvme_async_probe()
>                                                       nvme_remove()
>                                                         //can not change to
>                                                         //DELETING from NEW
>     nvme_reset_ctrl_sync()
>         nvme_reset_ctrl()
>           //change from NEW
>           //to RESETTING
>                                                        flush reset_work()
>                                                        //not yet queued
>           queue reset_work
>             nvme_reset_work()
>               ....                                     ....
> 
> With the above running concurrently, then it is possible to cause some
> strange issues, like kernel crash with illegal memory accessing
> or something like:
> kernel: pci 0000:00:1f.0: can't enable device: BAR 0
>  [mem 0xc0000000-0xc0003fff] not claimed
> 
> Fix this by waiting for the async probe to complete before allowing
> remove to make forward progress.
> 
> Reported-by: Li Zhong <lizhongfs at gmail.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

This looks good to me.

Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>

Thanks, Sagi.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
       [not found] ` <CAOSXXT7z4+pScQ+Kf0VauTCvPdRDEXX=H7jQN-Dkk=M2hkTFsA@mail.gmail.com>
@ 2019-07-22 18:18   ` Sagi Grimberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2019-07-22 18:18 UTC (permalink / raw)




On 7/20/19 7:46 PM, Keith Busch wrote:
> On Fri, Jul 19, 2019 at 1:43 PM Sagi Grimberg <sagi at grimberg.me 
> <mailto:sagi@grimberg.me>> wrote:
>  > It is possible that nvme_remove() being ran concurrently with
>  > nvme_reset_work(), with following sequence:
>  >
>  > nvme_probe()
>  > ? nvme_init_ctrl()
>  > ? ? //set to NEW
>  > ? nvme_async_probe()
>  > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? nvme_remove()
>  > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? //can not 
> change to
>  > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? //DELETING 
> from NEW
>  > ? ? nvme_reset_ctrl_sync()
>  > ? ? ? ? nvme_reset_ctrl()
>  > ? ? ? ? ? //change from NEW
>  > ? ? ? ? ? //to RESETTING
>  > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?flush reset_work()
>  > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?//not yet queued
>  > ? ? ? ? ? queue reset_work
>  > ? ? ? ? ? ? nvme_reset_work()
>  > ? ? ? ? ? ? ? .... ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ....
>  >
>  > With the above running concurrently, then it is possible to cause some
>  > strange issues, like kernel crash with illegal memory accessing
>  > or something like:
>  > kernel: pci 0000:00:1f.0: can't enable device: BAR 0
>  > ?[mem 0xc0000000-0xc0003fff] not claimed
>  >
>  > Fix this by waiting for the async probe to complete before allowing
>  > remove to make forward progress.
> 
> Hi Sagi,
> 
> The only problem is the async probe may be stuck if its device was hot 
> removed, and needs nvme_remove to make it unstuck immediately. Otherwise 
> it waits for timeout work to fix it up, which can be a very long time.

Given that this is an error case, I don't think that the timeout is a
big issue.

> We only really need to wait for the state to not be new? And if so, is 
> the attached ok?

I don't have a firm reproducer, I was hoping that Li would have since
he reported the problem.

This would work because nvme_dev_disable is serialized with the
shutdown_lock...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-19 19:42 [PATCH] nvme-pci: fix probe and remove race Sagi Grimberg
  2019-07-20  7:52 ` Minwoo Im
       [not found] ` <CAOSXXT7z4+pScQ+Kf0VauTCvPdRDEXX=H7jQN-Dkk=M2hkTFsA@mail.gmail.com>
@ 2019-07-22 18:26 ` Bart Van Assche
  2019-07-22 22:09   ` Sagi Grimberg
  2019-07-23 20:46 ` Keith Busch
  3 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-07-22 18:26 UTC (permalink / raw)


On 7/19/19 12:42 PM, Sagi Grimberg wrote:
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4b508d5e45cf..50061abe49c6 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -127,6 +127,7 @@ struct nvme_dev {
>   	dma_addr_t host_mem_descs_dma;
>   	struct nvme_host_mem_buf_desc *host_mem_descs;
>   	void **host_mem_desc_bufs;
> +	async_cookie_t async_probe;
>   };
>   
>   static int io_queue_depth_set(const char *val, const struct kernel_param *kp)
> @@ -2765,7 +2766,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>   
>   	nvme_get_ctrl(&dev->ctrl);
> -	async_schedule(nvme_async_probe, dev);
> +	dev->async_probe = async_schedule(nvme_async_probe, dev);
>   
>   	return 0;
>   
> @@ -2810,6 +2811,8 @@ static void nvme_remove(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
>   
> +	/* wait for async probe to complete */
> +	async_synchronize_cookie(dev->async_probe + 1);
>   	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
>   	pci_set_drvdata(pdev, NULL);

Hi Sagi,

Does the async_synchronize_cookie() call wait until all previously 
started probes have finished? In other words, does the change in 
nvme_remove() introduce a dependency between probe and remove calls of 
different NVMe devices? Is that dependency important? If not, can it be 
avoided to introduce that dependency?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-22 18:26 ` Bart Van Assche
@ 2019-07-22 22:09   ` Sagi Grimberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2019-07-22 22:09 UTC (permalink / raw)



>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 4b508d5e45cf..50061abe49c6 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -127,6 +127,7 @@ struct nvme_dev {
>> ????? dma_addr_t host_mem_descs_dma;
>> ????? struct nvme_host_mem_buf_desc *host_mem_descs;
>> ????? void **host_mem_desc_bufs;
>> +??? async_cookie_t async_probe;
>> ? };
>> ? static int io_queue_depth_set(const char *val, const struct 
>> kernel_param *kp)
>> @@ -2765,7 +2766,7 @@ static int nvme_probe(struct pci_dev *pdev, 
>> const struct pci_device_id *id)
>> ????? dev_info(dev->ctrl.device, "pci function %s\n", 
>> dev_name(&pdev->dev));
>> ????? nvme_get_ctrl(&dev->ctrl);
>> -??? async_schedule(nvme_async_probe, dev);
>> +??? dev->async_probe = async_schedule(nvme_async_probe, dev);
>> ????? return 0;
>> @@ -2810,6 +2811,8 @@ static void nvme_remove(struct pci_dev *pdev)
>> ? {
>> ????? struct nvme_dev *dev = pci_get_drvdata(pdev);
>> +??? /* wait for async probe to complete */
>> +??? async_synchronize_cookie(dev->async_probe + 1);
>> ????? nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
>> ????? pci_set_drvdata(pdev, NULL);
> 
> Hi Sagi,
> 
> Does the async_synchronize_cookie() call wait until all previously 
> started probes have finished? In other words, does the change in 
> nvme_remove() introduce a dependency between probe and remove calls of 
> different NVMe devices? Is that dependency important? If not, can it be 
> avoided to introduce that dependency?

It does create dependency as they are in the same domain I assume. Given
that nvme_remove() is really a hot-unplug operation (or shutdown) I am
not sure if this dependency is a limitation in any real use-case
though..

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-19 19:42 [PATCH] nvme-pci: fix probe and remove race Sagi Grimberg
                   ` (2 preceding siblings ...)
  2019-07-22 18:26 ` Bart Van Assche
@ 2019-07-23 20:46 ` Keith Busch
  2019-07-23 22:21   ` Sagi Grimberg
  3 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2019-07-23 20:46 UTC (permalink / raw)


On Fri, Jul 19, 2019@12:42:56PM -0700, Sagi Grimberg wrote:
> It is possible that nvme_remove() being ran concurrently with
> nvme_reset_work(), with following sequence:
> 
> nvme_probe()
>   nvme_init_ctrl()
>     //set to NEW
>   nvme_async_probe()
>                                                       nvme_remove()
>                                                         //can not change to
>                                                         //DELETING from NEW
>     nvme_reset_ctrl_sync()
>         nvme_reset_ctrl()
>           //change from NEW
>           //to RESETTING
>                                                        flush reset_work()
>                                                        //not yet queued
>           queue reset_work
>             nvme_reset_work()
>               ....                                     ....
> 
> With the above running concurrently, then it is possible to cause some
> strange issues, like kernel crash with illegal memory accessing
> or something like:
> kernel: pci 0000:00:1f.0: can't enable device: BAR 0
>  [mem 0xc0000000-0xc0003fff] not claimed
> 
> Fix this by waiting for the async probe to complete before allowing
> remove to make forward progress.
> 
> Reported-by: Li Zhong <lizhongfs at gmail.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

I still think we'd prefer not adding that async domain dependency and
relying on timeout to unstuck a hot-removal. So how about we schedule
the reset work in probe and have the async part just flush the reset
and scan work?

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..0c2c4b0c6655 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2695,7 +2695,7 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
 {
 	struct nvme_dev *dev = data;
 
-	nvme_reset_ctrl_sync(&dev->ctrl);
+	flush_work(&dev->ctrl.reset_work);
 	flush_work(&dev->ctrl.scan_work);
 	nvme_put_ctrl(&dev->ctrl);
 }
@@ -2761,6 +2761,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
 
+	nvme_reset_ctrl(&dev->ctrl);
 	nvme_get_ctrl(&dev->ctrl);
 	async_schedule(nvme_async_probe, dev);
 
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-23 20:46 ` Keith Busch
@ 2019-07-23 22:21   ` Sagi Grimberg
  2019-07-23 22:31     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Sagi Grimberg @ 2019-07-23 22:21 UTC (permalink / raw)



> I still think we'd prefer not adding that async domain dependency and
> relying on timeout to unstuck a hot-removal. So how about we schedule
> the reset work in probe and have the async part just flush the reset
> and scan work?
> 
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index db160cee42ad..0c2c4b0c6655 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2695,7 +2695,7 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
>   {
>   	struct nvme_dev *dev = data;
>   
> -	nvme_reset_ctrl_sync(&dev->ctrl);
> +	flush_work(&dev->ctrl.reset_work);
>   	flush_work(&dev->ctrl.scan_work);
>   	nvme_put_ctrl(&dev->ctrl);
>   }
> @@ -2761,6 +2761,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   
>   	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
>   
> +	nvme_reset_ctrl(&dev->ctrl);
>   	nvme_get_ctrl(&dev->ctrl);

I think you need to get the ref first and then fire the work right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-23 22:21   ` Sagi Grimberg
@ 2019-07-23 22:31     ` Keith Busch
  2019-07-29 22:17       ` Sagi Grimberg
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2019-07-23 22:31 UTC (permalink / raw)


On Tue, Jul 23, 2019@03:21:49PM -0700, Sagi Grimberg wrote:
> 
> > I still think we'd prefer not adding that async domain dependency and
> > relying on timeout to unstuck a hot-removal. So how about we schedule
> > the reset work in probe and have the async part just flush the reset
> > and scan work?
> > 
> > ---
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index db160cee42ad..0c2c4b0c6655 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2695,7 +2695,7 @@ static void nvme_async_probe(void *data, async_cookie_t cookie)
> >   {
> >   	struct nvme_dev *dev = data;
> > -	nvme_reset_ctrl_sync(&dev->ctrl);
> > +	flush_work(&dev->ctrl.reset_work);
> >   	flush_work(&dev->ctrl.scan_work);
> >   	nvme_put_ctrl(&dev->ctrl);
> >   }
> > @@ -2761,6 +2761,7 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >   	dev_info(dev->ctrl.device, "pci function %s\n", dev_name(&pdev->dev));
> > +	nvme_reset_ctrl(&dev->ctrl);
> >   	nvme_get_ctrl(&dev->ctrl);
> 
> I think you need to get the ref first and then fire the work right?

That ref order doesn't actually matter here. We can't call nvme_remove
during the synchronous part of probe, and the extra ref is just for the
async_schedule callback.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] nvme-pci: fix probe and remove race
  2019-07-23 22:31     ` Keith Busch
@ 2019-07-29 22:17       ` Sagi Grimberg
  0 siblings, 0 replies; 9+ messages in thread
From: Sagi Grimberg @ 2019-07-29 22:17 UTC (permalink / raw)



>> I think you need to get the ref first and then fire the work right?
> 
> That ref order doesn't actually matter here. We can't call nvme_remove
> during the synchronous part of probe, and the extra ref is just for the
> async_schedule callback.

Makes sense,

Can you send a proper patch?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-29 22:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-19 19:42 [PATCH] nvme-pci: fix probe and remove race Sagi Grimberg
2019-07-20  7:52 ` Minwoo Im
     [not found] ` <CAOSXXT7z4+pScQ+Kf0VauTCvPdRDEXX=H7jQN-Dkk=M2hkTFsA@mail.gmail.com>
2019-07-22 18:18   ` Sagi Grimberg
2019-07-22 18:26 ` Bart Van Assche
2019-07-22 22:09   ` Sagi Grimberg
2019-07-23 20:46 ` Keith Busch
2019-07-23 22:21   ` Sagi Grimberg
2019-07-23 22:31     ` Keith Busch
2019-07-29 22:17       ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.