[PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-01 15:42 ` Wenbo Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-01 15:42 UTC (permalink / raw)
  To: axboe, keith.busch; +Cc: linux-kernel, Wenbo Wang, Wenbo Wang, linux-nvme

If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
could have been suspended and dev->bar could have been
unmapped. Do not touch sq door bell in this case.

Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com>
Reviewed-by: Wenwei Tao <wenwei.tao@memblaze.com>
CC: linux-nvme@lists.infradead.org
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8b1a725..2288712 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
 
 	if (++tail == nvmeq->q_depth)
 		tail = 0;
-	writel(tail, nvmeq->q_db);
+	if (likely(nvmeq->cq_vector >= 0))
+		writel(tail, nvmeq->q_db);
 	nvmeq->sq_tail = tail;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-01 15:42 ` Wenbo Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-01 15:42 UTC (permalink / raw)


If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
could have been suspended and dev->bar could have been
unmapped. Do not touch sq door bell in this case.

Signed-off-by: Wenbo Wang <wenbo.wang at memblaze.com>
Reviewed-by: Wenwei Tao <wenwei.tao at memblaze.com>
CC: linux-nvme at lists.infradead.org
---
 drivers/nvme/host/pci.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8b1a725..2288712 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
 
 	if (++tail == nvmeq->q_depth)
 		tail = 0;
-	writel(tail, nvmeq->q_db);
+	if (likely(nvmeq->cq_vector >= 0))
+		writel(tail, nvmeq->q_db);
 	nvmeq->sq_tail = tail;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-01 15:42 ` Wenbo Wang
@ 2016-02-01 15:59   ` Busch, Keith
  -1 siblings, 0 replies; 36+ messages in thread
From: Busch, Keith @ 2016-02-01 15:59 UTC (permalink / raw)
  To: Wenbo Wang, axboe; +Cc: linux-kernel, Wenbo Wang, linux-nvme

Does this ever happen? The queue should be stopped before the bar is unmapped. If that's insufficient to guard against this, we've another problem this patch does not cover. That command will just timeout since it was accepted by the driver, but not actually submitted to anything. The request needs to be requeued or return BLK_MQ_RQ_QUEUE_BUSY.

> -----Original Message-----
> From: Wenbo Wang [mailto:mail_weber_wang@163.com]
> Sent: Monday, February 01, 2016 8:42 AM
> To: axboe@fb.com; Busch, Keith
> Cc: linux-kernel@vger.kernel.org; Wenbo Wang; Wenbo Wang; linux-nvme@lists.infradead.org
> Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
> 
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
> could have been suspended and dev->bar could have been
> unmapped. Do not touch sq door bell in this case.
> 
> Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao@memblaze.com>
> CC: linux-nvme@lists.infradead.org
> ---
>  drivers/nvme/host/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
> 
>  	if (++tail == nvmeq->q_depth)
>  		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>  	nvmeq->sq_tail = tail;
>  }
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-01 15:59   ` Busch, Keith
  0 siblings, 0 replies; 36+ messages in thread
From: Busch, Keith @ 2016-02-01 15:59 UTC (permalink / raw)


Does this ever happen? The queue should be stopped before the bar is unmapped. If that's insufficient to guard against this, we've another problem this patch does not cover. That command will just timeout since it was accepted by the driver, but not actually submitted to anything. The request needs to be requeued or return BLK_MQ_RQ_QUEUE_BUSY.

> -----Original Message-----
> From: Wenbo Wang [mailto:mail_weber_wang at 163.com]
> Sent: Monday, February 01, 2016 8:42 AM
> To: axboe at fb.com; Busch, Keith
> Cc: linux-kernel at vger.kernel.org; Wenbo Wang; Wenbo Wang; linux-nvme at lists.infradead.org
> Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
> 
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
> could have been suspended and dev->bar could have been
> unmapped. Do not touch sq door bell in this case.
> 
> Signed-off-by: Wenbo Wang <wenbo.wang at memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao at memblaze.com>
> CC: linux-nvme at lists.infradead.org
> ---
>  drivers/nvme/host/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
> 
>  	if (++tail == nvmeq->q_depth)
>  		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>  	nvmeq->sq_tail = tail;
>  }
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-01 15:59   ` Busch, Keith
@ 2016-02-01 16:17     ` Wenbo Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-01 16:17 UTC (permalink / raw)
  To: Busch, Keith, Wenbo Wang, axboe; +Cc: linux-kernel, linux-nvme

No, this issue has not been seen yet. 

Since blk_mq_run_hw_queue tests queue stopped in the very beginning. It should be possible to race with nvme_dev_disable.

I agree that the request shall be re-queued.

-----Original Message-----
From: Busch, Keith [mailto:keith.busch@intel.com] 
Sent: Tuesday, February 2, 2016 12:00 AM
To: Wenbo Wang; axboe@fb.com
Cc: linux-kernel@vger.kernel.org; Wenbo Wang; linux-nvme@lists.infradead.org
Subject: RE: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

Does this ever happen? The queue should be stopped before the bar is unmapped. If that's insufficient to guard against this, we've another problem this patch does not cover. That command will just timeout since it was accepted by the driver, but not actually submitted to anything. The request needs to be requeued or return BLK_MQ_RQ_QUEUE_BUSY.

> -----Original Message-----
> From: Wenbo Wang [mailto:mail_weber_wang@163.com]
> Sent: Monday, February 01, 2016 8:42 AM
> To: axboe@fb.com; Busch, Keith
> Cc: linux-kernel@vger.kernel.org; Wenbo Wang; Wenbo Wang; 
> linux-nvme@lists.infradead.org
> Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been 
> suspended
> 
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq could have 
> been suspended and dev->bar could have been unmapped. Do not touch sq 
> door bell in this case.
> 
> Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao@memblaze.com>
> CC: linux-nvme@lists.infradead.org
> ---
>  drivers/nvme/host/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 
> 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue 
> *nvmeq,
> 
>  	if (++tail == nvmeq->q_depth)
>  		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>  	nvmeq->sq_tail = tail;
>  }
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-01 16:17     ` Wenbo Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-01 16:17 UTC (permalink / raw)


No, this issue has not been seen yet. 

Since blk_mq_run_hw_queue tests queue stopped in the very beginning. It should be possible to race with nvme_dev_disable.

I agree that the request shall be re-queued.

-----Original Message-----
From: Busch, Keith [mailto:keith.busch@intel.com] 
Sent: Tuesday, February 2, 2016 12:00 AM
To: Wenbo Wang; axboe at fb.com
Cc: linux-kernel at vger.kernel.org; Wenbo Wang; linux-nvme at lists.infradead.org
Subject: RE: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

Does this ever happen? The queue should be stopped before the bar is unmapped. If that's insufficient to guard against this, we've another problem this patch does not cover. That command will just timeout since it was accepted by the driver, but not actually submitted to anything. The request needs to be requeued or return BLK_MQ_RQ_QUEUE_BUSY.

> -----Original Message-----
> From: Wenbo Wang [mailto:mail_weber_wang at 163.com]
> Sent: Monday, February 01, 2016 8:42 AM
> To: axboe at fb.com; Busch, Keith
> Cc: linux-kernel at vger.kernel.org; Wenbo Wang; Wenbo Wang; 
> linux-nvme at lists.infradead.org
> Subject: [PATCH] NVMe: do not touch sq door bell if nvmeq has been 
> suspended
> 
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq could have 
> been suspended and dev->bar could have been unmapped. Do not touch sq 
> door bell in this case.
> 
> Signed-off-by: Wenbo Wang <wenbo.wang at memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao at memblaze.com>
> CC: linux-nvme at lists.infradead.org
> ---
>  drivers/nvme/host/pci.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 
> 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue 
> *nvmeq,
> 
>  	if (++tail == nvmeq->q_depth)
>  		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>  	nvmeq->sq_tail = tail;
>  }
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-01 15:42 ` Wenbo Wang
@ 2016-02-01 16:54   ` Jens Axboe
  -1 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2016-02-01 16:54 UTC (permalink / raw)
  To: Wenbo Wang, keith.busch; +Cc: linux-kernel, Wenbo Wang, linux-nvme

On 02/01/2016 08:42 AM, Wenbo Wang wrote:
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
> could have been suspended and dev->bar could have been
> unmapped. Do not touch sq door bell in this case.
>
> Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao@memblaze.com>
> CC: linux-nvme@lists.infradead.org
> ---
>   drivers/nvme/host/pci.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
>
>   	if (++tail == nvmeq->q_depth)
>   		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>   	nvmeq->sq_tail = tail;

What Keith said (this should not happen), and additionally, this won't 
work for a polled CQ without a vector.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-01 16:54   ` Jens Axboe
  0 siblings, 0 replies; 36+ messages in thread
From: Jens Axboe @ 2016-02-01 16:54 UTC (permalink / raw)


On 02/01/2016 08:42 AM, Wenbo Wang wrote:
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq
> could have been suspended and dev->bar could have been
> unmapped. Do not touch sq door bell in this case.
>
> Signed-off-by: Wenbo Wang <wenbo.wang at memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao at memblaze.com>
> CC: linux-nvme at lists.infradead.org
> ---
>   drivers/nvme/host/pci.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
>
>   	if (++tail == nvmeq->q_depth)
>   		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>   	nvmeq->sq_tail = tail;

What Keith said (this should not happen), and additionally, this won't 
work for a polled CQ without a vector.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-01 16:54   ` Jens Axboe
  (?)
@ 2016-02-02  7:15   ` Wenbo Wang
  2016-02-02 12:41       ` Sagi Grimberg
                       ` (2 more replies)
  -1 siblings, 3 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-02  7:15 UTC (permalink / raw)


Jens,

I did the following test to validate the issue.

1. Modify code as below to increase the chance of races.
	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
	Add 10s delay before __nvme_submit_cmd()
2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.

Following is the execution order of the two code paths:
__blk_mq_run_hw_queue
  Test BLK_MQ_S_STOPPED
					nvme_dev_disable()
					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
					     nvme_dev_unmap(dev)  <-- unmap door bell
  nvme_queue_rq()
      Touch door bell	<-- panic here

-----Original Message-----
From: Jens Axboe [mailto:axboe@fb.com] 
Sent: Tuesday, February 2, 2016 12:54 AM
To: Wenbo Wang; keith.busch at intel.com
Cc: linux-kernel at vger.kernel.org; Wenbo Wang; linux-nvme at lists.infradead.org
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On 02/01/2016 08:42 AM, Wenbo Wang wrote:
> If __nvme_submit_cmd races with nvme_dev_disable, nvmeq could have 
> been suspended and dev->bar could have been unmapped. Do not touch sq 
> door bell in this case.
>
> Signed-off-by: Wenbo Wang <wenbo.wang at memblaze.com>
> Reviewed-by: Wenwei Tao <wenwei.tao at memblaze.com>
> CC: linux-nvme at lists.infradead.org
> ---
>   drivers/nvme/host/pci.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 
> 8b1a725..2288712 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -325,7 +325,8 @@ static void __nvme_submit_cmd(struct nvme_queue 
> *nvmeq,
>
>   	if (++tail == nvmeq->q_depth)
>   		tail = 0;
> -	writel(tail, nvmeq->q_db);
> +	if (likely(nvmeq->cq_vector >= 0))
> +		writel(tail, nvmeq->q_db);
>   	nvmeq->sq_tail = tail;

What Keith said (this should not happen), and additionally, this won't work for a polled CQ without a vector.

--
Jens Axboe

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02  7:15   ` Wenbo Wang
@ 2016-02-02 12:41       ` Sagi Grimberg
  2016-02-02 17:25       ` Keith Busch
  2016-02-03 14:41       ` Keith Busch
  2 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 12:41 UTC (permalink / raw)
  To: Wenbo Wang, Jens Axboe, Wenbo Wang, keith.busch
  Cc: Wenwei.Tao, linux-kernel, linux-nvme


> Jens,
>
> I did the following test to validate the issue.
>
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()
> 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
>
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>    Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>    nvme_queue_rq()
>        Touch door bell	<-- panic here

First of all, I think we need to cancel all
inflight requests before nvme_dev_unmap.

With my patches that move I/O termination to the
nvme core ([PATCH v1 0/3] Move active IO termination to the core)
the change needed is:
--
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e921165..2288bdb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1890,10 +1890,11 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
                 nvme_shutdown_ctrl(&dev->ctrl);
                 nvme_disable_queue(dev, 0);
         }
-       nvme_dev_unmap(dev);

         blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_io, dev);
         blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_io, dev);
+
+       nvme_dev_unmap(dev);
  }

  static int nvme_setup_prp_pools(struct nvme_dev *dev)
--

But still we need a way to wait out for all active
queue_rq to end. It seems like we need to maintain
the request_fn_active for blk-mq and provide an
API (blk_mq_wait_for_active_requests ?) that waits for
it to drop to zero.

Thoughts?

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 12:41       ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 12:41 UTC (permalink / raw)



> Jens,
>
> I did the following test to validate the issue.
>
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()
> 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
>
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>    Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>    nvme_queue_rq()
>        Touch door bell	<-- panic here

First of all, I think we need to cancel all
inflight requests before nvme_dev_unmap.

With my patches that move I/O termination to the
nvme core ([PATCH v1 0/3] Move active IO termination to the core)
the change needed is:
--
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e921165..2288bdb 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1890,10 +1890,11 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
                 nvme_shutdown_ctrl(&dev->ctrl);
                 nvme_disable_queue(dev, 0);
         }
-       nvme_dev_unmap(dev);

         blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_io, dev);
         blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_io, dev);
+
+       nvme_dev_unmap(dev);
  }

  static int nvme_setup_prp_pools(struct nvme_dev *dev)
--

But still we need a way to wait out for all active
queue_rq to end. It seems like we need to maintain
the request_fn_active for blk-mq and provide an
API (blk_mq_wait_for_active_requests ?) that waits for
it to drop to zero.

Thoughts?

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 12:41       ` Sagi Grimberg
@ 2016-02-02 14:27         ` Keith Busch
  -1 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 14:27 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Wenbo Wang, Jens Axboe, Wenbo Wang, Wenwei.Tao, linux-kernel, linux-nvme

On Tue, Feb 02, 2016 at 02:41:37PM +0200, Sagi Grimberg wrote:
> First of all, I think we need to cancel all
> inflight requests before nvme_dev_unmap.

IO cancelling is where it is because it protects against host memory
corruption. If you're going to mess with the ordering, just make sure
the PCI device is disabled from bus mastering first.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 14:27         ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 14:27 UTC (permalink / raw)

On Tue, Feb 02, 2016@02:41:37PM +0200, Sagi Grimberg wrote:
> First of all, I think we need to cancel all
> inflight requests before nvme_dev_unmap.

IO cancelling is where it is because it protects against host memory
corruption. If you're going to mess with the ordering, just make sure
the PCI device is disabled from bus mastering first.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 14:27         ` Keith Busch
@ 2016-02-02 14:33           ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 14:33 UTC (permalink / raw)
  To: Keith Busch
  Cc: Wenbo Wang, Jens Axboe, Wenbo Wang, Wenwei.Tao, linux-kernel, linux-nvme

Hey Keith,

>> First of all, I think we need to cancel all
>> inflight requests before nvme_dev_unmap.
>
> IO cancelling is where it is because it protects against host memory
> corruption. If you're going to mess with the ordering, just make sure
> the PCI device is disabled from bus mastering first.

Little help? :)

What corruption is the ordering protecting against?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 14:33           ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 14:33 UTC (permalink / raw)


Hey Keith,

>> First of all, I think we need to cancel all
>> inflight requests before nvme_dev_unmap.
>
> IO cancelling is where it is because it protects against host memory
> corruption. If you're going to mess with the ordering, just make sure
> the PCI device is disabled from bus mastering first.

Little help? :)

What corruption is the ordering protecting against?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 14:33           ` Sagi Grimberg
@ 2016-02-02 14:46             ` Keith Busch
  -1 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 14:46 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Wenbo Wang, Jens Axboe, Wenbo Wang, Wenwei.Tao, linux-kernel, linux-nvme

On Tue, Feb 02, 2016 at 04:33:10PM +0200, Sagi Grimberg wrote:
> Hey Keith,
> 
> >>First of all, I think we need to cancel all
> >>inflight requests before nvme_dev_unmap.
> >
> >IO cancelling is where it is because it protects against host memory
> >corruption. If you're going to mess with the ordering, just make sure
> >the PCI device is disabled from bus mastering first.
> 
> Little help? :)
> 
> What corruption is the ordering protecting against?

Sure thing. :)

We free the transfer buffers when a command is cancelled. The controller,
however, may still own the command and may try to write to them. We
have to fence the controller off from being able to do that, so we can't
cancel inflight commands while the PCI device is still bus master enabled.

In a perfect world, we could trust in disabling with NVMe registers,
but sometimes we can't rely on that.

This was commit 07836e659c81ec6b0d683dfbf7958339a22a7b69, which might
explain the scenario a little better, and was reported by end user.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 14:46             ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 14:46 UTC (permalink / raw)

On Tue, Feb 02, 2016@04:33:10PM +0200, Sagi Grimberg wrote:
> Hey Keith,
> 
> >>First of all, I think we need to cancel all
> >>inflight requests before nvme_dev_unmap.
> >
> >IO cancelling is where it is because it protects against host memory
> >corruption. If you're going to mess with the ordering, just make sure
> >the PCI device is disabled from bus mastering first.
> 
> Little help? :)
> 
> What corruption is the ordering protecting against?

Sure thing. :)

We free the transfer buffers when a command is cancelled. The controller,
however, may still own the command and may try to write to them. We
have to fence the controller off from being able to do that, so we can't
cancel inflight commands while the PCI device is still bus master enabled.

In a perfect world, we could trust in disabling with NVMe registers,
but sometimes we can't rely on that.

This was commit 07836e659c81ec6b0d683dfbf7958339a22a7b69, which might
explain the scenario a little better, and was reported by end user.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 14:46             ` Keith Busch
@ 2016-02-02 17:20               ` Sagi Grimberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 17:20 UTC (permalink / raw)
  To: Keith Busch
  Cc: Wenbo Wang, Jens Axboe, Wenbo Wang, Wenwei.Tao, linux-kernel, linux-nvme


> We free the transfer buffers when a command is cancelled. The controller,
> however, may still own the command and may try to write to them. We
> have to fence the controller off from being able to do that, so we can't
> cancel inflight commands while the PCI device is still bus master enabled.
>
> In a perfect world, we could trust in disabling with NVMe registers,
> but sometimes we can't rely on that.

OK, I wasn't aware that we cannot rely on that.

So it looks like we cannot change the ordering. So this leaves us with
the need to guarantee that no queue_rq is inflight before we unmap.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 17:20               ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-02 17:20 UTC (permalink / raw)



> We free the transfer buffers when a command is cancelled. The controller,
> however, may still own the command and may try to write to them. We
> have to fence the controller off from being able to do that, so we can't
> cancel inflight commands while the PCI device is still bus master enabled.
>
> In a perfect world, we could trust in disabling with NVMe registers,
> but sometimes we can't rely on that.

OK, I wasn't aware that we cannot rely on that.

So it looks like we cannot change the ordering. So this leaves us with
the need to guarantee that no queue_rq is inflight before we unmap.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02  7:15   ` Wenbo Wang
@ 2016-02-02 17:25       ` Keith Busch
  2016-02-02 17:25       ` Keith Busch
  2016-02-03 14:41       ` Keith Busch
  2 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 17:25 UTC (permalink / raw)
  To: Wenbo Wang; +Cc: Jens Axboe, Wenbo Wang, linux-kernel, linux-nvme, Wenwei.Tao

On Tue, Feb 02, 2016 at 07:15:57AM +0000, Wenbo Wang wrote:
> Jens,
> 
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()

If running sync IO, preempt is disabled. You can't just put a 10 second
delay there. Wouldn't you hit a "scheduling while atomic" bug instead?

If blk-mq is running the h/w context from its work queue, that might
be a different issue. Maybe we can change the "cancel_delayed_work" to
"cancel_delayed_work_sync" in blk_mq_stop_hw_queues.

If there's still a window where blk-mq can insert a request after the
driver requested to stop queues, I think we should try to close it with
the block layer.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-02 17:25       ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-02 17:25 UTC (permalink / raw)

On Tue, Feb 02, 2016@07:15:57AM +0000, Wenbo Wang wrote:
> Jens,
> 
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()

If running sync IO, preempt is disabled. You can't just put a 10 second
delay there. Wouldn't you hit a "scheduling while atomic" bug instead?

If blk-mq is running the h/w context from its work queue, that might
be a different issue. Maybe we can change the "cancel_delayed_work" to
"cancel_delayed_work_sync" in blk_mq_stop_hw_queues.

If there's still a window where blk-mq can insert a request after the
driver requested to stop queues, I think we should try to close it with
the block layer.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 17:25       ` Keith Busch
  (?)
@ 2016-02-03  0:19       ` Wenbo Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-03  0:19 UTC (permalink / raw)

I used mdelay but not sleep.

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Wednesday, February 3, 2016 1:25 AM
To: Wenbo Wang
Cc: Jens Axboe; Wenbo Wang; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org; Wenwei.Tao
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On Tue, Feb 02, 2016@07:15:57AM +0000, Wenbo Wang wrote:
> Jens,
> 
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()

If running sync IO, preempt is disabled. You can't just put a 10 second delay there. Wouldn't you hit a "scheduling while atomic" bug instead?

If blk-mq is running the h/w context from its work queue, that might be a different issue. Maybe we can change the "cancel_delayed_work" to "cancel_delayed_work_sync" in blk_mq_stop_hw_queues.

If there's still a window where blk-mq can insert a request after the driver requested to stop queues, I think we should try to close it with the block layer.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02 17:20               ` Sagi Grimberg
  (?)
@ 2016-02-03  0:49               ` Wenbo Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-03  0:49 UTC (permalink / raw)


nvme interrupt handler touches cq door bell, so actually before unmap it must be taken care of too. Currently it checks cq_vector >= 0 to avoid touching unmapped door bell. 

I think it is better to let queue_rq and interrupt handler use the same mechanism to detect unmap, by driver itself (e.g. cq_vector) or by block layer (e.g. a new API to guarantee no inflight IO under hw queue).


-----Original Message-----
From: Sagi Grimberg [mailto:sagig@dev.mellanox.co.il] 
Sent: Wednesday, February 3, 2016 1:20 AM
To: Keith Busch
Cc: Wenbo Wang; Jens Axboe; Wenbo Wang; Wenwei.Tao; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended


> We free the transfer buffers when a command is cancelled. The 
> controller, however, may still own the command and may try to write to 
> them. We have to fence the controller off from being able to do that, 
> so we can't cancel inflight commands while the PCI device is still bus master enabled.
>
> In a perfect world, we could trust in disabling with NVMe registers, 
> but sometimes we can't rely on that.

OK, I wasn't aware that we cannot rely on that.

So it looks like we cannot change the ordering. So this leaves us with the need to guarantee that no queue_rq is inflight before we unmap.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-02  7:15   ` Wenbo Wang
@ 2016-02-03 14:41       ` Keith Busch
  2016-02-02 17:25       ` Keith Busch
  2016-02-03 14:41       ` Keith Busch
  2 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-03 14:41 UTC (permalink / raw)
  To: Wenbo Wang; +Cc: Jens Axboe, Wenbo Wang, linux-kernel, linux-nvme, Wenwei.Tao

On Tue, Feb 02, 2016 at 07:15:57AM +0000, Wenbo Wang wrote:
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()
> 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
> 
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>   Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>   nvme_queue_rq()
>       Touch door bell	<-- panic here

Does the following force the first to complete before the unmap?

---
@@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
+		blk_sync_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
--

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-03 14:41       ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-03 14:41 UTC (permalink / raw)


On Tue, Feb 02, 2016@07:15:57AM +0000, Wenbo Wang wrote:
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd()
> 2. Run dd and at the same time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
> 
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>   Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>   nvme_queue_rq()
>       Touch door bell	<-- panic here

Does the following force the first to complete before the unmap?

---
@@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
+		blk_sync_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
--

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-03 14:41       ` Keith Busch
  (?)
@ 2016-02-03 16:35       ` Wenbo Wang
  2016-02-03 16:38           ` Keith Busch
  -1 siblings, 1 reply; 36+ messages in thread
From: Wenbo Wang @ 2016-02-03 16:35 UTC (permalink / raw)


For async io (executed by run_work worker) it should work. However for sync io in blk_mq_run_hw_queue, this seems not help, there is still a window.

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Wednesday, February 3, 2016 10:41 PM
To: Wenbo Wang
Cc: Jens Axboe; Wenbo Wang; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org; Wenwei.Tao
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On Tue, Feb 02, 2016@07:15:57AM +0000, Wenbo Wang wrote:
> I did the following test to validate the issue.
> 
> 1. Modify code as below to increase the chance of races.
> 	Add 10s delay after nvme_dev_unmap() in nvme_dev_disable()
> 	Add 10s delay before __nvme_submit_cmd() 2. Run dd and at the same 
> time, echo 1 to reset_controller to trigger device reset. Finally kernel crashes due to accessing unmapped door bell register.
> 
> Following is the execution order of the two code paths:
> __blk_mq_run_hw_queue
>   Test BLK_MQ_S_STOPPED
> 					nvme_dev_disable()
> 					     nvme_stop_queues()  <-- set BLK_MQ_S_STOPPED
> 					     nvme_dev_unmap(dev)  <-- unmap door bell
>   nvme_queue_rq()
>       Touch door bell	<-- panic here

Does the following force the first to complete before the unmap?

---
@@ -1415,10 +1421,21 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
 
 		blk_mq_cancel_requeue_work(ns->queue);
 		blk_mq_stop_hw_queues(ns->queue);
+		blk_sync_queue(ns->queue);
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
 }
--

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-03 16:35       ` Wenbo Wang
@ 2016-02-03 16:38           ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-03 16:38 UTC (permalink / raw)
  To: Wenbo Wang; +Cc: Jens Axboe, Wenbo Wang, linux-kernel, linux-nvme, Wenwei.Tao

On Wed, Feb 03, 2016 at 04:35:03PM +0000, Wenbo Wang wrote:
> For async io (executed by run_work worker) it should work. However for sync io in blk_mq_run_hw_queue, this seems not help, there is still a window.

Alright, for lack of a better way to sync a stopped queue, the driver's
indication looks like the best we can do at this point.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-03 16:38           ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-03 16:38 UTC (permalink / raw)


On Wed, Feb 03, 2016@04:35:03PM +0000, Wenbo Wang wrote:
> For async io (executed by run_work worker) it should work. However for sync io in blk_mq_run_hw_queue, this seems not help, there is still a window.

Alright, for lack of a better way to sync a stopped queue, the driver's
indication looks like the best we can do at this point.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-03 16:38           ` Keith Busch
  (?)
@ 2016-02-06 14:32           ` Wenbo Wang
  2016-02-07 13:41               ` Sagi Grimberg
  2016-02-08 15:01               ` Keith Busch
  -1 siblings, 2 replies; 36+ messages in thread
From: Wenbo Wang @ 2016-02-06 14:32 UTC (permalink / raw)


Keith,

Is the following solution OK?
synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code. Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.

Do you think synchronize_rcu shall be added to blk_sync_queue?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c0622f..bfe9132 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -865,7 +865,9 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
        if (!async) {
                int cpu = get_cpu();
                if (cpumask_test_cpu(cpu, hctx->cpumask)) {
+                       rcu_read_lock();
                        __blk_mq_run_hw_queue(hctx);
+                       rcu_read_unlock();
                        put_cpu();
                        return;
                }

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8b1a725..d1e5e63 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1861,6 +1861,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
                nvme_disable_io_queues(dev);
                nvme_disable_admin_queue(dev, shutdown);
        }
+       synchronize_rcu();
        nvme_dev_unmap(dev);

        for (i = dev->queue_count - 1; i >= 0; i--)

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Thursday, February 4, 2016 12:38 AM
To: Wenbo Wang
Cc: Jens Axboe; Wenbo Wang; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org; Wenwei.Tao
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On Wed, Feb 03, 2016@04:35:03PM +0000, Wenbo Wang wrote:
> For async io (executed by run_work worker) it should work. However for sync io in blk_mq_run_hw_queue, this seems not help, there is still a window.

Alright, for lack of a better way to sync a stopped queue, the driver's indication looks like the best we can do at this point.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-06 14:32           ` Wenbo Wang
@ 2016-02-07 13:41               ` Sagi Grimberg
  2016-02-08 15:01               ` Keith Busch
  1 sibling, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-07 13:41 UTC (permalink / raw)
  To: Wenbo Wang, Keith Busch
  Cc: Jens Axboe, Wenwei.Tao, Wenbo Wang, linux-nvme, linux-kernel


> Keith,
>
> Is the following solution OK?
> synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code.
> Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.

This can be acceptable I think.

> Do you think synchronize_rcu shall be added to blk_sync_queue?

Or, we'll add it to blk_mq_stop_hw_queues() and then scsi
will enjoy it as well.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4c0622f..bfe9132 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -865,7 +865,9 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>          if (!async) {
>                  int cpu = get_cpu();
>                  if (cpumask_test_cpu(cpu, hctx->cpumask)) {
> +                       rcu_read_lock();
>                          __blk_mq_run_hw_queue(hctx);
> +                       rcu_read_unlock();

I think the rcu is better folded into __blk_mq_run_hw_queue
to cover all the call sites.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-07 13:41               ` Sagi Grimberg
  0 siblings, 0 replies; 36+ messages in thread
From: Sagi Grimberg @ 2016-02-07 13:41 UTC (permalink / raw)



> Keith,
>
> Is the following solution OK?
> synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code.
> Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.

This can be acceptable I think.

> Do you think synchronize_rcu shall be added to blk_sync_queue?

Or, we'll add it to blk_mq_stop_hw_queues() and then scsi
will enjoy it as well.

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4c0622f..bfe9132 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -865,7 +865,9 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
>          if (!async) {
>                  int cpu = get_cpu();
>                  if (cpumask_test_cpu(cpu, hctx->cpumask)) {
> +                       rcu_read_lock();
>                          __blk_mq_run_hw_queue(hctx);
> +                       rcu_read_unlock();

I think the rcu is better folded into __blk_mq_run_hw_queue
to cover all the call sites.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-06 14:32           ` Wenbo Wang
@ 2016-02-08 15:01               ` Keith Busch
  2016-02-08 15:01               ` Keith Busch
  1 sibling, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-08 15:01 UTC (permalink / raw)
  To: Wenbo Wang; +Cc: Jens Axboe, Wenbo Wang, linux-kernel, linux-nvme, Wenwei.Tao

On Sat, Feb 06, 2016 at 02:32:24PM +0000, Wenbo Wang wrote:
> Keith,
> 
> Is the following solution OK?
> synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code. Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.
> 
> Do you think synchronize_rcu shall be added to blk_sync_queue?

I was nearly going to suggest the same last week, but it feels wrong since
no one takes rcu_read_lock in the path we're trying to sychronoize. Is
this safe if the task is interrupted?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-08 15:01               ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-08 15:01 UTC (permalink / raw)


On Sat, Feb 06, 2016@02:32:24PM +0000, Wenbo Wang wrote:
> Keith,
> 
> Is the following solution OK?
> synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code. Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.
> 
> Do you think synchronize_rcu shall be added to blk_sync_queue?

I was nearly going to suggest the same last week, but it feels wrong since
no one takes rcu_read_lock in the path we're trying to sychronoize. Is
this safe if the task is interrupted?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-08 15:01               ` Keith Busch
  (?)
@ 2016-02-09 11:22               ` Wenbo Wang
  2016-02-09 22:55                   ` Keith Busch
  -1 siblings, 1 reply; 36+ messages in thread
From: Wenbo Wang @ 2016-02-09 11:22 UTC (permalink / raw)


In most cases, rcu read lock is just a preempt_disable, which is what get_cpu does. I don't see any risk.

-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com] 
Sent: Monday, February 8, 2016 11:02 PM
To: Wenbo Wang
Cc: Jens Axboe; Wenbo Wang; linux-kernel at vger.kernel.org; linux-nvme at lists.infradead.org; Wenwei.Tao
Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended

On Sat, Feb 06, 2016@02:32:24PM +0000, Wenbo Wang wrote:
> Keith,
> 
> Is the following solution OK?
> synchronize_rcu guarantee that no queue_rq is running concurrently with device disable code. Together with your another patch (adding blk_sync_queue), both sync/async path shall be handled correctly.
> 
> Do you think synchronize_rcu shall be added to blk_sync_queue?

I was nearly going to suggest the same last week, but it feels wrong since no one takes rcu_read_lock in the path we're trying to sychronoize. Is this safe if the task is interrupted?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
  2016-02-09 11:22               ` Wenbo Wang
@ 2016-02-09 22:55                   ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-09 22:55 UTC (permalink / raw)
  To: Wenbo Wang; +Cc: Jens Axboe, Wenbo Wang, linux-kernel, linux-nvme, Wenwei.Tao

On Tue, Feb 09, 2016 at 11:22:04AM +0000, Wenbo Wang wrote:
> In most cases, rcu read lock is just a preempt_disable, which is what get_cpu does. I don't see any risk.

Yes, many rcu_read_lock cases expand similarly to get_cpu. What about
the other cases?

FWIW, I don't think we'll hit the problem with the proposed rcu sync.
Heck, we don't even have a synchronize today and we don't hit the
theoretical problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended
@ 2016-02-09 22:55                   ` Keith Busch
  0 siblings, 0 replies; 36+ messages in thread
From: Keith Busch @ 2016-02-09 22:55 UTC (permalink / raw)


On Tue, Feb 09, 2016@11:22:04AM +0000, Wenbo Wang wrote:
> In most cases, rcu read lock is just a preempt_disable, which is what get_cpu does. I don't see any risk.

Yes, many rcu_read_lock cases expand similarly to get_cpu. What about
the other cases?

FWIW, I don't think we'll hit the problem with the proposed rcu sync.
Heck, we don't even have a synchronize today and we don't hit the
theoretical problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-02-09 22:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-01 15:42 [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended Wenbo Wang
2016-02-01 15:42 ` Wenbo Wang
2016-02-01 15:59 ` Busch, Keith
2016-02-01 15:59   ` Busch, Keith
2016-02-01 16:17   ` Wenbo Wang
2016-02-01 16:17     ` Wenbo Wang
2016-02-01 16:54 ` Jens Axboe
2016-02-01 16:54   ` Jens Axboe
2016-02-02  7:15   ` Wenbo Wang
2016-02-02 12:41     ` Sagi Grimberg
2016-02-02 12:41       ` Sagi Grimberg
2016-02-02 14:27       ` Keith Busch
2016-02-02 14:27         ` Keith Busch
2016-02-02 14:33         ` Sagi Grimberg
2016-02-02 14:33           ` Sagi Grimberg
2016-02-02 14:46           ` Keith Busch
2016-02-02 14:46             ` Keith Busch
2016-02-02 17:20             ` Sagi Grimberg
2016-02-02 17:20               ` Sagi Grimberg
2016-02-03  0:49               ` Wenbo Wang
2016-02-02 17:25     ` Keith Busch
2016-02-02 17:25       ` Keith Busch
2016-02-03  0:19       ` Wenbo Wang
2016-02-03 14:41     ` Keith Busch
2016-02-03 14:41       ` Keith Busch
2016-02-03 16:35       ` Wenbo Wang
2016-02-03 16:38         ` Keith Busch
2016-02-03 16:38           ` Keith Busch
2016-02-06 14:32           ` Wenbo Wang
2016-02-07 13:41             ` Sagi Grimberg
2016-02-07 13:41               ` Sagi Grimberg
2016-02-08 15:01             ` Keith Busch
2016-02-08 15:01               ` Keith Busch
2016-02-09 11:22               ` Wenbo Wang
2016-02-09 22:55                 ` Keith Busch
2016-02-09 22:55                   ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.