All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
@ 2018-11-21 23:17 Sagi Grimberg
  2018-11-27 22:11 ` Ewan D. Milne
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Sagi Grimberg @ 2018-11-21 23:17 UTC (permalink / raw)


nvme_stop_ctrl can be called also for reset flow and there is no need to
flush the scan_work as namespaces are not being removed. This can cause
deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
before controller teardown (and specifically I/O cancellation of the
scan_work itself) takes place, but the scan_work will be blocked anyways
so there is no need to flush it.

Instead, move scan_work flush to nvme_remove_namespaces() where it really
needs to flush.

Reported-by: Ming Lei <ming.lei at redhat.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
This is a stable candidate...

 drivers/nvme/host/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c418b7a347e0..2e0571584e7a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	struct nvme_ns *ns, *next;
 	LIST_HEAD(ns_list);
 
+	/* prevent racing with ns scanning */
+	flush_work(&ctrl->scan_work);
+
 	/*
 	 * The dead states indicates the controller was not gracefully
 	 * disconnected. In that case, we won't be able to flush any data while
@@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 	nvme_mpath_stop(ctrl);
 	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
-	flush_work(&ctrl->scan_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 	if (ctrl->ops->stop_ctrl)
 		ctrl->ops->stop_ctrl(ctrl);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
  2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
@ 2018-11-27 22:11 ` Ewan D. Milne
  2018-11-27 22:21 ` Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Ewan D. Milne @ 2018-11-27 22:11 UTC (permalink / raw)


On Wed, 2018-11-21@15:17 -0800, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
> 
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
> 
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> This is a stable candidate...
> 
>  drivers/nvme/host/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c418b7a347e0..2e0571584e7a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns, *next;
>  	LIST_HEAD(ns_list);
>  
> +	/* prevent racing with ns scanning */
> +	flush_work(&ctrl->scan_work);
> +
>  	/*
>  	 * The dead states indicates the controller was not gracefully
>  	 * disconnected. In that case, we won't be able to flush any data while
> @@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
>  	nvme_mpath_stop(ctrl);
>  	nvme_stop_keep_alive(ctrl);
>  	flush_work(&ctrl->async_event_work);
> -	flush_work(&ctrl->scan_work);
>  	cancel_work_sync(&ctrl->fw_act_work);
>  	if (ctrl->ops->stop_ctrl)
>  		ctrl->ops->stop_ctrl(ctrl);

With this change, I can no longer reproduce the NVMe fabrics reset hangs
I was seeing (e.g. after a few hundred resets).  A longer-duration fabric
connection bounce test survived as well, this change looks good to me.

Thanks for fixing this, Sagi.

Tested-by: Ewan D. Milne <emilne at redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
  2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
  2018-11-27 22:11 ` Ewan D. Milne
@ 2018-11-27 22:21 ` Keith Busch
  2018-11-27 22:56 ` James Smart
  2018-11-28  6:57 ` Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2018-11-27 22:21 UTC (permalink / raw)


On Wed, Nov 21, 2018@03:17:37PM -0800, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
> 
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
> 
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>

Looks good to me.

Reviewed-by: Keith Busch <keith.busch at intel.com>

> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
>  	struct nvme_ns *ns, *next;
>  	LIST_HEAD(ns_list);
>  
> +	/* prevent racing with ns scanning */
> +	flush_work(&ctrl->scan_work);
> +
>  	/*
>  	 * The dead states indicates the controller was not gracefully
>  	 * disconnected. In that case, we won't be able to flush any data while
> @@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
>  	nvme_mpath_stop(ctrl);
>  	nvme_stop_keep_alive(ctrl);
>  	flush_work(&ctrl->async_event_work);
> -	flush_work(&ctrl->scan_work);
>  	cancel_work_sync(&ctrl->fw_act_work);
>  	if (ctrl->ops->stop_ctrl)
>  		ctrl->ops->stop_ctrl(ctrl);
> -- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
  2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
  2018-11-27 22:11 ` Ewan D. Milne
  2018-11-27 22:21 ` Keith Busch
@ 2018-11-27 22:56 ` James Smart
  2018-11-28  6:57 ` Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: James Smart @ 2018-11-27 22:56 UTC (permalink / raw)



On 11/21/2018 3:17 PM, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
>
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
>
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> This is a stable candidate...
>
>   drivers/nvme/host/core.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
Looks good to me.

Reviewed by:? James Smart? <jsmart2021 at gmail.com>

-- james

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
  2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
                   ` (2 preceding siblings ...)
  2018-11-27 22:56 ` James Smart
@ 2018-11-28  6:57 ` Christoph Hellwig
  3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2018-11-28  6:57 UTC (permalink / raw)


Thanks,

applied to nvme-4.20.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-28  6:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
2018-11-27 22:11 ` Ewan D. Milne
2018-11-27 22:21 ` Keith Busch
2018-11-27 22:56 ` James Smart
2018-11-28  6:57 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.