* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
@ 2018-11-21 23:17 Sagi Grimberg
2018-11-27 22:11 ` Ewan D. Milne
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Sagi Grimberg @ 2018-11-21 23:17 UTC (permalink / raw)
nvme_stop_ctrl can be called also for reset flow and there is no need to
flush the scan_work as namespaces are not being removed. This can cause
deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
before controller teardown (and specifically I/O cancellation of the
scan_work itself) takes place, but the scan_work will be blocked anyways
so there is no need to flush it.
Instead, move scan_work flush to nvme_remove_namespaces() where it really
needs to flush.
Reported-by: Ming Lei <ming.lei at redhat.com>
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
This is a stable candidate...
drivers/nvme/host/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c418b7a347e0..2e0571584e7a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
struct nvme_ns *ns, *next;
LIST_HEAD(ns_list);
+ /* prevent racing with ns scanning */
+ flush_work(&ctrl->scan_work);
+
/*
* The dead states indicates the controller was not gracefully
* disconnected. In that case, we won't be able to flush any data while
@@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
nvme_mpath_stop(ctrl);
nvme_stop_keep_alive(ctrl);
flush_work(&ctrl->async_event_work);
- flush_work(&ctrl->scan_work);
cancel_work_sync(&ctrl->fw_act_work);
if (ctrl->ops->stop_ctrl)
ctrl->ops->stop_ctrl(ctrl);
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
@ 2018-11-27 22:11 ` Ewan D. Milne
2018-11-27 22:21 ` Keith Busch
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Ewan D. Milne @ 2018-11-27 22:11 UTC (permalink / raw)
On Wed, 2018-11-21@15:17 -0800, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
>
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
>
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> This is a stable candidate...
>
> drivers/nvme/host/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c418b7a347e0..2e0571584e7a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns, *next;
> LIST_HEAD(ns_list);
>
> + /* prevent racing with ns scanning */
> + flush_work(&ctrl->scan_work);
> +
> /*
> * The dead states indicates the controller was not gracefully
> * disconnected. In that case, we won't be able to flush any data while
> @@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
> nvme_mpath_stop(ctrl);
> nvme_stop_keep_alive(ctrl);
> flush_work(&ctrl->async_event_work);
> - flush_work(&ctrl->scan_work);
> cancel_work_sync(&ctrl->fw_act_work);
> if (ctrl->ops->stop_ctrl)
> ctrl->ops->stop_ctrl(ctrl);
With this change, I can no longer reproduce the NVMe fabrics reset hangs
I was seeing (e.g. after a few hundred resets). A longer-duration fabric
connection bounce test survived as well, this change looks good to me.
Thanks for fixing this, Sagi.
Tested-by: Ewan D. Milne <emilne at redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
2018-11-27 22:11 ` Ewan D. Milne
@ 2018-11-27 22:21 ` Keith Busch
2018-11-27 22:56 ` James Smart
2018-11-28 6:57 ` Christoph Hellwig
3 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2018-11-27 22:21 UTC (permalink / raw)
On Wed, Nov 21, 2018@03:17:37PM -0800, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
>
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
>
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
Looks good to me.
Reviewed-by: Keith Busch <keith.busch at intel.com>
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3327,6 +3327,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns, *next;
> LIST_HEAD(ns_list);
>
> + /* prevent racing with ns scanning */
> + flush_work(&ctrl->scan_work);
> +
> /*
> * The dead states indicates the controller was not gracefully
> * disconnected. In that case, we won't be able to flush any data while
> @@ -3489,7 +3492,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
> nvme_mpath_stop(ctrl);
> nvme_stop_keep_alive(ctrl);
> flush_work(&ctrl->async_event_work);
> - flush_work(&ctrl->scan_work);
> cancel_work_sync(&ctrl->fw_act_work);
> if (ctrl->ops->stop_ctrl)
> ctrl->ops->stop_ctrl(ctrl);
> --
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
2018-11-27 22:11 ` Ewan D. Milne
2018-11-27 22:21 ` Keith Busch
@ 2018-11-27 22:56 ` James Smart
2018-11-28 6:57 ` Christoph Hellwig
3 siblings, 0 replies; 5+ messages in thread
From: James Smart @ 2018-11-27 22:56 UTC (permalink / raw)
On 11/21/2018 3:17 PM, Sagi Grimberg wrote:
> nvme_stop_ctrl can be called also for reset flow and there is no need to
> flush the scan_work as namespaces are not being removed. This can cause
> deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers
> before controller teardown (and specifically I/O cancellation of the
> scan_work itself) takes place, but the scan_work will be blocked anyways
> so there is no need to flush it.
>
> Instead, move scan_work flush to nvme_remove_namespaces() where it really
> needs to flush.
>
> Reported-by: Ming Lei <ming.lei at redhat.com>
> Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
> ---
> This is a stable candidate...
>
> drivers/nvme/host/core.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
Looks good to me.
Reviewed by:? James Smart? <jsmart2021 at gmail.com>
-- james
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nvme-core: flush namespace scanning work just before removing namespaces
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
` (2 preceding siblings ...)
2018-11-27 22:56 ` James Smart
@ 2018-11-28 6:57 ` Christoph Hellwig
3 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2018-11-28 6:57 UTC (permalink / raw)
Thanks,
applied to nvme-4.20.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-11-28 6:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-21 23:17 [PATCH] nvme-core: flush namespace scanning work just before removing namespaces Sagi Grimberg
2018-11-27 22:11 ` Ewan D. Milne
2018-11-27 22:21 ` Keith Busch
2018-11-27 22:56 ` James Smart
2018-11-28 6:57 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.