linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect
@ 2019-10-15 17:08 Anton Eidelman
  2019-10-17  7:03 ` Hannes Reinecke
  2019-10-17 15:22 ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Eidelman @ 2019-10-15 17:08 UTC (permalink / raw)
  To: linux-nvme, hch, keith.busch, sagi, hare; +Cc: Anton Eidelman

The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.
Leaving the groups_only capability in nvme_read_ana_log() in place,
although it is unused for now (NVME_ANA_LOG_RGO is defined in spec).

Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing
namespace.

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
---
 drivers/nvme/host/multipath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 30de7efef003..d320684d25b2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -715,7 +715,7 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		goto out;
 	}
 
-	error = nvme_read_ana_log(ctrl, true);
+	error = nvme_read_ana_log(ctrl, false);
 	if (error)
 		goto out_free_ana_log_buf;
 	return 0;
-- 
2.14.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-15 17:08 [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect Anton Eidelman
@ 2019-10-17  7:03 ` Hannes Reinecke
  2019-10-17 15:22 ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Hannes Reinecke @ 2019-10-17  7:03 UTC (permalink / raw)
  To: Anton Eidelman, linux-nvme, hch, keith.busch, sagi

On 10/15/19 7:08 PM, Anton Eidelman wrote:
> The following scenario results in an IO hang:
> 1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
>    NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
> 2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
>    This fails because ctrl disconnects.
>    Therefore nvme_update_ns_ana_state() is not called
>    and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
> 3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
>    nvme_read_ana_log(ctrl, groups_only=true).
>    However, nvme_update_ana_state() does not update namespaces
>    because nr_nsids = 0 (due to groups_only mode).
> 4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.
> 
> Result:
> The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
> Consequently ctrl will never be considered a viable path by __nvme_find_path().
> IO will hang if ctrl is the only or the last path to the namespace.
> 
> More generally, while ctrl is reconnecting, its ANA state may change.
> And because nvme_mpath_init() requests ANA log in groups_only mode,
> these changes are not propagated to the existing ctrl namespaces.
> This may result in a mal-function or an IO hang.
> 
> Solution:
> nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
> This will not harm the new ctrl case (no namespaces present),
> and will make sure the ANA state of namespaces gets updated after reconnect.
> Leaving the groups_only capability in nvme_read_ana_log() in place,
> although it is unused for now (NVME_ANA_LOG_RGO is defined in spec).
> 
> Another option would be for nvme_mpath_init() to invoke
> nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing
> namespace.
> 
> Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
> ---
>  drivers/nvme/host/multipath.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 30de7efef003..d320684d25b2 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -715,7 +715,7 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
>  		goto out;
>  	}
>  
> -	error = nvme_read_ana_log(ctrl, true);
> +	error = nvme_read_ana_log(ctrl, false);
>  	if (error)
>  		goto out_free_ana_log_buf;
>  	return 0;
> 
Actually I think we should drop the RGO handling in nvme_read_ana_log(),
seeing the issues it has caused already.
But this can be done in a new patch.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-15 17:08 [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect Anton Eidelman
  2019-10-17  7:03 ` Hannes Reinecke
@ 2019-10-17 15:22 ` Christoph Hellwig
  2019-10-17 18:11   ` Anton Eidelman
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-10-17 15:22 UTC (permalink / raw)
  To: Anton Eidelman; +Cc: keith.busch, hare, hch, linux-nvme, sagi

On Tue, Oct 15, 2019 at 10:08:02AM -0700, Anton Eidelman wrote:
> The following scenario results in an IO hang:
> 1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
>    NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
> 2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
>    This fails because ctrl disconnects.
>    Therefore nvme_update_ns_ana_state() is not called
>    and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
> 3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
>    nvme_read_ana_log(ctrl, groups_only=true).
>    However, nvme_update_ana_state() does not update namespaces
>    because nr_nsids = 0 (due to groups_only mode).

How do you end up calling into nvme_init_identify for a reconnect?
That should always hit the !new path in nvme_tcp_configure_admin_queue.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-17 15:22 ` Christoph Hellwig
@ 2019-10-17 18:11   ` Anton Eidelman
  2019-10-18  9:10     ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Anton Eidelman @ 2019-10-17 18:11 UTC (permalink / raw)
  To: hch; +Cc: keith.busch, Anton Eidelman, sagi, linux-nvme, hare

As far as I can see nvme_init_identify() is called from
nvme_xxx_configure_admin_queue() in tcp.c as well as in rdma.c
regardless of the value of "new".
This makes sense as over reconnect things in the identify might change
for a dynamic controller, e.g. the cntlid.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-17 18:11   ` Anton Eidelman
@ 2019-10-18  9:10     ` Christoph Hellwig
  2019-10-18 18:32       ` [PATCH v2 1/2] " Anton Eidelman
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2019-10-18  9:10 UTC (permalink / raw)
  To: Anton Eidelman; +Cc: keith.busch, hare, hch, linux-nvme, sagi

On Thu, Oct 17, 2019 at 11:11:42AM -0700, Anton Eidelman wrote:
> As far as I can see nvme_init_identify() is called from
> nvme_xxx_configure_admin_queue() in tcp.c as well as in rdma.c
> regardless of the value of "new".
> This makes sense as over reconnect things in the identify might change
> for a dynamic controller, e.g. the cntlid.

True.  I guess the patch is fine, but please remove the now dead
groups_only code.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-18  9:10     ` Christoph Hellwig
@ 2019-10-18 18:32       ` Anton Eidelman
  2019-10-18 18:32         ` [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log Anton Eidelman
  2019-10-22 19:48         ` [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect Sagi Grimberg
  0 siblings, 2 replies; 10+ messages in thread
From: Anton Eidelman @ 2019-10-18 18:32 UTC (permalink / raw)
  To: linux-nvme, hch, keith.busch, sagi, hare; +Cc: Anton Eidelman

The following scenario results in an IO hang:
1) ctrl completes a request with NVME_SC_ANA_TRANSITION.
   NVME_NS_ANA_PENDING bit in ns->flags is set and ana_work is triggered.
2) ana_work: nvme_read_ana_log() tries to get the ANA log page from the ctrl.
   This fails because ctrl disconnects.
   Therefore nvme_update_ns_ana_state() is not called
   and NVME_NS_ANA_PENDING bit in ns->flags is not cleared.
3) ctrl reconnects: nvme_mpath_init(ctrl,...) calls
   nvme_read_ana_log(ctrl, groups_only=true).
   However, nvme_update_ana_state() does not update namespaces
   because nr_nsids = 0 (due to groups_only mode).
4) scan_work calls nvme_validate_ns() finds the ns and re-validates OK.

Result:
The ctrl is now live but NVME_NS_ANA_PENDING bit in ns->flags is still set.
Consequently ctrl will never be considered a viable path by __nvme_find_path().
IO will hang if ctrl is the only or the last path to the namespace.

More generally, while ctrl is reconnecting, its ANA state may change.
And because nvme_mpath_init() requests ANA log in groups_only mode,
these changes are not propagated to the existing ctrl namespaces.
This may result in a mal-function or an IO hang.

Solution:
nvme_mpath_init() will nvme_read_ana_log() with groups_only set to false.
This will not harm the new ctrl case (no namespaces present),
and will make sure the ANA state of namespaces gets updated after reconnect.

Note: Another option would be for nvme_mpath_init() to invoke
nvme_parse_ana_log(..., nvme_set_ns_ana_state) for each existing namespace.

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
---
 drivers/nvme/host/multipath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 30de7efef003..d320684d25b2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -715,7 +715,7 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		goto out;
 	}
 
-	error = nvme_read_ana_log(ctrl, true);
+	error = nvme_read_ana_log(ctrl, false);
 	if (error)
 		goto out_free_ana_log_buf;
 	return 0;
-- 
2.14.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log
  2019-10-18 18:32       ` [PATCH v2 1/2] " Anton Eidelman
@ 2019-10-18 18:32         ` Anton Eidelman
  2019-10-22 19:48           ` Sagi Grimberg
  2019-10-22 19:48         ` [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Anton Eidelman @ 2019-10-18 18:32 UTC (permalink / raw)
  To: linux-nvme, hch, keith.busch, sagi, hare; +Cc: Anton Eidelman

groups_only mode in nvme_read_ana_log() is no longer used: remove it.

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
---
 drivers/nvme/host/multipath.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d320684d25b2..fc99a40c1ec4 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -522,14 +522,13 @@ static int nvme_update_ana_state(struct nvme_ctrl *ctrl,
 	return 0;
 }
 
-static int nvme_read_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
+static int nvme_read_ana_log(struct nvme_ctrl *ctrl)
 {
 	u32 nr_change_groups = 0;
 	int error;
 
 	mutex_lock(&ctrl->ana_lock);
-	error = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_ANA,
-			groups_only ? NVME_ANA_LOG_RGO : 0,
+	error = nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_ANA, 0,
 			ctrl->ana_log_buf, ctrl->ana_log_size, 0);
 	if (error) {
 		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
@@ -565,7 +564,7 @@ static void nvme_ana_work(struct work_struct *work)
 {
 	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, ana_work);
 
-	nvme_read_ana_log(ctrl, false);
+	nvme_read_ana_log(ctrl);
 }
 
 static void nvme_anatt_timeout(struct timer_list *t)
@@ -715,7 +714,7 @@ int nvme_mpath_init(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		goto out;
 	}
 
-	error = nvme_read_ana_log(ctrl, false);
+	error = nvme_read_ana_log(ctrl);
 	if (error)
 		goto out_free_ana_log_buf;
 	return 0;
-- 
2.14.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-18 18:32       ` [PATCH v2 1/2] " Anton Eidelman
  2019-10-18 18:32         ` [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log Anton Eidelman
@ 2019-10-22 19:48         ` Sagi Grimberg
  2019-10-23  1:18           ` Keith Busch
  1 sibling, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2019-10-22 19:48 UTC (permalink / raw)
  To: Anton Eidelman, linux-nvme, hch, keith.busch, hare

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

Keith can you apply these to nvme-5.4-rc?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log
  2019-10-18 18:32         ` [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log Anton Eidelman
@ 2019-10-22 19:48           ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2019-10-22 19:48 UTC (permalink / raw)
  To: Anton Eidelman, linux-nvme, hch, keith.busch, hare

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect
  2019-10-22 19:48         ` [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect Sagi Grimberg
@ 2019-10-23  1:18           ` Keith Busch
  0 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2019-10-23  1:18 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: keith.busch, Anton Eidelman, hch, linux-nvme, hare

On Tue, Oct 22, 2019 at 12:48:35PM -0700, Sagi Grimberg wrote:
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> 
> Keith can you apply these to nvme-5.4-rc?

Yep, applied to nvme-5.4. Thanks!

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-23  1:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-15 17:08 [PATCH] nvme-multipath: fix possible io hang after ctrl reconnect Anton Eidelman
2019-10-17  7:03 ` Hannes Reinecke
2019-10-17 15:22 ` Christoph Hellwig
2019-10-17 18:11   ` Anton Eidelman
2019-10-18  9:10     ` Christoph Hellwig
2019-10-18 18:32       ` [PATCH v2 1/2] " Anton Eidelman
2019-10-18 18:32         ` [PATCH v2 2/2] nvme-multipath: remove unused groups_only mode in ana log Anton Eidelman
2019-10-22 19:48           ` Sagi Grimberg
2019-10-22 19:48         ` [PATCH v2 1/2] nvme-multipath: fix possible io hang after ctrl reconnect Sagi Grimberg
2019-10-23  1:18           ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).