linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
@ 2022-09-28 13:58 Sagi Grimberg
  2022-09-28 17:02 ` Daniel Wagner
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-09-28 13:58 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	Hannes Reinecke, Yogev Cohen

When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
it is possible that during the path revalidation, the only paths that is IO
capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet
informed to the host, which will cause inflight requests to be requeued (as we
have available paths but none are IO capable). These requests on the requeue
list are waiting for someone to resubmit them at some point.

The IO capable paths will eventually notify the ns resize change to the
host, but there is nothing that will kick the requeue list to resubmit the
queued requests.

Fix this by always kicking the requeue list, and if no IO capable path exists,
these requests will be queued again.

A typical log that indicates that IOs are requeued:
--
nvme nvme1: creating 4 I/O queues.
nvme nvme1: new ctrl: "testnqn1"
nvme nvme2: creating 4 I/O queues.
nvme nvme2: mapped 4/0/0 default/read/poll queues.
nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
nvme nvme1: rescanning namespaces.
nvme1n1: detected capacity change from 2097152 to 4194304
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
block nvme1n1: no usable path - requeuing I/O
nvme nvme2: rescanning namespaces.
--

Reported-by: Yogev Cohen <yogev@lightbitslabs.com>
Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
I was easily capable to reproduce this regression with a small debug patch to nvmet
to have a 1 second delay between controller AENs:
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
 #include "trace.h"
 
 #include "nvmet.h"
+#include <linux/delay.h>
 
 struct workqueue_struct *buffered_io_wq;
 struct workqueue_struct *zbd_wq;
@@ -248,6 +249,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
                nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
                                NVME_AER_NOTICE_NS_CHANGED,
                                NVME_LOG_CHANGED_NS);
+               msleep(1000);
        }
 }

And expose a subsystem via two ports, one ANA 'optimized', and one ANA 'inaccessible'.

Then test file-backed ns resize duing I/O:
truncate --size=2G /tmp/f && echo 1 > /sys/kernel/config/nvmet/subsystems/testnqn1/namespaces/1/revalidate_size

 drivers/nvme/host/multipath.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6ef497c75a16..d532a78f24a2 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
 void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
 {
 	struct nvme_ns_head *head = ns->head;
+	struct nvme_ns *n;
 	sector_t capacity = get_capacity(head->disk);
 	int node;
 
-	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (capacity != get_capacity(ns->disk))
-			clear_bit(NVME_NS_READY, &ns->flags);
+	list_for_each_entry_rcu(n, &head->list, siblings) {
+		if (capacity != get_capacity(n->disk))
+			clear_bit(NVME_NS_READY, &n->flags);
 	}
 
 	for_each_node(node)
 		rcu_assign_pointer(head->current_path[node], NULL);
+	nvme_kick_requeue_lists(ns->ctrl);
 }
 
 static bool nvme_path_is_disabled(struct nvme_ns *ns)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 13:58 [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access Sagi Grimberg
@ 2022-09-28 17:02 ` Daniel Wagner
  2022-09-28 17:14 ` Christoph Hellwig
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Daniel Wagner @ 2022-09-28 17:02 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	Hannes Reinecke, Yogev Cohen

On Wed, Sep 28, 2022 at 04:58:16PM +0300, Sagi Grimberg wrote:
> When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
> it is possible that during the path revalidation, the only paths that is IO
> capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet
> informed to the host, which will cause inflight requests to be requeued (as we
> have available paths but none are IO capable). These requests on the requeue
> list are waiting for someone to resubmit them at some point.
> 
> The IO capable paths will eventually notify the ns resize change to the
> host, but there is nothing that will kick the requeue list to resubmit the
> queued requests.
> 
> Fix this by always kicking the requeue list, and if no IO capable path exists,
> these requests will be queued again.
> 
> A typical log that indicates that IOs are requeued:
> --
> nvme nvme1: creating 4 I/O queues.
> nvme nvme1: new ctrl: "testnqn1"
> nvme nvme2: creating 4 I/O queues.
> nvme nvme2: mapped 4/0/0 default/read/poll queues.
> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
> nvme nvme1: rescanning namespaces.
> nvme1n1: detected capacity change from 2097152 to 4194304
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> nvme nvme2: rescanning namespaces.
> --
> 
> Reported-by: Yogev Cohen <yogev@lightbitslabs.com>
> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 13:58 [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access Sagi Grimberg
  2022-09-28 17:02 ` Daniel Wagner
@ 2022-09-28 17:14 ` Christoph Hellwig
  2022-09-28 20:07   ` Sagi Grimberg
  2022-09-28 17:21 ` Keith Busch
  2022-09-30  6:29 ` Hannes Reinecke
  3 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2022-09-28 17:14 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Chaitanya Kulkarni,
	Hannes Reinecke, Yogev Cohen

On Wed, Sep 28, 2022 at 04:58:16PM +0300, Sagi Grimberg wrote:
> When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),

Nit: a bunch of overly long lines in this patch description.

> I was easily capable to reproduce this regression with a small debug patch to nvmet

Having this patch in the mail won't make git-am very happy..

Otherwise this change looks good.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 13:58 [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access Sagi Grimberg
  2022-09-28 17:02 ` Daniel Wagner
  2022-09-28 17:14 ` Christoph Hellwig
@ 2022-09-28 17:21 ` Keith Busch
  2022-09-28 20:06   ` Sagi Grimberg
  2022-09-30  6:29 ` Hannes Reinecke
  3 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2022-09-28 17:21 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-nvme, Christoph Hellwig, Chaitanya Kulkarni,
	Hannes Reinecke, Yogev Cohen

On Wed, Sep 28, 2022 at 04:58:16PM +0300, Sagi Grimberg wrote:
> @@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
>  void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
>  {
>  	struct nvme_ns_head *head = ns->head;
> +	struct nvme_ns *n;
>  	sector_t capacity = get_capacity(head->disk);
>  	int node;

Cosmetic nit, the new declaration ought to be below the capacity line for
reverse-xmas tree goodness.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 17:21 ` Keith Busch
@ 2022-09-28 20:06   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-09-28 20:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, Christoph Hellwig, Chaitanya Kulkarni,
	Hannes Reinecke, Yogev Cohen



On 9/28/22 20:21, Keith Busch wrote:
> On Wed, Sep 28, 2022 at 04:58:16PM +0300, Sagi Grimberg wrote:
>> @@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
>>   void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
>>   {
>>   	struct nvme_ns_head *head = ns->head;
>> +	struct nvme_ns *n;
>>   	sector_t capacity = get_capacity(head->disk);
>>   	int node;
> 
> Cosmetic nit, the new declaration ought to be below the capacity line for
> reverse-xmas tree goodness.

Will do.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 17:14 ` Christoph Hellwig
@ 2022-09-28 20:07   ` Sagi Grimberg
  0 siblings, 0 replies; 8+ messages in thread
From: Sagi Grimberg @ 2022-09-28 20:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, Keith Busch, Chaitanya Kulkarni, Hannes Reinecke,
	Yogev Cohen



On 9/28/22 20:14, Christoph Hellwig wrote:
> On Wed, Sep 28, 2022 at 04:58:16PM +0300, Sagi Grimberg wrote:
>> When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
> 
> Nit: a bunch of overly long lines in this patch description.

I'll clean it up.

>> I was easily capable to reproduce this regression with a small debug patch to nvmet
> 
> Having this patch in the mail won't make git-am very happy..

I'll remove it, was trying to avoid writing a cover-letter because
I'm lazy...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-28 13:58 [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access Sagi Grimberg
                   ` (2 preceding siblings ...)
  2022-09-28 17:21 ` Keith Busch
@ 2022-09-30  6:29 ` Hannes Reinecke
  2022-09-30  6:40   ` Christoph Hellwig
  3 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2022-09-30  6:29 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Chaitanya Kulkarni, Yogev Cohen

On 9/28/22 15:58, Sagi Grimberg wrote:
> When we revalidate paths as part of ns size change (as of commit e7d65803e2bb),
> it is possible that during the path revalidation, the only paths that is IO
> capable (i.e. optimized/non-optimized) are the ones that ns resize was not yet
> informed to the host, which will cause inflight requests to be requeued (as we
> have available paths but none are IO capable). These requests on the requeue
> list are waiting for someone to resubmit them at some point.
> 
> The IO capable paths will eventually notify the ns resize change to the
> host, but there is nothing that will kick the requeue list to resubmit the
> queued requests.
> 
> Fix this by always kicking the requeue list, and if no IO capable path exists,
> these requests will be queued again.
> 
> A typical log that indicates that IOs are requeued:
> --
> nvme nvme1: creating 4 I/O queues.
> nvme nvme1: new ctrl: "testnqn1"
> nvme nvme2: creating 4 I/O queues.
> nvme nvme2: mapped 4/0/0 default/read/poll queues.
> nvme nvme2: new ctrl: NQN "testnqn1", addr 127.0.0.1:8009
> nvme nvme1: rescanning namespaces.
> nvme1n1: detected capacity change from 2097152 to 4194304
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> block nvme1n1: no usable path - requeuing I/O
> nvme nvme2: rescanning namespaces.
> --
> 
> Reported-by: Yogev Cohen <yogev@lightbitslabs.com>
> Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> I was easily capable to reproduce this regression with a small debug patch to nvmet
> to have a 1 second delay between controller AENs:
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
>   #include "trace.h"
>   
>   #include "nvmet.h"
> +#include <linux/delay.h>
>   
>   struct workqueue_struct *buffered_io_wq;
>   struct workqueue_struct *zbd_wq;
> @@ -248,6 +249,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
>                  nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
>                                  NVME_AER_NOTICE_NS_CHANGED,
>                                  NVME_LOG_CHANGED_NS);
> +               msleep(1000);
>          }
>   }
> 
> And expose a subsystem via two ports, one ANA 'optimized', and one ANA 'inaccessible'.
> 
> Then test file-backed ns resize duing I/O:
> truncate --size=2G /tmp/f && echo 1 > /sys/kernel/config/nvmet/subsystems/testnqn1/namespaces/1/revalidate_size
> 
>   drivers/nvme/host/multipath.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 6ef497c75a16..d532a78f24a2 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -172,16 +172,18 @@ void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl)
>   void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
>   {
>   	struct nvme_ns_head *head = ns->head;
> +	struct nvme_ns *n;
>   	sector_t capacity = get_capacity(head->disk);
>   	int node;
>   
> -	list_for_each_entry_rcu(ns, &head->list, siblings) {
> -		if (capacity != get_capacity(ns->disk))
> -			clear_bit(NVME_NS_READY, &ns->flags);
> +	list_for_each_entry_rcu(n, &head->list, siblings) {
> +		if (capacity != get_capacity(n->disk))
> +			clear_bit(NVME_NS_READY, &n->flags);
>   	}
>   
>   	for_each_node(node)
>   		rcu_assign_pointer(head->current_path[node], NULL);
> +	nvme_kick_requeue_lists(ns->ctrl);
>   }
>   
>   static bool nvme_path_is_disabled(struct nvme_ns *ns)

Hmm. I guess the real issue is that we use 'ns' as both function 
argument _and_ iterator, so that after 'list_for_each_rcu()' 'ns' is 
pointing to a _different_ namespace, such that we end up kicking that 
namespace and not the one used as argument.
But in either case, patch is fine.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access
  2022-09-30  6:29 ` Hannes Reinecke
@ 2022-09-30  6:40   ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2022-09-30  6:40 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Sagi Grimberg, linux-nvme, Christoph Hellwig, Keith Busch,
	Chaitanya Kulkarni, Yogev Cohen

On Fri, Sep 30, 2022 at 08:29:54AM +0200, Hannes Reinecke wrote:
>>   void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
>>   {
>>   	struct nvme_ns_head *head = ns->head;
>> +	struct nvme_ns *n;
>>   	sector_t capacity = get_capacity(head->disk);
>>   	int node;
>>   -	list_for_each_entry_rcu(ns, &head->list, siblings) {
>> -		if (capacity != get_capacity(ns->disk))
>> -			clear_bit(NVME_NS_READY, &ns->flags);
>> +	list_for_each_entry_rcu(n, &head->list, siblings) {
>> +		if (capacity != get_capacity(n->disk))
>> +			clear_bit(NVME_NS_READY, &n->flags);
>>   	}
>>     	for_each_node(node)
>>   		rcu_assign_pointer(head->current_path[node], NULL);
>> +	nvme_kick_requeue_lists(ns->ctrl);
>>   }
>>     static bool nvme_path_is_disabled(struct nvme_ns *ns)
>
> Hmm. I guess the real issue is that we use 'ns' as both function argument 
> _and_ iterator, so that after 'list_for_each_rcu()' 'ns' is pointing to a 
> _different_ namespace, such that we end up kicking that namespace and not 
> the one used as argument.
> But in either case, patch is fine.

BEfore this patch that wasn't an actual problem as the ns pass as
argument was only used to derived the ns_head but not used after
the loop.  It was a bit of a landmine, though.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-30  6:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 13:58 [PATCH] nvme-multipath: fix possible hang in live ns resize with ANA access Sagi Grimberg
2022-09-28 17:02 ` Daniel Wagner
2022-09-28 17:14 ` Christoph Hellwig
2022-09-28 20:07   ` Sagi Grimberg
2022-09-28 17:21 ` Keith Busch
2022-09-28 20:06   ` Sagi Grimberg
2022-09-30  6:29 ` Hannes Reinecke
2022-09-30  6:40   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).