All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
@ 2020-07-16 19:59 mwilck
  2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Handle the special case where we have exactly one optimized path,
which we should keep using in this case. Also, use the next
non-optimized path, not the last one.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/nvme/host/multipath.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 74bad4e3d377..2c575b783d3e 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
 static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
 		int node, struct nvme_ns *old)
 {
-	struct nvme_ns *ns, *found, *fallback = NULL;
+	struct nvme_ns *ns, *found = NULL;
 
-	if (list_is_singular(&head->list)) {
-		if (nvme_path_is_disabled(old))
-			return NULL;
-		return old;
-	}
 
 	for (ns = nvme_next_ns(head, old);
 	     ns != old;
@@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
 			found = ns;
 			goto out;
 		}
-		if (ns->ana_state == NVME_ANA_NONOPTIMIZED)
-			fallback = ns;
+		if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED)
+			found = ns;
 	}
 
-	if (!fallback)
+	/* Fall back to old if it's better than the others */
+	if (!nvme_path_is_disabled(old) &&
+	    (old->ana_state == NVME_ANA_OPTIMIZED ||
+	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
+		found = old;
+
+	if (!found)
 		return NULL;
-	found = fallback;
+
 out:
 	rcu_assign_pointer(head->current_path[node], found);
 	return found;
-- 
2.26.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa
  2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
@ 2020-07-16 19:59 ` mwilck
  2020-07-17  6:10   ` Hannes Reinecke
  2020-07-17  6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
  2020-07-20 19:51 ` Sagi Grimberg
  2 siblings, 1 reply; 8+ messages in thread
From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch, Sagi Grimberg
  Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Currently, if the RR path selector returns a non-optimized path,
we fall back to __nvme_find_path(), which uses the logic of the
numa path selector. For a given numa node, this always chooses
the same path, thus preventing round-robin logic on non-optimized
paths.

By handling the situation where the current ns is NULL in
nvme_round_robin_path(), we can avoid falling back from round-robin
to NUMA, fixing the issue. The iopolicy case distinction in
__nvme_find_path() can be skipped now.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/nvme/host/multipath.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 2c575b783d3e..ff93bab0d549 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
 		if (nvme_path_is_disabled(ns))
 			continue;
 
-		if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA)
-			distance = node_distance(node, ns->ctrl->numa_node);
-		else
-			distance = LOCAL_DISTANCE;
+		distance = node_distance(node, ns->ctrl->numa_node);
 
 		switch (ns->ana_state) {
 		case NVME_ANA_OPTIMIZED:
@@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
 		int node, struct nvme_ns *old)
 {
 	struct nvme_ns *ns, *found = NULL;
+	bool was_null = (old == NULL);
 
+	if (unlikely(was_null))
+		old = list_first_or_null_rcu(&head->list,
+					     struct nvme_ns, siblings);
+	if (unlikely(!old))
+		return NULL;
 
 	for (ns = nvme_next_ns(head, old);
 	     ns != old;
@@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
 	/* Fall back to old if it's better than the others */
 	if (!nvme_path_is_disabled(old) &&
 	    (old->ana_state == NVME_ANA_OPTIMIZED ||
-	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
+	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) {
 		found = old;
-
+		if (!was_null)
+			/* No need to switch */
+			return found;
+	}
 	if (!found)
 		return NULL;
 
@@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
 	struct nvme_ns *ns;
 
 	ns = srcu_dereference(head->current_path[node], &head->srcu);
-	if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns)
-		ns = nvme_round_robin_path(head, node, ns);
+	if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
+		return nvme_round_robin_path(head, node, ns);
+
 	if (unlikely(!ns || !nvme_path_is_optimized(ns)))
 		ns = __nvme_find_path(head, node);
 	return ns;
-- 
2.26.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
  2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
  2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
@ 2020-07-17  6:08 ` Hannes Reinecke
  2020-07-20 19:51 ` Sagi Grimberg
  2 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-17  6:08 UTC (permalink / raw)
  To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme

On 7/16/20 9:59 PM, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Handle the special case where we have exactly one optimized path,
> which we should keep using in this case. Also, use the next
> non-optimized path, not the last one.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/nvme/host/multipath.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 74bad4e3d377..2c575b783d3e 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
>   static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   		int node, struct nvme_ns *old)
>   {
> -	struct nvme_ns *ns, *found, *fallback = NULL;
> +	struct nvme_ns *ns, *found = NULL;
>   
> -	if (list_is_singular(&head->list)) {
> -		if (nvme_path_is_disabled(old))
> -			return NULL;
> -		return old;
> -	}
>   
>   	for (ns = nvme_next_ns(head, old);
>   	     ns != old;

Why do you remove this?
This is an optimisation for single paths, and should stay.

> @@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   			found = ns;
>   			goto out;
>   		}
> -		if (ns->ana_state == NVME_ANA_NONOPTIMIZED)
> -			fallback = ns;
> +		if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED)
> +			found = ns;
>   	}
>   
> -	if (!fallback)
> +	/* Fall back to old if it's better than the others */
> +	if (!nvme_path_is_disabled(old) &&
> +	    (old->ana_state == NVME_ANA_OPTIMIZED ||
> +	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
> +		found = old;
> +
> +	if (!found)
>   		return NULL;
> -	found = fallback;
> +
>   out:
>   	rcu_assign_pointer(head->current_path[node], found);
>   	return found;
> 
The problem is that we should have tested all paths from (old + 1)
up to and including (old); currently we're only testing paths from
(old + 1) up to, but excluding, (old).

I would rather use this explanation instead of referring to 'better' 
paths; at the very least please name it 'optimal'.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa
  2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
@ 2020-07-17  6:10   ` Hannes Reinecke
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-17  6:10 UTC (permalink / raw)
  To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme

On 7/16/20 9:59 PM, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Currently, if the RR path selector returns a non-optimized path,
> we fall back to __nvme_find_path(), which uses the logic of the
> numa path selector. For a given numa node, this always chooses
> the same path, thus preventing round-robin logic on non-optimized
> paths.
> 
> By handling the situation where the current ns is NULL in
> nvme_round_robin_path(), we can avoid falling back from round-robin
> to NUMA, fixing the issue. The iopolicy case distinction in
> __nvme_find_path() can be skipped now.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   drivers/nvme/host/multipath.c | 23 +++++++++++++++--------
>   1 file changed, 15 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 2c575b783d3e..ff93bab0d549 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
>   		if (nvme_path_is_disabled(ns))
>   			continue;
>   
> -		if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA)
> -			distance = node_distance(node, ns->ctrl->numa_node);
> -		else
> -			distance = LOCAL_DISTANCE;
> +		distance = node_distance(node, ns->ctrl->numa_node);
>   
>   		switch (ns->ana_state) {
>   		case NVME_ANA_OPTIMIZED:
> @@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   		int node, struct nvme_ns *old)
>   {
>   	struct nvme_ns *ns, *found = NULL;
> +	bool was_null = (old == NULL);
>   
> +	if (unlikely(was_null))
> +		old = list_first_or_null_rcu(&head->list,
> +					     struct nvme_ns, siblings);
> +	if (unlikely(!old))
> +		return NULL;
>   
>   	for (ns = nvme_next_ns(head, old);
>   	     ns != old;
> @@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
>   	/* Fall back to old if it's better than the others */
>   	if (!nvme_path_is_disabled(old) &&
>   	    (old->ana_state == NVME_ANA_OPTIMIZED ||
> -	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
> +	     (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) {
>   		found = old;
> -
> +		if (!was_null)
> +			/* No need to switch */
> +			return found;
> +	}
>   	if (!found)
>   		return NULL;
>   
> @@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
>   	struct nvme_ns *ns;
>   
>   	ns = srcu_dereference(head->current_path[node], &head->srcu);
> -	if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns)
> -		ns = nvme_round_robin_path(head, node, ns);
> +	if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
> +		return nvme_round_robin_path(head, node, ns);
> +
>   	if (unlikely(!ns || !nvme_path_is_optimized(ns)))
>   		ns = __nvme_find_path(head, node);
>   	return ns;
> 
Why not modify the last if clause to just

if (unlikely(!ns))

?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
  2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
  2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
  2020-07-17  6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
@ 2020-07-20 19:51 ` Sagi Grimberg
  2020-07-21  6:39   ` Hannes Reinecke
  2 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-07-20 19:51 UTC (permalink / raw)
  To: mwilck, Christoph Hellwig, Keith Busch
  Cc: marting, Hannes Reinecke, linux-nvme


> From: Martin Wilck <mwilck@suse.com>
> 
> Handle the special case where we have exactly one optimized path,
> which we should keep using in this case. Also, use the next
> non-optimized path, not the last one.

Not sure I understand, does this patch also use nonopt paths if we
have a single opt path?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
  2020-07-20 19:51 ` Sagi Grimberg
@ 2020-07-21  6:39   ` Hannes Reinecke
  2020-07-21 17:19     ` Sagi Grimberg
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-21  6:39 UTC (permalink / raw)
  To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme

On 7/20/20 9:51 PM, Sagi Grimberg wrote:
> 
>> From: Martin Wilck <mwilck@suse.com>
>>
>> Handle the special case where we have exactly one optimized path,
>> which we should keep using in this case. Also, use the next
>> non-optimized path, not the last one.
> 
> Not sure I understand, does this patch also use nonopt paths if we
> have a single opt path?

Indeed, the latter change should be removed from this patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
  2020-07-21  6:39   ` Hannes Reinecke
@ 2020-07-21 17:19     ` Sagi Grimberg
  2020-07-22  5:35       ` Hannes Reinecke
  0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-07-21 17:19 UTC (permalink / raw)
  To: Hannes Reinecke, mwilck, Christoph Hellwig, Keith Busch
  Cc: marting, linux-nvme


>>> Handle the special case where we have exactly one optimized path,
>>> which we should keep using in this case. Also, use the next
>>> non-optimized path, not the last one.
>>
>> Not sure I understand, does this patch also use nonopt paths if we
>> have a single opt path?
> 
> Indeed, the latter change should be removed from this patch.

Why should we even do this? we have a clear indication from the
controller on ns access. If we have a optimized path we should never
select any non-optimized path.

What if a non-optimized path has a 50x higher access latency? it's just
wrong to use this path...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
  2020-07-21 17:19     ` Sagi Grimberg
@ 2020-07-22  5:35       ` Hannes Reinecke
  0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-22  5:35 UTC (permalink / raw)
  To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme

On 7/21/20 7:19 PM, Sagi Grimberg wrote:
> 
>>>> Handle the special case where we have exactly one optimized path,
>>>> which we should keep using in this case. Also, use the next
>>>> non-optimized path, not the last one.
>>>
>>> Not sure I understand, does this patch also use nonopt paths if we
>>> have a single opt path?
>>
>> Indeed, the latter change should be removed from this patch.
> 
> Why should we even do this? we have a clear indication from the
> controller on ns access. If we have a optimized path we should never
> select any non-optimized path.
> 
> What if a non-optimized path has a 50x higher access latency? it's just
> wrong to use this path...

Errm. I think Martin didn't state clearly what the problem was.

The problem is when the _current_ path is the only optimized path, and 
all other paths are non-optimized:

	for (ns = nvme_next_ns(head, old);
	     ns != old;
	     ns = nvme_next_ns(head, ns)) {

We start with the _next_ path, and stop _before_ we check the 'old' 
path. Consequently the 'old' path is never checked, and never selected.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-22  5:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
2020-07-17  6:10   ` Hannes Reinecke
2020-07-17  6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
2020-07-20 19:51 ` Sagi Grimberg
2020-07-21  6:39   ` Hannes Reinecke
2020-07-21 17:19     ` Sagi Grimberg
2020-07-22  5:35       ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.