* [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
@ 2020-07-16 19:59 mwilck
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch, Sagi Grimberg
Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck
From: Martin Wilck <mwilck@suse.com>
Handle the special case where we have exactly one optimized path,
which we should keep using in this case. Also, use the next
non-optimized path, not the last one.
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
drivers/nvme/host/multipath.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 74bad4e3d377..2c575b783d3e 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
int node, struct nvme_ns *old)
{
- struct nvme_ns *ns, *found, *fallback = NULL;
+ struct nvme_ns *ns, *found = NULL;
- if (list_is_singular(&head->list)) {
- if (nvme_path_is_disabled(old))
- return NULL;
- return old;
- }
for (ns = nvme_next_ns(head, old);
ns != old;
@@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
found = ns;
goto out;
}
- if (ns->ana_state == NVME_ANA_NONOPTIMIZED)
- fallback = ns;
+ if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED)
+ found = ns;
}
- if (!fallback)
+ /* Fall back to old if it's better than the others */
+ if (!nvme_path_is_disabled(old) &&
+ (old->ana_state == NVME_ANA_OPTIMIZED ||
+ (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
+ found = old;
+
+ if (!found)
return NULL;
- found = fallback;
+
out:
rcu_assign_pointer(head->current_path[node], found);
return found;
--
2.26.2
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa
2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
@ 2020-07-16 19:59 ` mwilck
2020-07-17 6:10 ` Hannes Reinecke
2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
2020-07-20 19:51 ` Sagi Grimberg
2 siblings, 1 reply; 8+ messages in thread
From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch, Sagi Grimberg
Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck
From: Martin Wilck <mwilck@suse.com>
Currently, if the RR path selector returns a non-optimized path,
we fall back to __nvme_find_path(), which uses the logic of the
numa path selector. For a given numa node, this always chooses
the same path, thus preventing round-robin logic on non-optimized
paths.
By handling the situation where the current ns is NULL in
nvme_round_robin_path(), we can avoid falling back from round-robin
to NUMA, fixing the issue. The iopolicy case distinction in
__nvme_find_path() can be skipped now.
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
drivers/nvme/host/multipath.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 2c575b783d3e..ff93bab0d549 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
if (nvme_path_is_disabled(ns))
continue;
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA)
- distance = node_distance(node, ns->ctrl->numa_node);
- else
- distance = LOCAL_DISTANCE;
+ distance = node_distance(node, ns->ctrl->numa_node);
switch (ns->ana_state) {
case NVME_ANA_OPTIMIZED:
@@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
int node, struct nvme_ns *old)
{
struct nvme_ns *ns, *found = NULL;
+ bool was_null = (old == NULL);
+ if (unlikely(was_null))
+ old = list_first_or_null_rcu(&head->list,
+ struct nvme_ns, siblings);
+ if (unlikely(!old))
+ return NULL;
for (ns = nvme_next_ns(head, old);
ns != old;
@@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
/* Fall back to old if it's better than the others */
if (!nvme_path_is_disabled(old) &&
(old->ana_state == NVME_ANA_OPTIMIZED ||
- (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
+ (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) {
found = old;
-
+ if (!was_null)
+ /* No need to switch */
+ return found;
+ }
if (!found)
return NULL;
@@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
struct nvme_ns *ns;
ns = srcu_dereference(head->current_path[node], &head->srcu);
- if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns)
- ns = nvme_round_robin_path(head, node, ns);
+ if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
+ return nvme_round_robin_path(head, node, ns);
+
if (unlikely(!ns || !nvme_path_is_optimized(ns)))
ns = __nvme_find_path(head, node);
return ns;
--
2.26.2
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
@ 2020-07-17 6:08 ` Hannes Reinecke
2020-07-20 19:51 ` Sagi Grimberg
2 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-17 6:08 UTC (permalink / raw)
To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme
On 7/16/20 9:59 PM, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
>
> Handle the special case where we have exactly one optimized path,
> which we should keep using in this case. Also, use the next
> non-optimized path, not the last one.
>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
> drivers/nvme/host/multipath.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 74bad4e3d377..2c575b783d3e 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head,
> static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> int node, struct nvme_ns *old)
> {
> - struct nvme_ns *ns, *found, *fallback = NULL;
> + struct nvme_ns *ns, *found = NULL;
>
> - if (list_is_singular(&head->list)) {
> - if (nvme_path_is_disabled(old))
> - return NULL;
> - return old;
> - }
>
> for (ns = nvme_next_ns(head, old);
> ns != old;
Why do you remove this?
This is an optimisation for single paths, and should stay.
> @@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> found = ns;
> goto out;
> }
> - if (ns->ana_state == NVME_ANA_NONOPTIMIZED)
> - fallback = ns;
> + if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED)
> + found = ns;
> }
>
> - if (!fallback)
> + /* Fall back to old if it's better than the others */
> + if (!nvme_path_is_disabled(old) &&
> + (old->ana_state == NVME_ANA_OPTIMIZED ||
> + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
> + found = old;
> +
> + if (!found)
> return NULL;
> - found = fallback;
> +
> out:
> rcu_assign_pointer(head->current_path[node], found);
> return found;
>
The problem is that we should have tested all paths from (old + 1)
up to and including (old); currently we're only testing paths from
(old + 1) up to, but excluding, (old).
I would rather use this explanation instead of referring to 'better'
paths; at the very least please name it 'optimal'.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
@ 2020-07-17 6:10 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-17 6:10 UTC (permalink / raw)
To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme
On 7/16/20 9:59 PM, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
>
> Currently, if the RR path selector returns a non-optimized path,
> we fall back to __nvme_find_path(), which uses the logic of the
> numa path selector. For a given numa node, this always chooses
> the same path, thus preventing round-robin logic on non-optimized
> paths.
>
> By handling the situation where the current ns is NULL in
> nvme_round_robin_path(), we can avoid falling back from round-robin
> to NUMA, fixing the issue. The iopolicy case distinction in
> __nvme_find_path() can be skipped now.
>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
> drivers/nvme/host/multipath.c | 23 +++++++++++++++--------
> 1 file changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 2c575b783d3e..ff93bab0d549 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node)
> if (nvme_path_is_disabled(ns))
> continue;
>
> - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA)
> - distance = node_distance(node, ns->ctrl->numa_node);
> - else
> - distance = LOCAL_DISTANCE;
> + distance = node_distance(node, ns->ctrl->numa_node);
>
> switch (ns->ana_state) {
> case NVME_ANA_OPTIMIZED:
> @@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> int node, struct nvme_ns *old)
> {
> struct nvme_ns *ns, *found = NULL;
> + bool was_null = (old == NULL);
>
> + if (unlikely(was_null))
> + old = list_first_or_null_rcu(&head->list,
> + struct nvme_ns, siblings);
> + if (unlikely(!old))
> + return NULL;
>
> for (ns = nvme_next_ns(head, old);
> ns != old;
> @@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head,
> /* Fall back to old if it's better than the others */
> if (!nvme_path_is_disabled(old) &&
> (old->ana_state == NVME_ANA_OPTIMIZED ||
> - (!found && old->ana_state == NVME_ANA_NONOPTIMIZED)))
> + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) {
> found = old;
> -
> + if (!was_null)
> + /* No need to switch */
> + return found;
> + }
> if (!found)
> return NULL;
>
> @@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> struct nvme_ns *ns;
>
> ns = srcu_dereference(head->current_path[node], &head->srcu);
> - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns)
> - ns = nvme_round_robin_path(head, node, ns);
> + if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR)
> + return nvme_round_robin_path(head, node, ns);
> +
> if (unlikely(!ns || !nvme_path_is_optimized(ns)))
> ns = __nvme_find_path(head, node);
> return ns;
>
Why not modify the last if clause to just
if (unlikely(!ns))
?
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
@ 2020-07-20 19:51 ` Sagi Grimberg
2020-07-21 6:39 ` Hannes Reinecke
2 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-07-20 19:51 UTC (permalink / raw)
To: mwilck, Christoph Hellwig, Keith Busch
Cc: marting, Hannes Reinecke, linux-nvme
> From: Martin Wilck <mwilck@suse.com>
>
> Handle the special case where we have exactly one optimized path,
> which we should keep using in this case. Also, use the next
> non-optimized path, not the last one.
Not sure I understand, does this patch also use nonopt paths if we
have a single opt path?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
2020-07-20 19:51 ` Sagi Grimberg
@ 2020-07-21 6:39 ` Hannes Reinecke
2020-07-21 17:19 ` Sagi Grimberg
0 siblings, 1 reply; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-21 6:39 UTC (permalink / raw)
To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme
On 7/20/20 9:51 PM, Sagi Grimberg wrote:
>
>> From: Martin Wilck <mwilck@suse.com>
>>
>> Handle the special case where we have exactly one optimized path,
>> which we should keep using in this case. Also, use the next
>> non-optimized path, not the last one.
>
> Not sure I understand, does this patch also use nonopt paths if we
> have a single opt path?
Indeed, the latter change should be removed from this patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
2020-07-21 6:39 ` Hannes Reinecke
@ 2020-07-21 17:19 ` Sagi Grimberg
2020-07-22 5:35 ` Hannes Reinecke
0 siblings, 1 reply; 8+ messages in thread
From: Sagi Grimberg @ 2020-07-21 17:19 UTC (permalink / raw)
To: Hannes Reinecke, mwilck, Christoph Hellwig, Keith Busch
Cc: marting, linux-nvme
>>> Handle the special case where we have exactly one optimized path,
>>> which we should keep using in this case. Also, use the next
>>> non-optimized path, not the last one.
>>
>> Not sure I understand, does this patch also use nonopt paths if we
>> have a single opt path?
>
> Indeed, the latter change should be removed from this patch.
Why should we even do this? we have a clear indication from the
controller on ns access. If we have a optimized path we should never
select any non-optimized path.
What if a non-optimized path has a 50x higher access latency? it's just
wrong to use this path...
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths
2020-07-21 17:19 ` Sagi Grimberg
@ 2020-07-22 5:35 ` Hannes Reinecke
0 siblings, 0 replies; 8+ messages in thread
From: Hannes Reinecke @ 2020-07-22 5:35 UTC (permalink / raw)
To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme
On 7/21/20 7:19 PM, Sagi Grimberg wrote:
>
>>>> Handle the special case where we have exactly one optimized path,
>>>> which we should keep using in this case. Also, use the next
>>>> non-optimized path, not the last one.
>>>
>>> Not sure I understand, does this patch also use nonopt paths if we
>>> have a single opt path?
>>
>> Indeed, the latter change should be removed from this patch.
>
> Why should we even do this? we have a clear indication from the
> controller on ns access. If we have a optimized path we should never
> select any non-optimized path.
>
> What if a non-optimized path has a 50x higher access latency? it's just
> wrong to use this path...
Errm. I think Martin didn't state clearly what the problem was.
The problem is when the _current_ path is the only optimized path, and
all other paths are non-optimized:
for (ns = nvme_next_ns(head, old);
ns != old;
ns = nvme_next_ns(head, ns)) {
We start with the _next_ path, and stop _before_ we check the 'old'
path. Consequently the 'old' path is never checked, and never selected.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-07-22 5:35 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck
2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck
2020-07-17 6:10 ` Hannes Reinecke
2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke
2020-07-20 19:51 ` Sagi Grimberg
2020-07-21 6:39 ` Hannes Reinecke
2020-07-21 17:19 ` Sagi Grimberg
2020-07-22 5:35 ` Hannes Reinecke
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.