* [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths @ 2020-07-16 19:59 mwilck 2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw) To: Christoph Hellwig, Keith Busch, Sagi Grimberg Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck From: Martin Wilck <mwilck@suse.com> Handle the special case where we have exactly one optimized path, which we should keep using in this case. Also, use the next non-optimized path, not the last one. Signed-off-by: Martin Wilck <mwilck@suse.com> --- drivers/nvme/host/multipath.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 74bad4e3d377..2c575b783d3e 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, int node, struct nvme_ns *old) { - struct nvme_ns *ns, *found, *fallback = NULL; + struct nvme_ns *ns, *found = NULL; - if (list_is_singular(&head->list)) { - if (nvme_path_is_disabled(old)) - return NULL; - return old; - } for (ns = nvme_next_ns(head, old); ns != old; @@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, found = ns; goto out; } - if (ns->ana_state == NVME_ANA_NONOPTIMIZED) - fallback = ns; + if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED) + found = ns; } - if (!fallback) + /* Fall back to old if it's better than the others */ + if (!nvme_path_is_disabled(old) && + (old->ana_state == NVME_ANA_OPTIMIZED || + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) + found = old; + + if (!found) return NULL; - found = fallback; + out: rcu_assign_pointer(head->current_path[node], found); return found; -- 2.26.2 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa 2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck @ 2020-07-16 19:59 ` mwilck 2020-07-17 6:10 ` Hannes Reinecke 2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke 2020-07-20 19:51 ` Sagi Grimberg 2 siblings, 1 reply; 8+ messages in thread From: mwilck @ 2020-07-16 19:59 UTC (permalink / raw) To: Christoph Hellwig, Keith Busch, Sagi Grimberg Cc: marting, Hannes Reinecke, linux-nvme, Martin Wilck From: Martin Wilck <mwilck@suse.com> Currently, if the RR path selector returns a non-optimized path, we fall back to __nvme_find_path(), which uses the logic of the numa path selector. For a given numa node, this always chooses the same path, thus preventing round-robin logic on non-optimized paths. By handling the situation where the current ns is NULL in nvme_round_robin_path(), we can avoid falling back from round-robin to NUMA, fixing the issue. The iopolicy case distinction in __nvme_find_path() can be skipped now. Signed-off-by: Martin Wilck <mwilck@suse.com> --- drivers/nvme/host/multipath.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 2c575b783d3e..ff93bab0d549 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) if (nvme_path_is_disabled(ns)) continue; - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA) - distance = node_distance(node, ns->ctrl->numa_node); - else - distance = LOCAL_DISTANCE; + distance = node_distance(node, ns->ctrl->numa_node); switch (ns->ana_state) { case NVME_ANA_OPTIMIZED: @@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, int node, struct nvme_ns *old) { struct nvme_ns *ns, *found = NULL; + bool was_null = (old == NULL); + if (unlikely(was_null)) + old = list_first_or_null_rcu(&head->list, + struct nvme_ns, siblings); + if (unlikely(!old)) + return NULL; for (ns = nvme_next_ns(head, old); ns != old; @@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, /* Fall back to old if it's better than the others */ if (!nvme_path_is_disabled(old) && (old->ana_state == NVME_ANA_OPTIMIZED || - (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) { found = old; - + if (!was_null) + /* No need to switch */ + return found; + } if (!found) return NULL; @@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head) struct nvme_ns *ns; ns = srcu_dereference(head->current_path[node], &head->srcu); - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns) - ns = nvme_round_robin_path(head, node, ns); + if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR) + return nvme_round_robin_path(head, node, ns); + if (unlikely(!ns || !nvme_path_is_optimized(ns))) ns = __nvme_find_path(head, node); return ns; -- 2.26.2 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa 2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck @ 2020-07-17 6:10 ` Hannes Reinecke 0 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2020-07-17 6:10 UTC (permalink / raw) To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme On 7/16/20 9:59 PM, mwilck@suse.com wrote: > From: Martin Wilck <mwilck@suse.com> > > Currently, if the RR path selector returns a non-optimized path, > we fall back to __nvme_find_path(), which uses the logic of the > numa path selector. For a given numa node, this always chooses > the same path, thus preventing round-robin logic on non-optimized > paths. > > By handling the situation where the current ns is NULL in > nvme_round_robin_path(), we can avoid falling back from round-robin > to NUMA, fixing the issue. The iopolicy case distinction in > __nvme_find_path() can be skipped now. > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/nvme/host/multipath.c | 23 +++++++++++++++-------- > 1 file changed, 15 insertions(+), 8 deletions(-) > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 2c575b783d3e..ff93bab0d549 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -181,10 +181,7 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) > if (nvme_path_is_disabled(ns)) > continue; > > - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_NUMA) > - distance = node_distance(node, ns->ctrl->numa_node); > - else > - distance = LOCAL_DISTANCE; > + distance = node_distance(node, ns->ctrl->numa_node); > > switch (ns->ana_state) { > case NVME_ANA_OPTIMIZED: > @@ -225,7 +222,13 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > int node, struct nvme_ns *old) > { > struct nvme_ns *ns, *found = NULL; > + bool was_null = (old == NULL); > > + if (unlikely(was_null)) > + old = list_first_or_null_rcu(&head->list, > + struct nvme_ns, siblings); > + if (unlikely(!old)) > + return NULL; > > for (ns = nvme_next_ns(head, old); > ns != old; > @@ -244,9 +247,12 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > /* Fall back to old if it's better than the others */ > if (!nvme_path_is_disabled(old) && > (old->ana_state == NVME_ANA_OPTIMIZED || > - (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) > + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) { > found = old; > - > + if (!was_null) > + /* No need to switch */ > + return found; > + } > if (!found) > return NULL; > > @@ -267,8 +273,9 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head) > struct nvme_ns *ns; > > ns = srcu_dereference(head->current_path[node], &head->srcu); > - if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR && ns) > - ns = nvme_round_robin_path(head, node, ns); > + if (READ_ONCE(head->subsys->iopolicy) == NVME_IOPOLICY_RR) > + return nvme_round_robin_path(head, node, ns); > + > if (unlikely(!ns || !nvme_path_is_optimized(ns))) > ns = __nvme_find_path(head, node); > return ns; > Why not modify the last if clause to just if (unlikely(!ns)) ? Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths 2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck 2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck @ 2020-07-17 6:08 ` Hannes Reinecke 2020-07-20 19:51 ` Sagi Grimberg 2 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2020-07-17 6:08 UTC (permalink / raw) To: mwilck, Christoph Hellwig, Keith Busch, Sagi Grimberg; +Cc: marting, linux-nvme On 7/16/20 9:59 PM, mwilck@suse.com wrote: > From: Martin Wilck <mwilck@suse.com> > > Handle the special case where we have exactly one optimized path, > which we should keep using in this case. Also, use the next > non-optimized path, not the last one. > > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > drivers/nvme/host/multipath.c | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c > index 74bad4e3d377..2c575b783d3e 100644 > --- a/drivers/nvme/host/multipath.c > +++ b/drivers/nvme/host/multipath.c > @@ -224,13 +224,8 @@ static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, > static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > int node, struct nvme_ns *old) > { > - struct nvme_ns *ns, *found, *fallback = NULL; > + struct nvme_ns *ns, *found = NULL; > > - if (list_is_singular(&head->list)) { > - if (nvme_path_is_disabled(old)) > - return NULL; > - return old; > - } > > for (ns = nvme_next_ns(head, old); > ns != old; Why do you remove this? This is an optimisation for single paths, and should stay. > @@ -242,13 +237,19 @@ static struct nvme_ns *nvme_round_robin_path(struct nvme_ns_head *head, > found = ns; > goto out; > } > - if (ns->ana_state == NVME_ANA_NONOPTIMIZED) > - fallback = ns; > + if (!found && ns->ana_state == NVME_ANA_NONOPTIMIZED) > + found = ns; > } > > - if (!fallback) > + /* Fall back to old if it's better than the others */ > + if (!nvme_path_is_disabled(old) && > + (old->ana_state == NVME_ANA_OPTIMIZED || > + (!found && old->ana_state == NVME_ANA_NONOPTIMIZED))) > + found = old; > + > + if (!found) > return NULL; > - found = fallback; > + > out: > rcu_assign_pointer(head->current_path[node], found); > return found; > The problem is that we should have tested all paths from (old + 1) up to and including (old); currently we're only testing paths from (old + 1) up to, but excluding, (old). I would rather use this explanation instead of referring to 'better' paths; at the very least please name it 'optimal'. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths 2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck 2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck 2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke @ 2020-07-20 19:51 ` Sagi Grimberg 2020-07-21 6:39 ` Hannes Reinecke 2 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2020-07-20 19:51 UTC (permalink / raw) To: mwilck, Christoph Hellwig, Keith Busch Cc: marting, Hannes Reinecke, linux-nvme > From: Martin Wilck <mwilck@suse.com> > > Handle the special case where we have exactly one optimized path, > which we should keep using in this case. Also, use the next > non-optimized path, not the last one. Not sure I understand, does this patch also use nonopt paths if we have a single opt path? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths 2020-07-20 19:51 ` Sagi Grimberg @ 2020-07-21 6:39 ` Hannes Reinecke 2020-07-21 17:19 ` Sagi Grimberg 0 siblings, 1 reply; 8+ messages in thread From: Hannes Reinecke @ 2020-07-21 6:39 UTC (permalink / raw) To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme On 7/20/20 9:51 PM, Sagi Grimberg wrote: > >> From: Martin Wilck <mwilck@suse.com> >> >> Handle the special case where we have exactly one optimized path, >> which we should keep using in this case. Also, use the next >> non-optimized path, not the last one. > > Not sure I understand, does this patch also use nonopt paths if we > have a single opt path? Indeed, the latter change should be removed from this patch. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths 2020-07-21 6:39 ` Hannes Reinecke @ 2020-07-21 17:19 ` Sagi Grimberg 2020-07-22 5:35 ` Hannes Reinecke 0 siblings, 1 reply; 8+ messages in thread From: Sagi Grimberg @ 2020-07-21 17:19 UTC (permalink / raw) To: Hannes Reinecke, mwilck, Christoph Hellwig, Keith Busch Cc: marting, linux-nvme >>> Handle the special case where we have exactly one optimized path, >>> which we should keep using in this case. Also, use the next >>> non-optimized path, not the last one. >> >> Not sure I understand, does this patch also use nonopt paths if we >> have a single opt path? > > Indeed, the latter change should be removed from this patch. Why should we even do this? we have a clear indication from the controller on ns access. If we have a optimized path we should never select any non-optimized path. What if a non-optimized path has a 50x higher access latency? it's just wrong to use this path... _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths 2020-07-21 17:19 ` Sagi Grimberg @ 2020-07-22 5:35 ` Hannes Reinecke 0 siblings, 0 replies; 8+ messages in thread From: Hannes Reinecke @ 2020-07-22 5:35 UTC (permalink / raw) To: Sagi Grimberg, mwilck, Christoph Hellwig, Keith Busch; +Cc: marting, linux-nvme On 7/21/20 7:19 PM, Sagi Grimberg wrote: > >>>> Handle the special case where we have exactly one optimized path, >>>> which we should keep using in this case. Also, use the next >>>> non-optimized path, not the last one. >>> >>> Not sure I understand, does this patch also use nonopt paths if we >>> have a single opt path? >> >> Indeed, the latter change should be removed from this patch. > > Why should we even do this? we have a clear indication from the > controller on ns access. If we have a optimized path we should never > select any non-optimized path. > > What if a non-optimized path has a 50x higher access latency? it's just > wrong to use this path... Errm. I think Martin didn't state clearly what the problem was. The problem is when the _current_ path is the only optimized path, and all other paths are non-optimized: for (ns = nvme_next_ns(head, old); ns != old; ns = nvme_next_ns(head, ns)) { We start with the _next_ path, and stop _before_ we check the 'old' path. Consequently the 'old' path is never checked, and never selected. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-07-22 5:35 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-16 19:59 [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths mwilck 2020-07-16 19:59 ` [PATCH 2/2] nvme: multipath: round-robin: don't fall back to numa mwilck 2020-07-17 6:10 ` Hannes Reinecke 2020-07-17 6:08 ` [PATCH 1/2] nvme: multipath: round-robin: fix logic for non-optimized paths Hannes Reinecke 2020-07-20 19:51 ` Sagi Grimberg 2020-07-21 6:39 ` Hannes Reinecke 2020-07-21 17:19 ` Sagi Grimberg 2020-07-22 5:35 ` Hannes Reinecke
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.