All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] multipath-tools: prio handling for offline paths
@ 2019-04-11 10:49 Martin Wilck
  2019-04-11 10:49 ` [PATCH 1/4] libmultipath: group_by_prio: fix signedness bug Martin Wilck
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Martin Wilck @ 2019-04-11 10:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

A recent bug report revealed that the handling of path priorities is
inconsistent in multipathd when paths fail. In the current code, depending
on timing, the prio of a faild path may be reset to 0, reset to -1
(PRIO_UNDEF), or not reset at all. The inconsistency is most obvious
with "group_by_prio", as it results in strange artefacts how failed
paths are regrouped, and how path groups are sorted.

This series tries to address these issues.

The general mind set that I applied is that, in case of doubt, it's
best to retain the priority that was retrieved while the path was
last accessible.

Martin Wilck (4):
  libmultipath: group_by_prio: fix signedness bug
  Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN"
  libmultipath: ana prioritizer: decrease log level
  libmultipath: get_prio(): don't reset prio for inaccessible paths

 libmultipath/discovery.c        | 16 +++++++++++-----
 libmultipath/pgpolicies.c       |  2 +-
 libmultipath/prioritizers/ana.c |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] libmultipath: group_by_prio: fix signedness bug
  2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
@ 2019-04-11 10:49 ` Martin Wilck
  2019-04-11 10:49 ` [PATCH 2/4] Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN" Martin Wilck
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2019-04-11 10:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

pp->priority can be negative, so we we shouldn't compare
it with an unsigned int.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/pgpolicies.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/pgpolicies.c b/libmultipath/pgpolicies.c
index ac2596ad..660768a4 100644
--- a/libmultipath/pgpolicies.c
+++ b/libmultipath/pgpolicies.c
@@ -312,7 +312,7 @@ out:
 int group_by_prio(struct multipath *mp)
 {
 	int i;
-	unsigned int prio;
+	int prio;
 	struct path * pp;
 	struct pathgroup * pgp;
 	vector pathvec = NULL;
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN"
  2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
  2019-04-11 10:49 ` [PATCH 1/4] libmultipath: group_by_prio: fix signedness bug Martin Wilck
@ 2019-04-11 10:49 ` Martin Wilck
  2019-04-11 10:49 ` [PATCH 3/4] libmultipath: ana prioritizer: decrease log level Martin Wilck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2019-04-11 10:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

This reverts commit ce8d707c4235860373238dea6491a77a931d4c9f.

In check_path(), we don't touch path priority if a path is down.
But when pathinfo(DI_CHECKER) is called in down state, we reset
the priority to 0. This is inconsistent.

Commit ce8d707 was about maps being rejected during multipath startup
because of undefined priorities. Since commit 94036e3 "libmultipath:
don't reject maps with undefined prio", such maps aren't rejected
any more, thus we can skip resetting the priority to 0.

Note that when we calculate path group priorities, the prio of
paths which are not UP or GHOST are ignored anyway, so this
change will not cause changes wrt PG priorities or PG ordering.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Cc: Hannes Reinecke <hare@suse.de>
---
 libmultipath/discovery.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 744cf2cc..7a17911d 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1982,9 +1982,6 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 		} else {
 			condlog(3, "%s: path inaccessible", pp->dev);
 			pp->chkrstate = pp->state = path_state;
-			if (path_state == PATH_PENDING ||
-			    path_state == PATH_DOWN)
-				pp->priority = 0;
 		}
 	}
 
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] libmultipath: ana prioritizer: decrease log level
  2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
  2019-04-11 10:49 ` [PATCH 1/4] libmultipath: group_by_prio: fix signedness bug Martin Wilck
  2019-04-11 10:49 ` [PATCH 2/4] Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN" Martin Wilck
@ 2019-04-11 10:49 ` Martin Wilck
  2019-04-11 10:49 ` [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths Martin Wilck
  2019-04-18 16:27 ` [PATCH 0/4] multipath-tools: prio handling for offline paths Benjamin Marzinski
  4 siblings, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2019-04-11 10:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

The "ana state = ..." printed in the get_prio() call path
is irritating, because by it's similarity to the checker
messages it suggests to the reader that an "ana checker",
which doesn't exist, was in place. Moreover, the "ana prio ="
message is also only printed at level 4.

Decrease the verbosity of this message.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/prioritizers/ana.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/prioritizers/ana.c b/libmultipath/prioritizers/ana.c
index 990d935f..2673d9d9 100644
--- a/libmultipath/prioritizers/ana.c
+++ b/libmultipath/prioritizers/ana.c
@@ -164,7 +164,7 @@ int get_ana_info(struct path * pp, unsigned int timeout)
 				   ana_log, ana_log_len);
 	pthread_cleanup_pop(1);
 	if (rc >= 0)
-		condlog(3, "%s: ana state = %02x [%s]", pp->dev, rc,
+		condlog(4, "%s: ana state = %02x [%s]", pp->dev, rc,
 			aas_print_string(rc));
 	return rc;
 }
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths
  2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
                   ` (2 preceding siblings ...)
  2019-04-11 10:49 ` [PATCH 3/4] libmultipath: ana prioritizer: decrease log level Martin Wilck
@ 2019-04-11 10:49 ` Martin Wilck
  2019-05-01 20:31   ` Benjamin Marzinski
  2019-04-18 16:27 ` [PATCH 0/4] multipath-tools: prio handling for offline paths Benjamin Marzinski
  4 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2019-04-11 10:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck

pathinfo() doesn't call get_prio() if a path is down, now keeping the old
priority value. But if a path becomes inaccessible between the state check
and the get_prio() call, retrieving the priority will likely fail, and the
path priority will be reset to PRIO_UNDEF. This makes it timing-dependent
how the priority of a failed path is treated. Fix that by checking the path
state in get_prio() if an error occurs, and not touching pp->priority if
the path is in inaccessible state. A checker_check() call would be too
much here, but a quick path_offline() check seems appropriate.

Keep the previous behavior for the case where getting the priority fails
although the path is apparently healthy. This is presumably a very rare
condition, in which it seems actually wrong to preserve the old prio value.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/discovery.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 7a17911d..7d638c20 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1625,8 +1625,17 @@ get_prio (struct path * pp)
 	old_prio = pp->priority;
 	pp->priority = prio_getprio(p, pp, checker_timeout);
 	if (pp->priority < 0) {
-		condlog(3, "%s: %s prio error", pp->dev, prio_name(p));
-		pp->priority = PRIO_UNDEF;
+		/* this changes pp->offline, but why not */
+		int state = path_offline(pp);
+
+		if (state == PATH_DOWN || state == PATH_PENDING)
+			condlog(3, "%s: %s prio error in state %d, keeping prio = %d",
+				pp->dev, prio_name(p), state, pp->priority);
+		else {
+			condlog(3, "%s: %s prio error in state %d",
+				pp->dev, prio_name(p), state);
+			pp->priority = PRIO_UNDEF;
+		}
 		return 1;
 	}
 	condlog((old_prio == pp->priority ? 4 : 3), "%s: %s prio = %u",
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] multipath-tools: prio handling for offline paths
  2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
                   ` (3 preceding siblings ...)
  2019-04-11 10:49 ` [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths Martin Wilck
@ 2019-04-18 16:27 ` Benjamin Marzinski
  4 siblings, 0 replies; 7+ messages in thread
From: Benjamin Marzinski @ 2019-04-18 16:27 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Apr 11, 2019 at 12:49:19PM +0200, Martin Wilck wrote:
> A recent bug report revealed that the handling of path priorities is
> inconsistent in multipathd when paths fail. In the current code, depending
> on timing, the prio of a faild path may be reset to 0, reset to -1
> (PRIO_UNDEF), or not reset at all. The inconsistency is most obvious
> with "group_by_prio", as it results in strange artefacts how failed
> paths are regrouped, and how path groups are sorted.
> 
> This series tries to address these issues.
> 
> The general mind set that I applied is that, in case of doubt, it's
> best to retain the priority that was retrieved while the path was
> last accessible.

ACK for the set.

-Ben

> 
> Martin Wilck (4):
>   libmultipath: group_by_prio: fix signedness bug
>   Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN"
>   libmultipath: ana prioritizer: decrease log level
>   libmultipath: get_prio(): don't reset prio for inaccessible paths
> 
>  libmultipath/discovery.c        | 16 +++++++++++-----
>  libmultipath/pgpolicies.c       |  2 +-
>  libmultipath/prioritizers/ana.c |  2 +-
>  3 files changed, 13 insertions(+), 7 deletions(-)
> 
> -- 
> 2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths
  2019-04-11 10:49 ` [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths Martin Wilck
@ 2019-05-01 20:31   ` Benjamin Marzinski
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Marzinski @ 2019-05-01 20:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Apr 11, 2019 at 12:49:23PM +0200, Martin Wilck wrote:
> pathinfo() doesn't call get_prio() if a path is down, now keeping the old
> priority value. But if a path becomes inaccessible between the state check
> and the get_prio() call, retrieving the priority will likely fail, and the
> path priority will be reset to PRIO_UNDEF. This makes it timing-dependent
> how the priority of a failed path is treated. Fix that by checking the path
> state in get_prio() if an error occurs, and not touching pp->priority if
> the path is in inaccessible state. A checker_check() call would be too
> much here, but a quick path_offline() check seems appropriate.
> 
> Keep the previous behavior for the case where getting the priority fails
> although the path is apparently healthy. This is presumably a very rare
> condition, in which it seems actually wrong to preserve the old prio value.
> 

I think I was to hasty in ACKing this one.

> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/discovery.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 7a17911d..7d638c20 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1625,8 +1625,17 @@ get_prio (struct path * pp)
>  	old_prio = pp->priority;
>  	pp->priority = prio_getprio(p, pp, checker_timeout);
>  	if (pp->priority < 0) {
> -		condlog(3, "%s: %s prio error", pp->dev, prio_name(p));
> -		pp->priority = PRIO_UNDEF;
> +		/* this changes pp->offline, but why not */
> +		int state = path_offline(pp);
> +
> +		if (state == PATH_DOWN || state == PATH_PENDING)

Since pp->priority has already been set to something less then 0, most
likely -1 (PRIO_UNDEF), this patch isn't really keeping the old
priority. Don't you want

			pp->priority = old_prio;

here?

-Ben

> +			condlog(3, "%s: %s prio error in state %d, keeping prio = %d",
> +				pp->dev, prio_name(p), state, pp->priority);
> +		else {
> +			condlog(3, "%s: %s prio error in state %d",
> +				pp->dev, prio_name(p), state);
> +			pp->priority = PRIO_UNDEF;
> +		}
>  		return 1;
>  	}
>  	condlog((old_prio == pp->priority ? 4 : 3), "%s: %s prio = %u",
> -- 
> 2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-05-01 20:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-11 10:49 [PATCH 0/4] multipath-tools: prio handling for offline paths Martin Wilck
2019-04-11 10:49 ` [PATCH 1/4] libmultipath: group_by_prio: fix signedness bug Martin Wilck
2019-04-11 10:49 ` [PATCH 2/4] Revert "Set priority to '0' for PATH_BLOCKED or PATH_DOWN" Martin Wilck
2019-04-11 10:49 ` [PATCH 3/4] libmultipath: ana prioritizer: decrease log level Martin Wilck
2019-04-11 10:49 ` [PATCH 4/4] libmultipath: get_prio(): don't reset prio for inaccessible paths Martin Wilck
2019-05-01 20:31   ` Benjamin Marzinski
2019-04-18 16:27 ` [PATCH 0/4] multipath-tools: prio handling for offline paths Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.