All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH 0/5] priority and pathgroup switching changes
@ 2023-05-24 23:21 Benjamin Marzinski
  2023-05-24 23:21 ` [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority Benjamin Marzinski
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

These are a set of changes that mostly effect how multipathd handles
switching or reordering pathgroups for devices where group_by_prio
isn't set. They are meant to apply on top of my "multipath: Add a
group_by_tgp path grouping policy" patchset.

Benjamin Marzinski (5):
  libmultipath: don't count PRIO_UNDEF paths for pathgroup priority
  multipath-tools tests: add tests to verify PRIO_UDEF changes
  multipathd: refresh all priorities if one has changed
  multipathd: reload map if the path groups are out of order
  multipathd: don't assume mpp->paths will exist in
    need_switch_pathgroup

 libmultipath/libmultipath.version |  5 +++
 libmultipath/switchgroup.c        | 39 +++++++++++++++--
 libmultipath/switchgroup.h        |  1 +
 multipathd/main.c                 | 69 ++++++++++++++++++++-----------
 tests/pgpolicy.c                  | 42 +++++++++++++++++++
 5 files changed, 129 insertions(+), 27 deletions(-)

-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority
  2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
@ 2023-05-24 23:21 ` Benjamin Marzinski
  2023-05-31 16:01   ` Martin Wilck
  2023-05-24 23:21 ` [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes Benjamin Marzinski
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When multipath is not set to group_by_prio, different paths in a
pathgroup can have different priorities. If there is a problem getting
the priority of an active path, its priority will be set to PRIO_UNDEF.
This will change the priority of the whole pathgroup, even though it's
likely that this is simply a temporary error. Instead, do not count
PRIO_UNDEF paths towards to priority of the path group, unless there are
no paths that have an actual priority. This will not effect the priority
of multipath devices with group_by_prio, since all paths in a pathgroup
will have the same priority.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/switchgroup.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
index 6fdfcfa7..b1e1f39b 100644
--- a/libmultipath/switchgroup.c
+++ b/libmultipath/switchgroup.c
@@ -12,6 +12,7 @@ void path_group_prio_update(struct pathgroup *pgp)
 	int i;
 	int priority = 0;
 	int marginal = 0;
+	int defined_prios = 0;
 	struct path * pp;
 
 	pgp->enabled_paths = 0;
@@ -24,12 +25,17 @@ void path_group_prio_update(struct pathgroup *pgp)
 			marginal++;
 		if (pp->state == PATH_UP ||
 		    pp->state == PATH_GHOST) {
-			priority += pp->priority;
+			if (pp->priority != PRIO_UNDEF) {
+				defined_prios++;
+				priority += pp->priority;
+			}
 			pgp->enabled_paths++;
 		}
 	}
-	if (pgp->enabled_paths)
-		pgp->priority = priority / pgp->enabled_paths;
+	if (defined_prios)
+		pgp->priority = priority / defined_prios;
+	else if (pgp->enabled_paths)
+		pgp->priority = PRIO_UNDEF;
 	else
 		pgp->priority = 0;
 	if (marginal && marginal == i)
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes
  2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
  2023-05-24 23:21 ` [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority Benjamin Marzinski
@ 2023-05-24 23:21 ` Benjamin Marzinski
  2023-05-31 16:04   ` Martin Wilck
  2023-05-24 23:21 ` [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed Benjamin Marzinski
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Make sure that pathgroups that include paths with a prio_UNDEF priority
are properly sorted.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 tests/pgpolicy.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/tests/pgpolicy.c b/tests/pgpolicy.c
index 85fa30ce..ccf29bc9 100644
--- a/tests/pgpolicy.c
+++ b/tests/pgpolicy.c
@@ -648,6 +648,26 @@ static void test_group_by_prio_mixed_one_marginal8(void **state)
 	verify_pathgroups(&mp8, p8, groups, group_size, group_marginal, 7);
 }
 
+static void test_group_by_prio_mixed_undef8(void **state)
+{
+	int prio[] = {7,1,3,-1,5,2,8,2};
+	int group0[] = {6};
+	int group1[] = {0};
+	int group2[] = {4};
+	int group3[] = {2};
+	int group4[] = {5,7};
+	int group5[] = {1};
+	int group6[] = {3};
+	int *groups[] = {group0, group1, group2, group3,
+			  group4, group5, group6};
+	int group_size[] = {1,1,1,1,2,1,1};
+
+	set_priority(p8, prio, 8);
+	mp8.pgpolicyfn = group_by_prio;
+	assert_int_equal(group_paths(&mp8, 0), 0);
+	verify_pathgroups(&mp8, p8, groups, group_size, NULL, 7);
+}
+
 static void test_group_by_tpg_same8(void **state)
 {
 	int paths[] = {0,1,2,3,4,5,6,7};
@@ -828,6 +848,26 @@ static void test_group_by_tpg_mixed_one_marginal8(void **state)
 	verify_pathgroups(&mp8, p8, groups, group_size, group_marginal, 7);
 }
 
+static void test_group_by_tpg_mixed_undef8(void **state)
+{
+	int prio[] = {-1,2,3,-1,5,2,8,2};
+	int tpg[] = {1,2,3,3,4,2,5,6};
+	int group0[] = {6};
+	int group1[] = {4};
+	int group2[] = {2,3};
+	int group3[] = {1,5};
+	int group4[] = {7};
+	int group5[] = {0};
+	int *groups[] = {group0, group1, group2, group3,
+			  group4, group5};
+	int group_size[] = {1,1,2,2,1,1};
+
+	set_priority(p8, prio, 8);
+	set_tpg(p8, tpg, 8);
+	mp8.pgpolicyfn = group_by_tpg;
+	assert_int_equal(group_paths(&mp8, 0), 0);
+	verify_pathgroups(&mp8, p8, groups, group_size, NULL, 6);
+}
 
 static void test_group_by_node_name_same8(void **state)
 {
@@ -1192,6 +1232,7 @@ int test_pgpolicies(void)
 		setup_test(test_group_by_prio_mixed_all_marginal, 8),
 		setup_test(test_group_by_prio_mixed_half_marginal, 8),
 		setup_test(test_group_by_prio_mixed_one_marginal, 8),
+		setup_test(test_group_by_prio_mixed_undef, 8),
 		setup_test(test_group_by_tpg_same, 8),
 		setup_test(test_group_by_tpg_different, 8),
 		setup_test(test_group_by_tpg_mixed, 8),
@@ -1203,6 +1244,7 @@ int test_pgpolicies(void)
 		setup_test(test_group_by_tpg_mixed_all_marginal, 8),
 		setup_test(test_group_by_tpg_mixed_half_marginal, 8),
 		setup_test(test_group_by_tpg_mixed_one_marginal, 8),
+		setup_test(test_group_by_tpg_mixed_undef, 8),
 		setup_test(test_group_by_node_name_same, 8),
 		setup_test(test_group_by_node_name_increasing, 8),
 		setup_test(test_group_by_node_name_3_groups, 8),
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed
  2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
  2023-05-24 23:21 ` [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority Benjamin Marzinski
  2023-05-24 23:21 ` [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes Benjamin Marzinski
@ 2023-05-24 23:21 ` Benjamin Marzinski
  2023-05-31 16:27   ` Martin Wilck
  2023-05-24 23:21 ` [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order Benjamin Marzinski
  2023-05-24 23:21 ` [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup Benjamin Marzinski
  4 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

For multipath devices with path group policies other than group_by_prio,
multipathd wasn't updating all the paths' priorities when calling
need_switch_pathgroup(), even in cases where it likely was necessary.
When a path just becomes usable again, all paths' priorities get updated
by update_prio().  But if the priority changes on a path that is already
up, the other paths' priorities only get updated if the multipath device
uses the group_by_prio path_grouping_policy, otherwise
need_switch_pathgroup() is called with refresh set to 0. But if the
priority of the checked path has changed, then likely so have the
priorities of other paths. Since the pathgroup's priority is the average
of its paths' priorities, changing the priority of just one path may not
change the average enough to reorder the pathgroups.

Instead, set refresh in need_switch_pathgroup() if the priorty has
changed to something other than PRIO_UNDEF (which usually means an error
has occured) and the priorities of the other paths haven't already been
updated by update_prio().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index bdeffe76..e7c272ad 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2575,20 +2575,27 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 
 	if (marginal_changed)
 		reload_and_sync_map(pp->mpp, vecs, 1);
-	else if (update_prio(pp, new_path_up) &&
-	    (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio) &&
-	     pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
-		condlog(2, "%s: path priorities changed. reloading",
-			pp->mpp->alias);
-		reload_and_sync_map(pp->mpp, vecs, !new_path_up);
-	} else if (need_switch_pathgroup(pp->mpp, 0)) {
-		if (pp->mpp->pgfailback > 0 &&
-		    (new_path_up || pp->mpp->failback_tick <= 0))
-			pp->mpp->failback_tick =
-				pp->mpp->pgfailback + 1;
-		else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE ||
-			 (chkr_new_path_up && followover_should_failback(pp)))
-			switch_pathgroup(pp->mpp);
+	else {
+		int prio_changed = update_prio(pp, new_path_up);
+		bool need_refresh = (!new_path_up && prio_changed &&
+				     pp->priority != PRIO_UNDEF);
+
+		if (prio_changed &&
+		    pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio &&
+		    pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
+			condlog(2, "%s: path priorities changed. reloading",
+				pp->mpp->alias);
+			reload_and_sync_map(pp->mpp, vecs, !new_path_up);
+		} else if (need_switch_pathgroup(pp->mpp, need_refresh)) {
+			if (pp->mpp->pgfailback > 0 &&
+			    (new_path_up || pp->mpp->failback_tick <= 0))
+				pp->mpp->failback_tick =
+					pp->mpp->pgfailback + 1;
+			else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE ||
+				 (chkr_new_path_up &&
+				  followover_should_failback(pp)))
+				switch_pathgroup(pp->mpp);
+		}
 	}
 	return 1;
 }
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2023-05-24 23:21 ` [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed Benjamin Marzinski
@ 2023-05-24 23:21 ` Benjamin Marzinski
  2023-05-31 16:27   ` Martin Wilck
  2023-06-06 16:39   ` Martin Wilck
  2023-05-24 23:21 ` [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup Benjamin Marzinski
  4 siblings, 2 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

need_switch_pathgroup() only checks if the currently used pathgroup is
not the highest priority pathgroup. If it isn't, all multipathd does is
instruct the kernel to switch to the correct pathgroup.  However, the
kernel treats the pathgroups as if they were ordered by priority. When
the kernel runs out of paths to use in the currently selected pathgroup,
it will start checking the pathgroups in order until it finds one with
usable paths.

need_switch_pathgroup() should also check if the pathgroups are out of
order, and if so, multipathd should reload the map to reorder them
correctly.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/libmultipath.version |  5 ++++
 libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
 libmultipath/switchgroup.h        |  1 +
 multipathd/main.c                 | 38 +++++++++++++++++++++----------
 4 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 8f72c452..38074699 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -237,3 +237,8 @@ global:
 local:
 	*;
 };
+
+LIBMULTIPATH_19.1.0 {
+global:
+	path_groups_in_order;
+} LIBMULTIPATH_19.0.0;
diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
index b1e1f39b..b1180839 100644
--- a/libmultipath/switchgroup.c
+++ b/libmultipath/switchgroup.c
@@ -7,6 +7,33 @@
 #include "structs.h"
 #include "switchgroup.h"
 
+bool path_groups_in_order(struct multipath *mpp)
+{
+	int i;
+	struct pathgroup *pgp;
+	bool seen_marginal_pg = false;
+	int last_prio = INT_MAX;
+
+	if (VECTOR_SIZE(mpp->pg) < 2)
+		return true;
+
+	vector_foreach_slot(mpp->pg, pgp, i) {
+		/* skip pgs with PRIO_UNDEF, since this is likely temporary */
+		if (!pgp->paths || pgp->priority == PRIO_UNDEF)
+			continue;
+		if (pgp->marginal && !seen_marginal_pg) {
+			last_prio = INT_MAX;
+			continue;
+		}
+		if (seen_marginal_pg && !pgp->marginal)
+			return false;
+		if (pgp->priority > last_prio)
+			return false;
+		last_prio = pgp->priority;
+	}
+	return true;
+}
+
 void path_group_prio_update(struct pathgroup *pgp)
 {
 	int i;
diff --git a/libmultipath/switchgroup.h b/libmultipath/switchgroup.h
index 9365e2e2..43dbb6c9 100644
--- a/libmultipath/switchgroup.h
+++ b/libmultipath/switchgroup.h
@@ -1,2 +1,3 @@
 void path_group_prio_update (struct pathgroup * pgp);
 int select_path_group (struct multipath * mpp);
+bool path_groups_in_order(struct multipath *mpp);
diff --git a/multipathd/main.c b/multipathd/main.c
index e7c272ad..2ea7c76b 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -396,7 +396,7 @@ void put_multipath_config(__attribute__((unused)) void *arg)
 }
 
 static int
-need_switch_pathgroup (struct multipath * mpp, int refresh)
+need_switch_pathgroup (struct multipath * mpp, int refresh, bool *need_reload)
 {
 	struct pathgroup * pgp;
 	struct path * pp;
@@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath * mpp, int refresh)
 	struct config *conf;
 	int bestpg;
 
+	*need_reload = false;
 	if (!mpp)
 		return 0;
 
@@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath * mpp, int refresh)
 		return 0;
 
 	mpp->bestpg = bestpg;
-	if (mpp->bestpg != mpp->nextpg)
-		return 1;
+	*need_reload = !path_groups_in_order(mpp);
 
-	return 0;
+	return (*need_reload || mpp->bestpg != mpp->nextpg);
 }
 
 static void
@@ -1982,20 +1982,26 @@ ghost_delay_tick(struct vectors *vecs)
 }
 
 static void
-deferred_failback_tick (vector mpvec)
+deferred_failback_tick (struct vectors *vecs)
 {
 	struct multipath * mpp;
 	unsigned int i;
+	bool need_reload;
 
-	vector_foreach_slot (mpvec, mpp, i) {
+	vector_foreach_slot (vecs->mpvec, mpp, i) {
 		/*
 		 * deferred failback getting sooner
 		 */
 		if (mpp->pgfailback > 0 && mpp->failback_tick > 0) {
 			mpp->failback_tick--;
 
-			if (!mpp->failback_tick && need_switch_pathgroup(mpp, 1))
-				switch_pathgroup(mpp);
+			if (!mpp->failback_tick &&
+			    need_switch_pathgroup(mpp, 1, &need_reload)) {
+				if (need_reload)
+					reload_and_sync_map(mpp, vecs, 0);
+				else
+					switch_pathgroup(mpp);
+			}
 		}
 	}
 }
@@ -2579,6 +2585,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 		int prio_changed = update_prio(pp, new_path_up);
 		bool need_refresh = (!new_path_up && prio_changed &&
 				     pp->priority != PRIO_UNDEF);
+		bool need_reload;
 
 		if (prio_changed &&
 		    pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio &&
@@ -2586,15 +2593,22 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 			condlog(2, "%s: path priorities changed. reloading",
 				pp->mpp->alias);
 			reload_and_sync_map(pp->mpp, vecs, !new_path_up);
-		} else if (need_switch_pathgroup(pp->mpp, need_refresh)) {
+		} else if (need_switch_pathgroup(pp->mpp, need_refresh,
+			                         &need_reload)) {
 			if (pp->mpp->pgfailback > 0 &&
 			    (new_path_up || pp->mpp->failback_tick <= 0))
 				pp->mpp->failback_tick =
 					pp->mpp->pgfailback + 1;
 			else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE ||
 				 (chkr_new_path_up &&
-				  followover_should_failback(pp)))
-				switch_pathgroup(pp->mpp);
+				  followover_should_failback(pp))) {
+				if (need_reload)
+					reload_and_sync_map(pp->mpp, vecs,
+							    !need_refresh &&
+							    !new_path_up);
+				else
+					switch_pathgroup(pp->mpp);
+			}
 		}
 	}
 	return 1;
@@ -2720,7 +2734,7 @@ unlock:
 		pthread_cleanup_push(cleanup_lock, &vecs->lock);
 		lock(&vecs->lock);
 		pthread_testcancel();
-		deferred_failback_tick(vecs->mpvec);
+		deferred_failback_tick(vecs);
 		retry_count_tick(vecs->mpvec);
 		missing_uev_wait_tick(vecs);
 		ghost_delay_tick(vecs);
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup
  2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2023-05-24 23:21 ` [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order Benjamin Marzinski
@ 2023-05-24 23:21 ` Benjamin Marzinski
  2023-05-31 16:28   ` Martin Wilck
  4 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-05-24 23:21 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When need_switch_pathgroup() is called by deferred_failback_tick(),
there is a chance that mpp->paths will be NULL, even if there are paths
in the multipath device's pathgroups. Instead check if there are
multiple pathgroups, since multipath can't be using the wrong pathgroup
if there is one or none.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 2ea7c76b..39ba6d27 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -423,7 +423,7 @@ need_switch_pathgroup (struct multipath * mpp, int refresh, bool *need_reload)
 		}
 	}
 
-	if (!mpp->pg || VECTOR_SIZE(mpp->paths) == 0)
+	if (VECTOR_SIZE(mpp->pg) < 2)
 		return 0;
 
 	bestpg = select_path_group(mpp);
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority
  2023-05-24 23:21 ` [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority Benjamin Marzinski
@ 2023-05-31 16:01   ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2023-05-31 16:01 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> When multipath is not set to group_by_prio, different paths in a
> pathgroup can have different priorities. If there is a problem
> getting
> the priority of an active path, its priority will be set to
> PRIO_UNDEF.
> This will change the priority of the whole pathgroup, even though
> it's
> likely that this is simply a temporary error. Instead, do not count
> PRIO_UNDEF paths towards to priority of the path group, unless there
> are
> no paths that have an actual priority. This will not effect the
> priority
> of multipath devices with group_by_prio, since all paths in a
> pathgroup
> will have the same priority.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes
  2023-05-24 23:21 ` [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes Benjamin Marzinski
@ 2023-05-31 16:04   ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2023-05-31 16:04 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> Make sure that pathgroups that include paths with a prio_UNDEF
> priority
> are properly sorted.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  tests/pgpolicy.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/tests/pgpolicy.c b/tests/pgpolicy.c
> index 85fa30ce..ccf29bc9 100644
> --- a/tests/pgpolicy.c
> +++ b/tests/pgpolicy.c
> @@ -648,6 +648,26 @@ static void
> test_group_by_prio_mixed_one_marginal8(void **state)
>         verify_pathgroups(&mp8, p8, groups, group_size,
> group_marginal, 7);
>  }
>  
> +static void test_group_by_prio_mixed_undef8(void **state)
> +{
> +       int prio[] = {7,1,3,-1,5,2,8,2};
> +       int group0[] = {6};
> +       int group1[] = {0};
> +       int group2[] = {4};
> +       int group3[] = {2};
> +       int group4[] = {5,7};
> +       int group5[] = {1};
> +       int group6[] = {3};
> +       int *groups[] = {group0, group1, group2, group3,
> +                         group4, group5, group6};
> +       int group_size[] = {1,1,1,1,2,1,1};
> +
> +       set_priority(p8, prio, 8);
> +       mp8.pgpolicyfn = group_by_prio;
> +       assert_int_equal(group_paths(&mp8, 0), 0);
> +       verify_pathgroups(&mp8, p8, groups, group_size, NULL, 7);
> +}
> +
>  static void test_group_by_tpg_same8(void **state)
>  {
>         int paths[] = {0,1,2,3,4,5,6,7};
> @@ -828,6 +848,26 @@ static void
> test_group_by_tpg_mixed_one_marginal8(void **state)
>         verify_pathgroups(&mp8, p8, groups, group_size,
> group_marginal, 7);
>  }
>  
> +static void test_group_by_tpg_mixed_undef8(void **state)
> +{
> +       int prio[] = {-1,2,3,-1,5,2,8,2};
> +       int tpg[] = {1,2,3,3,4,2,5,6};
> +       int group0[] = {6};
> +       int group1[] = {4};
> +       int group2[] = {2,3};
> +       int group3[] = {1,5};
> +       int group4[] = {7};
> +       int group5[] = {0};
> +       int *groups[] = {group0, group1, group2, group3,
> +                         group4, group5};
> +       int group_size[] = {1,1,2,2,1,1};
> +
> +       set_priority(p8, prio, 8);
> +       set_tpg(p8, tpg, 8);
> +       mp8.pgpolicyfn = group_by_tpg;
> +       assert_int_equal(group_paths(&mp8, 0), 0);
> +       verify_pathgroups(&mp8, p8, groups, group_size, NULL, 6);
> +}
>  
>  static void test_group_by_node_name_same8(void **state)
>  {
> @@ -1192,6 +1232,7 @@ int test_pgpolicies(void)
>                 setup_test(test_group_by_prio_mixed_all_marginal, 8),
>                 setup_test(test_group_by_prio_mixed_half_marginal,
> 8),
>                 setup_test(test_group_by_prio_mixed_one_marginal, 8),
> +               setup_test(test_group_by_prio_mixed_undef, 8),
>                 setup_test(test_group_by_tpg_same, 8),
>                 setup_test(test_group_by_tpg_different, 8),
>                 setup_test(test_group_by_tpg_mixed, 8),
> @@ -1203,6 +1244,7 @@ int test_pgpolicies(void)
>                 setup_test(test_group_by_tpg_mixed_all_marginal, 8),
>                 setup_test(test_group_by_tpg_mixed_half_marginal, 8),
>                 setup_test(test_group_by_tpg_mixed_one_marginal, 8),
> +               setup_test(test_group_by_tpg_mixed_undef, 8),
>                 setup_test(test_group_by_node_name_same, 8),
>                 setup_test(test_group_by_node_name_increasing, 8),
>                 setup_test(test_group_by_node_name_3_groups, 8),

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed
  2023-05-24 23:21 ` [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed Benjamin Marzinski
@ 2023-05-31 16:27   ` Martin Wilck
  2023-06-05 18:22     ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2023-05-31 16:27 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> For multipath devices with path group policies other than
> group_by_prio,
> multipathd wasn't updating all the paths' priorities when calling
> need_switch_pathgroup(), even in cases where it likely was necessary.
> When a path just becomes usable again, all paths' priorities get
> updated
> by update_prio().  But if the priority changes on a path that is
> already
> up, the other paths' priorities only get updated if the multipath
> device
> uses the group_by_prio path_grouping_policy, otherwise
> need_switch_pathgroup() is called with refresh set to 0. But if the
> priority of the checked path has changed, then likely so have the
> priorities of other paths. Since the pathgroup's priority is the
> average
> of its paths' priorities, changing the priority of just one path may
> not
> change the average enough to reorder the pathgroups.
> 
> Instead, set refresh in need_switch_pathgroup() if the priorty has
> changed to something other than PRIO_UNDEF (which usually means an
> error
> has occured) and the priorities of the other paths haven't already
> been
> updated by update_prio().
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/main.c | 35 +++++++++++++++++++++--------------
>  1 file changed, 21 insertions(+), 14 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index bdeffe76..e7c272ad 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2575,20 +2575,27 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
>  
>         if (marginal_changed)
>                 reload_and_sync_map(pp->mpp, vecs, 1);
> -       else if (update_prio(pp, new_path_up) &&
> -           (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio) &&
> -            pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> -               condlog(2, "%s: path priorities changed. reloading",
> -                       pp->mpp->alias);
> -               reload_and_sync_map(pp->mpp, vecs, !new_path_up);
> -       } else if (need_switch_pathgroup(pp->mpp, 0)) {
> -               if (pp->mpp->pgfailback > 0 &&
> -                   (new_path_up || pp->mpp->failback_tick <= 0))
> -                       pp->mpp->failback_tick =
> -                               pp->mpp->pgfailback + 1;
> -               else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE
> ||
> -                        (chkr_new_path_up &&
> followover_should_failback(pp)))
> -                       switch_pathgroup(pp->mpp);
> +       else {
> +               int prio_changed = update_prio(pp, new_path_up);
> +               bool need_refresh = (!new_path_up && prio_changed &&
> +                                    pp->priority != PRIO_UNDEF);
> +

I have always found it confusing that we recalculate the priorities in
two functions (update_prio() and need_switch_pathgroup()), passing
boolean flags back and forth. IMO we should move this logic to
update_prio(), so that we don't need to refresh any priorities in
need_switch_pathgroup() any more. after determining the prio of the
"primary" path device, update_prio() has all the information
it needs to figure out whether priorities of other paths must be
refreshed.

That would even make the code easier to understand, IMO.

Regards
Martin



> +               if (prio_changed &&
> +                   pp->mpp->pgpolicyfn == (pgpolicyfn
> *)group_by_prio &&
> +                   pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> +                       condlog(2, "%s: path priorities changed.
> reloading",
> +                               pp->mpp->alias);
> +                       reload_and_sync_map(pp->mpp, vecs,
> !new_path_up);
> +               } else if (need_switch_pathgroup(pp->mpp,
> need_refresh)) {
> +                       if (pp->mpp->pgfailback > 0 &&
> +                           (new_path_up || pp->mpp->failback_tick <=
> 0))
> +                               pp->mpp->failback_tick =
> +                                       pp->mpp->pgfailback + 1;
> +                       else if (pp->mpp->pgfailback == -
> FAILBACK_IMMEDIATE ||
> +                                (chkr_new_path_up &&
> +                                 followover_should_failback(pp)))
> +                               switch_pathgroup(pp->mpp);
> +               }
>         }
>         return 1;
>  }

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-05-24 23:21 ` [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order Benjamin Marzinski
@ 2023-05-31 16:27   ` Martin Wilck
  2023-06-05 19:08     ` Benjamin Marzinski
  2023-06-06 16:39   ` Martin Wilck
  1 sibling, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2023-05-31 16:27 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> need_switch_pathgroup() only checks if the currently used pathgroup
> is
> not the highest priority pathgroup. If it isn't, all multipathd does
> is
> instruct the kernel to switch to the correct pathgroup.  However, the
> kernel treats the pathgroups as if they were ordered by priority.
> When
> the kernel runs out of paths to use in the currently selected
> pathgroup,
> it will start checking the pathgroups in order until it finds one
> with
> usable paths.
> 
> need_switch_pathgroup() should also check if the pathgroups are out
> of
> order, and if so, multipathd should reload the map to reorder them
> correctly.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/libmultipath.version |  5 ++++
>  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
>  libmultipath/switchgroup.h        |  1 +
>  multipathd/main.c                 | 38 +++++++++++++++++++++--------
> --
>  4 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/libmultipath/libmultipath.version
> b/libmultipath/libmultipath.version
> index 8f72c452..38074699 100644
> --- a/libmultipath/libmultipath.version
> +++ b/libmultipath/libmultipath.version
> @@ -237,3 +237,8 @@ global:
>  local:
>         *;
>  };
> +
> +LIBMULTIPATH_19.1.0 {
> +global:
> +       path_groups_in_order;
> +} LIBMULTIPATH_19.0.0;
> diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
> index b1e1f39b..b1180839 100644
> --- a/libmultipath/switchgroup.c
> +++ b/libmultipath/switchgroup.c
> @@ -7,6 +7,33 @@
>  #include "structs.h"
>  #include "switchgroup.h"
>  
> +bool path_groups_in_order(struct multipath *mpp)
> +{
> +       int i;
> +       struct pathgroup *pgp;
> +       bool seen_marginal_pg = false;
> +       int last_prio = INT_MAX;
> +
> +       if (VECTOR_SIZE(mpp->pg) < 2)
> +               return true;
> +
> +       vector_foreach_slot(mpp->pg, pgp, i) {
> +               /* skip pgs with PRIO_UNDEF, since this is likely
> temporary */
> +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> +                       continue;
> +               if (pgp->marginal && !seen_marginal_pg) {
> +                       last_prio = INT_MAX;
> +                       continue;
> +               }
> +               if (seen_marginal_pg && !pgp->marginal)
> +                       return false;
> +               if (pgp->priority > last_prio)
> +                       return false;
> +               last_prio = pgp->priority;
> +       }
> +       return true;
> +}
> +
>  void path_group_prio_update(struct pathgroup *pgp)
>  {
>         int i;
> diff --git a/libmultipath/switchgroup.h b/libmultipath/switchgroup.h
> index 9365e2e2..43dbb6c9 100644
> --- a/libmultipath/switchgroup.h
> +++ b/libmultipath/switchgroup.h
> @@ -1,2 +1,3 @@
>  void path_group_prio_update (struct pathgroup * pgp);
>  int select_path_group (struct multipath * mpp);
> +bool path_groups_in_order(struct multipath *mpp);
> diff --git a/multipathd/main.c b/multipathd/main.c
> index e7c272ad..2ea7c76b 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -396,7 +396,7 @@ void put_multipath_config(__attribute__((unused))
> void *arg)
>  }
>  
>  static int
> -need_switch_pathgroup (struct multipath * mpp, int refresh)
> +need_switch_pathgroup (struct multipath * mpp, int refresh, bool
> *need_reload)
>  {
>         struct pathgroup * pgp;
>         struct path * pp;
> @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath * mpp,
> int refresh)
>         struct config *conf;
>         int bestpg;
>  
> +       *need_reload = false;
>         if (!mpp)
>                 return 0;
>  
> @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath * mpp,
> int refresh)
>                 return 0;
>  
>         mpp->bestpg = bestpg;
> -       if (mpp->bestpg != mpp->nextpg)
> -               return 1;
> +       *need_reload = !path_groups_in_order(mpp);

This will start another loop over the path groups. Can we just
integrate the path_groups_in_order() logic into the loop right here?



>  
> -       return 0;
> +       return (*need_reload || mpp->bestpg != mpp->nextpg);
>  }
>  
>  static void
> @@ -1982,20 +1982,26 @@ ghost_delay_tick(struct vectors *vecs)
>  }
>  
>  static void
> -deferred_failback_tick (vector mpvec)
> +deferred_failback_tick (struct vectors *vecs)
>  {
>         struct multipath * mpp;
>         unsigned int i;
> +       bool need_reload;
>  
> -       vector_foreach_slot (mpvec, mpp, i) {
> +       vector_foreach_slot (vecs->mpvec, mpp, i) {
>                 /*
>                  * deferred failback getting sooner
>                  */
>                 if (mpp->pgfailback > 0 && mpp->failback_tick > 0) {
>                         mpp->failback_tick--;
>  
> -                       if (!mpp->failback_tick &&
> need_switch_pathgroup(mpp, 1))
> -                               switch_pathgroup(mpp);
> +                       if (!mpp->failback_tick &&
> +                           need_switch_pathgroup(mpp, 1,
> &need_reload)) {
> +                               if (need_reload)
> +                                       reload_and_sync_map(mpp,
> vecs, 0);
> +                               else
> +                                       switch_pathgroup(mpp);
> +                       }
>                 }
>         }
>  }
> @@ -2579,6 +2585,7 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>                 int prio_changed = update_prio(pp, new_path_up);
>                 bool need_refresh = (!new_path_up && prio_changed &&
>                                      pp->priority != PRIO_UNDEF);
> +               bool need_reload;
>  
>                 if (prio_changed &&
>                     pp->mpp->pgpolicyfn == (pgpolicyfn
> *)group_by_prio &&
> @@ -2586,15 +2593,22 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
>                         condlog(2, "%s: path priorities changed.
> reloading",
>                                 pp->mpp->alias);
>                         reload_and_sync_map(pp->mpp, vecs,
> !new_path_up);
> -               } else if (need_switch_pathgroup(pp->mpp,
> need_refresh)) {
> +               } else if (need_switch_pathgroup(pp->mpp,
> need_refresh,
> +                                                &need_reload)) {
>                         if (pp->mpp->pgfailback > 0 &&
>                             (new_path_up || pp->mpp->failback_tick <=
> 0))
>                                 pp->mpp->failback_tick =
>                                         pp->mpp->pgfailback + 1;
>                         else if (pp->mpp->pgfailback == -
> FAILBACK_IMMEDIATE ||
>                                  (chkr_new_path_up &&
> -                                 followover_should_failback(pp)))
> -                               switch_pathgroup(pp->mpp);
> +                                 followover_should_failback(pp))) {
> +                               if (need_reload)
> +                                       reload_and_sync_map(pp->mpp,
> vecs,
> +                                                          
> !need_refresh &&
> +                                                          
> !new_path_up);
> +                               else
> +                                       switch_pathgroup(pp->mpp);
> +                       }
>                 }
>         }
>         return 1;
> @@ -2720,7 +2734,7 @@ unlock:
>                 pthread_cleanup_push(cleanup_lock, &vecs->lock);
>                 lock(&vecs->lock);
>                 pthread_testcancel();
> -               deferred_failback_tick(vecs->mpvec);
> +               deferred_failback_tick(vecs);
>                 retry_count_tick(vecs->mpvec);
>                 missing_uev_wait_tick(vecs);
>                 ghost_delay_tick(vecs);



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup
  2023-05-24 23:21 ` [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup Benjamin Marzinski
@ 2023-05-31 16:28   ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2023-05-31 16:28 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> When need_switch_pathgroup() is called by deferred_failback_tick(),
> there is a chance that mpp->paths will be NULL, even if there are
> paths
> in the multipath device's pathgroups. Instead check if there are
> multiple pathgroups, since multipath can't be using the wrong
> pathgroup
> if there is one or none.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed
  2023-05-31 16:27   ` Martin Wilck
@ 2023-06-05 18:22     ` Benjamin Marzinski
  2023-06-06 14:08       ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-05 18:22 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Wed, May 31, 2023 at 04:27:25PM +0000, Martin Wilck wrote:
> On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > For multipath devices with path group policies other than
> > group_by_prio,
> > multipathd wasn't updating all the paths' priorities when calling
> > need_switch_pathgroup(), even in cases where it likely was necessary.
> > When a path just becomes usable again, all paths' priorities get
> > updated
> > by update_prio().  But if the priority changes on a path that is
> > already
> > up, the other paths' priorities only get updated if the multipath
> > device
> > uses the group_by_prio path_grouping_policy, otherwise
> > need_switch_pathgroup() is called with refresh set to 0. But if the
> > priority of the checked path has changed, then likely so have the
> > priorities of other paths. Since the pathgroup's priority is the
> > average
> > of its paths' priorities, changing the priority of just one path may
> > not
> > change the average enough to reorder the pathgroups.
> > 
> > Instead, set refresh in need_switch_pathgroup() if the priorty has
> > changed to something other than PRIO_UNDEF (which usually means an
> > error
> > has occured) and the priorities of the other paths haven't already
> > been
> > updated by update_prio().
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/main.c | 35 +++++++++++++++++++++--------------
> >  1 file changed, 21 insertions(+), 14 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index bdeffe76..e7c272ad 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2575,20 +2575,27 @@ check_path (struct vectors * vecs, struct
> > path * pp, unsigned int ticks)
> >  
> >         if (marginal_changed)
> >                 reload_and_sync_map(pp->mpp, vecs, 1);
> > -       else if (update_prio(pp, new_path_up) &&
> > -           (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio) &&
> > -            pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > -               condlog(2, "%s: path priorities changed. reloading",
> > -                       pp->mpp->alias);
> > -               reload_and_sync_map(pp->mpp, vecs, !new_path_up);
> > -       } else if (need_switch_pathgroup(pp->mpp, 0)) {
> > -               if (pp->mpp->pgfailback > 0 &&
> > -                   (new_path_up || pp->mpp->failback_tick <= 0))
> > -                       pp->mpp->failback_tick =
> > -                               pp->mpp->pgfailback + 1;
> > -               else if (pp->mpp->pgfailback == -FAILBACK_IMMEDIATE
> > ||
> > -                        (chkr_new_path_up &&
> > followover_should_failback(pp)))
> > -                       switch_pathgroup(pp->mpp);
> > +       else {
> > +               int prio_changed = update_prio(pp, new_path_up);
> > +               bool need_refresh = (!new_path_up && prio_changed &&
> > +                                    pp->priority != PRIO_UNDEF);
> > +
> 
> I have always found it confusing that we recalculate the priorities in
> two functions (update_prio() and need_switch_pathgroup()), passing
> boolean flags back and forth. IMO we should move this logic to
> update_prio(), so that we don't need to refresh any priorities in
> need_switch_pathgroup() any more. after determining the prio of the
> "primary" path device, update_prio() has all the information
> it needs to figure out whether priorities of other paths must be
> refreshed.
> 
> That would even make the code easier to understand, IMO.
> 
> Regards
> Martin

So the difference in this code between when we choose to update all the
paths' prios for the group_by_prio case, and when we choose to update
all the paths' prios for the other pgpolicies comes down to how we treat
PRIO_UNDEF. I didn't change the group_by_prio behavior. So right now,
for group_by_prio devices, we will update all the paths' priorities if
the checked path switches priorities to PRIO_UNDEF. My question is, "Is
this the right thing to do?"

In the best case, if the prioritizer fails on one path, it will fail on
all the other paths in the pathgroup as well, so that they stay
together. In the worst case it will fail on paths in other pathgroups,
so that incorrect paths get grouped together. Granted, I'm not sure how
much of a difference it makes in the worst case, since the other
priorities would get checked eventually, and would get placed in the
wrong group then.

Perhaps it would be better to treat PRIO_UNDEF like PATH_PENDING, where
we will continue to use the old priority if we get a PRIO_UNDEF result.

The choices are:
1. make both the group_by_prio and the non-group_by_prio cases recheck
   all paths on PRIO_UNDEF (this keeps the group_by_prio behavior the
   same).
2. make both cases NOT recheck all paths on PRIO_UNDEF.
3. keep the destinction between the two (make update_prio() check the
   pgplicy, and act accordingly)
4. Make paths keep their previous priority when they would have returned
   PRIO_UNDEF, so we never switch to PRIO_UNDEF.

All the choices except 3 seem reasonable. 1 keeps things how they are
for group_by_prio. 2 leans towards moving PRIO_UNDEF paths out of their
current pathgroup.  4 leans towards keeping PRIO_UNDEF paths in their
current pathgroup.

The other question is, what do we do for the delayed case. Right now,
once we finish waiting for our delay in deferred_failback_tick(), we
automatically refresh the priorities of all the devices in our
need_switch_pathgroup() call.  We could add an update_prio() call before
it to keep this behavior, but if we are already refreshing all the
paths' priorities when we need to, I'm not sure that it's necessary to
do it again here.

Thoughts?
-Ben

> 
> 
> 
> > +               if (prio_changed &&
> > +                   pp->mpp->pgpolicyfn == (pgpolicyfn
> > *)group_by_prio &&
> > +                   pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > +                       condlog(2, "%s: path priorities changed.
> > reloading",
> > +                               pp->mpp->alias);
> > +                       reload_and_sync_map(pp->mpp, vecs,
> > !new_path_up);
> > +               } else if (need_switch_pathgroup(pp->mpp,
> > need_refresh)) {
> > +                       if (pp->mpp->pgfailback > 0 &&
> > +                           (new_path_up || pp->mpp->failback_tick <=
> > 0))
> > +                               pp->mpp->failback_tick =
> > +                                       pp->mpp->pgfailback + 1;
> > +                       else if (pp->mpp->pgfailback == -
> > FAILBACK_IMMEDIATE ||
> > +                                (chkr_new_path_up &&
> > +                                 followover_should_failback(pp)))
> > +                               switch_pathgroup(pp->mpp);
> > +               }
> >         }
> >         return 1;
> >  }
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-05-31 16:27   ` Martin Wilck
@ 2023-06-05 19:08     ` Benjamin Marzinski
  2023-06-06  4:42       ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-05 19:08 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Wed, May 31, 2023 at 04:27:30PM +0000, Martin Wilck wrote:
> On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > need_switch_pathgroup() only checks if the currently used pathgroup
> > is
> > not the highest priority pathgroup. If it isn't, all multipathd does
> > is
> > instruct the kernel to switch to the correct pathgroup.  However, the
> > kernel treats the pathgroups as if they were ordered by priority.
> > When
> > the kernel runs out of paths to use in the currently selected
> > pathgroup,
> > it will start checking the pathgroups in order until it finds one
> > with
> > usable paths.
> > 
> > need_switch_pathgroup() should also check if the pathgroups are out
> > of
> > order, and if so, multipathd should reload the map to reorder them
> > correctly.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/libmultipath.version |  5 ++++
> >  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
> >  libmultipath/switchgroup.h        |  1 +
> >  multipathd/main.c                 | 38 +++++++++++++++++++++--------
> > --
> >  4 files changed, 59 insertions(+), 12 deletions(-)
> > 
> > diff --git a/libmultipath/libmultipath.version
> > b/libmultipath/libmultipath.version
> > index 8f72c452..38074699 100644
> > --- a/libmultipath/libmultipath.version
> > +++ b/libmultipath/libmultipath.version
> > @@ -237,3 +237,8 @@ global:
> >  local:
> >         *;
> >  };
> > +
> > +LIBMULTIPATH_19.1.0 {
> > +global:
> > +       path_groups_in_order;
> > +} LIBMULTIPATH_19.0.0;
> > diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
> > index b1e1f39b..b1180839 100644
> > --- a/libmultipath/switchgroup.c
> > +++ b/libmultipath/switchgroup.c
> > @@ -7,6 +7,33 @@
> >  #include "structs.h"
> >  #include "switchgroup.h"
> >  
> > +bool path_groups_in_order(struct multipath *mpp)
> > +{
> > +       int i;
> > +       struct pathgroup *pgp;
> > +       bool seen_marginal_pg = false;
> > +       int last_prio = INT_MAX;
> > +
> > +       if (VECTOR_SIZE(mpp->pg) < 2)
> > +               return true;
> > +
> > +       vector_foreach_slot(mpp->pg, pgp, i) {
> > +               /* skip pgs with PRIO_UNDEF, since this is likely
> > temporary */
> > +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> > +                       continue;
> > +               if (pgp->marginal && !seen_marginal_pg) {
> > +                       last_prio = INT_MAX;
> > +                       continue;
> > +               }
> > +               if (seen_marginal_pg && !pgp->marginal)
> > +                       return false;
> > +               if (pgp->priority > last_prio)
> > +                       return false;
> > +               last_prio = pgp->priority;
> > +       }
> > +       return true;
> > +}
> > +
> >  void path_group_prio_update(struct pathgroup *pgp)
> >  {
> >         int i;
> > diff --git a/libmultipath/switchgroup.h b/libmultipath/switchgroup.h
> > index 9365e2e2..43dbb6c9 100644
> > --- a/libmultipath/switchgroup.h
> > +++ b/libmultipath/switchgroup.h
> > @@ -1,2 +1,3 @@
> >  void path_group_prio_update (struct pathgroup * pgp);
> >  int select_path_group (struct multipath * mpp);
> > +bool path_groups_in_order(struct multipath *mpp);
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index e7c272ad..2ea7c76b 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -396,7 +396,7 @@ void put_multipath_config(__attribute__((unused))
> > void *arg)
> >  }
> >  
> >  static int
> > -need_switch_pathgroup (struct multipath * mpp, int refresh)
> > +need_switch_pathgroup (struct multipath * mpp, int refresh, bool
> > *need_reload)
> >  {
> >         struct pathgroup * pgp;
> >         struct path * pp;
> > @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath * mpp,
> > int refresh)
> >         struct config *conf;
> >         int bestpg;
> >  
> > +       *need_reload = false;
> >         if (!mpp)
> >                 return 0;
> >  
> > @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath * mpp,
> > int refresh)
> >                 return 0;
> >  
> >         mpp->bestpg = bestpg;
> > -       if (mpp->bestpg != mpp->nextpg)
> > -               return 1;
> > +       *need_reload = !path_groups_in_order(mpp);
> 
> This will start another loop over the path groups. Can we just
> integrate the path_groups_in_order() logic into the loop right here?
> 

Sure
-Ben

> 
> 
> >  
> > -       return 0;
> > +       return (*need_reload || mpp->bestpg != mpp->nextpg);
> >  }
> >  
> >  static void
> > @@ -1982,20 +1982,26 @@ ghost_delay_tick(struct vectors *vecs)
> >  }
> >  
> >  static void
> > -deferred_failback_tick (vector mpvec)
> > +deferred_failback_tick (struct vectors *vecs)
> >  {
> >         struct multipath * mpp;
> >         unsigned int i;
> > +       bool need_reload;
> >  
> > -       vector_foreach_slot (mpvec, mpp, i) {
> > +       vector_foreach_slot (vecs->mpvec, mpp, i) {
> >                 /*
> >                  * deferred failback getting sooner
> >                  */
> >                 if (mpp->pgfailback > 0 && mpp->failback_tick > 0) {
> >                         mpp->failback_tick--;
> >  
> > -                       if (!mpp->failback_tick &&
> > need_switch_pathgroup(mpp, 1))
> > -                               switch_pathgroup(mpp);
> > +                       if (!mpp->failback_tick &&
> > +                           need_switch_pathgroup(mpp, 1,
> > &need_reload)) {
> > +                               if (need_reload)
> > +                                       reload_and_sync_map(mpp,
> > vecs, 0);
> > +                               else
> > +                                       switch_pathgroup(mpp);
> > +                       }
> >                 }
> >         }
> >  }
> > @@ -2579,6 +2585,7 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >                 int prio_changed = update_prio(pp, new_path_up);
> >                 bool need_refresh = (!new_path_up && prio_changed &&
> >                                      pp->priority != PRIO_UNDEF);
> > +               bool need_reload;
> >  
> >                 if (prio_changed &&
> >                     pp->mpp->pgpolicyfn == (pgpolicyfn
> > *)group_by_prio &&
> > @@ -2586,15 +2593,22 @@ check_path (struct vectors * vecs, struct
> > path * pp, unsigned int ticks)
> >                         condlog(2, "%s: path priorities changed.
> > reloading",
> >                                 pp->mpp->alias);
> >                         reload_and_sync_map(pp->mpp, vecs,
> > !new_path_up);
> > -               } else if (need_switch_pathgroup(pp->mpp,
> > need_refresh)) {
> > +               } else if (need_switch_pathgroup(pp->mpp,
> > need_refresh,
> > +                                                &need_reload)) {
> >                         if (pp->mpp->pgfailback > 0 &&
> >                             (new_path_up || pp->mpp->failback_tick <=
> > 0))
> >                                 pp->mpp->failback_tick =
> >                                         pp->mpp->pgfailback + 1;
> >                         else if (pp->mpp->pgfailback == -
> > FAILBACK_IMMEDIATE ||
> >                                  (chkr_new_path_up &&
> > -                                 followover_should_failback(pp)))
> > -                               switch_pathgroup(pp->mpp);
> > +                                 followover_should_failback(pp))) {
> > +                               if (need_reload)
> > +                                       reload_and_sync_map(pp->mpp,
> > vecs,
> > +                                                          
> > !need_refresh &&
> > +                                                          
> > !new_path_up);
> > +                               else
> > +                                       switch_pathgroup(pp->mpp);
> > +                       }
> >                 }
> >         }
> >         return 1;
> > @@ -2720,7 +2734,7 @@ unlock:
> >                 pthread_cleanup_push(cleanup_lock, &vecs->lock);
> >                 lock(&vecs->lock);
> >                 pthread_testcancel();
> > -               deferred_failback_tick(vecs->mpvec);
> > +               deferred_failback_tick(vecs);
> >                 retry_count_tick(vecs->mpvec);
> >                 missing_uev_wait_tick(vecs);
> >                 ghost_delay_tick(vecs);
> 
> 
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-05 19:08     ` Benjamin Marzinski
@ 2023-06-06  4:42       ` Benjamin Marzinski
  2023-06-06 14:55         ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-06  4:42 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski wrote:
> On Wed, May 31, 2023 at 04:27:30PM +0000, Martin Wilck wrote:
> > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > > need_switch_pathgroup() only checks if the currently used pathgroup
> > > is
> > > not the highest priority pathgroup. If it isn't, all multipathd does
> > > is
> > > instruct the kernel to switch to the correct pathgroup.  However, the
> > > kernel treats the pathgroups as if they were ordered by priority.
> > > When
> > > the kernel runs out of paths to use in the currently selected
> > > pathgroup,
> > > it will start checking the pathgroups in order until it finds one
> > > with
> > > usable paths.
> > > 
> > > need_switch_pathgroup() should also check if the pathgroups are out
> > > of
> > > order, and if so, multipathd should reload the map to reorder them
> > > correctly.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  libmultipath/libmultipath.version |  5 ++++
> > >  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
> > >  libmultipath/switchgroup.h        |  1 +
> > >  multipathd/main.c                 | 38 +++++++++++++++++++++--------
> > > --
> > >  4 files changed, 59 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/libmultipath/libmultipath.version
> > > b/libmultipath/libmultipath.version
> > > index 8f72c452..38074699 100644
> > > --- a/libmultipath/libmultipath.version
> > > +++ b/libmultipath/libmultipath.version
> > > @@ -237,3 +237,8 @@ global:
> > >  local:
> > >         *;
> > >  };
> > > +
> > > +LIBMULTIPATH_19.1.0 {
> > > +global:
> > > +       path_groups_in_order;
> > > +} LIBMULTIPATH_19.0.0;
> > > diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
> > > index b1e1f39b..b1180839 100644
> > > --- a/libmultipath/switchgroup.c
> > > +++ b/libmultipath/switchgroup.c
> > > @@ -7,6 +7,33 @@
> > >  #include "structs.h"
> > >  #include "switchgroup.h"
> > >  
> > > +bool path_groups_in_order(struct multipath *mpp)
> > > +{
> > > +       int i;
> > > +       struct pathgroup *pgp;
> > > +       bool seen_marginal_pg = false;
> > > +       int last_prio = INT_MAX;
> > > +
> > > +       if (VECTOR_SIZE(mpp->pg) < 2)
> > > +               return true;
> > > +
> > > +       vector_foreach_slot(mpp->pg, pgp, i) {
> > > +               /* skip pgs with PRIO_UNDEF, since this is likely
> > > temporary */
> > > +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> > > +                       continue;
> > > +               if (pgp->marginal && !seen_marginal_pg) {
> > > +                       last_prio = INT_MAX;
> > > +                       continue;
> > > +               }
> > > +               if (seen_marginal_pg && !pgp->marginal)
> > > +                       return false;
> > > +               if (pgp->priority > last_prio)
> > > +                       return false;
> > > +               last_prio = pgp->priority;
> > > +       }
> > > +       return true;
> > > +}
> > > +
> > >  void path_group_prio_update(struct pathgroup *pgp)
> > >  {
> > >         int i;
> > > diff --git a/libmultipath/switchgroup.h b/libmultipath/switchgroup.h
> > > index 9365e2e2..43dbb6c9 100644
> > > --- a/libmultipath/switchgroup.h
> > > +++ b/libmultipath/switchgroup.h
> > > @@ -1,2 +1,3 @@
> > >  void path_group_prio_update (struct pathgroup * pgp);
> > >  int select_path_group (struct multipath * mpp);
> > > +bool path_groups_in_order(struct multipath *mpp);
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index e7c272ad..2ea7c76b 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -396,7 +396,7 @@ void put_multipath_config(__attribute__((unused))
> > > void *arg)
> > >  }
> > >  
> > >  static int
> > > -need_switch_pathgroup (struct multipath * mpp, int refresh)
> > > +need_switch_pathgroup (struct multipath * mpp, int refresh, bool
> > > *need_reload)
> > >  {
> > >         struct pathgroup * pgp;
> > >         struct path * pp;
> > > @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath * mpp,
> > > int refresh)
> > >         struct config *conf;
> > >         int bestpg;
> > >  
> > > +       *need_reload = false;
> > >         if (!mpp)
> > >                 return 0;
> > >  
> > > @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath * mpp,
> > > int refresh)
> > >                 return 0;
> > >  
> > >         mpp->bestpg = bestpg;
> > > -       if (mpp->bestpg != mpp->nextpg)
> > > -               return 1;
> > > +       *need_reload = !path_groups_in_order(mpp);
> > 
> > This will start another loop over the path groups. Can we just
> > integrate the path_groups_in_order() logic into the loop right here?
> > 
> 
> Sure

Actually, after looking into this more, pushing those two functions
together makes the logic more confusing. Plus select_path_group() is
used by multiple other functions that don't need to check if the path
groups are out of order.

There's no reason for path_groups_in_order to be in libmultipath, since
it's only needed by multipathd. I'll fix that. But I would rather not
join it with select_path_group(). Since we just loop over the pathgroups
and not the paths within them, we will likely go through the loop a
couple of times, and we don't actually perform any costly actions during
the loops that would make combining them look more attractive. The
performance gains for need_switch_pathgroup() aren't worth making the
logic harder to follow (and the minor performance hits when we don't
need to check the order), IMHO. 

-Ben

> -Ben
> 
> > 
> > 
> > >  
> > > -       return 0;
> > > +       return (*need_reload || mpp->bestpg != mpp->nextpg);
> > >  }
> > >  
> > >  static void
> > > @@ -1982,20 +1982,26 @@ ghost_delay_tick(struct vectors *vecs)
> > >  }
> > >  
> > >  static void
> > > -deferred_failback_tick (vector mpvec)
> > > +deferred_failback_tick (struct vectors *vecs)
> > >  {
> > >         struct multipath * mpp;
> > >         unsigned int i;
> > > +       bool need_reload;
> > >  
> > > -       vector_foreach_slot (mpvec, mpp, i) {
> > > +       vector_foreach_slot (vecs->mpvec, mpp, i) {
> > >                 /*
> > >                  * deferred failback getting sooner
> > >                  */
> > >                 if (mpp->pgfailback > 0 && mpp->failback_tick > 0) {
> > >                         mpp->failback_tick--;
> > >  
> > > -                       if (!mpp->failback_tick &&
> > > need_switch_pathgroup(mpp, 1))
> > > -                               switch_pathgroup(mpp);
> > > +                       if (!mpp->failback_tick &&
> > > +                           need_switch_pathgroup(mpp, 1,
> > > &need_reload)) {
> > > +                               if (need_reload)
> > > +                                       reload_and_sync_map(mpp,
> > > vecs, 0);
> > > +                               else
> > > +                                       switch_pathgroup(mpp);
> > > +                       }
> > >                 }
> > >         }
> > >  }
> > > @@ -2579,6 +2585,7 @@ check_path (struct vectors * vecs, struct path
> > > * pp, unsigned int ticks)
> > >                 int prio_changed = update_prio(pp, new_path_up);
> > >                 bool need_refresh = (!new_path_up && prio_changed &&
> > >                                      pp->priority != PRIO_UNDEF);
> > > +               bool need_reload;
> > >  
> > >                 if (prio_changed &&
> > >                     pp->mpp->pgpolicyfn == (pgpolicyfn
> > > *)group_by_prio &&
> > > @@ -2586,15 +2593,22 @@ check_path (struct vectors * vecs, struct
> > > path * pp, unsigned int ticks)
> > >                         condlog(2, "%s: path priorities changed.
> > > reloading",
> > >                                 pp->mpp->alias);
> > >                         reload_and_sync_map(pp->mpp, vecs,
> > > !new_path_up);
> > > -               } else if (need_switch_pathgroup(pp->mpp,
> > > need_refresh)) {
> > > +               } else if (need_switch_pathgroup(pp->mpp,
> > > need_refresh,
> > > +                                                &need_reload)) {
> > >                         if (pp->mpp->pgfailback > 0 &&
> > >                             (new_path_up || pp->mpp->failback_tick <=
> > > 0))
> > >                                 pp->mpp->failback_tick =
> > >                                         pp->mpp->pgfailback + 1;
> > >                         else if (pp->mpp->pgfailback == -
> > > FAILBACK_IMMEDIATE ||
> > >                                  (chkr_new_path_up &&
> > > -                                 followover_should_failback(pp)))
> > > -                               switch_pathgroup(pp->mpp);
> > > +                                 followover_should_failback(pp))) {
> > > +                               if (need_reload)
> > > +                                       reload_and_sync_map(pp->mpp,
> > > vecs,
> > > +                                                          
> > > !need_refresh &&
> > > +                                                          
> > > !new_path_up);
> > > +                               else
> > > +                                       switch_pathgroup(pp->mpp);
> > > +                       }
> > >                 }
> > >         }
> > >         return 1;
> > > @@ -2720,7 +2734,7 @@ unlock:
> > >                 pthread_cleanup_push(cleanup_lock, &vecs->lock);
> > >                 lock(&vecs->lock);
> > >                 pthread_testcancel();
> > > -               deferred_failback_tick(vecs->mpvec);
> > > +               deferred_failback_tick(vecs);
> > >                 retry_count_tick(vecs->mpvec);
> > >                 missing_uev_wait_tick(vecs);
> > >                 ghost_delay_tick(vecs);
> > 
> > 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed
  2023-06-05 18:22     ` Benjamin Marzinski
@ 2023-06-06 14:08       ` Martin Wilck
  2023-06-06 15:33         ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2023-06-06 14:08 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Mon, 2023-06-05 at 13:22 -0500, Benjamin Marzinski wrote:
> On Wed, May 31, 2023 at 04:27:25PM +0000, Martin Wilck wrote:
> > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > > For multipath devices with path group policies other than
> > > group_by_prio,
> > > multipathd wasn't updating all the paths' priorities when calling
> > > need_switch_pathgroup(), even in cases where it likely was
> > > necessary.
> > > When a path just becomes usable again, all paths' priorities get
> > > updated
> > > by update_prio().  But if the priority changes on a path that is
> > > already
> > > up, the other paths' priorities only get updated if the multipath
> > > device
> > > uses the group_by_prio path_grouping_policy, otherwise
> > > need_switch_pathgroup() is called with refresh set to 0. But if
> > > the
> > > priority of the checked path has changed, then likely so have the
> > > priorities of other paths. Since the pathgroup's priority is the
> > > average
> > > of its paths' priorities, changing the priority of just one path
> > > may
> > > not
> > > change the average enough to reorder the pathgroups.
> > > 
> > > Instead, set refresh in need_switch_pathgroup() if the priorty
> > > has
> > > changed to something other than PRIO_UNDEF (which usually means
> > > an
> > > error
> > > has occured) and the priorities of the other paths haven't
> > > already
> > > been
> > > updated by update_prio().
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  multipathd/main.c | 35 +++++++++++++++++++++--------------
> > >  1 file changed, 21 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > index bdeffe76..e7c272ad 100644
> > > --- a/multipathd/main.c
> > > +++ b/multipathd/main.c
> > > @@ -2575,20 +2575,27 @@ check_path (struct vectors * vecs, struct
> > > path * pp, unsigned int ticks)
> > >  
> > >         if (marginal_changed)
> > >                 reload_and_sync_map(pp->mpp, vecs, 1);
> > > -       else if (update_prio(pp, new_path_up) &&
> > > -           (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio)
> > > &&
> > > -            pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > > -               condlog(2, "%s: path priorities changed.
> > > reloading",
> > > -                       pp->mpp->alias);
> > > -               reload_and_sync_map(pp->mpp, vecs, !new_path_up);
> > > -       } else if (need_switch_pathgroup(pp->mpp, 0)) {
> > > -               if (pp->mpp->pgfailback > 0 &&
> > > -                   (new_path_up || pp->mpp->failback_tick <= 0))
> > > -                       pp->mpp->failback_tick =
> > > -                               pp->mpp->pgfailback + 1;
> > > -               else if (pp->mpp->pgfailback == -
> > > FAILBACK_IMMEDIATE
> > > > > 
> > > -                        (chkr_new_path_up &&
> > > followover_should_failback(pp)))
> > > -                       switch_pathgroup(pp->mpp);
> > > +       else {
> > > +               int prio_changed = update_prio(pp, new_path_up);
> > > +               bool need_refresh = (!new_path_up && prio_changed
> > > &&
> > > +                                    pp->priority != PRIO_UNDEF);
> > > +
> > 
> > I have always found it confusing that we recalculate the priorities
> > in
> > two functions (update_prio() and need_switch_pathgroup()), passing
> > boolean flags back and forth. IMO we should move this logic to
> > update_prio(), so that we don't need to refresh any priorities in
> > need_switch_pathgroup() any more. after determining the prio of the
> > "primary" path device, update_prio() has all the information
> > it needs to figure out whether priorities of other paths must be
> > refreshed.
> > 
> > That would even make the code easier to understand, IMO.
> > 
> > Regards
> > Martin
> 
> So the difference in this code between when we choose to update all
> the
> paths' prios for the group_by_prio case, and when we choose to update
> all the paths' prios for the other pgpolicies comes down to how we
> treat
> PRIO_UNDEF. I didn't change the group_by_prio behavior.

My comment may have caused confusion, sorry. I just wanted to point out
that we could make the logic clearer by moving it into update_prio(),
on top of what you did, as in "while we're at it". 

>  So right now,
> for group_by_prio devices, we will update all the paths' priorities
> if
> the checked path switches priorities to PRIO_UNDEF. My question is,
> "Is
> this the right thing to do?"
> 
> In the best case, if the prioritizer fails on one path, it will fail
> on
> all the other paths in the pathgroup as well, so that they stay
> together. In the worst case it will fail on paths in other
> pathgroups,
> so that incorrect paths get grouped together. Granted, I'm not sure
> how
> much of a difference it makes in the worst case, since the other
> priorities would get checked eventually, and would get placed in the
> wrong group then.

No matter what we do, it's always just the state at some point in time.
If we update all priorities, we are as close to the "real" state of the
hardware as possible, at this given instant. We don't know what's going
to happen next. Paths could quickly recover and provide useful prio
values, but they might as well not. Or their prio could change, and the
value we just obtained would be obsolete. It makes no sense to reason
about the "future".

> Perhaps it would be better to treat PRIO_UNDEF like PATH_PENDING,
> where
> we will continue to use the old priority if we get a PRIO_UNDEF
> result.

Let's have a look where PRIO_UNDEF occurs. Unless I'm overlooking
something, get_prio() returns PRIO_UNDEF if no prio algorithm is
selected, or if the prio algo returns an error *and* the path state as
returned by path_offline() is neither DOWN nor PENDING. From
path_offline(), this means the state must be either PATH_REMOVED (no
point in trying a assign a prio, UNDEF is ok) or PATH_UP, i.e.
"running". The last case is strange. It can mean a very short-lived
failure, in which case we could consider retrying prio_getprio() from
get_prio(), or a general problem with the prio algorithm for the path,
in which case UNDEF would again be ok (but then, how did the same
prioritizer assign a valid priority previously?).

I think that none of these cases really justifies treating UNDEF like
PENDING, except _maybe_ the "running" case. If that's agreed, we should
probably just change the way this case is handled in get_prio().

> The choices are:
> 1. make both the group_by_prio and the non-group_by_prio cases
> recheck
>    all paths on PRIO_UNDEF (this keeps the group_by_prio behavior the
>    same).
> 2. make both cases NOT recheck all paths on PRIO_UNDEF.
> 3. keep the destinction between the two (make update_prio() check the
>    pgplicy, and act accordingly)
> 4. Make paths keep their previous priority when they would have
> returned
>    PRIO_UNDEF, so we never switch to PRIO_UNDEF.
> 
> All the choices except 3 seem reasonable. 1 keeps things how they are
> for group_by_prio. 2 leans towards moving PRIO_UNDEF paths out of
> their
> current pathgroup.  4 leans towards keeping PRIO_UNDEF paths in their
> current pathgroup.

I agree that 3) makes no sense. I argued above that I don't think 4)
does, either. Wrt 1) vs. 2), we should realize that the checker loop
will eventually run over all paths anyway.  With 1) we are as close as
possible to the kernel state at any instant, but we may recalculate
priorities (and possibly regroup) repeatedly during a single checker
loop iteration, which is suboptimal [1]. With 2), we might never have
the map in a correct state, not even after the checker loop has
finished.

I think we should go with 1), and consider a later change where we just
set a marker in the checker loop, and do prio updates / path regrouping
once per map after the checker loop has finished. This requires more
changes for the checker loop though, and is out of scope for your
current patch set.

I wouldn't worry too much about group_by_prio. Regrouping is by design
with this grouping policy, and it's expected that this results in
incorrect grouping, at least temporarily. Where this is undesirable, 
your new group_by_tpg will come to the rescue.

> The other question is, what do we do for the delayed case. Right now,
> once we finish waiting for our delay in deferred_failback_tick(), we
> automatically refresh the priorities of all the devices in our
> need_switch_pathgroup() call.  We could add an update_prio() call
> before
> it to keep this behavior, but if we are already refreshing all the
> paths' priorities when we need to, I'm not sure that it's necessary
> to
> do it again here.

Well if we calculated priorities in update_prio() only as I suggested
in my previous post, we'd call update_prio() in this code path and
change the way need_switch_pathgroup() works. But I admit I haven't
thought this through, and it can be done in a separate set, anyway.

Regards
Martin

[1] it seems dumb to reason about "performance" here, but we both know
that execution time in the checker loop can become critical if there
are lots of path devices.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-06  4:42       ` Benjamin Marzinski
@ 2023-06-06 14:55         ` Martin Wilck
  2023-06-06 15:54           ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2023-06-06 14:55 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski wrote:
> > On Wed, May 31, 2023 at 04:27:30PM +0000, Martin Wilck wrote:
> > > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > > > need_switch_pathgroup() only checks if the currently used
> > > > pathgroup
> > > > is
> > > > not the highest priority pathgroup. If it isn't, all multipathd
> > > > does
> > > > is
> > > > instruct the kernel to switch to the correct pathgroup. 
> > > > However, the
> > > > kernel treats the pathgroups as if they were ordered by
> > > > priority.
> > > > When
> > > > the kernel runs out of paths to use in the currently selected
> > > > pathgroup,
> > > > it will start checking the pathgroups in order until it finds
> > > > one
> > > > with
> > > > usable paths.
> > > > 
> > > > need_switch_pathgroup() should also check if the pathgroups are
> > > > out
> > > > of
> > > > order, and if so, multipathd should reload the map to reorder
> > > > them
> > > > correctly.
> > > > 
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > ---
> > > >  libmultipath/libmultipath.version |  5 ++++
> > > >  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
> > > >  libmultipath/switchgroup.h        |  1 +
> > > >  multipathd/main.c                 | 38 +++++++++++++++++++++--
> > > > ------
> > > > --
> > > >  4 files changed, 59 insertions(+), 12 deletions(-)
> > > > 
> > > > diff --git a/libmultipath/libmultipath.version
> > > > b/libmultipath/libmultipath.version
> > > > index 8f72c452..38074699 100644
> > > > --- a/libmultipath/libmultipath.version
> > > > +++ b/libmultipath/libmultipath.version
> > > > @@ -237,3 +237,8 @@ global:
> > > >  local:
> > > >         *;
> > > >  };
> > > > +
> > > > +LIBMULTIPATH_19.1.0 {
> > > > +global:
> > > > +       path_groups_in_order;
> > > > +} LIBMULTIPATH_19.0.0;
> > > > diff --git a/libmultipath/switchgroup.c
> > > > b/libmultipath/switchgroup.c
> > > > index b1e1f39b..b1180839 100644
> > > > --- a/libmultipath/switchgroup.c
> > > > +++ b/libmultipath/switchgroup.c
> > > > @@ -7,6 +7,33 @@
> > > >  #include "structs.h"
> > > >  #include "switchgroup.h"
> > > >  
> > > > +bool path_groups_in_order(struct multipath *mpp)
> > > > +{
> > > > +       int i;
> > > > +       struct pathgroup *pgp;
> > > > +       bool seen_marginal_pg = false;
> > > > +       int last_prio = INT_MAX;
> > > > +
> > > > +       if (VECTOR_SIZE(mpp->pg) < 2)
> > > > +               return true;
> > > > +
> > > > +       vector_foreach_slot(mpp->pg, pgp, i) {
> > > > +               /* skip pgs with PRIO_UNDEF, since this is
> > > > likely
> > > > temporary */
> > > > +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> > > > +                       continue;
> > > > +               if (pgp->marginal && !seen_marginal_pg) {
> > > > +                       last_prio = INT_MAX;
> > > > +                       continue;
> > > > +               }
> > > > +               if (seen_marginal_pg && !pgp->marginal)
> > > > +                       return false;
> > > > +               if (pgp->priority > last_prio)
> > > > +                       return false;
> > > > +               last_prio = pgp->priority;
> > > > +       }
> > > > +       return true;
> > > > +}
> > > > +
> > > >  void path_group_prio_update(struct pathgroup *pgp)
> > > >  {
> > > >         int i;
> > > > diff --git a/libmultipath/switchgroup.h
> > > > b/libmultipath/switchgroup.h
> > > > index 9365e2e2..43dbb6c9 100644
> > > > --- a/libmultipath/switchgroup.h
> > > > +++ b/libmultipath/switchgroup.h
> > > > @@ -1,2 +1,3 @@
> > > >  void path_group_prio_update (struct pathgroup * pgp);
> > > >  int select_path_group (struct multipath * mpp);
> > > > +bool path_groups_in_order(struct multipath *mpp);
> > > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > > index e7c272ad..2ea7c76b 100644
> > > > --- a/multipathd/main.c
> > > > +++ b/multipathd/main.c
> > > > @@ -396,7 +396,7 @@ void
> > > > put_multipath_config(__attribute__((unused))
> > > > void *arg)
> > > >  }
> > > >  
> > > >  static int
> > > > -need_switch_pathgroup (struct multipath * mpp, int refresh)
> > > > +need_switch_pathgroup (struct multipath * mpp, int refresh,
> > > > bool
> > > > *need_reload)
> > > >  {
> > > >         struct pathgroup * pgp;
> > > >         struct path * pp;
> > > > @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath *
> > > > mpp,
> > > > int refresh)
> > > >         struct config *conf;
> > > >         int bestpg;
> > > >  
> > > > +       *need_reload = false;
> > > >         if (!mpp)
> > > >                 return 0;
> > > >  
> > > > @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath *
> > > > mpp,
> > > > int refresh)
> > > >                 return 0;
> > > >  
> > > >         mpp->bestpg = bestpg;
> > > > -       if (mpp->bestpg != mpp->nextpg)
> > > > -               return 1;
> > > > +       *need_reload = !path_groups_in_order(mpp);
> > > 
> > > This will start another loop over the path groups. Can we just
> > > integrate the path_groups_in_order() logic into the loop right
> > > here?
> > > 
> > 
> > Sure
> 
> Actually, after looking into this more, pushing those two functions
> together makes the logic more confusing. Plus select_path_group() is
> used by multiple other functions that don't need to check if the path
> groups are out of order.

Hm. Can it happen at all that select_path_group() returns something
other than 1 but path_groups_in_order() returns true? 

If we follow the mindset you layed out in your patch ("the kernel
treats the pathgroups as if they were ordered by priority")
consequently, just determining bestpg is not enough; we'd need to sort
the PGs every time (except for a user-triggered switchgroup command).
IMO whenever we reload the map anyway (e.g. in setup_map()) we should
make sure that the PGs are properly sorted. Using "switch_group"
instead of a full reload is just an optimization because the kernel
operation is more light-weight than a full reload. But as soon as we
have e.g. a marginal path group, reordering is probably a better idea
most of the time.

I agree that that would be another patch set, but I think that
determining the best path group and checking whether the groups are
correctly ordered are very closely related tasks.

But it's not a religious matter to me, so proceed with what you
consider best at this time.

> There's no reason for path_groups_in_order to be in libmultipath,
> since
> it's only needed by multipathd. I'll fix that. But I would rather not
> join it with select_path_group(). Since we just loop over the
> pathgroups
> and not the paths within them, we will likely go through the loop a
> couple of times, and we don't actually perform any costly actions
> during
> the loops that would make combining them look more attractive. The
> performance gains for need_switch_pathgroup() aren't worth making the
> logic harder to follow (and the minor performance hits when we don't
> need to check the order), IMHO. 
> 

Regards,
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed
  2023-06-06 14:08       ` Martin Wilck
@ 2023-06-06 15:33         ` Benjamin Marzinski
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-06 15:33 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Tue, Jun 06, 2023 at 02:08:04PM +0000, Martin Wilck wrote:
> On Mon, 2023-06-05 at 13:22 -0500, Benjamin Marzinski wrote:
> > On Wed, May 31, 2023 at 04:27:25PM +0000, Martin Wilck wrote:
> > > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > > > For multipath devices with path group policies other than
> > > > group_by_prio,
> > > > multipathd wasn't updating all the paths' priorities when calling
> > > > need_switch_pathgroup(), even in cases where it likely was
> > > > necessary.
> > > > When a path just becomes usable again, all paths' priorities get
> > > > updated
> > > > by update_prio().  But if the priority changes on a path that is
> > > > already
> > > > up, the other paths' priorities only get updated if the multipath
> > > > device
> > > > uses the group_by_prio path_grouping_policy, otherwise
> > > > need_switch_pathgroup() is called with refresh set to 0. But if
> > > > the
> > > > priority of the checked path has changed, then likely so have the
> > > > priorities of other paths. Since the pathgroup's priority is the
> > > > average
> > > > of its paths' priorities, changing the priority of just one path
> > > > may
> > > > not
> > > > change the average enough to reorder the pathgroups.
> > > > 
> > > > Instead, set refresh in need_switch_pathgroup() if the priorty
> > > > has
> > > > changed to something other than PRIO_UNDEF (which usually means
> > > > an
> > > > error
> > > > has occured) and the priorities of the other paths haven't
> > > > already
> > > > been
> > > > updated by update_prio().
> > > > 
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > ---
> > > >  multipathd/main.c | 35 +++++++++++++++++++++--------------
> > > >  1 file changed, 21 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > > index bdeffe76..e7c272ad 100644
> > > > --- a/multipathd/main.c
> > > > +++ b/multipathd/main.c
> > > > @@ -2575,20 +2575,27 @@ check_path (struct vectors * vecs, struct
> > > > path * pp, unsigned int ticks)
> > > >  
> > > >         if (marginal_changed)
> > > >                 reload_and_sync_map(pp->mpp, vecs, 1);
> > > > -       else if (update_prio(pp, new_path_up) &&
> > > > -           (pp->mpp->pgpolicyfn == (pgpolicyfn *)group_by_prio)
> > > > &&
> > > > -            pp->mpp->pgfailback == -FAILBACK_IMMEDIATE) {
> > > > -               condlog(2, "%s: path priorities changed.
> > > > reloading",
> > > > -                       pp->mpp->alias);
> > > > -               reload_and_sync_map(pp->mpp, vecs, !new_path_up);
> > > > -       } else if (need_switch_pathgroup(pp->mpp, 0)) {
> > > > -               if (pp->mpp->pgfailback > 0 &&
> > > > -                   (new_path_up || pp->mpp->failback_tick <= 0))
> > > > -                       pp->mpp->failback_tick =
> > > > -                               pp->mpp->pgfailback + 1;
> > > > -               else if (pp->mpp->pgfailback == -
> > > > FAILBACK_IMMEDIATE
> > > > > > 
> > > > -                        (chkr_new_path_up &&
> > > > followover_should_failback(pp)))
> > > > -                       switch_pathgroup(pp->mpp);
> > > > +       else {
> > > > +               int prio_changed = update_prio(pp, new_path_up);
> > > > +               bool need_refresh = (!new_path_up && prio_changed
> > > > &&
> > > > +                                    pp->priority != PRIO_UNDEF);
> > > > +
> > > 
> > > I have always found it confusing that we recalculate the priorities
> > > in
> > > two functions (update_prio() and need_switch_pathgroup()), passing
> > > boolean flags back and forth. IMO we should move this logic to
> > > update_prio(), so that we don't need to refresh any priorities in
> > > need_switch_pathgroup() any more. after determining the prio of the
> > > "primary" path device, update_prio() has all the information
> > > it needs to figure out whether priorities of other paths must be
> > > refreshed.
> > > 
> > > That would even make the code easier to understand, IMO.
> > > 
> > > Regards
> > > Martin
> > 
> > So the difference in this code between when we choose to update all
> > the
> > paths' prios for the group_by_prio case, and when we choose to update
> > all the paths' prios for the other pgpolicies comes down to how we
> > treat
> > PRIO_UNDEF. I didn't change the group_by_prio behavior.
> 
> My comment may have caused confusion, sorry. I just wanted to point out
> that we could make the logic clearer by moving it into update_prio(),
> on top of what you did, as in "while we're at it". 
> 
> >  So right now,
> > for group_by_prio devices, we will update all the paths' priorities
> > if
> > the checked path switches priorities to PRIO_UNDEF. My question is,
> > "Is
> > this the right thing to do?"
> > 
> > In the best case, if the prioritizer fails on one path, it will fail
> > on
> > all the other paths in the pathgroup as well, so that they stay
> > together. In the worst case it will fail on paths in other
> > pathgroups,
> > so that incorrect paths get grouped together. Granted, I'm not sure
> > how
> > much of a difference it makes in the worst case, since the other
> > priorities would get checked eventually, and would get placed in the
> > wrong group then.
> 
> No matter what we do, it's always just the state at some point in time.
> If we update all priorities, we are as close to the "real" state of the
> hardware as possible, at this given instant. We don't know what's going
> to happen next. Paths could quickly recover and provide useful prio
> values, but they might as well not. Or their prio could change, and the
> value we just obtained would be obsolete. It makes no sense to reason
> about the "future".
> 
> > Perhaps it would be better to treat PRIO_UNDEF like PATH_PENDING,
> > where
> > we will continue to use the old priority if we get a PRIO_UNDEF
> > result.
> 
> Let's have a look where PRIO_UNDEF occurs. Unless I'm overlooking
> something, get_prio() returns PRIO_UNDEF if no prio algorithm is
> selected, or if the prio algo returns an error *and* the path state as
> returned by path_offline() is neither DOWN nor PENDING. From
> path_offline(), this means the state must be either PATH_REMOVED (no
> point in trying a assign a prio, UNDEF is ok) or PATH_UP, i.e.
> "running". The last case is strange. It can mean a very short-lived
> failure, in which case we could consider retrying prio_getprio() from
> get_prio(), or a general problem with the prio algorithm for the path,
> in which case UNDEF would again be ok (but then, how did the same
> prioritizer assign a valid priority previously?).
> 
> I think that none of these cases really justifies treating UNDEF like
> PENDING, except _maybe_ the "running" case. If that's agreed, we should
> probably just change the way this case is handled in get_prio().

I actually think that the only case where it might NOT make sense to
treat UNDEF like PENDING is the "running" case. If the device no longer
exists, I would argue that keeping the old priority is the best option.
Why do the extra work associated with a path changing priority, when in
reality the path is gone? In all the other cases, I don't see how a path
ever gets out of PRIO_UNDEF to start with. So keeping the old priority
is the same as just returning PRIO_UNDEF in those cases. The only case
were the PRIO_UNDEF is really interesting is the "running" case. If we
think that this is a short lived failure, then doing a bunch of work and
moving the paths around might just be wasted effort. When the failure
resolves, the path might still have the priority it previously had. If
the path wasn't previously in PRIO_UNDEF, then we can be pretty sure
that the existing priority is a temporary error, but we don't know how
long it error will last, and whether the priority after the error will
be the same or different than it was before.

I'm also not sure that retrying the priority immediately is the right
answer, since that might just waste more time and give us the same
result. Waiting till the next path check makes more sense to me. If this
is a very short lived error, and the path did change priority then we
may well not need to wait till the next path_check() for it to get
updated. Assuming this path changed its priority, then likely other
paths will be changing priorities too. When we check them, if the
failure has resolved, we will end up rechecking this path while
refreshing all the priorities.

I don't want to make it seem like I feel like this (option 4 below)
is the obviously right answer. But I do feel like it is just as valid an
option as 1) or 2).

 
> > The choices are:
> > 1. make both the group_by_prio and the non-group_by_prio cases
> > recheck
> >    all paths on PRIO_UNDEF (this keeps the group_by_prio behavior the
> >    same).
> > 2. make both cases NOT recheck all paths on PRIO_UNDEF.
> > 3. keep the destinction between the two (make update_prio() check the
> >    pgplicy, and act accordingly)
> > 4. Make paths keep their previous priority when they would have
> > returned
> >    PRIO_UNDEF, so we never switch to PRIO_UNDEF.
> > 
> > All the choices except 3 seem reasonable. 1 keeps things how they are
> > for group_by_prio. 2 leans towards moving PRIO_UNDEF paths out of
> > their
> > current pathgroup.  4 leans towards keeping PRIO_UNDEF paths in their
> > current pathgroup.
> 
> I agree that 3) makes no sense. I argued above that I don't think 4)
> does, either. Wrt 1) vs. 2), we should realize that the checker loop
> will eventually run over all paths anyway.  With 1) we are as close as
> possible to the kernel state at any instant, but we may recalculate
> priorities (and possibly regroup) repeatedly during a single checker
> loop iteration, which is suboptimal [1]. With 2), we might never have
> the map in a correct state, not even after the checker loop has
> finished.
> 
> I think we should go with 1), and consider a later change where we just
> set a marker in the checker loop, and do prio updates / path regrouping
> once per map after the checker loop has finished. This requires more
> changes for the checker loop though, and is out of scope for your
> current patch set.
> 
> I wouldn't worry too much about group_by_prio. Regrouping is by design
> with this grouping policy, and it's expected that this results in
> incorrect grouping, at least temporarily. Where this is undesirable, 
> your new group_by_tpg will come to the rescue.

I ended up going with 1) when in my next version, on the groups that
this keeps our behavior the same, and nobody has been complaining about
it.
 
> > The other question is, what do we do for the delayed case. Right now,
> > once we finish waiting for our delay in deferred_failback_tick(), we
> > automatically refresh the priorities of all the devices in our
> > need_switch_pathgroup() call.  We could add an update_prio() call
> > before
> > it to keep this behavior, but if we are already refreshing all the
> > paths' priorities when we need to, I'm not sure that it's necessary
> > to
> > do it again here.
> 
> Well if we calculated priorities in update_prio() only as I suggested
> in my previous post, we'd call update_prio() in this code path and
> change the way need_switch_pathgroup() works. But I admit I haven't
> thought this through, and it can be done in a separate set, anyway.

I ended up not refreshing the priorities in deferred_failback_tick(). We
will have already done it in check_path() if necessary, either when we
started the deferred timer, or if we notice it's necessary while waiting
for the timeout.

-Ben

> Regards
> Martin
> 
> [1] it seems dumb to reason about "performance" here, but we both know
> that execution time in the checker loop can become critical if there
> are lots of path devices.
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-06 14:55         ` Martin Wilck
@ 2023-06-06 15:54           ` Benjamin Marzinski
  2023-06-06 16:32             ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-06 15:54 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski wrote:
> > > On Wed, May 31, 2023 at 04:27:30PM +0000, Martin Wilck wrote:
> > > > On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> > > > > need_switch_pathgroup() only checks if the currently used
> > > > > pathgroup
> > > > > is
> > > > > not the highest priority pathgroup. If it isn't, all multipathd
> > > > > does
> > > > > is
> > > > > instruct the kernel to switch to the correct pathgroup. 
> > > > > However, the
> > > > > kernel treats the pathgroups as if they were ordered by
> > > > > priority.
> > > > > When
> > > > > the kernel runs out of paths to use in the currently selected
> > > > > pathgroup,
> > > > > it will start checking the pathgroups in order until it finds
> > > > > one
> > > > > with
> > > > > usable paths.
> > > > > 
> > > > > need_switch_pathgroup() should also check if the pathgroups are
> > > > > out
> > > > > of
> > > > > order, and if so, multipathd should reload the map to reorder
> > > > > them
> > > > > correctly.
> > > > > 
> > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > > ---
> > > > >  libmultipath/libmultipath.version |  5 ++++
> > > > >  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
> > > > >  libmultipath/switchgroup.h        |  1 +
> > > > >  multipathd/main.c                 | 38 +++++++++++++++++++++--
> > > > > ------
> > > > > --
> > > > >  4 files changed, 59 insertions(+), 12 deletions(-)
> > > > > 
> > > > > diff --git a/libmultipath/libmultipath.version
> > > > > b/libmultipath/libmultipath.version
> > > > > index 8f72c452..38074699 100644
> > > > > --- a/libmultipath/libmultipath.version
> > > > > +++ b/libmultipath/libmultipath.version
> > > > > @@ -237,3 +237,8 @@ global:
> > > > >  local:
> > > > >         *;
> > > > >  };
> > > > > +
> > > > > +LIBMULTIPATH_19.1.0 {
> > > > > +global:
> > > > > +       path_groups_in_order;
> > > > > +} LIBMULTIPATH_19.0.0;
> > > > > diff --git a/libmultipath/switchgroup.c
> > > > > b/libmultipath/switchgroup.c
> > > > > index b1e1f39b..b1180839 100644
> > > > > --- a/libmultipath/switchgroup.c
> > > > > +++ b/libmultipath/switchgroup.c
> > > > > @@ -7,6 +7,33 @@
> > > > >  #include "structs.h"
> > > > >  #include "switchgroup.h"
> > > > >  
> > > > > +bool path_groups_in_order(struct multipath *mpp)
> > > > > +{
> > > > > +       int i;
> > > > > +       struct pathgroup *pgp;
> > > > > +       bool seen_marginal_pg = false;
> > > > > +       int last_prio = INT_MAX;
> > > > > +
> > > > > +       if (VECTOR_SIZE(mpp->pg) < 2)
> > > > > +               return true;
> > > > > +
> > > > > +       vector_foreach_slot(mpp->pg, pgp, i) {
> > > > > +               /* skip pgs with PRIO_UNDEF, since this is
> > > > > likely
> > > > > temporary */
> > > > > +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> > > > > +                       continue;
> > > > > +               if (pgp->marginal && !seen_marginal_pg) {
> > > > > +                       last_prio = INT_MAX;
> > > > > +                       continue;
> > > > > +               }
> > > > > +               if (seen_marginal_pg && !pgp->marginal)
> > > > > +                       return false;
> > > > > +               if (pgp->priority > last_prio)
> > > > > +                       return false;
> > > > > +               last_prio = pgp->priority;
> > > > > +       }
> > > > > +       return true;
> > > > > +}
> > > > > +
> > > > >  void path_group_prio_update(struct pathgroup *pgp)
> > > > >  {
> > > > >         int i;
> > > > > diff --git a/libmultipath/switchgroup.h
> > > > > b/libmultipath/switchgroup.h
> > > > > index 9365e2e2..43dbb6c9 100644
> > > > > --- a/libmultipath/switchgroup.h
> > > > > +++ b/libmultipath/switchgroup.h
> > > > > @@ -1,2 +1,3 @@
> > > > >  void path_group_prio_update (struct pathgroup * pgp);
> > > > >  int select_path_group (struct multipath * mpp);
> > > > > +bool path_groups_in_order(struct multipath *mpp);
> > > > > diff --git a/multipathd/main.c b/multipathd/main.c
> > > > > index e7c272ad..2ea7c76b 100644
> > > > > --- a/multipathd/main.c
> > > > > +++ b/multipathd/main.c
> > > > > @@ -396,7 +396,7 @@ void
> > > > > put_multipath_config(__attribute__((unused))
> > > > > void *arg)
> > > > >  }
> > > > >  
> > > > >  static int
> > > > > -need_switch_pathgroup (struct multipath * mpp, int refresh)
> > > > > +need_switch_pathgroup (struct multipath * mpp, int refresh,
> > > > > bool
> > > > > *need_reload)
> > > > >  {
> > > > >         struct pathgroup * pgp;
> > > > >         struct path * pp;
> > > > > @@ -404,6 +404,7 @@ need_switch_pathgroup (struct multipath *
> > > > > mpp,
> > > > > int refresh)
> > > > >         struct config *conf;
> > > > >         int bestpg;
> > > > >  
> > > > > +       *need_reload = false;
> > > > >         if (!mpp)
> > > > >                 return 0;
> > > > >  
> > > > > @@ -430,10 +431,9 @@ need_switch_pathgroup (struct multipath *
> > > > > mpp,
> > > > > int refresh)
> > > > >                 return 0;
> > > > >  
> > > > >         mpp->bestpg = bestpg;
> > > > > -       if (mpp->bestpg != mpp->nextpg)
> > > > > -               return 1;
> > > > > +       *need_reload = !path_groups_in_order(mpp);
> > > > 
> > > > This will start another loop over the path groups. Can we just
> > > > integrate the path_groups_in_order() logic into the loop right
> > > > here?
> > > > 
> > > 
> > > Sure
> > 
> > Actually, after looking into this more, pushing those two functions
> > together makes the logic more confusing. Plus select_path_group() is
> > used by multiple other functions that don't need to check if the path
> > groups are out of order.
> 
> Hm. Can it happen at all that select_path_group() returns something
> other than 1 but path_groups_in_order() returns true? 

Yes. It might even be the common case. Say a switch goes down and all
the paths in the high priority pathgroup fail. The kernel will switch
over to a lower priority pathgroup. As long as those paths work, it
won't automatically switch back to the high priority pathgroup when we
tell it that those failed paths have recovered. It's multipath's job to
tell it when to proactively switch pathgroups. Since multipath doesn't
update the priority of failed paths, the pathgroups should still look
the same (unless you use group_by_prio and the path fails between
checking the state and running the prioritizer, in which case you will
likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's the
thing group_by_tpg is trying to resolve). 
 
> If we follow the mindset you layed out in your patch ("the kernel
> treats the pathgroups as if they were ordered by priority")
> consequently, just determining bestpg is not enough; we'd need to sort
> the PGs every time (except for a user-triggered switchgroup command).
> IMO whenever we reload the map anyway (e.g. in setup_map()) we should
> make sure that the PGs are properly sorted. Using "switch_group"
> instead of a full reload is just an optimization because the kernel
> operation is more light-weight than a full reload. But as soon as we
> have e.g. a marginal path group, reordering is probably a better idea
> most of the time.

We already do correctly order the paths in setup_map().
setup_map() -> group_paths() -> sort_pathgroup().  Actually, looking at
this, I don't see why we even bother to call select_path_group() in
setup_map(). The answer will always be 1, since we just sorted them.

-Ben
 
> I agree that that would be another patch set, but I think that
> determining the best path group and checking whether the groups are
> correctly ordered are very closely related tasks.
> 
> But it's not a religious matter to me, so proceed with what you
> consider best at this time.
> 
> > There's no reason for path_groups_in_order to be in libmultipath,
> > since
> > it's only needed by multipathd. I'll fix that. But I would rather not
> > join it with select_path_group(). Since we just loop over the
> > pathgroups
> > and not the paths within them, we will likely go through the loop a
> > couple of times, and we don't actually perform any costly actions
> > during
> > the loops that would make combining them look more attractive. The
> > performance gains for need_switch_pathgroup() aren't worth making the
> > logic harder to follow (and the minor performance hits when we don't
> > need to check the order), IMHO. 
> > 
> 
> Regards,
> Martin
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-06 15:54           ` Benjamin Marzinski
@ 2023-06-06 16:32             ` Martin Wilck
  2023-06-06 17:38               ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2023-06-06 16:32 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote:
> On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski

> > >  
> > > Actually, after looking into this more, pushing those two
> > > functions
> > > together makes the logic more confusing. Plus select_path_group()
> > > is
> > > used by multiple other functions that don't need to check if the
> > > path
> > > groups are out of order.
> > 
> > Hm. Can it happen at all that select_path_group() returns something
> > other than 1 but path_groups_in_order() returns true? 
> 
> Yes. It might even be the common case. Say a switch goes down and all
> the paths in the high priority pathgroup fail. The kernel will switch
> over to a lower priority pathgroup. As long as those paths work, it
> won't automatically switch back to the high priority pathgroup when
> we
> tell it that those failed paths have recovered. It's multipath's job
> to
> tell it when to proactively switch pathgroups. Since multipath
> doesn't
> update the priority of failed paths, the pathgroups should still look
> the same (unless you use group_by_prio and the path fails between
> checking the state and running the prioritizer, in which case you
> will
> likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's
> the
> thing group_by_tpg is trying to resolve). 

Ok, this is subtle; it's caused by the fact that path_groups_in_order()
ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will be
the prio of a PG with only failed paths), whereas select_path_group()
will ignore such PGs it in a different way - by never selecting them.
I hope I understand correctly now.

I have to say this is confusing. We have different concepts of how path
priority and path state together affect the PG priority, and we apply
these different concepts in different parts of the code. I'm not saying
it's wrong, but at the moment I'm too confused to tell if it's right.

>  
> > If we follow the mindset you layed out in your patch ("the kernel
> > treats the pathgroups as if they were ordered by priority")
> > consequently, just determining bestpg is not enough; we'd need to
> > sort
> > the PGs every time (except for a user-triggered switchgroup
> > command).
> > IMO whenever we reload the map anyway (e.g. in setup_map()) we
> > should
> > make sure that the PGs are properly sorted. Using "switch_group"
> > instead of a full reload is just an optimization because the kernel
> > operation is more light-weight than a full reload. But as soon as
> > we
> > have e.g. a marginal path group, reordering is probably a better
> > idea
> > most of the time.
> 
> We already do correctly order the paths in setup_map().
> setup_map() -> group_paths() -> sort_pathgroup().  Actually, looking
> at
> this, I don't see why we even bother to call select_path_group() in
> setup_map(). The answer will always be 1, since we just sorted them.
> 

Right. I suppose the call to select_path_groups() predates the one to
sort_pathgroups().

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-05-24 23:21 ` [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order Benjamin Marzinski
  2023-05-31 16:27   ` Martin Wilck
@ 2023-06-06 16:39   ` Martin Wilck
  1 sibling, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2023-06-06 16:39 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2023-05-24 at 18:21 -0500, Benjamin Marzinski wrote:
> need_switch_pathgroup() only checks if the currently used pathgroup
> is
> not the highest priority pathgroup. If it isn't, all multipathd does
> is
> instruct the kernel to switch to the correct pathgroup.  However, the
> kernel treats the pathgroups as if they were ordered by priority.
> When
> the kernel runs out of paths to use in the currently selected
> pathgroup,
> it will start checking the pathgroups in order until it finds one
> with
> usable paths.
> 
> need_switch_pathgroup() should also check if the pathgroups are out
> of
> order, and if so, multipathd should reload the map to reorder them
> correctly.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/libmultipath.version |  5 ++++
>  libmultipath/switchgroup.c        | 27 ++++++++++++++++++++++
>  libmultipath/switchgroup.h        |  1 +
>  multipathd/main.c                 | 38 +++++++++++++++++++++--------
> --
>  4 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/libmultipath/libmultipath.version
> b/libmultipath/libmultipath.version
> index 8f72c452..38074699 100644
> --- a/libmultipath/libmultipath.version
> +++ b/libmultipath/libmultipath.version
> @@ -237,3 +237,8 @@ global:
>  local:
>         *;
>  };
> +
> +LIBMULTIPATH_19.1.0 {
> +global:
> +       path_groups_in_order;
> +} LIBMULTIPATH_19.0.0;
> diff --git a/libmultipath/switchgroup.c b/libmultipath/switchgroup.c
> index b1e1f39b..b1180839 100644
> --- a/libmultipath/switchgroup.c
> +++ b/libmultipath/switchgroup.c
> @@ -7,6 +7,33 @@
>  #include "structs.h"
>  #include "switchgroup.h"
>  
> +bool path_groups_in_order(struct multipath *mpp)
> +{
> +       int i;
> +       struct pathgroup *pgp;
> +       bool seen_marginal_pg = false;
> +       int last_prio = INT_MAX;
> +
> +       if (VECTOR_SIZE(mpp->pg) < 2)
> +               return true;
> +
> +       vector_foreach_slot(mpp->pg, pgp, i) {
> +               /* skip pgs with PRIO_UNDEF, since this is likely
> temporary */
> +               if (!pgp->paths || pgp->priority == PRIO_UNDEF)
> +                       continue;
> +               if (pgp->marginal && !seen_marginal_pg) {
> +                       last_prio = INT_MAX;
> +                       continue;
> +               }
> +               if (seen_marginal_pg && !pgp->marginal)
> +                       return false;
> +               if (pgp->priority > last_prio)
> +                       return false;
> +               last_prio = pgp->priority;
> +       }
> +       return true;
> +}
> +

I stared at this code again while we were discussing, and I figured I
don't quite understand it. First of all, you never update
seen_marginal_pg. I suppose you want to set it in the if (pgp->marginal
&& !seen_marginal_pg) code block. But then, why set last_prio =
INT_MAX? If any non-marginal GP would follow, you would return false
anyway.

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-06 16:32             ` Martin Wilck
@ 2023-06-06 17:38               ` Benjamin Marzinski
  2023-06-06 18:53                 ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2023-06-06 17:38 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Tue, Jun 06, 2023 at 04:32:13PM +0000, Martin Wilck wrote:
> On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote:
> > On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> > > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski
> 
> > > >  
> > > > Actually, after looking into this more, pushing those two
> > > > functions
> > > > together makes the logic more confusing. Plus select_path_group()
> > > > is
> > > > used by multiple other functions that don't need to check if the
> > > > path
> > > > groups are out of order.
> > > 
> > > Hm. Can it happen at all that select_path_group() returns something
> > > other than 1 but path_groups_in_order() returns true? 
> > 
> > Yes. It might even be the common case. Say a switch goes down and all
> > the paths in the high priority pathgroup fail. The kernel will switch
> > over to a lower priority pathgroup. As long as those paths work, it
> > won't automatically switch back to the high priority pathgroup when
> > we
> > tell it that those failed paths have recovered. It's multipath's job
> > to
> > tell it when to proactively switch pathgroups. Since multipath
> > doesn't
> > update the priority of failed paths, the pathgroups should still look
> > the same (unless you use group_by_prio and the path fails between
> > checking the state and running the prioritizer, in which case you
> > will
> > likely get a PRIO_UNDEF and reconfigure the pathgroups, but that's
> > the
> > thing group_by_tpg is trying to resolve). 
> 
> Ok, this is subtle; it's caused by the fact that path_groups_in_order()
> ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will be
> the prio of a PG with only failed paths), whereas select_path_group()
> will ignore such PGs it in a different way - by never selecting them.
> I hope I understand correctly now.
> 
> I have to say this is confusing. We have different concepts of how path
> priority and path state together affect the PG priority, and we apply
> these different concepts in different parts of the code. I'm not saying
> it's wrong, but at the moment I'm too confused to tell if it's right.

It might make sense to have these be even more different.  Perhaps
select_path_group should stay the same but sort_pathgroups() and
path_groups_in_order() should just look at the priorities of the paths
that have a non-PRIO_UNDEF priority and use the total number of paths
for tie-breakers. This would mean that they would order the pathgroups
how they should be when everything is working correctly. That way if
something forces the kernel to pick a new pathgroup (which is likely the
failure of all the paths in the current pathgroup), it will switch to
the pathgroup that, all things being equal, should be the best.
Obviously, if there is another pathgroup of equal prioity with more
usable paths, multipath can tell it to switch to that one, but I assume
that even without group_by_prio, multiple pathgroups with the same
priority will be uncommon. 

In this situation, it would make sense to call select_path_group() after
calling group_paths(), since the first pathgroup might not currently be
the best pathgroup, if its paths are down. At any rate, it's not
something to worry about in this patchset.

-Ben

> >  
> > > If we follow the mindset you layed out in your patch ("the kernel
> > > treats the pathgroups as if they were ordered by priority")
> > > consequently, just determining bestpg is not enough; we'd need to
> > > sort
> > > the PGs every time (except for a user-triggered switchgroup
> > > command).
> > > IMO whenever we reload the map anyway (e.g. in setup_map()) we
> > > should
> > > make sure that the PGs are properly sorted. Using "switch_group"
> > > instead of a full reload is just an optimization because the kernel
> > > operation is more light-weight than a full reload. But as soon as
> > > we
> > > have e.g. a marginal path group, reordering is probably a better
> > > idea
> > > most of the time.
> > 
> > We already do correctly order the paths in setup_map().
> > setup_map() -> group_paths() -> sort_pathgroup().  Actually, looking
> > at
> > this, I don't see why we even bother to call select_path_group() in
> > setup_map(). The answer will always be 1, since we just sorted them.
> > 
> 
> Right. I suppose the call to select_path_groups() predates the one to
> sort_pathgroups().
> 
> Martin
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order
  2023-06-06 17:38               ` Benjamin Marzinski
@ 2023-06-06 18:53                 ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2023-06-06 18:53 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Tue, 2023-06-06 at 12:38 -0500, Benjamin Marzinski wrote:
> On Tue, Jun 06, 2023 at 04:32:13PM +0000, Martin Wilck wrote:
> > On Tue, 2023-06-06 at 10:54 -0500, Benjamin Marzinski wrote:
> > > On Tue, Jun 06, 2023 at 02:55:27PM +0000, Martin Wilck wrote:
> > > > On Mon, 2023-06-05 at 23:42 -0500, Benjamin Marzinski wrote:
> > > > > On Mon, Jun 05, 2023 at 02:08:07PM -0500, Benjamin Marzinski
> > 
> > > > >  
> > > > > Actually, after looking into this more, pushing those two
> > > > > functions
> > > > > together makes the logic more confusing. Plus
> > > > > select_path_group()
> > > > > is
> > > > > used by multiple other functions that don't need to check if
> > > > > the
> > > > > path
> > > > > groups are out of order.
> > > > 
> > > > Hm. Can it happen at all that select_path_group() returns
> > > > something
> > > > other than 1 but path_groups_in_order() returns true? 
> > > 
> > > Yes. It might even be the common case. Say a switch goes down and
> > > all
> > > the paths in the high priority pathgroup fail. The kernel will
> > > switch
> > > over to a lower priority pathgroup. As long as those paths work,
> > > it
> > > won't automatically switch back to the high priority pathgroup
> > > when
> > > we
> > > tell it that those failed paths have recovered. It's multipath's
> > > job
> > > to
> > > tell it when to proactively switch pathgroups. Since multipath
> > > doesn't
> > > update the priority of failed paths, the pathgroups should still
> > > look
> > > the same (unless you use group_by_prio and the path fails between
> > > checking the state and running the prioritizer, in which case you
> > > will
> > > likely get a PRIO_UNDEF and reconfigure the pathgroups, but
> > > that's
> > > the
> > > thing group_by_tpg is trying to resolve). 
> > 
> > Ok, this is subtle; it's caused by the fact that
> > path_groups_in_order()
> > ignores the ordering of PGs with pgp->prio = PRIO_UNDEF (which will
> > be
> > the prio of a PG with only failed paths), whereas
> > select_path_group()
> > will ignore such PGs it in a different way - by never selecting
> > them.
> > I hope I understand correctly now.
> > 
> > I have to say this is confusing. We have different concepts of how
> > path
> > priority and path state together affect the PG priority, and we
> > apply
> > these different concepts in different parts of the code. I'm not
> > saying
> > it's wrong, but at the moment I'm too confused to tell if it's
> > right.
> 
> It might make sense to have these be even more different.  Perhaps
> select_path_group should stay the same but sort_pathgroups() and
> path_groups_in_order() should just look at the priorities of the
> paths
> that have a non-PRIO_UNDEF priority and use the total number of paths
> for tie-breakers. This would mean that they would order the
> pathgroups
> how they should be when everything is working correctly. 

That makes sense, yes.

> That way if
> something forces the kernel to pick a new pathgroup (which is likely
> the
> failure of all the paths in the current pathgroup), it will switch to
> the pathgroup that, all things being equal, should be the best.
> Obviously, if there is another pathgroup of equal prioity with more
> usable paths, multipath can tell it to switch to that one, but I
> assume
> that even without group_by_prio, multiple pathgroups with the same
> priority will be uncommon. 

I suppose it can happen with with group_by_tpg. There may be one
optimized and multiple non-optimized PGs, for example.

> 
> In this situation, it would make sense to call select_path_group()
> after
> calling group_paths(), since the first pathgroup might not currently
> be
> the best pathgroup, if its paths are down. At any rate, it's not
> something to worry about in this patchset.

Right. What I'm worrying about is mostly whether I (or someone else)
will understand the rationale of all this a few years from now. It's ok
to use different logic when we sort the PGs and when we select the best
one, but it should be explained why, and how, we do it. So please add
some comments explaining it.

Thanks
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-06 18:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24 23:21 [dm-devel] [PATCH 0/5] priority and pathgroup switching changes Benjamin Marzinski
2023-05-24 23:21 ` [dm-devel] [PATCH 1/5] libmultipath: don't count PRIO_UNDEF paths for pathgroup priority Benjamin Marzinski
2023-05-31 16:01   ` Martin Wilck
2023-05-24 23:21 ` [dm-devel] [PATCH 2/5] multipath-tools tests: add tests to verify PRIO_UDEF changes Benjamin Marzinski
2023-05-31 16:04   ` Martin Wilck
2023-05-24 23:21 ` [dm-devel] [PATCH 3/5] multipathd: refresh all priorities if one has changed Benjamin Marzinski
2023-05-31 16:27   ` Martin Wilck
2023-06-05 18:22     ` Benjamin Marzinski
2023-06-06 14:08       ` Martin Wilck
2023-06-06 15:33         ` Benjamin Marzinski
2023-05-24 23:21 ` [dm-devel] [PATCH 4/5] multipathd: reload map if the path groups are out of order Benjamin Marzinski
2023-05-31 16:27   ` Martin Wilck
2023-06-05 19:08     ` Benjamin Marzinski
2023-06-06  4:42       ` Benjamin Marzinski
2023-06-06 14:55         ` Martin Wilck
2023-06-06 15:54           ` Benjamin Marzinski
2023-06-06 16:32             ` Martin Wilck
2023-06-06 17:38               ` Benjamin Marzinski
2023-06-06 18:53                 ` Martin Wilck
2023-06-06 16:39   ` Martin Wilck
2023-05-24 23:21 ` [dm-devel] [PATCH 5/5] multipathd: don't assume mpp->paths will exist in need_switch_pathgroup Benjamin Marzinski
2023-05-31 16:28   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.