All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Misc Multipath patches
@ 2019-02-22 16:58 Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The series is a mix of resends an new patches.

Patches 0001-0005 are simply resends of patches I've submitted earlier,
with no changes other that adding Reviewed-by's where appropriate.

Patches 0006-0009 are the result of running into some bugs during
firmare updates on an array. 

Benjamin Marzinski (9):
  libmultipath: disable user_friendly_names for NetApp
  libmultipath: handle existing paths in marginal_path enqueue
  multipathd: cleanup marginal paths checking timers
  libmultipath: fix marginal paths queueing errors
  libmultipath: fix marginal_paths nr_active check
  multipathd: Fix miscounting active paths
  multipathd: ignore failed wwid recheck
  libmutipath: continue to use old state on PATH_PENDING
  multipathd: use update_path_groups instead of reload_map

 libmultipath/discovery.c   | 18 ++++++----
 libmultipath/hwtable.c     |  1 +
 libmultipath/io_err_stat.c | 71 +++++++++++++++-----------------------
 libmultipath/io_err_stat.h |  2 +-
 multipath/main.c           |  2 +-
 multipathd/cli_handlers.c  |  2 +-
 multipathd/main.c          | 27 +++++++++------
 multipathd/main.h          |  2 ++
 8 files changed, 62 insertions(+), 63 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-26  8:43   ` Martin Wilck
  2019-02-22 16:58 ` [PATCH 2/9] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

NetApp has tools that rely on devices using WWID names. To avoid
breaking these, NetApp devices should continue to use WWID names, even
if the default config is set to enable user_friendly_names. If users
want to use user_friendly_names on NetApp devices, the must specifically
override the device config.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/hwtable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index d3a8d9b..8776411 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -719,6 +719,7 @@ static struct hwentry default_hw[] = {
 		.flush_on_last_del = FLUSH_ENABLED,
 		.dev_loss      = MAX_DEV_LOSS_TMO,
 		.prio_name     = PRIO_ONTAP,
+		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
 	},
 	{
 		/*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/9] libmultipath: handle existing paths in marginal_path enqueue
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 3/9] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If the path that enqueue_io_err_stat_by_path() is trying to add
is already on the list, just return success. There's no reason
to fail in this case.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 02b1453..1cb3ffe 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -254,7 +254,6 @@ static void free_io_err_pathvec(struct io_err_stat_pathvec *p)
  * return value
  * 0: enqueue OK
  * 1: fails because of internal error
- * 2: fails because of existing already
  */
 static int enqueue_io_err_stat_by_path(struct path *path)
 {
@@ -264,7 +263,7 @@ static int enqueue_io_err_stat_by_path(struct path *path)
 	p = find_err_path_by_dev(paths->pathvec, path->dev);
 	if (p) {
 		pthread_mutex_unlock(&paths->mutex);
-		return 2;
+		return 0;
 	}
 	pthread_mutex_unlock(&paths->mutex);
 
@@ -418,9 +417,8 @@ int hit_io_err_recheck_time(struct path *pp)
 			io_err_stat_log(3, "%s: enqueue fails, to recover",
 					pp->dev);
 			goto recover;
-		} else if (!r) {
+		} else
 			pp->io_err_pathfail_cnt = PATH_IO_ERR_IN_CHECKING;
-		}
 	}
 
 	return 1;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/9] multipathd: cleanup marginal paths checking timers
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 2/9] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 4/9] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When a path gets recovered in hit_io_err_recheck_time(), it will
continue running in check_path(), so there is no reason to schedule
another path check as soon as possible (since one is currently
happening).

Also, there isn't much point in restarting the io err stat checking when
the path is down, so hit_io_err_recheck_time() should only be run when
the path is up. Downed marginal paths can be treated just like any other
downed path.

Finally, there is no reason to set reset pp->io_err_dis_reinstate_time
when we decide to enqueue a path. Either th enqueue will fail and the
path will get recovered, or it will succeed, and we won't check the
reinstate time again until poll_io_err_stat() marks the path as needing
a requeue.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 8 --------
 multipathd/main.c          | 3 ++-
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 1cb3ffe..416e13a 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -400,13 +400,6 @@ int hit_io_err_recheck_time(struct path *pp)
 		io_err_stat_log(4, "%s: reschedule checking after %d seconds",
 				pp->dev,
 				pp->mpp->marginal_path_err_recheck_gap_time);
-		/*
-		 * to reschedule io error checking again
-		 * if the path is good enough, we claim it is good
-		 * and can be reinsated as soon as possible in the
-		 * check_path routine.
-		 */
-		pp->io_err_dis_reinstate_time = curr_time.tv_sec;
 		r = enqueue_io_err_stat_by_path(pp);
 		/*
 		 * Enqueue fails because of internal error.
@@ -426,7 +419,6 @@ int hit_io_err_recheck_time(struct path *pp)
 recover:
 	pp->io_err_pathfail_cnt = 0;
 	pp->io_err_disable_reinstate = 0;
-	pp->tick = 1;
 	return 0;
 }
 
diff --git a/multipathd/main.c b/multipathd/main.c
index fb520b6..fe6d8ef 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2079,7 +2079,8 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		return 1;
 	}
 
-	if (pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
+	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
+	    pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
 		pp->state = PATH_SHAKY;
 		/*
 		 * to reschedule as soon as possible,so that this path can
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/9] libmultipath: fix marginal paths queueing errors
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 3/9] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 5/9] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The current marginal paths code tries to enqueue paths for io error
checking when multipathd receives a uevent on path failure. This can run
into a couple of problems. First, this uevent could happen before or
after multipathd actually fails the path, so simply checking nr_active
doesn't tell us if this is the last active path. Also, The code to fail
the path in enqueue_io_err_stat_by_path() doesn't ever update the path
state. This can cause the path to get failed twice, temporarily leading
to incorrect nr_active counts. Further, The point when multipathd should
care if this is the last active path is when the path has come up again,
not when it goes down. Lastly, if the path is down, it is often
impossible to open the path device, causing setup_directio_ctx() to
fail, which causes multipathd to skip io error checking and mark the
path as not marginal.

Instead, multipathd should just make sure that if the path is marginal,
it gets failed in the uevent, so as not to race with the checkerloop
thread. Then, when the path comes back up, check_path() can enqueue it,
just like it does for paths that need to get rechecked. To make it
obvious that the state PATH_IO_ERR_IN_POLLING_RECHECK and the function
hit_io_err_recheck_time() now apply to paths waiting to be enqueued for
the first time as well, I've also changed their names to
PATH_IO_ERR_WAITING_TO_CHECK and need_io_err_check(), respectively.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 55 +++++++++++++++++---------------------
 libmultipath/io_err_stat.h |  2 +-
 multipathd/main.c          |  2 +-
 3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 416e13a..72aacf3 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -41,7 +41,7 @@
 #define CONCUR_NR_EVENT			32
 
 #define PATH_IO_ERR_IN_CHECKING		-1
-#define PATH_IO_ERR_IN_POLLING_RECHECK	-2
+#define PATH_IO_ERR_WAITING_TO_CHECK	-2
 
 #define io_err_stat_log(prio, fmt, args...) \
 	condlog(prio, "io error statistic: " fmt, ##args)
@@ -283,24 +283,6 @@ static int enqueue_io_err_stat_by_path(struct path *path)
 	vector_set_slot(paths->pathvec, p);
 	pthread_mutex_unlock(&paths->mutex);
 
-	if (!path->io_err_disable_reinstate) {
-		/*
-		 *fail the path in the kernel for the time of the to make
-		 *the test more reliable
-		 */
-		io_err_stat_log(3, "%s: fail dm path %s before checking",
-				path->mpp->alias, path->dev);
-		path->io_err_disable_reinstate = 1;
-		dm_fail_path(path->mpp->alias, path->dev_t);
-		update_queue_mode_del_path(path->mpp);
-
-		/*
-		 * schedule path check as soon as possible to
-		 * update path state to delayed state
-		 */
-		path->tick = 1;
-
-	}
 	io_err_stat_log(2, "%s: enqueue path %s to check",
 			path->mpp->alias, path->dev);
 	return 0;
@@ -317,7 +299,6 @@ free_ioerr_path:
 int io_err_stat_handle_pathfail(struct path *path)
 {
 	struct timespec curr_time;
-	int res;
 
 	if (uatomic_read(&io_err_thread_running) == 0)
 		return 1;
@@ -332,8 +313,6 @@ int io_err_stat_handle_pathfail(struct path *path)
 
 	if (!path->mpp)
 		return 1;
-	if (path->mpp->nr_active <= 1)
-		return 1;
 	if (path->mpp->marginal_path_double_failed_time <= 0 ||
 		path->mpp->marginal_path_err_sample_time <= 0 ||
 		path->mpp->marginal_path_err_recheck_gap_time <= 0 ||
@@ -371,17 +350,33 @@ int io_err_stat_handle_pathfail(struct path *path)
 	}
 	path->io_err_pathfail_cnt++;
 	if (path->io_err_pathfail_cnt >= FLAKY_PATHFAIL_THRESHOLD) {
-		res = enqueue_io_err_stat_by_path(path);
-		if (!res)
-			path->io_err_pathfail_cnt = PATH_IO_ERR_IN_CHECKING;
-		else
-			path->io_err_pathfail_cnt = 0;
+		path->io_err_disable_reinstate = 1;
+		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
+		/* enqueue path as soon as it comes up */
+		path->io_err_dis_reinstate_time = 0;
+		if (path->state != PATH_DOWN) {
+			struct config *conf;
+			int oldstate = path->state;
+			int checkint;
+
+			conf = get_multipath_config();
+			checkint = conf->checkint;
+			put_multipath_config(conf);
+			io_err_stat_log(2, "%s: mark as failed", path->dev);
+			path->mpp->stat_path_failures++;
+			path->state = PATH_DOWN;
+			path->dmstate = PSTATE_FAILED;
+			if (oldstate == PATH_UP || oldstate == PATH_GHOST)
+				update_queue_mode_del_path(path->mpp);
+			if (path->tick > checkint)
+				path->tick = checkint;
+		}
 	}
 
 	return 0;
 }
 
-int hit_io_err_recheck_time(struct path *pp)
+int need_io_err_check(struct path *pp)
 {
 	struct timespec curr_time;
 	int r;
@@ -392,7 +387,7 @@ int hit_io_err_recheck_time(struct path *pp)
 		io_err_stat_log(2, "%s: recover path early", pp->dev);
 		goto recover;
 	}
-	if (pp->io_err_pathfail_cnt != PATH_IO_ERR_IN_POLLING_RECHECK)
+	if (pp->io_err_pathfail_cnt != PATH_IO_ERR_WAITING_TO_CHECK)
 		return 1;
 	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
 	    (curr_time.tv_sec - pp->io_err_dis_reinstate_time) >
@@ -489,7 +484,7 @@ static int poll_io_err_stat(struct vectors *vecs, struct io_err_stat_path *pp)
 	} else if (path->mpp && path->mpp->nr_active > 1) {
 		io_err_stat_log(3, "%s: keep failing the dm path %s",
 				path->mpp->alias, path->dev);
-		path->io_err_pathfail_cnt = PATH_IO_ERR_IN_POLLING_RECHECK;
+		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
 		path->io_err_disable_reinstate = 1;
 		path->io_err_dis_reinstate_time = currtime.tv_sec;
 		io_err_stat_log(3, "%s: disable reinstating of %s",
diff --git a/libmultipath/io_err_stat.h b/libmultipath/io_err_stat.h
index bbf31b4..53d6d7d 100644
--- a/libmultipath/io_err_stat.h
+++ b/libmultipath/io_err_stat.h
@@ -10,6 +10,6 @@ extern pthread_attr_t io_err_stat_attr;
 int start_io_err_stat_thread(void *data);
 void stop_io_err_stat_thread(void);
 int io_err_stat_handle_pathfail(struct path *path);
-int hit_io_err_recheck_time(struct path *pp);
+int need_io_err_check(struct path *pp);
 
 #endif /* _IO_ERR_STAT_H */
diff --git a/multipathd/main.c b/multipathd/main.c
index fe6d8ef..43830e8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2080,7 +2080,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	}
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
-	    pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
+	    pp->io_err_disable_reinstate && need_io_err_check(pp)) {
 		pp->state = PATH_SHAKY;
 		/*
 		 * to reschedule as soon as possible,so that this path can
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/9] libmultipath: fix marginal_paths nr_active check
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 4/9] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-22 16:58 ` [PATCH 6/9] multipathd: Fix miscounting active paths Benjamin Marzinski
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Marginal paths are SHAKY, so they don't count towards the number of
active paths. poll_io_err_stat() shouldn't automatically reinstate a
marginal path if there already is an active path.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 72aacf3..554b777 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -481,7 +481,7 @@ static int poll_io_err_stat(struct vectors *vecs, struct io_err_stat_path *pp)
 		 */
 		path->tick = 1;
 
-	} else if (path->mpp && path->mpp->nr_active > 1) {
+	} else if (path->mpp && path->mpp->nr_active > 0) {
 		io_err_stat_log(3, "%s: keep failing the dm path %s",
 				path->mpp->alias, path->dev);
 		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/9] multipathd: Fix miscounting active paths
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 5/9] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-26  9:15   ` Martin Wilck
  2019-02-22 16:58 ` [PATCH 7/9] multipathd: ignore failed wwid recheck Benjamin Marzinski
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When multipathd gets a change uevent, it calls pathinfo with DI_NOIO.
This sets the path state to the return value of path_offline(). If a
path is in the PATH_DOWN state but path_offline() returns PATH_UP, when
that path gets a change event, its state will get moved to PATH_UP
without either reinstating the path, or reloading the map.  The next
call to check_path() will move the path back to PATH_DOWN. Since
check_path() simply increments and decrements nr_active instead of
calculating it based on the actual number of active paths, nr_active
will get decremented a second time for this failed path, potentially
putting the multipath device into recovery mode.

This commit does two things to avoid this situation. It makes the
DI_NOIO flag only set pp->state in pathinfo() if DI_CHECKER is also set.
This isn't set in uev_update_path() to avoid changing the path state in
this case.  Also, to guard against pp->state getting changed in some
other code path without properly updating the map state, check_path()
now calls set_no_path_retry, which recalculates nr_active based on the
actual number of active paths, and makes sure that the queue_if_no_path
value in the features line is correct.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 11 ++++++-----
 multipath/main.c         |  2 +-
 multipathd/main.c        |  4 +++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 10bd8cd..729bcb9 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1914,11 +1914,12 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 	if (path_state == PATH_REMOVED)
 		goto blank;
 	else if (mask & DI_NOIO) {
-		/*
-		 * Avoid any IO on the device itself.
-		 * Behave like DI_CHECKER in the "path unavailable" case.
-		 */
-		pp->chkrstate = pp->state = path_state;
+		if (mask & DI_CHECKER)
+			/*
+			 * Avoid any IO on the device itself.
+			 * simply use the path_offline() return as its state
+			 */
+			pp->chkrstate = pp->state = path_state;
 		return PATHINFO_OK;
 	}
 
diff --git a/multipath/main.c b/multipath/main.c
index 5abb118..69141db 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -356,7 +356,7 @@ static int check_usable_paths(struct config *conf,
 			pp->udev = get_udev_device(pp->dev_t, DEV_DEVT);
 			if (pp->udev == NULL)
 				continue;
-			if (pathinfo(pp, conf, DI_SYSFS|DI_NOIO) != PATHINFO_OK)
+			if (pathinfo(pp, conf, DI_SYSFS|DI_NOIO|DI_CHECKER) != PATHINFO_OK)
 				continue;
 
 			if (pp->state == PATH_UP &&
diff --git a/multipathd/main.c b/multipathd/main.c
index 43830e8..678ecf8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -392,7 +392,8 @@ static void set_no_path_retry(struct multipath *mpp)
 	default:
 		if (mpp->nr_active > 0) {
 			mpp->retry_tick = 0;
-			dm_queue_if_no_path(mpp->alias, 1);
+			if (!is_queueing)
+				dm_queue_if_no_path(mpp->alias, 1);
 		} else if (is_queueing && mpp->retry_tick == 0)
 			enter_recovery_mode(mpp);
 		break;
@@ -2072,6 +2073,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	/* if update_multipath_strings orphaned the path, quit early */
 	if (!pp->mpp)
 		return 0;
+	set_no_path_retry(pp->mpp);
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
 			check_path_reinstate_state(pp)) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 7/9] multipathd: ignore failed wwid recheck
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 6/9] multipathd: Fix miscounting active paths Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-26  9:42   ` Martin Wilck
  2019-02-22 16:58 ` [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If disable_changed_wwids is set, when multipathd gets a change event on
a path, it verifies that the wwid hasn't changed in uev_update_path().
If get_uid() failed, uev_update_path treated this as a wwid change to 0.
This could cause paths to suddenly be dropped due to an issue with
getting the wwid.  Even if get_uid() failed because the path was down,
it no change uevent happend when it later became active, multipathd
would continue to ignore the path.

Instead, multipathd should neither set nor clear wwid_changed if
get_uid() returned failure.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 678ecf8..81ad6c0 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1234,9 +1234,9 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			goto out;
 
 		strcpy(wwid, pp->wwid);
-		get_uid(pp, pp->state, uev->udev);
+		rc = get_uid(pp, pp->state, uev->udev);
 
-		if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
+		if (rc == 0 && strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
 			condlog(0, "%s: path wwid changed from '%s' to '%s'. %s",
 				uev->kernel, wwid, pp->wwid,
 				(disable_changed_wwids ? "disallowing" :
@@ -1252,7 +1252,8 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 				goto out;
 			}
 		} else {
-			pp->wwid_changed = 0;
+			if (rc == 0)
+				pp->wwid_changed = 0;
 			udev_device_unref(pp->udev);
 			pp->udev = udev_device_ref(uev->udev);
 			conf = get_multipath_config();
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 7/9] multipathd: ignore failed wwid recheck Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-26 10:12   ` Martin Wilck
  2019-02-22 16:58 ` [PATCH 9/9] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
  2019-02-26 10:50 ` [PATCH 0/9] Misc Multipath patches Martin Wilck
  9 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When pathinfo() sets pp->state to PATH_PENDING, it can cause problems
with path checking.  It should act more like check_path(). When
check_path() sees a new state of PATH_PENDING, it doesn't update the
path state at all, so a path's old state is normally never PATH_PENDING.

As and example of the problems of setting a path to PATH_PENDING, If
check_path() sets a path's state to PATH_UP, then a call to pathinfo()
sets the state to PATH_PENDING, and then another call the check_path()
sets the state to PATH_DOWN, multipathd won't fail the path in the
kernel. Also, if a path's state is PATH_PENDING, and nr_active is
recalculated, that path will count as down, even if the state was
previously PATH_UP. If a path already has a state of PATH_WILD or
PATH_UNCHECKED, changing it to PATH_PENDING won't hurt anything, and it
will help anyone who sees it know what's actually happening. But
otherwise, pathinfo() should leave the previous state alone.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 729bcb9..d3585f9 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1946,8 +1946,11 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 
 	if (mask & DI_CHECKER) {
 		if (path_state == PATH_UP) {
-			pp->chkrstate = pp->state = get_state(pp, conf, 0,
-							      path_state);
+			int newstate = get_state(pp, conf, 0, path_state);
+			if (newstate != PATH_PENDING ||
+			    pp->state == PATH_UNCHECKED ||
+			    pp->state == PATH_WILD)
+				pp->chkrstate = pp->state = newstate;
 			if (pp->state == PATH_TIMEOUT)
 				pp->state = PATH_DOWN;
 			if (pp->state == PATH_UP && !pp->size) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 9/9] multipathd: use update_path_groups instead of reload_map
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (7 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
@ 2019-02-22 16:58 ` Benjamin Marzinski
  2019-02-26 10:47   ` Martin Wilck
  2019-02-26 10:50 ` [PATCH 0/9] Misc Multipath patches Martin Wilck
  9 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-22 16:58 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

reload_map() doesn't do the work to sync the state after reloading the
map.  Instead of calling it directly, cli_reload() and uev_update_path()
should call update_path_groups(), which calls reload_map() with all the
necessary syncing.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/cli_handlers.c |  2 +-
 multipathd/main.c         | 13 ++++++++-----
 multipathd/main.h         |  2 ++
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index f95813e..60e17d6 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -877,7 +877,7 @@ cli_reload(void *v, char **reply, int *len, void *data)
 		return 1;
 	}
 
-	return reload_map(vecs, mpp, 0, 1);
+	return update_path_groups(mpp, vecs, 0);
 }
 
 int resize_map(struct multipath *mpp, unsigned long long size,
diff --git a/multipathd/main.c b/multipathd/main.c
index 81ad6c0..27ff186 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1272,10 +1272,13 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			else {
 				if (ro == 1)
 					pp->mpp->force_readonly = 1;
-				retval = reload_map(vecs, mpp, 0, 1);
-				pp->mpp->force_readonly = 0;
-				condlog(2, "%s: map %s reloaded (retval %d)",
-					uev->kernel, mpp->alias, retval);
+				retval = update_path_groups(mpp, vecs, 0);
+				if (retval == 2)
+					condlog(2, "%s: map removed during reload", pp->dev);
+				else {
+					pp->mpp->force_readonly = 0;
+					condlog(2, "%s: map %s reloaded (retval %d)", uev->kernel, mpp->alias, retval);
+				}
 			}
 		}
 	}
@@ -1831,7 +1834,7 @@ int update_path_groups(struct multipath *mpp, struct vectors *vecs, int refresh)
 
 	dm_lib_release();
 	if (setup_multipath(vecs, mpp) != 0)
-		return 1;
+		return 2;
 	sync_map_state(mpp);
 
 	return 0;
diff --git a/multipathd/main.h b/multipathd/main.h
index 8fd426b..e5c1398 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -43,5 +43,7 @@ int __setup_multipath (struct vectors * vecs, struct multipath * mpp,
 		       int reset);
 #define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
 int update_multipath (struct vectors *vecs, char *mapname, int reset);
+int update_path_groups(struct multipath *mpp, struct vectors *vecs,
+		       int refresh);
 
 #endif /* MAIN_H */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp
  2019-02-22 16:58 ` [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
@ 2019-02-26  8:43   ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2019-02-26  8:43 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> NetApp has tools that rely on devices using WWID names. To avoid
> breaking these, NetApp devices should continue to use WWID names,
> even
> if the default config is set to enable user_friendly_names. If users
> want to use user_friendly_names on NetApp devices, the must
> specifically
> override the device config.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/hwtable.c | 1 +
>  1 file changed, 1 insertion(+)

Ok. 

While looking at this, I realized that the multipath.conf(5) man page
doesn't mention the fact that user_friendly_names could be used in the
devices section of multipath.conf (which would be important for users
who want to override the NetApp default behavior).

Martin




> 
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index d3a8d9b..8776411 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -719,6 +719,7 @@ static struct hwentry default_hw[] = {
>  		.flush_on_last_del = FLUSH_ENABLED,
>  		.dev_loss      = MAX_DEV_LOSS_TMO,
>  		.prio_name     = PRIO_ONTAP,
> +		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
>  	},
>  	{
>  		/*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/9] multipathd: Fix miscounting active paths
  2019-02-22 16:58 ` [PATCH 6/9] multipathd: Fix miscounting active paths Benjamin Marzinski
@ 2019-02-26  9:15   ` Martin Wilck
  2019-02-26 10:16     ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2019-02-26  9:15 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> When multipathd gets a change uevent, it calls pathinfo with DI_NOIO.
> This sets the path state to the return value of path_offline(). If a
> path is in the PATH_DOWN state but path_offline() returns PATH_UP,
> when
> that path gets a change event, its state will get moved to PATH_UP
> without either reinstating the path, or reloading the map.  The next
> call to check_path() will move the path back to PATH_DOWN. Since
> check_path() simply increments and decrements nr_active instead of
> calculating it based on the actual number of active paths, nr_active
> will get decremented a second time for this failed path, potentially
> putting the multipath device into recovery mode.
> 
> This commit does two things to avoid this situation. It makes the
> DI_NOIO flag only set pp->state in pathinfo() if DI_CHECKER is also
> set.
> This isn't set in uev_update_path() to avoid changing the path state
> in
> this case.  Also, to guard against pp->state getting changed in some
> other code path without properly updating the map state, check_path()
> now calls set_no_path_retry, which recalculates nr_active based on
> the
> actual number of active paths, and makes sure that the
> queue_if_no_path
> value in the features line is correct.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 11 ++++++-----
>  multipath/main.c         |  2 +-
>  multipathd/main.c        |  4 +++-
>  3 files changed, 10 insertions(+), 7 deletions(-)

Thanks a lot for catching this! I was just hunting down a similar
problem. You may have saved me lots of hair-pulling.

While I'm excited abouot the first hunk, I'm a bit unsure about the
second one. I like the general idea (I'd be happy to do away with the
"blind" incrementing and decrementing of nr_active), but I see a
problem with PATH_PENDING paths, along the lines of thought layed out
in commit adf551f "setup_map: wait for pending path checkers to
finish". By counting only PATH_UP and PATH_GHOST in check_path(), we
are very likely to falsely regard some pending paths as "inactive" just
because their checker hasn't completed within a millisecond.
Consequently, we may falsely set a map to queuing (or worse, failing)
mode just because a few path checkers are still pending.

So for the time being, I'd rather apply the first hunk only. If we
still see multipathd's internal nr_active deviating from the real
situation, we need to put in some debugging code to find out where it
diverges.

In the long run, IMO we should separate the "queueing mode" logic
(which is map-related) from check_path() (which is path-related). We
should check all paths of a given map, and when we're done, decide upon
the queuing mode. PATH_PENDING will still need some extra thought.

Thanks
Martin



> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 10bd8cd..729bcb9 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1914,11 +1914,12 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>  	if (path_state == PATH_REMOVED)
>  		goto blank;
>  	else if (mask & DI_NOIO) {
> -		/*
> -		 * Avoid any IO on the device itself.
> -		 * Behave like DI_CHECKER in the "path unavailable"
> case.
> -		 */
> -		pp->chkrstate = pp->state = path_state;
> +		if (mask & DI_CHECKER)
> +			/*
> +			 * Avoid any IO on the device itself.
> +			 * simply use the path_offline() return as its
> state
> +			 */
> +			pp->chkrstate = pp->state = path_state;
>  		return PATHINFO_OK;
>  	}
>  
> diff --git a/multipath/main.c b/multipath/main.c
> index 5abb118..69141db 100644
> --- a/multipath/main.c
> +++ b/multipath/main.c
> @@ -356,7 +356,7 @@ static int check_usable_paths(struct config
> *conf,
>  			pp->udev = get_udev_device(pp->dev_t,
> DEV_DEVT);
>  			if (pp->udev == NULL)
>  				continue;
> -			if (pathinfo(pp, conf, DI_SYSFS|DI_NOIO) !=
> PATHINFO_OK)
> +			if (pathinfo(pp, conf,
> DI_SYSFS|DI_NOIO|DI_CHECKER) != PATHINFO_OK)
>  				continue;
>  
>  			if (pp->state == PATH_UP &&
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 43830e8..678ecf8 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -392,7 +392,8 @@ static void set_no_path_retry(struct multipath
> *mpp)
>  	default:
>  		if (mpp->nr_active > 0) {
>  			mpp->retry_tick = 0;
> -			dm_queue_if_no_path(mpp->alias, 1);
> +			if (!is_queueing)
> +				dm_queue_if_no_path(mpp->alias, 1);
>  		} else if (is_queueing && mpp->retry_tick == 0)
>  			enter_recovery_mode(mpp);
>  		break;
> @@ -2072,6 +2073,7 @@ check_path (struct vectors * vecs, struct path
> * pp, int ticks)
>  	/* if update_multipath_strings orphaned the path, quit early */
>  	if (!pp->mpp)
>  		return 0;
> +	set_no_path_retry(pp->mpp);
>  
>  	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
>  			check_path_reinstate_state(pp)) {

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/9] multipathd: ignore failed wwid recheck
  2019-02-22 16:58 ` [PATCH 7/9] multipathd: ignore failed wwid recheck Benjamin Marzinski
@ 2019-02-26  9:42   ` Martin Wilck
  2019-02-26 20:29     ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2019-02-26  9:42 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> If disable_changed_wwids is set, when multipathd gets a change event
> on
> a path, it verifies that the wwid hasn't changed in
> uev_update_path().
> If get_uid() failed, uev_update_path treated this as a wwid change to
> 0.
> This could cause paths to suddenly be dropped due to an issue with
> getting the wwid.  Even if get_uid() failed because the path was
> down,
> it no change uevent happend when it later became active, multipathd
> would continue to ignore the path.
> 
> Instead, multipathd should neither set nor clear wwid_changed if
> get_uid() returned failure.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/main.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Hm. Failure of get_uid() is a rather severe problem. With this change,
if get_uid() fails, pp->wwid won't be reset to the original value, and
dm_fail_path() won't be called. So the path will continue to be
considered "good", but it will have a zero WWID from this point on. Or
am I overlooking someting?

Maybe we should rather change the treatment of this case in
check_path(). In uev_update_path(), we could leave the WWID at 0 and
fail the path. Then in check_path(), when the path is up to be checked
next time, we'd detect the strlen(pp->wwid) == 0 case (indicating
previously failed get_uid()) and try to call get_uid() again to see if
this was just a temporary WWID lookup failure or if the WWID actually
changed (IOW: pp->wwid differs from pp->mpp->wwid). Would this make
sense?

Martin


> diff --git a/multipathd/main.c b/multipathd/main.c
> index 678ecf8..81ad6c0 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1234,9 +1234,9 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  			goto out;
>  
>  		strcpy(wwid, pp->wwid);
> -		get_uid(pp, pp->state, uev->udev);
> +		rc = get_uid(pp, pp->state, uev->udev);
>  
> -		if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> +		if (rc == 0 && strncmp(wwid, pp->wwid, WWID_SIZE) != 0)
> {
>  			condlog(0, "%s: path wwid changed from '%s' to
> '%s'. %s",
>  				uev->kernel, wwid, pp->wwid,
>  				(disable_changed_wwids ? "disallowing"
> :
> @@ -1252,7 +1252,8 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  				goto out;
>  			}
>  		} else {
> -			pp->wwid_changed = 0;
> +			if (rc == 0)
> +				pp->wwid_changed = 0;
>  			udev_device_unref(pp->udev);
>  			pp->udev = udev_device_ref(uev->udev);
>  			conf = get_multipath_config();

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING
  2019-02-22 16:58 ` [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
@ 2019-02-26 10:12   ` Martin Wilck
  2019-02-26 20:37     ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2019-02-26 10:12 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> When pathinfo() sets pp->state to PATH_PENDING, it can cause problems
> with path checking.  It should act more like check_path(). When
> check_path() sees a new state of PATH_PENDING, it doesn't update the
> path state at all, so a path's old state is normally never
> PATH_PENDING.
> 
> As and example of the problems of setting a path to PATH_PENDING, If
> check_path() sets a path's state to PATH_UP, then a call to
> pathinfo()
> sets the state to PATH_PENDING, and then another call the
> check_path()
> sets the state to PATH_DOWN, multipathd won't fail the path in the
> kernel. 

I can see what you mean, but I'm unsure how this example would come to
pass in practice (AFAICS, pathinfo(DI_CHECKER) is only called during
path discovery / reconfigure(), or when new paths are detected /
added).

> Also, if a path's state is PATH_PENDING, and nr_active is
> recalculated, that path will count as down, even if the state was
> previously PATH_UP.

Ah, this is how you deal with what I remarked on the second hunk of
patch 6/9. This is smart.

>  If a path already has a state of PATH_WILD or
> PATH_UNCHECKED, changing it to PATH_PENDING won't hurt anything, and
> it
> will help anyone who sees it know what's actually happening. But
> otherwise, pathinfo() should leave the previous state alone.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

My gut feeling is that we should distinguish between the case where
PATH_PENDING is set in path_offline() (sysfs state "quiesce",
"blocked", "pending" etc.), and the case where the async checker is
simply not finished yet. The first case does tell us something about
the path (being temporarily inaccessible), whereas the second basically
just says that we're ignorant.

Anyway, that's future work, so:

Reviewed-by: Martin Wilck <mwilck@suse.com>



> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 729bcb9..d3585f9 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1946,8 +1946,11 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>  
>  	if (mask & DI_CHECKER) {
>  		if (path_state == PATH_UP) {
> -			pp->chkrstate = pp->state = get_state(pp, conf,
> 0,
> -							      path_stat
> e);
> +			int newstate = get_state(pp, conf, 0,
> path_state);
> +			if (newstate != PATH_PENDING ||
> +			    pp->state == PATH_UNCHECKED ||
> +			    pp->state == PATH_WILD)
> +				pp->chkrstate = pp->state = newstate;
>  			if (pp->state == PATH_TIMEOUT)
>  				pp->state = PATH_DOWN;
>  			if (pp->state == PATH_UP && !pp->size) {

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/9] multipathd: Fix miscounting active paths
  2019-02-26  9:15   ` Martin Wilck
@ 2019-02-26 10:16     ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2019-02-26 10:16 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Tue, 2019-02-26 at 10:15 +0100, Martin Wilck wrote:
> On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > When multipathd gets a change uevent, it calls pathinfo with
> > DI_NOIO.
> > This sets the path state to the return value of path_offline(). If
> > a
> > path is in the PATH_DOWN state but path_offline() returns PATH_UP,
> > when
> > that path gets a change event, its state will get moved to PATH_UP
> > without either reinstating the path, or reloading the map.  The
> > next
> > call to check_path() will move the path back to PATH_DOWN. Since
> > check_path() simply increments and decrements nr_active instead of
> > calculating it based on the actual number of active paths,
> > nr_active
> > will get decremented a second time for this failed path,
> > potentially
> > putting the multipath device into recovery mode.
> > 
> > This commit does two things to avoid this situation. It makes the
> > DI_NOIO flag only set pp->state in pathinfo() if DI_CHECKER is also
> > set.
> > This isn't set in uev_update_path() to avoid changing the path
> > state
> > in
> > this case.  Also, to guard against pp->state getting changed in
> > some
> > other code path without properly updating the map state,
> > check_path()
> > now calls set_no_path_retry, which recalculates nr_active based on
> > the
> > actual number of active paths, and makes sure that the
> > queue_if_no_path
> > value in the features line is correct.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 11 ++++++-----
> >  multipath/main.c         |  2 +-
> >  multipathd/main.c        |  4 +++-
> >  3 files changed, 10 insertions(+), 7 deletions(-)
> 
> Thanks a lot for catching this! I was just hunting down a similar
> problem. You may have saved me lots of hair-pulling.
> 
> While I'm excited abouot the first hunk, I'm a bit unsure about the
> second one. I like the general idea (I'd be happy to do away with the
> "blind" incrementing and decrementing of nr_active), but I see a
> problem with PATH_PENDING paths, along the lines of thought layed out
> in commit adf551f "setup_map: wait for pending path checkers to
> finish". By counting only PATH_UP and PATH_GHOST in check_path(), we
> are very likely to falsely regard some pending paths as "inactive"
> just
> because their checker hasn't completed within a millisecond.
> Consequently, we may falsely set a map to queuing (or worse, failing)
> mode just because a few path checkers are still pending.
> 
> So for the time being, I'd rather apply the first hunk only. If we
> still see multipathd's internal nr_active deviating from the real
> situation, we need to put in some debugging code to find out where it
> diverges.
> 
> In the long run, IMO we should separate the "queueing mode" logic
> (which is map-related) from check_path() (which is path-related). We
> should check all paths of a given map, and when we're done, decide
> upon
> the queuing mode. PATH_PENDING will still need some extra thought.

After having reviewed patch 8/9 of this series ("libmutipath: continue
to use old state on PATH_PENDING"), I see this differently now. The
last paragraph of my first assessment is still worth consideration, but
only as future work.

Reviewed-by: Martin Wilck <mwilck@suse.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 9/9] multipathd: use update_path_groups instead of reload_map
  2019-02-22 16:58 ` [PATCH 9/9] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
@ 2019-02-26 10:47   ` Martin Wilck
  2019-02-26 22:32     ` Benjamin Marzinski
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2019-02-26 10:47 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> reload_map() doesn't do the work to sync the state after reloading
> the
> map.  Instead of calling it directly, cli_reload() and
> uev_update_path()
> should call update_path_groups(), which calls reload_map() with all
> the
> necessary syncing.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/cli_handlers.c |  2 +-
>  multipathd/main.c         | 13 ++++++++-----
>  multipathd/main.h         |  2 ++
>  3 files changed, 11 insertions(+), 6 deletions(-)

I can see that this makes some sense for the cli_reload() path (see
below). But for uev_update_path(), I'm not sure. This is the code path
where a paths's "ro" attribute changes. With this change, you'll call
update_multipath_strings() for every uevent that arrives, causing a lot
of extra work on every uevent. This may hurt, in particular if lots of
uevents arrive at the same time. I fail to see what you gain by doing
the extra work. If path states change, either dm events or uevents for
the other paths in the map will likely arrive soon and cause the other
path's states to be fixed, or check_path() will detect the state
changes and fix the DM states eventually. No?

For cli_reload() it'd actually be the question if we should re-
calculate priorities and path groups in multipathd before calling
update_path_groups() / reload_map().

Regards,
Martin



> 
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index f95813e..60e17d6 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -877,7 +877,7 @@ cli_reload(void *v, char **reply, int *len, void
> *data)
>  		return 1;
>  	}
>  
> -	return reload_map(vecs, mpp, 0, 1);
> +	return update_path_groups(mpp, vecs, 0);
>  }
>  
>  int resize_map(struct multipath *mpp, unsigned long long size,
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 81ad6c0..27ff186 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1272,10 +1272,13 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  			else {
>  				if (ro == 1)
>  					pp->mpp->force_readonly = 1;
> -				retval = reload_map(vecs, mpp, 0, 1);
> -				pp->mpp->force_readonly = 0;
> -				condlog(2, "%s: map %s reloaded (retval
> %d)",
> -					uev->kernel, mpp->alias,
> retval);
> +				retval = update_path_groups(mpp, vecs,
> 0);
> +				if (retval == 2)
> +					condlog(2, "%s: map removed
> during reload", pp->dev);
> +				else {
> +					pp->mpp->force_readonly = 0;
> +					condlog(2, "%s: map %s reloaded
> (retval %d)", uev->kernel, mpp->alias, retval);
> +				}
>  			}
>  		}
>  	}
> @@ -1831,7 +1834,7 @@ int update_path_groups(struct multipath *mpp,
> struct vectors *vecs, int refresh)
>  
>  	dm_lib_release();
>  	if (setup_multipath(vecs, mpp) != 0)
> -		return 1;
> +		return 2;
>  	sync_map_state(mpp);
>  
>  	return 0;
> diff --git a/multipathd/main.h b/multipathd/main.h
> index 8fd426b..e5c1398 100644
> --- a/multipathd/main.h
> +++ b/multipathd/main.h
> @@ -43,5 +43,7 @@ int __setup_multipath (struct vectors * vecs,
> struct multipath * mpp,
>  		       int reset);
>  #define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
>  int update_multipath (struct vectors *vecs, char *mapname, int
> reset);
> +int update_path_groups(struct multipath *mpp, struct vectors *vecs,
> +		       int refresh);
>  
>  #endif /* MAIN_H */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/9] Misc Multipath patches
  2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
                   ` (8 preceding siblings ...)
  2019-02-22 16:58 ` [PATCH 9/9] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
@ 2019-02-26 10:50 ` Martin Wilck
  2019-02-26 15:10   ` Martin Wilck
  9 siblings, 1 reply; 22+ messages in thread
From: Martin Wilck @ 2019-02-26 10:50 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> The series is a mix of resends an new patches.
> 
> Patches 0001-0005 are simply resends of patches I've submitted
> earlier,
> with no changes other that adding Reviewed-by's where appropriate.
> 
> Patches 0006-0009 are the result of running into some bugs during
> firmare updates on an array. 
> 
> Benjamin Marzinski (9):
>   libmultipath: disable user_friendly_names for NetApp
>   libmultipath: handle existing paths in marginal_path enqueue
>   multipathd: cleanup marginal paths checking timers
>   libmultipath: fix marginal paths queueing errors
>   libmultipath: fix marginal_paths nr_active check
>   multipathd: Fix miscounting active paths
>   multipathd: ignore failed wwid recheck
>   libmutipath: continue to use old state on PATH_PENDING
>   multipathd: use update_path_groups instead of reload_map
> 
>  libmultipath/discovery.c   | 18 ++++++----
>  libmultipath/hwtable.c     |  1 +
>  libmultipath/io_err_stat.c | 71 +++++++++++++++---------------------
> --
>  libmultipath/io_err_stat.h |  2 +-
>  multipath/main.c           |  2 +-
>  multipathd/cli_handlers.c  |  2 +-
>  multipathd/main.c          | 27 +++++++++------
>  multipathd/main.h          |  2 ++
>  8 files changed, 62 insertions(+), 63 deletions(-)
> 

In case I've caused confusion:

ACK on the series except 9/9 for which I'd appreciate some extra
information.

Cheers,
Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/9] Misc Multipath patches
  2019-02-26 10:50 ` [PATCH 0/9] Misc Multipath patches Martin Wilck
@ 2019-02-26 15:10   ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2019-02-26 15:10 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Tue, 2019-02-26 at 11:50 +0100, Martin Wilck wrote:
> 
> In case I've caused confusion:
> 
> ACK on the series except 9/9 for which I'd appreciate some extra
> information.

Argh. I meant to say: all except 7/9 and 9/9. Details in my related
emails.

Next time I better not try to avoid confusion :-/

Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 7/9] multipathd: ignore failed wwid recheck
  2019-02-26  9:42   ` Martin Wilck
@ 2019-02-26 20:29     ` Benjamin Marzinski
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-26 20:29 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Tue, Feb 26, 2019 at 10:42:49AM +0100, Martin Wilck wrote:
> On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > If disable_changed_wwids is set, when multipathd gets a change event
> > on
> > a path, it verifies that the wwid hasn't changed in
> > uev_update_path().
> > If get_uid() failed, uev_update_path treated this as a wwid change to
> > 0.
> > This could cause paths to suddenly be dropped due to an issue with
> > getting the wwid.  Even if get_uid() failed because the path was
> > down,
> > it no change uevent happend when it later became active, multipathd
> > would continue to ignore the path.
> > 
> > Instead, multipathd should neither set nor clear wwid_changed if
> > get_uid() returned failure.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/main.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> Hm. Failure of get_uid() is a rather severe problem. With this change,
> if get_uid() fails, pp->wwid won't be reset to the original value, and
> dm_fail_path() won't be called. So the path will continue to be
> considered "good", but it will have a zero WWID from this point on. Or
> am I overlooking someting?

Nope. This patch is buggy. Well, actually, check_path() will call
update_multipath_strings() which will eventually call disassemble_map(),
which will reset the path's WWID to match the multipath device wwid
(since it's empty), which is why I didn't notice the issue right away.

This could easily be solved by simply resetting the WWID when get_uid
fails.  But there are bigger issues with the patch. Sometimes get_uid()
doesn't return failure, it just returns 0 and an empty wwid. This still
allows the issue I am seeing to occur.

The problem I'm seeing is that sometimes (like when firmware gets
updated on a controller) multipath is temporarily unable to get the
wwid. If a change event happens during this period, get_uid() will set
the wwid to "", possibly without returning an error.  After that,
multipathd will treat the path as if it was down, until a change event
occurs and multipath can successfully get the wwid.
 
> Maybe we should rather change the treatment of this case in
> check_path(). In uev_update_path(), we could leave the WWID at 0 and
> fail the path. Then in check_path(), when the path is up to be checked
> next time, we'd detect the strlen(pp->wwid) == 0 case (indicating
> previously failed get_uid()) and try to call get_uid() again to see if
> this was just a temporary WWID lookup failure or if the WWID actually
> changed (IOW: pp->wwid differs from pp->mpp->wwid). Would this make
> sense?

Sure. I'll try something along those lines (although I'll probably keep
the old wwid, and just set a flag, so the I don't have to make sure that
all of the code that deals with paths with blank wwids does the correct
thing in this case).

-Ben
 
> Martin
> 
> 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 678ecf8..81ad6c0 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1234,9 +1234,9 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  			goto out;
> >  
> >  		strcpy(wwid, pp->wwid);
> > -		get_uid(pp, pp->state, uev->udev);
> > +		rc = get_uid(pp, pp->state, uev->udev);
> >  
> > -		if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> > +		if (rc == 0 && strncmp(wwid, pp->wwid, WWID_SIZE) != 0)
> > {
> >  			condlog(0, "%s: path wwid changed from '%s' to
> > '%s'. %s",
> >  				uev->kernel, wwid, pp->wwid,
> >  				(disable_changed_wwids ? "disallowing"
> > :
> > @@ -1252,7 +1252,8 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  				goto out;
> >  			}
> >  		} else {
> > -			pp->wwid_changed = 0;
> > +			if (rc == 0)
> > +				pp->wwid_changed = 0;
> >  			udev_device_unref(pp->udev);
> >  			pp->udev = udev_device_ref(uev->udev);
> >  			conf = get_multipath_config();
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING
  2019-02-26 10:12   ` Martin Wilck
@ 2019-02-26 20:37     ` Benjamin Marzinski
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-26 20:37 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Tue, Feb 26, 2019 at 11:12:47AM +0100, Martin Wilck wrote:
> On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > When pathinfo() sets pp->state to PATH_PENDING, it can cause problems
> > with path checking.  It should act more like check_path(). When
> > check_path() sees a new state of PATH_PENDING, it doesn't update the
> > path state at all, so a path's old state is normally never
> > PATH_PENDING.
> > 
> > As and example of the problems of setting a path to PATH_PENDING, If
> > check_path() sets a path's state to PATH_UP, then a call to
> > pathinfo()
> > sets the state to PATH_PENDING, and then another call the
> > check_path()
> > sets the state to PATH_DOWN, multipathd won't fail the path in the
> > kernel. 
> 
> I can see what you mean, but I'm unsure how this example would come to
> pass in practice (AFAICS, pathinfo(DI_CHECKER) is only called during
> path discovery / reconfigure(), or when new paths are detected /
> added).
> 

This can happen when you add or remove paths, if something goes wrong
and you don't actually do the reload. For instance, if you call
ev_add_path(), it calls adopt_paths(). If adopt_paths() fails, some of
the paths may have had their state updated by a call to pathinfo().

> > Also, if a path's state is PATH_PENDING, and nr_active is
> > recalculated, that path will count as down, even if the state was
> > previously PATH_UP.
> 
> Ah, this is how you deal with what I remarked on the second hunk of
> patch 6/9. This is smart.
> 
> >  If a path already has a state of PATH_WILD or
> > PATH_UNCHECKED, changing it to PATH_PENDING won't hurt anything, and
> > it
> > will help anyone who sees it know what's actually happening. But
> > otherwise, pathinfo() should leave the previous state alone.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> My gut feeling is that we should distinguish between the case where
> PATH_PENDING is set in path_offline() (sysfs state "quiesce",
> "blocked", "pending" etc.), and the case where the async checker is
> simply not finished yet. The first case does tell us something about
> the path (being temporarily inaccessible), whereas the second basically
> just says that we're ignorant.
> 
> Anyway, that's future work, so:
> 
> Reviewed-by: Martin Wilck <mwilck@suse.com>
> 
> 
> 
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 729bcb9..d3585f9 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1946,8 +1946,11 @@ int pathinfo(struct path *pp, struct config
> > *conf, int mask)
> >  
> >  	if (mask & DI_CHECKER) {
> >  		if (path_state == PATH_UP) {
> > -			pp->chkrstate = pp->state = get_state(pp, conf,
> > 0,
> > -							      path_stat
> > e);
> > +			int newstate = get_state(pp, conf, 0,
> > path_state);
> > +			if (newstate != PATH_PENDING ||
> > +			    pp->state == PATH_UNCHECKED ||
> > +			    pp->state == PATH_WILD)
> > +				pp->chkrstate = pp->state = newstate;
> >  			if (pp->state == PATH_TIMEOUT)
> >  				pp->state = PATH_DOWN;
> >  			if (pp->state == PATH_UP && !pp->size) {
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 9/9] multipathd: use update_path_groups instead of reload_map
  2019-02-26 10:47   ` Martin Wilck
@ 2019-02-26 22:32     ` Benjamin Marzinski
  2019-02-27 10:22       ` Martin Wilck
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Marzinski @ 2019-02-26 22:32 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Tue, Feb 26, 2019 at 11:47:29AM +0100, Martin Wilck wrote:
> On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > reload_map() doesn't do the work to sync the state after reloading
> > the
> > map.  Instead of calling it directly, cli_reload() and
> > uev_update_path()
> > should call update_path_groups(), which calls reload_map() with all
> > the
> > necessary syncing.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/cli_handlers.c |  2 +-
> >  multipathd/main.c         | 13 ++++++++-----
> >  multipathd/main.h         |  2 ++
> >  3 files changed, 11 insertions(+), 6 deletions(-)
> 
> I can see that this makes some sense for the cli_reload() path (see
> below). But for uev_update_path(), I'm not sure. This is the code path
> where a paths's "ro" attribute changes. With this change, you'll call
> update_multipath_strings() for every uevent that arrives, causing a lot
> of extra work on every uevent. This may hurt, in particular if lots of
> uevents arrive at the same time. I fail to see what you gain by doing
> the extra work. If path states change, either dm events or uevents for
> the other paths in the map will likely arrive soon and cause the other
> path's states to be fixed, or check_path() will detect the state
> changes and fix the DM states eventually. No?

Fair enough. The one issue is that whenever we reload a map, all path
states in the kernel are reset to active. We need to call
sync_map_state() to reset them to the correct state.  Obviously, an
dm event will come in soonish, but if there is a backlog of stuff for
multipathd to do, those down paths might be reactivated for a while
before we get that. And if the kernel uses them, it will fail them and
generate even more events for multipathd to deal with.

Also, we wouldn't call update_path_groups() in uev_update_path() very
often. Every time that we do, you will see this message

condlog(2, "%s: update path write_protect to '%d' (uevent)",
uev->kernel, ro);

That certainly doesn't happen on every change event for a path. right?

 
> For cli_reload() it'd actually be the question if we should re-
> calculate priorities and path groups in multipathd before calling
> update_path_groups() / reload_map().

Sure.  This isn't a horribly important patch. It isn't fixing any issue
that I've seen. I just noticed that these are the only code paths I
could see were we reload the map, but don't call setup_multipath() and
sync_map_state(), to make sure all the states are in sync.

> Regards,
> Martin
> 
> 
> 
> > 
> > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > index f95813e..60e17d6 100644
> > --- a/multipathd/cli_handlers.c
> > +++ b/multipathd/cli_handlers.c
> > @@ -877,7 +877,7 @@ cli_reload(void *v, char **reply, int *len, void
> > *data)
> >  		return 1;
> >  	}
> >  
> > -	return reload_map(vecs, mpp, 0, 1);
> > +	return update_path_groups(mpp, vecs, 0);
> >  }
> >  
> >  int resize_map(struct multipath *mpp, unsigned long long size,
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 81ad6c0..27ff186 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1272,10 +1272,13 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  			else {
> >  				if (ro == 1)
> >  					pp->mpp->force_readonly = 1;
> > -				retval = reload_map(vecs, mpp, 0, 1);
> > -				pp->mpp->force_readonly = 0;
> > -				condlog(2, "%s: map %s reloaded (retval
> > %d)",
> > -					uev->kernel, mpp->alias,
> > retval);
> > +				retval = update_path_groups(mpp, vecs,
> > 0);
> > +				if (retval == 2)
> > +					condlog(2, "%s: map removed
> > during reload", pp->dev);
> > +				else {
> > +					pp->mpp->force_readonly = 0;
> > +					condlog(2, "%s: map %s reloaded
> > (retval %d)", uev->kernel, mpp->alias, retval);
> > +				}
> >  			}
> >  		}
> >  	}
> > @@ -1831,7 +1834,7 @@ int update_path_groups(struct multipath *mpp,
> > struct vectors *vecs, int refresh)
> >  
> >  	dm_lib_release();
> >  	if (setup_multipath(vecs, mpp) != 0)
> > -		return 1;
> > +		return 2;
> >  	sync_map_state(mpp);
> >  
> >  	return 0;
> > diff --git a/multipathd/main.h b/multipathd/main.h
> > index 8fd426b..e5c1398 100644
> > --- a/multipathd/main.h
> > +++ b/multipathd/main.h
> > @@ -43,5 +43,7 @@ int __setup_multipath (struct vectors * vecs,
> > struct multipath * mpp,
> >  		       int reset);
> >  #define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
> >  int update_multipath (struct vectors *vecs, char *mapname, int
> > reset);
> > +int update_path_groups(struct multipath *mpp, struct vectors *vecs,
> > +		       int refresh);
> >  
> >  #endif /* MAIN_H */
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 9/9] multipathd: use update_path_groups instead of reload_map
  2019-02-26 22:32     ` Benjamin Marzinski
@ 2019-02-27 10:22       ` Martin Wilck
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Wilck @ 2019-02-27 10:22 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: development, device-mapper

On Tue, 2019-02-26 at 16:32 -0600, Benjamin Marzinski wrote:
> On Tue, Feb 26, 2019 at 11:47:29AM +0100, Martin Wilck wrote:
> > On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > > reload_map() doesn't do the work to sync the state after
> > > reloading
> > > the
> > > map.  Instead of calling it directly, cli_reload() and
> > > uev_update_path()
> > > should call update_path_groups(), which calls reload_map() with
> > > all
> > > the
> > > necessary syncing.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  multipathd/cli_handlers.c |  2 +-
> > >  multipathd/main.c         | 13 ++++++++-----
> > >  multipathd/main.h         |  2 ++
> > >  3 files changed, 11 insertions(+), 6 deletions(-)
> > 
> > I can see that this makes some sense for the cli_reload() path (see
> > below). But for uev_update_path(), I'm not sure. This is the code
> > path
> > where a paths's "ro" attribute changes. With this change, you'll
> > call
> > update_multipath_strings() for every uevent that arrives, causing a
> > lot
> > of extra work on every uevent. This may hurt, in particular if lots
> > of
> > uevents arrive at the same time. I fail to see what you gain by
> > doing
> > the extra work. If path states change, either dm events or uevents
> > for
> > the other paths in the map will likely arrive soon and cause the
> > other
> > path's states to be fixed, or check_path() will detect the state
> > changes and fix the DM states eventually. No?
> 
> Fair enough. The one issue is that whenever we reload a map, all path
> states in the kernel are reset to active. We need to call
> sync_map_state() to reset them to the correct state.

Right. That was the point I missed, then. You convinced me. So:

Reviewed-by: Martin Wilck <mwilck@suse.com>

I assume you'll resend the series because of 7/9.

>   Obviously, an
> dm event will come in soonish, but if there is a backlog of stuff for
> multipathd to do, those down paths might be reactivated for a while
> before we get that. And if the kernel uses them, it will fail them
> and
> generate even more events for multipathd to deal with.
> 
> Also, we wouldn't call update_path_groups() in uev_update_path() very
> often. Every time that we do, you will see this message
> 
> condlog(2, "%s: update path write_protect to '%d' (uevent)",
> uev->kernel, ro);
> 
> That certainly doesn't happen on every change event for a path.
> right?

Sure, this is only the "DISK_RO=1" path which is rarely executed.

I have a general issue with the fact that the handling of path uevents
may cause lots of redundant actions in multipathd, but it's unrelated
to your patch.

> > For cli_reload() it'd actually be the question if we should re-
> > calculate priorities and path groups in multipathd before calling
> > update_path_groups() / reload_map().
> 

Correcting myself: Reviewing code history, cli_reload() has never done
this ("refresh" parameter to reload_map() was always 0), so changing
that behavior would require a broader discussion.

> Sure.  This isn't a horribly important patch. It isn't fixing any
> issue
> that I've seen. I just noticed that these are the only code paths I
> could see were we reload the map, but don't call setup_multipath()
> and
> sync_map_state(), to make sure all the states are in sync.

Here's a nitpick: if we replace all calls of reload_map() by
update_path_groups(), we might as well pull the functionality of
update_path_groups() into reload_map() altogether. I'd prefer that,
because the function name reload_map() is more descriptive of what the
code actually does.

And while we're at it: the "refresh" parameter of reload_map() (telling
wherer path priorities should be updated) is only set when called via
check_path() / update_path_groups(), in the (!new_path_up) case
(because if new_path_up is set, priorities have already been updated).
Code readability would be improved by pulling the prio update code into
check_path(), and removing the refresh parameter from reload_map().

But both of these nits are future work. No need to add this into the
current patch set.

Best,
Martin

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-02-27 10:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 16:58 [PATCH 0/9] Misc Multipath patches Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 1/9] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
2019-02-26  8:43   ` Martin Wilck
2019-02-22 16:58 ` [PATCH 2/9] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 3/9] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 4/9] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 5/9] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 6/9] multipathd: Fix miscounting active paths Benjamin Marzinski
2019-02-26  9:15   ` Martin Wilck
2019-02-26 10:16     ` Martin Wilck
2019-02-22 16:58 ` [PATCH 7/9] multipathd: ignore failed wwid recheck Benjamin Marzinski
2019-02-26  9:42   ` Martin Wilck
2019-02-26 20:29     ` Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 8/9] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
2019-02-26 10:12   ` Martin Wilck
2019-02-26 20:37     ` Benjamin Marzinski
2019-02-22 16:58 ` [PATCH 9/9] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
2019-02-26 10:47   ` Martin Wilck
2019-02-26 22:32     ` Benjamin Marzinski
2019-02-27 10:22       ` Martin Wilck
2019-02-26 10:50 ` [PATCH 0/9] Misc Multipath patches Martin Wilck
2019-02-26 15:10   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.