All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/12] Misc Multipath patches
@ 2019-03-08 23:11 Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
                   ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The series is a mix of resends an new patches.

Patches 0001-0005 are simply resends of patches I've submitted earlier,
with no changes other that adding Reviewed-by's where appropriate.

Patches 0006-0009 are the result of running into some bugs during
firmare updates on an array.

Changes in v2:

0007:	simply fixes the issue with multipathd not resetting the wwid
	when get_uid() fails, and makes scsi_uid_fallback return the
	correct value, when it is still waiting for the retriggers
	to max out. The Changes that Martin suggested are dealt with
	in patch 0012

0010:	This patch adds some missing options to the multipath.conf(5)
	man page, including "user_friendly_names" like Martin metioned
	in his earlier review of patch 0001.

0011:	This patch adds a fallback method for nvme devices, to grab the
	wwid directly from sysfs.

0012:	I kept this patch seperate from 0007, because I'm not sure that
	it is the right thing to do. First, it's not very hard to have
	get_uid() fail. For a scsi device, all that needs to happen is
	for a change event to occur while the path is down and cannot be
	opened. scsi_id will fail, and ID_SERIAL will not be set. This
	seems far more likely than a device changing wwids because it
	switched LUNs.

	If we fail the device for getting a blank wwid, we then have to
	get the new wwid before we can decide what needs to be done to
	the path.  There are two options. One is to trigger more
	uevents, so that multipathd gets a uevent with the updated
	uid_attribute. This assumes that the original problem wasn't
	udev timing out because it was overloaded. If that was the
	problem, adding more uevents just makes things work. Even
	still, this means that once the new wwid is available, we have
	to run check_path, then wait for the uevent, and then run
	check_path again before we can restore the path. For these
	reasons, I didn't go down this route.

	The other way to get the new wwid is to run the fallback
	methods. This doesn't require multipathd to get a new uevent
	with the updated wwid, but it does do something that worries
	me.  When we run the fallback methods, we clear uid_attribute
	for the path. We do this because, presumably, we can't be
	certain that the fallback method and the uid_attribute method
	will return the same wwid. If this is really a possiblity
	then the fallback methods aren't any help at all. I don't
	know of any cases where this is true, which is why I'm
	submitting the patch. But I still not sure that its not
	better simply to assume that if get_uid() fails, that it
	did so for some mundane reason, and simply keep the old
	wwid.

Benjamin Marzinski (12):
  libmultipath: disable user_friendly_names for NetApp
  libmultipath: handle existing paths in marginal_path enqueue
  multipathd: cleanup marginal paths checking timers
  libmultipath: fix marginal paths queueing errors
  libmultipath: fix marginal_paths nr_active check
  multipathd: Fix miscounting active paths
  multipathd: ignore failed wwid recheck
  libmutipath: continue to use old state on PATH_PENDING
  multipathd: use update_path_groups instead of reload_map
  multipath.conf: add missing options to man page
  libmultipath: add get_uid fallback code for NVMe devices
  multipathd: change failed get_uid handling

 libmultipath/discovery.c   | 73 +++++++++++++++++++++++++-------------
 libmultipath/hwtable.c     |  1 +
 libmultipath/io_err_stat.c | 71 +++++++++++++++---------------------
 libmultipath/io_err_stat.h |  2 +-
 libmultipath/structs.h     |  6 ++++
 multipath/main.c           |  2 +-
 multipath/multipath.conf.5 |  8 +++++
 multipathd/cli_handlers.c  |  2 +-
 multipathd/main.c          | 50 ++++++++++++++++++--------
 multipathd/main.h          |  2 ++
 10 files changed, 133 insertions(+), 84 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-17 15:04   ` Xose Vazquez Perez
  2019-03-08 23:11 ` [PATCH v2 02/12] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

NetApp has tools that rely on devices using WWID names. To avoid
breaking these, NetApp devices should continue to use WWID names, even
if the default config is set to enable user_friendly_names. If users
want to use user_friendly_names on NetApp devices, the must specifically
override the device config.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/hwtable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index d3a8d9b..8776411 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -719,6 +719,7 @@ static struct hwentry default_hw[] = {
 		.flush_on_last_del = FLUSH_ENABLED,
 		.dev_loss      = MAX_DEV_LOSS_TMO,
 		.prio_name     = PRIO_ONTAP,
+		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
 	},
 	{
 		/*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/12] libmultipath: handle existing paths in marginal_path enqueue
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 03/12] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If the path that enqueue_io_err_stat_by_path() is trying to add
is already on the list, just return success. There's no reason
to fail in this case.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 02b1453..1cb3ffe 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -254,7 +254,6 @@ static void free_io_err_pathvec(struct io_err_stat_pathvec *p)
  * return value
  * 0: enqueue OK
  * 1: fails because of internal error
- * 2: fails because of existing already
  */
 static int enqueue_io_err_stat_by_path(struct path *path)
 {
@@ -264,7 +263,7 @@ static int enqueue_io_err_stat_by_path(struct path *path)
 	p = find_err_path_by_dev(paths->pathvec, path->dev);
 	if (p) {
 		pthread_mutex_unlock(&paths->mutex);
-		return 2;
+		return 0;
 	}
 	pthread_mutex_unlock(&paths->mutex);
 
@@ -418,9 +417,8 @@ int hit_io_err_recheck_time(struct path *pp)
 			io_err_stat_log(3, "%s: enqueue fails, to recover",
 					pp->dev);
 			goto recover;
-		} else if (!r) {
+		} else
 			pp->io_err_pathfail_cnt = PATH_IO_ERR_IN_CHECKING;
-		}
 	}
 
 	return 1;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/12] multipathd: cleanup marginal paths checking timers
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 02/12] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 04/12] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When a path gets recovered in hit_io_err_recheck_time(), it will
continue running in check_path(), so there is no reason to schedule
another path check as soon as possible (since one is currently
happening).

Also, there isn't much point in restarting the io err stat checking when
the path is down, so hit_io_err_recheck_time() should only be run when
the path is up. Downed marginal paths can be treated just like any other
downed path.

Finally, there is no reason to set reset pp->io_err_dis_reinstate_time
when we decide to enqueue a path. Either th enqueue will fail and the
path will get recovered, or it will succeed, and we won't check the
reinstate time again until poll_io_err_stat() marks the path as needing
a requeue.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 8 --------
 multipathd/main.c          | 3 ++-
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 1cb3ffe..416e13a 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -400,13 +400,6 @@ int hit_io_err_recheck_time(struct path *pp)
 		io_err_stat_log(4, "%s: reschedule checking after %d seconds",
 				pp->dev,
 				pp->mpp->marginal_path_err_recheck_gap_time);
-		/*
-		 * to reschedule io error checking again
-		 * if the path is good enough, we claim it is good
-		 * and can be reinsated as soon as possible in the
-		 * check_path routine.
-		 */
-		pp->io_err_dis_reinstate_time = curr_time.tv_sec;
 		r = enqueue_io_err_stat_by_path(pp);
 		/*
 		 * Enqueue fails because of internal error.
@@ -426,7 +419,6 @@ int hit_io_err_recheck_time(struct path *pp)
 recover:
 	pp->io_err_pathfail_cnt = 0;
 	pp->io_err_disable_reinstate = 0;
-	pp->tick = 1;
 	return 0;
 }
 
diff --git a/multipathd/main.c b/multipathd/main.c
index fb520b6..fe6d8ef 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2079,7 +2079,8 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 		return 1;
 	}
 
-	if (pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
+	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
+	    pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
 		pp->state = PATH_SHAKY;
 		/*
 		 * to reschedule as soon as possible,so that this path can
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/12] libmultipath: fix marginal paths queueing errors
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2019-03-08 23:11 ` [PATCH v2 03/12] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 05/12] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The current marginal paths code tries to enqueue paths for io error
checking when multipathd receives a uevent on path failure. This can run
into a couple of problems. First, this uevent could happen before or
after multipathd actually fails the path, so simply checking nr_active
doesn't tell us if this is the last active path. Also, The code to fail
the path in enqueue_io_err_stat_by_path() doesn't ever update the path
state. This can cause the path to get failed twice, temporarily leading
to incorrect nr_active counts. Further, The point when multipathd should
care if this is the last active path is when the path has come up again,
not when it goes down. Lastly, if the path is down, it is often
impossible to open the path device, causing setup_directio_ctx() to
fail, which causes multipathd to skip io error checking and mark the
path as not marginal.

Instead, multipathd should just make sure that if the path is marginal,
it gets failed in the uevent, so as not to race with the checkerloop
thread. Then, when the path comes back up, check_path() can enqueue it,
just like it does for paths that need to get rechecked. To make it
obvious that the state PATH_IO_ERR_IN_POLLING_RECHECK and the function
hit_io_err_recheck_time() now apply to paths waiting to be enqueued for
the first time as well, I've also changed their names to
PATH_IO_ERR_WAITING_TO_CHECK and need_io_err_check(), respectively.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 55 +++++++++++++++++---------------------
 libmultipath/io_err_stat.h |  2 +-
 multipathd/main.c          |  2 +-
 3 files changed, 27 insertions(+), 32 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 416e13a..72aacf3 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -41,7 +41,7 @@
 #define CONCUR_NR_EVENT			32
 
 #define PATH_IO_ERR_IN_CHECKING		-1
-#define PATH_IO_ERR_IN_POLLING_RECHECK	-2
+#define PATH_IO_ERR_WAITING_TO_CHECK	-2
 
 #define io_err_stat_log(prio, fmt, args...) \
 	condlog(prio, "io error statistic: " fmt, ##args)
@@ -283,24 +283,6 @@ static int enqueue_io_err_stat_by_path(struct path *path)
 	vector_set_slot(paths->pathvec, p);
 	pthread_mutex_unlock(&paths->mutex);
 
-	if (!path->io_err_disable_reinstate) {
-		/*
-		 *fail the path in the kernel for the time of the to make
-		 *the test more reliable
-		 */
-		io_err_stat_log(3, "%s: fail dm path %s before checking",
-				path->mpp->alias, path->dev);
-		path->io_err_disable_reinstate = 1;
-		dm_fail_path(path->mpp->alias, path->dev_t);
-		update_queue_mode_del_path(path->mpp);
-
-		/*
-		 * schedule path check as soon as possible to
-		 * update path state to delayed state
-		 */
-		path->tick = 1;
-
-	}
 	io_err_stat_log(2, "%s: enqueue path %s to check",
 			path->mpp->alias, path->dev);
 	return 0;
@@ -317,7 +299,6 @@ free_ioerr_path:
 int io_err_stat_handle_pathfail(struct path *path)
 {
 	struct timespec curr_time;
-	int res;
 
 	if (uatomic_read(&io_err_thread_running) == 0)
 		return 1;
@@ -332,8 +313,6 @@ int io_err_stat_handle_pathfail(struct path *path)
 
 	if (!path->mpp)
 		return 1;
-	if (path->mpp->nr_active <= 1)
-		return 1;
 	if (path->mpp->marginal_path_double_failed_time <= 0 ||
 		path->mpp->marginal_path_err_sample_time <= 0 ||
 		path->mpp->marginal_path_err_recheck_gap_time <= 0 ||
@@ -371,17 +350,33 @@ int io_err_stat_handle_pathfail(struct path *path)
 	}
 	path->io_err_pathfail_cnt++;
 	if (path->io_err_pathfail_cnt >= FLAKY_PATHFAIL_THRESHOLD) {
-		res = enqueue_io_err_stat_by_path(path);
-		if (!res)
-			path->io_err_pathfail_cnt = PATH_IO_ERR_IN_CHECKING;
-		else
-			path->io_err_pathfail_cnt = 0;
+		path->io_err_disable_reinstate = 1;
+		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
+		/* enqueue path as soon as it comes up */
+		path->io_err_dis_reinstate_time = 0;
+		if (path->state != PATH_DOWN) {
+			struct config *conf;
+			int oldstate = path->state;
+			int checkint;
+
+			conf = get_multipath_config();
+			checkint = conf->checkint;
+			put_multipath_config(conf);
+			io_err_stat_log(2, "%s: mark as failed", path->dev);
+			path->mpp->stat_path_failures++;
+			path->state = PATH_DOWN;
+			path->dmstate = PSTATE_FAILED;
+			if (oldstate == PATH_UP || oldstate == PATH_GHOST)
+				update_queue_mode_del_path(path->mpp);
+			if (path->tick > checkint)
+				path->tick = checkint;
+		}
 	}
 
 	return 0;
 }
 
-int hit_io_err_recheck_time(struct path *pp)
+int need_io_err_check(struct path *pp)
 {
 	struct timespec curr_time;
 	int r;
@@ -392,7 +387,7 @@ int hit_io_err_recheck_time(struct path *pp)
 		io_err_stat_log(2, "%s: recover path early", pp->dev);
 		goto recover;
 	}
-	if (pp->io_err_pathfail_cnt != PATH_IO_ERR_IN_POLLING_RECHECK)
+	if (pp->io_err_pathfail_cnt != PATH_IO_ERR_WAITING_TO_CHECK)
 		return 1;
 	if (clock_gettime(CLOCK_MONOTONIC, &curr_time) != 0 ||
 	    (curr_time.tv_sec - pp->io_err_dis_reinstate_time) >
@@ -489,7 +484,7 @@ static int poll_io_err_stat(struct vectors *vecs, struct io_err_stat_path *pp)
 	} else if (path->mpp && path->mpp->nr_active > 1) {
 		io_err_stat_log(3, "%s: keep failing the dm path %s",
 				path->mpp->alias, path->dev);
-		path->io_err_pathfail_cnt = PATH_IO_ERR_IN_POLLING_RECHECK;
+		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
 		path->io_err_disable_reinstate = 1;
 		path->io_err_dis_reinstate_time = currtime.tv_sec;
 		io_err_stat_log(3, "%s: disable reinstating of %s",
diff --git a/libmultipath/io_err_stat.h b/libmultipath/io_err_stat.h
index bbf31b4..53d6d7d 100644
--- a/libmultipath/io_err_stat.h
+++ b/libmultipath/io_err_stat.h
@@ -10,6 +10,6 @@ extern pthread_attr_t io_err_stat_attr;
 int start_io_err_stat_thread(void *data);
 void stop_io_err_stat_thread(void);
 int io_err_stat_handle_pathfail(struct path *path);
-int hit_io_err_recheck_time(struct path *pp);
+int need_io_err_check(struct path *pp);
 
 #endif /* _IO_ERR_STAT_H */
diff --git a/multipathd/main.c b/multipathd/main.c
index fe6d8ef..43830e8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2080,7 +2080,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	}
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
-	    pp->io_err_disable_reinstate && hit_io_err_recheck_time(pp)) {
+	    pp->io_err_disable_reinstate && need_io_err_check(pp)) {
 		pp->state = PATH_SHAKY;
 		/*
 		 * to reschedule as soon as possible,so that this path can
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/12] libmultipath: fix marginal_paths nr_active check
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2019-03-08 23:11 ` [PATCH v2 04/12] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-08 23:11 ` [PATCH v2 06/12] multipathd: Fix miscounting active paths Benjamin Marzinski
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Marginal paths are SHAKY, so they don't count towards the number of
active paths. poll_io_err_stat() shouldn't automatically reinstate a
marginal path if there already is an active path.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/io_err_stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index 72aacf3..554b777 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -481,7 +481,7 @@ static int poll_io_err_stat(struct vectors *vecs, struct io_err_stat_path *pp)
 		 */
 		path->tick = 1;
 
-	} else if (path->mpp && path->mpp->nr_active > 1) {
+	} else if (path->mpp && path->mpp->nr_active > 0) {
 		io_err_stat_log(3, "%s: keep failing the dm path %s",
 				path->mpp->alias, path->dev);
 		path->io_err_pathfail_cnt = PATH_IO_ERR_WAITING_TO_CHECK;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/12] multipathd: Fix miscounting active paths
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2019-03-08 23:11 ` [PATCH v2 05/12] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
@ 2019-03-08 23:11 ` Benjamin Marzinski
  2019-03-08 23:12 ` [PATCH v2 07/12] multipathd: ignore failed wwid recheck Benjamin Marzinski
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:11 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When multipathd gets a change uevent, it calls pathinfo with DI_NOIO.
This sets the path state to the return value of path_offline(). If a
path is in the PATH_DOWN state but path_offline() returns PATH_UP, when
that path gets a change event, its state will get moved to PATH_UP
without either reinstating the path, or reloading the map.  The next
call to check_path() will move the path back to PATH_DOWN. Since
check_path() simply increments and decrements nr_active instead of
calculating it based on the actual number of active paths, nr_active
will get decremented a second time for this failed path, potentially
putting the multipath device into recovery mode.

This commit does two things to avoid this situation. It makes the
DI_NOIO flag only set pp->state in pathinfo() if DI_CHECKER is also set.
This isn't set in uev_update_path() to avoid changing the path state in
this case.  Also, to guard against pp->state getting changed in some
other code path without properly updating the map state, check_path()
now calls set_no_path_retry, which recalculates nr_active based on the
actual number of active paths, and makes sure that the queue_if_no_path
value in the features line is correct.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 11 ++++++-----
 multipath/main.c         |  2 +-
 multipathd/main.c        |  4 +++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 10bd8cd..729bcb9 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1914,11 +1914,12 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 	if (path_state == PATH_REMOVED)
 		goto blank;
 	else if (mask & DI_NOIO) {
-		/*
-		 * Avoid any IO on the device itself.
-		 * Behave like DI_CHECKER in the "path unavailable" case.
-		 */
-		pp->chkrstate = pp->state = path_state;
+		if (mask & DI_CHECKER)
+			/*
+			 * Avoid any IO on the device itself.
+			 * simply use the path_offline() return as its state
+			 */
+			pp->chkrstate = pp->state = path_state;
 		return PATHINFO_OK;
 	}
 
diff --git a/multipath/main.c b/multipath/main.c
index 5abb118..69141db 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -356,7 +356,7 @@ static int check_usable_paths(struct config *conf,
 			pp->udev = get_udev_device(pp->dev_t, DEV_DEVT);
 			if (pp->udev == NULL)
 				continue;
-			if (pathinfo(pp, conf, DI_SYSFS|DI_NOIO) != PATHINFO_OK)
+			if (pathinfo(pp, conf, DI_SYSFS|DI_NOIO|DI_CHECKER) != PATHINFO_OK)
 				continue;
 
 			if (pp->state == PATH_UP &&
diff --git a/multipathd/main.c b/multipathd/main.c
index 43830e8..678ecf8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -392,7 +392,8 @@ static void set_no_path_retry(struct multipath *mpp)
 	default:
 		if (mpp->nr_active > 0) {
 			mpp->retry_tick = 0;
-			dm_queue_if_no_path(mpp->alias, 1);
+			if (!is_queueing)
+				dm_queue_if_no_path(mpp->alias, 1);
 		} else if (is_queueing && mpp->retry_tick == 0)
 			enter_recovery_mode(mpp);
 		break;
@@ -2072,6 +2073,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	/* if update_multipath_strings orphaned the path, quit early */
 	if (!pp->mpp)
 		return 0;
+	set_no_path_retry(pp->mpp);
 
 	if ((newstate == PATH_UP || newstate == PATH_GHOST) &&
 			check_path_reinstate_state(pp)) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/12] multipathd: ignore failed wwid recheck
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2019-03-08 23:11 ` [PATCH v2 06/12] multipathd: Fix miscounting active paths Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-15 11:48   ` Martin Wilck
  2019-03-08 23:12 ` [PATCH v2 08/12] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If disable_changed_wwids is set, when multipathd gets a change event on
a path, it verifies that the wwid hasn't changed in uev_update_path().
If get_uid() failed, uev_update_path treated this as a wwid change to 0.
This could cause paths to suddenly be dropped due to an issue with
getting the wwid.  Even if get_uid() failed because the path was down,
it no change uevent happend when it later became active, multipathd
would continue to ignore the path. Also, scsi_uid_fallback() clears the
failure return if it doesn't attempt to fallback, causing get_uid()
to return success, when it actually failed.

Multipathd should neither set nor clear wwid_changed if get_uid()
returned failure. Also, scsi_uid_fallback() should retain the old return
value if it doesn't attempt to fallback.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 6 +++---
 multipathd/main.c        | 6 ++++--
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 729bcb9..b08cb2d 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1755,9 +1755,9 @@ get_vpd_uid(struct path * pp)
 }
 
 static ssize_t scsi_uid_fallback(struct path *pp, int path_state,
-			     const char **origin)
+			     const char **origin, ssize_t old_len)
 {
-	ssize_t len = 0;
+	ssize_t len = old_len;
 	int retrigger;
 	struct config *conf;
 
@@ -1828,7 +1828,7 @@ get_uid (struct path * pp, int path_state, struct udev_device *udev)
 			origin = "sysfs";
 		}
 		if (len <= 0 && pp->bus == SYSFS_BUS_SCSI)
-			len = scsi_uid_fallback(pp, path_state, &origin);
+			len = scsi_uid_fallback(pp, path_state, &origin, len);
 	}
 	if ( len < 0 ) {
 		condlog(1, "%s: failed to get %s uid: %s",
diff --git a/multipathd/main.c b/multipathd/main.c
index 678ecf8..fd83a6a 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1234,9 +1234,11 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			goto out;
 
 		strcpy(wwid, pp->wwid);
-		get_uid(pp, pp->state, uev->udev);
+		rc = get_uid(pp, pp->state, uev->udev);
 
-		if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
+		if (rc != 0)
+			strcpy(pp->wwid, wwid);
+		else if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
 			condlog(0, "%s: path wwid changed from '%s' to '%s'. %s",
 				uev->kernel, wwid, pp->wwid,
 				(disable_changed_wwids ? "disallowing" :
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/12] libmutipath: continue to use old state on PATH_PENDING
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2019-03-08 23:12 ` [PATCH v2 07/12] multipathd: ignore failed wwid recheck Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-08 23:12 ` [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When pathinfo() sets pp->state to PATH_PENDING, it can cause problems
with path checking.  It should act more like check_path(). When
check_path() sees a new state of PATH_PENDING, it doesn't update the
path state at all, so a path's old state is normally never PATH_PENDING.

As and example of the problems of setting a path to PATH_PENDING, If
check_path() sets a path's state to PATH_UP, then a call to pathinfo()
sets the state to PATH_PENDING, and then another call the check_path()
sets the state to PATH_DOWN, multipathd won't fail the path in the
kernel. Also, if a path's state is PATH_PENDING, and nr_active is
recalculated, that path will count as down, even if the state was
previously PATH_UP. If a path already has a state of PATH_WILD or
PATH_UNCHECKED, changing it to PATH_PENDING won't hurt anything, and it
will help anyone who sees it know what's actually happening. But
otherwise, pathinfo() should leave the previous state alone.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index b08cb2d..28c00e5 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1946,8 +1946,11 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 
 	if (mask & DI_CHECKER) {
 		if (path_state == PATH_UP) {
-			pp->chkrstate = pp->state = get_state(pp, conf, 0,
-							      path_state);
+			int newstate = get_state(pp, conf, 0, path_state);
+			if (newstate != PATH_PENDING ||
+			    pp->state == PATH_UNCHECKED ||
+			    pp->state == PATH_WILD)
+				pp->chkrstate = pp->state = newstate;
 			if (pp->state == PATH_TIMEOUT)
 				pp->state = PATH_DOWN;
 			if (pp->state == PATH_UP && !pp->size) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (7 preceding siblings ...)
  2019-03-08 23:12 ` [PATCH v2 08/12] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-15 11:49   ` Martin Wilck
  2019-03-08 23:12 ` [PATCH v2 10/12] multipath.conf: add missing options to man page Benjamin Marzinski
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

reload_map() doesn't do the work to sync the state after reloading the
map.  Instead of calling it directly, cli_reload() and uev_update_path()
should call update_path_groups(), which calls reload_map() with all the
necessary syncing.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/cli_handlers.c |  2 +-
 multipathd/main.c         | 13 ++++++++-----
 multipathd/main.h         |  2 ++
 3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index f95813e..60e17d6 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -877,7 +877,7 @@ cli_reload(void *v, char **reply, int *len, void *data)
 		return 1;
 	}
 
-	return reload_map(vecs, mpp, 0, 1);
+	return update_path_groups(mpp, vecs, 0);
 }
 
 int resize_map(struct multipath *mpp, unsigned long long size,
diff --git a/multipathd/main.c b/multipathd/main.c
index fd83a6a..7a317d9 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1273,10 +1273,13 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			else {
 				if (ro == 1)
 					pp->mpp->force_readonly = 1;
-				retval = reload_map(vecs, mpp, 0, 1);
-				pp->mpp->force_readonly = 0;
-				condlog(2, "%s: map %s reloaded (retval %d)",
-					uev->kernel, mpp->alias, retval);
+				retval = update_path_groups(mpp, vecs, 0);
+				if (retval == 2)
+					condlog(2, "%s: map removed during reload", pp->dev);
+				else {
+					pp->mpp->force_readonly = 0;
+					condlog(2, "%s: map %s reloaded (retval %d)", uev->kernel, mpp->alias, retval);
+				}
 			}
 		}
 	}
@@ -1832,7 +1835,7 @@ int update_path_groups(struct multipath *mpp, struct vectors *vecs, int refresh)
 
 	dm_lib_release();
 	if (setup_multipath(vecs, mpp) != 0)
-		return 1;
+		return 2;
 	sync_map_state(mpp);
 
 	return 0;
diff --git a/multipathd/main.h b/multipathd/main.h
index 8fd426b..e5c1398 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -43,5 +43,7 @@ int __setup_multipath (struct vectors * vecs, struct multipath * mpp,
 		       int reset);
 #define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
 int update_multipath (struct vectors *vecs, char *mapname, int reset);
+int update_path_groups(struct multipath *mpp, struct vectors *vecs,
+		       int refresh);
 
 #endif /* MAIN_H */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/12] multipath.conf: add missing options to man page
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (8 preceding siblings ...)
  2019-03-08 23:12 ` [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-15 11:49   ` Martin Wilck
  2019-03-08 23:12 ` [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices Benjamin Marzinski
  2019-03-08 23:12 ` [PATCH v2 12/12] multipathd: change failed get_uid handling Benjamin Marzinski
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipath/multipath.conf.5 | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/multipath/multipath.conf.5 b/multipath/multipath.conf.5
index 0fe8461..1e0714f 100644
--- a/multipath/multipath.conf.5
+++ b/multipath/multipath.conf.5
@@ -1494,6 +1494,8 @@ section:
 .TP
 .B flush_on_last_del
 .TP
+.B user_friendly_names
+.TP
 .B retain_attached_hw_handler
 .TP
 .B detect_prio
@@ -1525,6 +1527,8 @@ section:
 .B max_sectors_kb
 .TP
 .B ghost_delay
+.TP
+.B all_tg_pt
 .RE
 .PD
 .LP
@@ -1604,7 +1608,11 @@ the values are taken from the \fIdevices\fR or \fIdefaults\fR sections:
 .TP
 .B skip_kpartx
 .TP
+.B max_sectors_kb
+.TP
 .B ghost_delay
+.TP
+.B all_tg_pt
 .RE
 .PD
 .LP
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (9 preceding siblings ...)
  2019-03-08 23:12 ` [PATCH v2 10/12] multipath.conf: add missing options to man page Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-15 11:49   ` Martin Wilck
  2019-03-08 23:12 ` [PATCH v2 12/12] multipathd: change failed get_uid handling Benjamin Marzinski
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If multipath can't get the uid for NVMe devices from udev, it can get it
directly from sysfs.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 49 ++++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 28c00e5..bece67c 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1754,8 +1754,8 @@ get_vpd_uid(struct path * pp)
 	return get_vpd_sysfs(parent, 0x83, pp->wwid, WWID_SIZE);
 }
 
-static ssize_t scsi_uid_fallback(struct path *pp, int path_state,
-			     const char **origin, ssize_t old_len)
+static ssize_t uid_fallback(struct path *pp, int path_state,
+			    const char **origin, ssize_t old_len)
 {
 	ssize_t len = old_len;
 	int retrigger;
@@ -1764,17 +1764,36 @@ static ssize_t scsi_uid_fallback(struct path *pp, int path_state,
 	conf = get_multipath_config();
 	retrigger = conf->retrigger_tries;
 	put_multipath_config(conf);
-	if (pp->retriggers >= retrigger &&
-	    !strcmp(pp->uid_attribute, DEFAULT_UID_ATTRIBUTE)) {
-		len = get_vpd_uid(pp);
-		*origin = "sysfs";
-		pp->uid_attribute = NULL;
-		if (len < 0 && path_state == PATH_UP) {
-			condlog(1, "%s: failed to get sysfs uid: %s",
-				pp->dev, strerror(-len));
-			len = get_vpd_sgio(pp->fd, 0x83, pp->wwid,
-					   WWID_SIZE);
-			*origin = "sgio";
+	if (pp->retriggers >= retrigger) {
+		if (pp->bus == SYSFS_BUS_SCSI &&
+		    !strcmp(pp->uid_attribute, DEFAULT_UID_ATTRIBUTE)) {
+			len = get_vpd_uid(pp);
+			*origin = "sysfs";
+			pp->uid_attribute = NULL;
+			if (len < 0 && path_state == PATH_UP) {
+				condlog(1, "%s: failed to get sysfs uid: %s",
+					pp->dev, strerror(-len));
+				len = get_vpd_sgio(pp->fd, 0x83, pp->wwid,
+						   WWID_SIZE);
+				*origin = "sgio";
+			}
+		} else if (pp->bus == SYSFS_BUS_NVME) {
+			char value[256];
+			len = sysfs_attr_get_value(pp->udev, "wwid", value,
+						   sizeof(value));
+			if (len <= 0)
+				return -1;
+			len = strlcpy(pp->wwid, value, WWID_SIZE);
+			if (len >= WWID_SIZE) {
+				len = fix_broken_nvme_wwid(pp, value,
+							   WWID_SIZE);
+				if (len > 0)
+					return len;
+				condlog(0, "%s: wwid overflow", pp->dev);
+				len = WWID_SIZE;
+			}
+			*origin = "sysfs";
+			pp->uid_attribute = NULL;
 		}
 	}
 	return len;
@@ -1827,8 +1846,8 @@ get_uid (struct path * pp, int path_state, struct udev_device *udev)
 			len = get_vpd_uid(pp);
 			origin = "sysfs";
 		}
-		if (len <= 0 && pp->bus == SYSFS_BUS_SCSI)
-			len = scsi_uid_fallback(pp, path_state, &origin, len);
+		if (len <= 0)
+			len = uid_fallback(pp, path_state, &origin, len);
 	}
 	if ( len < 0 ) {
 		condlog(1, "%s: failed to get %s uid: %s",
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/12] multipathd: change failed get_uid handling
  2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
                   ` (10 preceding siblings ...)
  2019-03-08 23:12 ` [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices Benjamin Marzinski
@ 2019-03-08 23:12 ` Benjamin Marzinski
  2019-03-15 11:50   ` Martin Wilck
  11 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-08 23:12 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Instead of ignoring failed get_uid() calls, multipathd now fails the
path as it originally did. However, if the result of calling get_uid()
is a blank wwid (which is always the case when it fails), multipathd now
tries to get the wwid in check_path() as well, using the uid_fallback()
methods to attempt to get a valid wwid.

Multipathd can't use the uid_attribute methods, since pp->udev still has
the old uevent information with the uid_attribute of the original wwid.
This means that the uid_attribute methods would always return the
original wwid, even if it had changed.

To make the get_uid() use the fallback methods, pathinfo now sets
pp->retriggers to the retrigger_tries once a WWID has be successfully
obtained, so that it uid_fallback() doesn't need to be called
retrigger_tries times before trying the fallback methods.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c |  4 +++-
 libmultipath/structs.h   |  6 ++++++
 multipathd/main.c        | 36 +++++++++++++++++++++++++-----------
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index bece67c..15568ca 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -2018,8 +2018,10 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 		}
 	}
 
-	if ((mask & DI_ALL) == DI_ALL)
+	if ((mask & DI_ALL) == DI_ALL) {
+		pp->retriggers = conf->retrigger_tries;
 		pp->initialized = INIT_OK;
+	}
 	return PATHINFO_OK;
 
 blank:
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index b794b0d..6e4b871 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -221,6 +221,12 @@ enum all_tg_pt_states {
 	ALL_TG_PT_ON = YNU_YES,
 };
 
+enum wwid_changed_states {
+	WWID_SAME = 0,
+	WWID_ZEROED,
+	WWID_CHANGED,
+};
+
 struct sg_id {
 	int host_no;
 	int channel;
diff --git a/multipathd/main.c b/multipathd/main.c
index 7a317d9..2331c41 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1234,27 +1234,27 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			goto out;
 
 		strcpy(wwid, pp->wwid);
-		rc = get_uid(pp, pp->state, uev->udev);
+		get_uid(pp, pp->state, uev->udev);
 
-		if (rc != 0)
-			strcpy(pp->wwid, wwid);
-		else if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
+		if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
+			int wwid_changed = (strlen(pp->wwid))? WWID_CHANGED :
+							       WWID_ZEROED;
 			condlog(0, "%s: path wwid changed from '%s' to '%s'. %s",
 				uev->kernel, wwid, pp->wwid,
 				(disable_changed_wwids ? "disallowing" :
 				 "continuing"));
 			strcpy(pp->wwid, wwid);
 			if (disable_changed_wwids) {
-				if (!pp->wwid_changed) {
-					pp->wwid_changed = 1;
+				if (pp->wwid_changed == WWID_SAME) {
 					pp->tick = 1;
 					if (pp->mpp)
 						dm_fail_path(pp->mpp->alias, pp->dev_t);
 				}
+				pp->wwid_changed = wwid_changed;
 				goto out;
 			}
 		} else {
-			pp->wwid_changed = 0;
+			pp->wwid_changed = WWID_SAME;
 			udev_device_unref(pp->udev);
 			pp->udev = udev_device_ref(uev->udev);
 			conf = get_multipath_config();
@@ -2017,10 +2017,24 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 	if (newstate == PATH_REMOVED)
 		newstate = PATH_DOWN;
 
-	if (pp->wwid_changed) {
-		condlog(2, "%s: path wwid has changed. Refusing to use",
-			pp->dev);
-		newstate = PATH_DOWN;
+	if (pp->wwid_changed != WWID_SAME) {
+		if (pp->wwid_changed == WWID_ZEROED) {
+			char wwid[WWID_SIZE];
+
+			strcpy(wwid, pp->wwid);
+			get_uid(pp, newstate, NULL);
+			if (strncmp(wwid, pp->wwid, WWID_SIZE) == 0)
+				pp->wwid_changed = WWID_SAME;
+			else {
+				pp->wwid_changed = (strlen(pp->wwid))? WWID_CHANGED : WWID_ZEROED;
+				strcpy(pp->wwid, wwid);
+			}
+		}
+		if (pp->wwid_changed != WWID_SAME) {
+			condlog(2, "%s: path wwid has changed. Refusing to use",
+				pp->dev);
+			newstate = PATH_DOWN;
+		}
 	}
 
 	if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/12] multipathd: ignore failed wwid recheck
  2019-03-08 23:12 ` [PATCH v2 07/12] multipathd: ignore failed wwid recheck Benjamin Marzinski
@ 2019-03-15 11:48   ` Martin Wilck
  2019-03-19 17:13     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2019-03-15 11:48 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> If disable_changed_wwids is set, when multipathd gets a change event
> on
> a path, it verifies that the wwid hasn't changed in
> uev_update_path().
> If get_uid() failed, uev_update_path treated this as a wwid change to
> 0.
> This could cause paths to suddenly be dropped due to an issue with
> getting the wwid.  Even if get_uid() failed because the path was
> down,
> it no change uevent happend when it later became active, multipathd
> would continue to ignore the path. Also, scsi_uid_fallback() clears
> the
> failure return if it doesn't attempt to fallback, causing get_uid()
> to return success, when it actually failed.
> 
> Multipathd should neither set nor clear wwid_changed if get_uid()
> returned failure. Also, scsi_uid_fallback() should retain the old
> return
> value if it doesn't attempt to fallback.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 6 +++---
>  multipathd/main.c        | 6 ++++--
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 729bcb9..b08cb2d 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1755,9 +1755,9 @@ get_vpd_uid(struct path * pp)
>  }
>  
>  static ssize_t scsi_uid_fallback(struct path *pp, int path_state,
> -			     const char **origin)
> +			     const char **origin, ssize_t old_len)
>  {
> -	ssize_t len = 0;
> +	ssize_t len = old_len;
>  	int retrigger;
>  	struct config *conf;

Please don't call this variable "old_len" but "errcode" or the like. If
this is called, "old_len" is always negative (indicating a previous
error) or 0 (indicating previous attempts returned an empty WWID, which
is also likely an error).

Otherwise, ACK. But you revert most of this later in 12/12; I think the
two should be merged (except for the scsi_uid_fallback part, maybe).

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map
  2019-03-08 23:12 ` [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
@ 2019-03-15 11:49   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2019-03-15 11:49 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> reload_map() doesn't do the work to sync the state after reloading
> the
> map.  Instead of calling it directly, cli_reload() and
> uev_update_path()
> should call update_path_groups(), which calls reload_map() with all
> the
> necessary syncing.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/cli_handlers.c |  2 +-
>  multipathd/main.c         | 13 ++++++++-----
>  multipathd/main.h         |  2 ++
>  3 files changed, 11 insertions(+), 6 deletions(-)

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 10/12] multipath.conf: add missing options to man page
  2019-03-08 23:12 ` [PATCH v2 10/12] multipath.conf: add missing options to man page Benjamin Marzinski
@ 2019-03-15 11:49   ` Martin Wilck
  2019-03-19 17:14     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2019-03-15 11:49 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipath/multipath.conf.5 | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 

"getuid_callout" is still missing in the devices section.
(yes, I actually checked every option :-)

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices
  2019-03-08 23:12 ` [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices Benjamin Marzinski
@ 2019-03-15 11:49   ` Martin Wilck
  2019-03-19 17:15     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2019-03-15 11:49 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> If multipath can't get the uid for NVMe devices from udev, it can get
> it
> directly from sysfs.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 49 ++++++++++++++++++++++++++++--------
> ----
>  1 file changed, 34 insertions(+), 15 deletions(-)

Aside the nit over the "old_len" variable name that I remarked on
07/12:

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 12/12] multipathd: change failed get_uid handling
  2019-03-08 23:12 ` [PATCH v2 12/12] multipathd: change failed get_uid handling Benjamin Marzinski
@ 2019-03-15 11:50   ` Martin Wilck
  2019-03-19 17:20     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2019-03-15 11:50 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> Instead of ignoring failed get_uid() calls, multipathd now fails the
> path as it originally did.

This patch reverts much of 07/12; I'd appreciate if you could merge the
two into one, that would make the review easier.

> However, if the result of calling get_uid()
> is a blank wwid (which is always the case when it fails), multipathd
> now
> tries to get the wwid in check_path() as well, using the
> uid_fallback()
> methods to attempt to get a valid wwid.
> 
> Multipathd can't use the uid_attribute methods, since pp->udev still
> has
> the old uevent information with the uid_attribute of the original
> wwid.

Which is actually wrong, IMO. It means that udev's (and the kernel's)
view of the device are now different from multipathd's, and will remain
so unless a new change event arrives. What bad can happen if we update
pp->udev?

Perhaps we should rather update pp->udev always, and set the
wwid_changed flag if necessary. The case wwid_changed==WWID_ZEROED
could be treated like INIT_MISSING_UDEV (it _is_ almost the same case,
actually), by retriggering uevents.

> This means that the uid_attribute methods would always return the
> original wwid, even if it had changed.
> 
> To make the get_uid() use the fallback methods, pathinfo now sets
> pp->retriggers to the retrigger_tries once a WWID has be successfully
> obtained, so that it uid_fallback() doesn't need to be called
> retrigger_tries times before trying the fallback methods.

I don't like this much, see below.


> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c |  4 +++-
>  libmultipath/structs.h   |  6 ++++++
>  multipathd/main.c        | 36 +++++++++++++++++++++++++-----------
>  3 files changed, 34 insertions(+), 12 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index bece67c..15568ca 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -2018,8 +2018,10 @@ int pathinfo(struct path *pp, struct config
> *conf, int mask)
>  		}
>  	}
>  
> -	if ((mask & DI_ALL) == DI_ALL)
> +	if ((mask & DI_ALL) == DI_ALL) {
> +		pp->retriggers = conf->retrigger_tries;
>  		pp->initialized = INIT_OK;
> +	}
>  	return PATHINFO_OK;

This abuse of retrigger_tries looks like a hack to me. You want
uid_fallback() to try and get a uid, even if retrigger_tries isn't
exhausted. So you you should check another condition besides 
(retriggers > retriggers_tries) in uid_callback().

At the very least, this would call for comments explaining the logic
in both pathinfo() and uid_fallback(). I think you'd also have to set 
retriggers = 0 when you set initialized = INIT_MISSING_UDEV
(be it only for code clarity). But I'd prefer not to use the retrigger
counter for this different condition.

> [...]
> @@ -2017,10 +2017,24 @@ check_path (struct vectors * vecs, struct
> path * pp, int ticks)
>  	if (newstate == PATH_REMOVED)
>  		newstate = PATH_DOWN;
>  
> -	if (pp->wwid_changed) {
> -		condlog(2, "%s: path wwid has changed. Refusing to
> use",
> -			pp->dev);
> -		newstate = PATH_DOWN;
> +	if (pp->wwid_changed != WWID_SAME) {
> +		if (pp->wwid_changed == WWID_ZEROED) {
> +			char wwid[WWID_SIZE];
> +
> +			strcpy(wwid, pp->wwid);
> +			get_uid(pp, newstate, NULL);
> +			if (strncmp(wwid, pp->wwid, WWID_SIZE) == 0)
> +				pp->wwid_changed = WWID_SAME;
> +			else {
> +				pp->wwid_changed = (strlen(pp->wwid))?
> WWID_CHANGED : WWID_ZEROED;
> +				strcpy(pp->wwid, wwid);
> +			}
> +		}
> +		if (pp->wwid_changed != WWID_SAME) {
> +			condlog(2, "%s: path wwid has changed. Refusing
> to use",
> +				pp->dev);

Please don't log the same condition at -v2 in every check_path()
iteration. We log the actual change at -v0 already in
uev_update_path().

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp
  2019-03-08 23:11 ` [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
@ 2019-03-17 15:04   ` Xose Vazquez Perez
  2019-03-18  9:45     ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Xose Vazquez Perez @ 2019-03-17 15:04 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development, Martin Wilck

On 3/9/19 12:11 AM, Benjamin Marzinski wrote:

> NetApp has tools that rely on devices using WWID names. To avoid
> breaking these, NetApp devices should continue to use WWID names, even
> if the default config is set to enable user_friendly_names. If users
> want to use user_friendly_names on NetApp devices, the must specifically
> override the device config.
> 
> Reviewed-by: Martin Wilck <mwilck@suse.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/hwtable.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> index d3a8d9b..8776411 100644
> --- a/libmultipath/hwtable.c
> +++ b/libmultipath/hwtable.c
> @@ -719,6 +719,7 @@ static struct hwentry default_hw[] = {
>  		.flush_on_last_del = FLUSH_ENABLED,
>  		.dev_loss      = MAX_DEV_LOSS_TMO,
>  		.prio_name     = PRIO_ONTAP,
                /* WWID names needed by NetApp tool ... */
> +		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
>  	},
>  	{
>  		/*
> 

A minimal explanation should be added for a quick reference.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp
  2019-03-17 15:04   ` Xose Vazquez Perez
@ 2019-03-18  9:45     ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2019-03-18  9:45 UTC (permalink / raw)
  To: Xose Vazquez Perez, Benjamin Marzinski; +Cc: device-mapper development

On Sun, 2019-03-17 at 16:04 +0100, Xose Vazquez Perez wrote:
> On 3/9/19 12:11 AM, Benjamin Marzinski wrote:
> 
> > NetApp has tools that rely on devices using WWID names. To avoid
> > breaking these, NetApp devices should continue to use WWID names,
> > even
> > if the default config is set to enable user_friendly_names. If
> > users
> > want to use user_friendly_names on NetApp devices, the must
> > specifically
> > override the device config.
> > 
> > Reviewed-by: Martin Wilck <mwilck@suse.com>
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/hwtable.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
> > index d3a8d9b..8776411 100644
> > --- a/libmultipath/hwtable.c
> > +++ b/libmultipath/hwtable.c
> > @@ -719,6 +719,7 @@ static struct hwentry default_hw[] = {
> >  		.flush_on_last_del = FLUSH_ENABLED,
> >  		.dev_loss      = MAX_DEV_LOSS_TMO,
> >  		.prio_name     = PRIO_ONTAP,
>                 /* WWID names needed by NetApp tool ... */
> > +		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
> >  	},
> >  	{
> >  		/*
> > 
> 
> A minimal explanation should be added for a quick reference.

IMO that's not necessary. People who want to know can quickly run "git
blame" to figure this out, and get more information than your single
line comment provides. 

That works as long as we don't mess up the code with "reorder by
comment" patches, that is :-)

Martin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/12] multipathd: ignore failed wwid recheck
  2019-03-15 11:48   ` Martin Wilck
@ 2019-03-19 17:13     ` Benjamin Marzinski
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-19 17:13 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Fri, Mar 15, 2019 at 12:48:49PM +0100, Martin Wilck wrote:
> On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> > If disable_changed_wwids is set, when multipathd gets a change event
> > on
> > a path, it verifies that the wwid hasn't changed in
> > uev_update_path().
> > If get_uid() failed, uev_update_path treated this as a wwid change to
> > 0.
> > This could cause paths to suddenly be dropped due to an issue with
> > getting the wwid.  Even if get_uid() failed because the path was
> > down,
> > it no change uevent happend when it later became active, multipathd
> > would continue to ignore the path. Also, scsi_uid_fallback() clears
> > the
> > failure return if it doesn't attempt to fallback, causing get_uid()
> > to return success, when it actually failed.
> > 
> > Multipathd should neither set nor clear wwid_changed if get_uid()
> > returned failure. Also, scsi_uid_fallback() should retain the old
> > return
> > value if it doesn't attempt to fallback.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 6 +++---
> >  multipathd/main.c        | 6 ++++--
> >  2 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 729bcb9..b08cb2d 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1755,9 +1755,9 @@ get_vpd_uid(struct path * pp)
> >  }
> >  
> >  static ssize_t scsi_uid_fallback(struct path *pp, int path_state,
> > -			     const char **origin)
> > +			     const char **origin, ssize_t old_len)
> >  {
> > -	ssize_t len = 0;
> > +	ssize_t len = old_len;
> >  	int retrigger;
> >  	struct config *conf;
> 
> Please don't call this variable "old_len" but "errcode" or the like. If
> this is called, "old_len" is always negative (indicating a previous
> error) or 0 (indicating previous attempts returned an empty WWID, which
> is also likely an error).

Sure.
 
> Otherwise, ACK. But you revert most of this later in 12/12; I think the
> two should be merged (except for the scsi_uid_fallback part, maybe).

See my reply to your "New approach at handling changed WWIDs" patches.
Something like this patch is still my preferred solution, for reasons I
explain there.

-Ben

> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 10/12] multipath.conf: add missing options to man page
  2019-03-15 11:49   ` Martin Wilck
@ 2019-03-19 17:14     ` Benjamin Marzinski
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-19 17:14 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Fri, Mar 15, 2019 at 12:49:26PM +0100, Martin Wilck wrote:
> On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipath/multipath.conf.5 | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> 
> "getuid_callout" is still missing in the devices section.
> (yes, I actually checked every option :-)

Oops. I can add that one too.

-Ben
 
> Regards,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices
  2019-03-15 11:49   ` Martin Wilck
@ 2019-03-19 17:15     ` Benjamin Marzinski
  2019-03-20  7:55       ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-19 17:15 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Fri, Mar 15, 2019 at 12:49:41PM +0100, Martin Wilck wrote:
> On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> > If multipath can't get the uid for NVMe devices from udev, it can get
> > it
> > directly from sysfs.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 49 ++++++++++++++++++++++++++++--------
> > ----
> >  1 file changed, 34 insertions(+), 15 deletions(-)
> 
> Aside the nit over the "old_len" variable name that I remarked on
> 07/12:

Sure, assuming that we come to some agreement that keeps the fallback
code.

-Ben

> Reviewed-by: Martin Wilck <mwilck@suse.com>
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 12/12] multipathd: change failed get_uid handling
  2019-03-15 11:50   ` Martin Wilck
@ 2019-03-19 17:20     ` Benjamin Marzinski
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2019-03-19 17:20 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Fri, Mar 15, 2019 at 12:50:14PM +0100, Martin Wilck wrote:
> On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> > Instead of ignoring failed get_uid() calls, multipathd now fails the
> > path as it originally did.
> 
> This patch reverts much of 07/12; I'd appreciate if you could merge the
> two into one, that would make the review easier.
> 
> > However, if the result of calling get_uid()
> > is a blank wwid (which is always the case when it fails), multipathd
> > now
> > tries to get the wwid in check_path() as well, using the
> > uid_fallback()
> > methods to attempt to get a valid wwid.
> > 
> > Multipathd can't use the uid_attribute methods, since pp->udev still
> > has
> > the old uevent information with the uid_attribute of the original
> > wwid.
> 
> Which is actually wrong, IMO. It means that udev's (and the kernel's)
> view of the device are now different from multipathd's, and will remain
> so unless a new change event arrives. What bad can happen if we update
> pp->udev?

The thing is, this is almost definitely udev getting it wrong. There are
many types of path failure that keep udev from getting the path
information it is supposed to have, and udev timesout can do the same.
If something like that happens, udev will simply no longer have this
information. For instanace, ID_WWN won't be set, so if pathinfo ever
checked if that path should be blacklisted before it updates the udev
device again, it would blacklist the device. I think it's reasonable to
assume that the udev information that we already have is better than no
information. 
 
> Perhaps we should rather update pp->udev always, and set the
> wwid_changed flag if necessary. The case wwid_changed==WWID_ZEROED
> could be treated like INIT_MISSING_UDEV (it _is_ almost the same case,
> actually), by retriggering uevents.

If the issue was caused by a udev timeout because of a uevent storm,
firing off more uevents is counterproductive.

> > This means that the uid_attribute methods would always return the
> > original wwid, even if it had changed.
> > 
> > To make the get_uid() use the fallback methods, pathinfo now sets
> > pp->retriggers to the retrigger_tries once a WWID has be successfully
> > obtained, so that it uid_fallback() doesn't need to be called
> > retrigger_tries times before trying the fallback methods.
> 
> I don't like this much, see below.
> 
> 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c |  4 +++-
> >  libmultipath/structs.h   |  6 ++++++
> >  multipathd/main.c        | 36 +++++++++++++++++++++++++-----------
> >  3 files changed, 34 insertions(+), 12 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index bece67c..15568ca 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -2018,8 +2018,10 @@ int pathinfo(struct path *pp, struct config
> > *conf, int mask)
> >  		}
> >  	}
> >  
> > -	if ((mask & DI_ALL) == DI_ALL)
> > +	if ((mask & DI_ALL) == DI_ALL) {
> > +		pp->retriggers = conf->retrigger_tries;
> >  		pp->initialized = INIT_OK;
> > +	}
> >  	return PATHINFO_OK;
> 
> This abuse of retrigger_tries looks like a hack to me. You want
> uid_fallback() to try and get a uid, even if retrigger_tries isn't
> exhausted. So you you should check another condition besides 
> (retriggers > retriggers_tries) in uid_callback().
> 
> At the very least, this would call for comments explaining the logic
> in both pathinfo() and uid_fallback(). I think you'd also have to set 
> retriggers = 0 when you set initialized = INIT_MISSING_UDEV
> (be it only for code clarity). But I'd prefer not to use the retrigger
> counter for this different condition.

Sure.
 
> > [...]
> > @@ -2017,10 +2017,24 @@ check_path (struct vectors * vecs, struct
> > path * pp, int ticks)
> >  	if (newstate == PATH_REMOVED)
> >  		newstate = PATH_DOWN;
> >  
> > -	if (pp->wwid_changed) {
> > -		condlog(2, "%s: path wwid has changed. Refusing to
> > use",
> > -			pp->dev);
> > -		newstate = PATH_DOWN;
> > +	if (pp->wwid_changed != WWID_SAME) {
> > +		if (pp->wwid_changed == WWID_ZEROED) {
> > +			char wwid[WWID_SIZE];
> > +
> > +			strcpy(wwid, pp->wwid);
> > +			get_uid(pp, newstate, NULL);
> > +			if (strncmp(wwid, pp->wwid, WWID_SIZE) == 0)
> > +				pp->wwid_changed = WWID_SAME;
> > +			else {
> > +				pp->wwid_changed = (strlen(pp->wwid))?
> > WWID_CHANGED : WWID_ZEROED;
> > +				strcpy(pp->wwid, wwid);
> > +			}
> > +		}
> > +		if (pp->wwid_changed != WWID_SAME) {
> > +			condlog(2, "%s: path wwid has changed. Refusing
> > to use",
> > +				pp->dev);
> 
> Please don't log the same condition at -v2 in every check_path()
> iteration. We log the actual change at -v0 already in
> uev_update_path().

Sure.

-Ben

> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices
  2019-03-19 17:15     ` Benjamin Marzinski
@ 2019-03-20  7:55       ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2019-03-20  7:55 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: development, device-mapper

On Tue, 2019-03-19 at 12:15 -0500, Benjamin Marzinski wrote:
> On Fri, Mar 15, 2019 at 12:49:41PM +0100, Martin Wilck wrote:
> > On Fri, 2019-03-08 at 17:12 -0600, Benjamin Marzinski wrote:
> > > If multipath can't get the uid for NVMe devices from udev, it can
> > > get
> > > it
> > > directly from sysfs.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  libmultipath/discovery.c | 49 ++++++++++++++++++++++++++++----
> > > ----
> > > ----
> > >  1 file changed, 34 insertions(+), 15 deletions(-)
> > 
> > Aside the nit over the "old_len" variable name that I remarked on
> > 07/12:
> 
> Sure, assuming that we come to some agreement that keeps the fallback
> code.

One more thing as you're working on this stuff anyway: I find it wrong
that the fallback function itself checks (using retriggers or whatever
other criterion) whether or not it should do something. uid_fallback()
should simply retrieve and return the UID, and the caller should decide
what to do with that, or whether to call uid_fallback() at all. I guess
that would make it necessary to change the API such that uid_fallback
doesn't write to pp->wwid directly, but takes (or allocates) a separate
buffer.

This isn't a problem of your patch, but I think it should be fixed
while we're on it.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2019-03-20  7:55 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 23:11 [PATCH v2 00/12] Misc Multipath patches Benjamin Marzinski
2019-03-08 23:11 ` [PATCH v2 01/12] libmultipath: disable user_friendly_names for NetApp Benjamin Marzinski
2019-03-17 15:04   ` Xose Vazquez Perez
2019-03-18  9:45     ` Martin Wilck
2019-03-08 23:11 ` [PATCH v2 02/12] libmultipath: handle existing paths in marginal_path enqueue Benjamin Marzinski
2019-03-08 23:11 ` [PATCH v2 03/12] multipathd: cleanup marginal paths checking timers Benjamin Marzinski
2019-03-08 23:11 ` [PATCH v2 04/12] libmultipath: fix marginal paths queueing errors Benjamin Marzinski
2019-03-08 23:11 ` [PATCH v2 05/12] libmultipath: fix marginal_paths nr_active check Benjamin Marzinski
2019-03-08 23:11 ` [PATCH v2 06/12] multipathd: Fix miscounting active paths Benjamin Marzinski
2019-03-08 23:12 ` [PATCH v2 07/12] multipathd: ignore failed wwid recheck Benjamin Marzinski
2019-03-15 11:48   ` Martin Wilck
2019-03-19 17:13     ` Benjamin Marzinski
2019-03-08 23:12 ` [PATCH v2 08/12] libmutipath: continue to use old state on PATH_PENDING Benjamin Marzinski
2019-03-08 23:12 ` [PATCH v2 09/12] multipathd: use update_path_groups instead of reload_map Benjamin Marzinski
2019-03-15 11:49   ` Martin Wilck
2019-03-08 23:12 ` [PATCH v2 10/12] multipath.conf: add missing options to man page Benjamin Marzinski
2019-03-15 11:49   ` Martin Wilck
2019-03-19 17:14     ` Benjamin Marzinski
2019-03-08 23:12 ` [PATCH v2 11/12] libmultipath: add get_uid fallback code for NVMe devices Benjamin Marzinski
2019-03-15 11:49   ` Martin Wilck
2019-03-19 17:15     ` Benjamin Marzinski
2019-03-20  7:55       ` Martin Wilck
2019-03-08 23:12 ` [PATCH v2 12/12] multipathd: change failed get_uid handling Benjamin Marzinski
2019-03-15 11:50   ` Martin Wilck
2019-03-19 17:20     ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.