All of lore.kernel.org
 help / color / mirror / Atom feed
* [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency
@ 2021-10-20 19:15 Benjamin Marzinski
  2021-10-20 19:15 ` [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup Benjamin Marzinski
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

So, it turns out that commit 4ef67017 "libmultipath: add
update_pathvec_from_dm()" already does most of the hard work of making
multipath handle the uninitialized paths that exist during boot, after
the switch root, but before the all the coldplug uevents have been
processed. The only problem is that this code leaves the paths in a
partially initialized state, which doesn't completely get corrected until
a reconfigure happens. 

This patchset makes multipath track these partially initailized paths,
and makes sure that they get fully initialized, or removed, as
necessary.

The first patch is a tangentially related bug I found when trying
(unsuccessfully) to recreate multipathd's problem with dropping
uninitialized paths. Multipathd was not removing completely missing
paths from maps that it found while starting up. The solution I chose
was just to make sure that a full reload will happen on these maps, even
if a weak reconfigure was requested. However, this means that multipath
still completely ignores these missing paths. A side effect of this is
that "multipath -l" does not show these paths, even though they exist as
part of the multipath table. The full reloads are necessary, to take
care of issues that update_pathvec_from_dm() can run into, but we might
also want to think about marking these missing paths as INIT_REMOVED,
instead of not adding them at all. This would make "multipath -l" still
show them, until they actually get removed from the table.

Patch 0005 makes sure to fully initialize the paths when their coldplug
event occurs, but if the path is already fully initialized in udev, but
multipathd finds out about it from update_pathvec_from_dm(), multipath
doesn't do anything to finish initializing the path and moving it to
INIT_OK, besides waiting for a uevent or a reconfigure. This is probably
fine, since the only way I can see for a path to be in this state is for
someone to manually remove the path from multipathd monitoring. But if
I'm missing some other way that paths could end up in this state, we
could try proactively finishing the initalization of INIT_PARTIAL paths
that have all their udev information.

I'm also kind of on the fence about patch 0006. There is no reason why
we have to remove INIT_PARTIAL paths if the multipath device goes away.
But if a path is in INIT_PARTIAL for long enough that the multipath
device gets removed, then it's likely not a path we want to be
monitoring, and if a uevent occurs, we'll add it back, anyway. Also,
knowing that INIT_PARTIAL paths are always part of multipath devices
does make the code simpler.

I've tested these patches both by rebooting with necessary and
unnecessary multipath devices in the initramfs and multipathd.service
set to make multipathd start up at various points after the switch root,
and by manually sending remove uevents to unintialize some paths, and
then starting multipathd to test specific combinations of path states.
But more testing is always welcome.

Benjamin Marzinski (7):
  multipathd: remove missing paths on startup
  libmultipath: skip unneeded steps to get path name
  libmultipath: don't use fallback wwid in update_pathvec_from_dm
  libmultipath: always set INIT_REMOVED in set_path_removed
  multipathd: fully initialize paths added by update_pathvec_from_dm
  multipathd: remove INIT_PARTIAL paths that aren't in a multipath
    device
  multipathd: Remove dependency on systemd-udev-settle.service

 libmultipath/configure.c      |  2 ++
 libmultipath/devmapper.c      |  2 ++
 libmultipath/discovery.c      |  7 ++---
 libmultipath/discovery.h      |  2 ++
 libmultipath/structs.h        |  7 +++++
 libmultipath/structs_vec.c    | 40 ++++++++++++++++-------------
 multipathd/cli_handlers.c     |  4 +++
 multipathd/main.c             | 48 ++++++++++++++++++++++++++++++++---
 multipathd/main.h             |  1 +
 multipathd/multipathd.service |  3 +--
 10 files changed, 90 insertions(+), 26 deletions(-)

-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 22:12   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name Benjamin Marzinski
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If a path device was removed from the system while multipathd was not
running, multipathd would not remove the path from the multipath table
on start-up, or on a weak reconfigure. update_pathvec_from_dm() would
return that a reload was necessary, but that information wasn't
propigated back to where it could be used to reload the device.

Multipath devices now remember if they need to be reloaded, and if so,
force_reload is set in select_action().  This means that even when
configure is called with FORCE_RELOAD_WEAK, these devices will still be
reloaded.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/configure.c   | 2 ++
 libmultipath/devmapper.c   | 2 ++
 libmultipath/structs.h     | 1 +
 libmultipath/structs_vec.c | 1 +
 4 files changed, 6 insertions(+)

diff --git a/libmultipath/configure.c b/libmultipath/configure.c
index eb8ec1bd..f1a890af 100644
--- a/libmultipath/configure.c
+++ b/libmultipath/configure.c
@@ -715,6 +715,8 @@ void select_action (struct multipath *mpp, const struct _vector *curmp,
 
 	cmpp = find_mp_by_wwid(curmp, mpp->wwid);
 	cmpp_by_name = find_mp_by_alias(curmp, mpp->alias);
+	if (mpp->need_reload || (cmpp && cmpp->need_reload))
+		force_reload = 1;
 
 	if (!cmpp_by_name) {
 		if (cmpp) {
diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index c05dc201..3e1a7260 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -522,6 +522,8 @@ freeout:
 addout:
 	dm_task_destroy (dmt);
 
+	if (r)
+		mpp->need_reload = false;
 	return r;
 }
 
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 399540e7..d0b266b7 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -355,6 +355,7 @@ struct multipath {
 	int retain_hwhandler;
 	int deferred_remove;
 	bool in_recovery;
+	bool need_reload;
 	int san_path_err_threshold;
 	int san_path_err_forget_rate;
 	int san_path_err_recovery_time;
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 85d97ac1..e52db0c4 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -237,6 +237,7 @@ bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
 		free_pathgroup(pgp, KEEP_PATHS);
 		must_reload = true;
 	}
+	mpp->need_reload = mpp->need_reload || must_reload;
 	return must_reload;
 }
 
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
  2021-10-20 19:15 ` [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 21:20   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm Benjamin Marzinski
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The path already must have a udev device at this point, so it just
needs to copy the sysname from it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index e52db0c4..4d56107a 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -1,6 +1,7 @@
 #include <stdio.h>
 #include <string.h>
 #include <unistd.h>
+#include <libudev.h>
 
 #include "util.h"
 #include "checkers.h"
@@ -174,8 +175,8 @@ bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
 				} else {
 					int rc;
 
-					devt2devname(pp->dev, sizeof(pp->dev),
-						     pp->dev_t);
+					strlcpy(pp->dev, udev_device_get_sysname(pp->udev),
+						sizeof(pp->dev));
 					conf = get_multipath_config();
 					pthread_cleanup_push(put_multipath_config,
 							     conf);
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
  2021-10-20 19:15 ` [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup Benjamin Marzinski
  2021-10-20 19:15 ` [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 21:27   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed Benjamin Marzinski
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When new paths are added in update_pathvec_from_dm(). If they can't get
their regular wwid, they shouldn't try the getting the fallback wwid,
and should just copy the wwid of the multipath device.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c   | 7 ++++---
 libmultipath/discovery.h   | 2 ++
 libmultipath/structs_vec.c | 3 +--
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index f25fe9e3..5edf9593 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -2363,15 +2363,16 @@ int pathinfo(struct path *pp, struct config *conf, int mask)
 	}
 
 	if ((mask & DI_WWID) && !strlen(pp->wwid)) {
-		get_uid(pp, path_state, pp->udev,
-			(pp->retriggers >= conf->retrigger_tries));
+		int allow_fallback = ((mask & DI_NOFALLBACK) == 0 &&
+				      pp->retriggers >= conf->retrigger_tries);
+		get_uid(pp, path_state, pp->udev, allow_fallback);
 		if (!strlen(pp->wwid)) {
 			if (pp->bus == SYSFS_BUS_UNDEF)
 				return PATHINFO_SKIPPED;
 			if (pp->initialized != INIT_FAILED) {
 				pp->initialized = INIT_MISSING_UDEV;
 				pp->tick = conf->retrigger_delay;
-			} else if (pp->retriggers >= conf->retrigger_tries &&
+			} else if (allow_fallback &&
 				   (pp->state == PATH_UP || pp->state == PATH_GHOST)) {
 				/*
 				 * We have failed to read udev info for this path
diff --git a/libmultipath/discovery.h b/libmultipath/discovery.h
index a5446b4d..095657bb 100644
--- a/libmultipath/discovery.h
+++ b/libmultipath/discovery.h
@@ -70,6 +70,7 @@ enum discovery_mode {
 	__DI_WWID,
 	__DI_BLACKLIST,
 	__DI_NOIO,
+	__DI_NOFALLBACK,
 };
 
 #define DI_SYSFS	(1 << __DI_SYSFS)
@@ -79,6 +80,7 @@ enum discovery_mode {
 #define DI_WWID		(1 << __DI_WWID)
 #define DI_BLACKLIST	(1 << __DI_BLACKLIST)
 #define DI_NOIO		(1 << __DI_NOIO) /* Avoid IO on the device */
+#define DI_NOFALLBACK	(1 << __DI_NOFALLBACK) /* do not allow wwid fallback */
 
 #define DI_ALL		(DI_SYSFS  | DI_SERIAL | DI_CHECKER | DI_PRIO | \
 			 DI_WWID)
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 4d56107a..d363e7f6 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -182,8 +182,7 @@ bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
 							     conf);
 					pp->checkint = conf->checkint;
 					rc = pathinfo(pp, conf,
-						      DI_SYSFS|DI_WWID|DI_BLACKLIST|
-						      pathinfo_flags);
+						      DI_SYSFS|DI_WWID|DI_BLACKLIST|DI_NOFALLBACK|pathinfo_flags);
 					pthread_cleanup_pop(1);
 					if (rc != PATHINFO_OK) {
 						condlog(1, "%s: error %d in pathinfo, discarding path",
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2021-10-20 19:15 ` [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 21:28   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm Benjamin Marzinski
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Avoiding this corner case simplifies a future patch

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index d363e7f6..fb26437a 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -326,10 +326,8 @@ void set_path_removed(struct path *pp)
 	 * Keep link to mpp. It will be removed when the path
 	 * is successfully removed from the map.
 	 */
-	if (!mpp) {
+	if (!mpp)
 		condlog(0, "%s: internal error: mpp == NULL", pp->dev);
-		return;
-	}
 	pp->mpp = mpp;
 	pp->initialized = INIT_REMOVED;
 }
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2021-10-20 19:15 ` [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 22:10   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device Benjamin Marzinski
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

When paths are added by update_pathvec_from_dm(), udev may not have
initialized them. This means that it's possible that they are supposed
to be blacklisted by udev properties, but weren't.  Also, in order to
avoid doing potentially stalling IO, update_pathvec_from_dm() doesn't
get all the path information in pathinfo(). These paths end up in the
unexpected state of INIT_MISSING_UDEV or INIT_NEW, but with their mpp
and wwid set.

If udev has already initialized the path, but multipathed wasn't
monitoring it, the blacklist checks and wwid determination in
update_pathvec_from_dm() will work correctly, so paths added by it are
safe, but not completely initialized. The most likely reason why this
would happen is if the path was manually removed from multipathd
monitoring with "multipathd del path". The most common time when
uninitialized paths would in be part of multipath devices is during
boot, after the pivot root, but before the udev coldplug happens. These
paths are not necessarily safe. It's possible that /etc/multipath.conf
in the initramfs and regular filesystem differ, and they should now be
either blacklisted by udev_property, or have a different wwid. However
an "add" event should appear for them shortly.

Multipath now has a new state to deal with these devices, INIT_PARTIAL.
Devices in this state are treated mostly like INIT_OK devices, but
when "multipathd add path" is called or an add/change uevent happens
on these devices, multipathd will finish initializing them, and remove
them if necessary.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs.h     |  6 +++++
 libmultipath/structs_vec.c |  5 ++--
 multipathd/cli_handlers.c  |  4 ++++
 multipathd/main.c          | 48 ++++++++++++++++++++++++++++++++++----
 multipathd/main.h          |  1 +
 5 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index d0b266b7..69409fd4 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -200,6 +200,12 @@ enum initialized_states {
 	 * mapped by some multipath map because of map reload failure.
 	 */
 	INIT_REMOVED,
+	/*
+	 * INIT_PARTIAL: paths added by update_pathvec_from_dm() will not
+	 * be fully initialized. This will be handled when an add or
+	 * change uevent is received.
+	 */
+	INIT_PARTIAL,
 };
 
 enum prkey_sources {
diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index fb26437a..1de9175e 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -194,6 +194,7 @@ bool update_pathvec_from_dm(vector pathvec, struct multipath *mpp,
 					}
 					condlog(2, "%s: adding new path %s",
 						mpp->alias, pp->dev);
+					pp->initialized = INIT_PARTIAL;
 					store_path(pathvec, pp);
 					pp->tick = 1;
 				}
@@ -392,12 +393,12 @@ extract_hwe_from_path(struct multipath * mpp)
 	condlog(4, "%s: searching paths for valid hwe", mpp->alias);
 	/* doing this in two passes seems like paranoia to me */
 	vector_foreach_slot(mpp->paths, pp, i) {
-		if (pp->state == PATH_UP &&
+		if (pp->state == PATH_UP && pp->initialized != INIT_PARTIAL &&
 		    pp->initialized != INIT_REMOVED && pp->hwe)
 			goto done;
 	}
 	vector_foreach_slot(mpp->paths, pp, i) {
-		if (pp->state != PATH_UP &&
+		if ((pp->state != PATH_UP || pp->initialized == INIT_PARTIAL) &&
 		    pp->initialized != INIT_REMOVED && pp->hwe)
 			goto done;
 	}
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 58b9916c..8d37431e 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -526,6 +526,10 @@ cli_add_path (void *v, struct strbuf *reply, void *data)
 			return 1;
 		}
 
+		if (pp->initialized == INIT_PARTIAL &&
+		    finish_path_init(pp, vecs) < 0)
+			return 1;
+
 		if (pp->mpp)
 			return 0;
 	} else if (pp) {
diff --git a/multipathd/main.c b/multipathd/main.c
index cc4a4a5d..4b8e2cde 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -976,12 +976,19 @@ check_path_wwid_change(struct path *pp)
 	return false;
 }
 
+/*
+ * uev_add_path can call uev_update_path, and uev_update_path can call
+ * uev_add_path
+ */
+static int uev_update_path (struct uevent *uev, struct vectors * vecs);
+
 static int
 uev_add_path (struct uevent *uev, struct vectors * vecs, int need_do_map)
 {
 	struct path *pp;
 	int ret = 0, i;
 	struct config *conf;
+	bool partial_init = false;
 
 	condlog(3, "%s: add path (uevent)", uev->kernel);
 	if (strstr(uev->kernel, "..") != NULL) {
@@ -1000,7 +1007,10 @@ uev_add_path (struct uevent *uev, struct vectors * vecs, int need_do_map)
 		int r;
 		struct multipath *prev_mpp = NULL;
 
-		if (pp->initialized == INIT_REMOVED) {
+		if (pp->initialized == INIT_PARTIAL) {
+			partial_init = true;
+			goto out;
+		} else if (pp->initialized == INIT_REMOVED) {
 			condlog(3, "%s: re-adding removed path", pp->dev);
 			pp->initialized = INIT_NEW;
 			prev_mpp = pp->mpp;
@@ -1110,6 +1120,8 @@ uev_add_path (struct uevent *uev, struct vectors * vecs, int need_do_map)
 	}
 out:
 	lock_cleanup_pop(vecs->lock);
+	if (partial_init)
+		return uev_update_path(uev, vecs);
 	return ret;
 }
 
@@ -1405,6 +1417,25 @@ fail:
 	return REMOVE_PATH_MAP_ERROR;
 }
 
+int
+finish_path_init(struct path *pp, struct vectors * vecs)
+{
+	int r;
+	struct config *conf;
+
+	conf = get_multipath_config();
+	pthread_cleanup_push(put_multipath_config, conf);
+	r = pathinfo(pp, conf, DI_ALL|DI_BLACKLIST);
+	pthread_cleanup_pop(1);
+
+	if (r == PATHINFO_OK)
+		return 0;
+
+	condlog(0, "%s: error fully initializing path, removing", pp->dev);
+	ev_remove_path(pp, vecs, 1);
+	return -1;
+}
+
 static int
 uev_update_path (struct uevent *uev, struct vectors * vecs)
 {
@@ -1443,7 +1474,7 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 		}
 		/* Don't deal with other types of failed initialization
 		 * now. check_path will handle it */
-		if (!strlen(pp->wwid))
+		if (!strlen(pp->wwid) && pp->initialized != INIT_PARTIAL)
 			goto out;
 
 		strcpy(wwid, pp->wwid);
@@ -1451,12 +1482,20 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 
 		if (rc != 0)
 			strcpy(pp->wwid, wwid);
-		else if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
+		else if (strlen(wwid) &&
+			 strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
 			condlog(0, "%s: path wwid changed from '%s' to '%s'",
 				uev->kernel, wwid, pp->wwid);
 			ev_remove_path(pp, vecs, 1);
 			needs_reinit = 1;
 			goto out;
+		} else if (pp->initialized == INIT_PARTIAL) {
+			udev_device_unref(pp->udev);
+			pp->udev = udev_device_ref(uev->udev);
+			if (finish_path_init(pp, vecs) < 0) {
+				retval = 1;
+				goto out;
+			}
 		} else {
 			udev_device_unref(pp->udev);
 			pp->udev = udev_device_ref(uev->udev);
@@ -1507,6 +1546,7 @@ out:
 
 		condlog(0, "%s: spurious uevent, path not found", uev->kernel);
 	}
+	/* pp->initalized must not be INIT_PARTIAL if needs_reinit is set */
 	if (needs_reinit)
 		retval = uev_add_path(uev, vecs, 1);
 	return retval;
@@ -2116,7 +2156,7 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	int marginal_pathgroups, marginal_changed = 0;
 	int ret;
 
-	if (((pp->initialized == INIT_OK ||
+	if (((pp->initialized == INIT_OK || pp->initialized == INIT_PARTIAL ||
 	      pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
 	    pp->initialized == INIT_REMOVED)
 		return 0;
diff --git a/multipathd/main.h b/multipathd/main.h
index c8a1ce92..4acd1b8c 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -66,4 +66,5 @@ int reload_and_sync_map(struct multipath *mpp, struct vectors *vecs,
 
 void handle_path_wwid_change(struct path *pp, struct vectors *vecs);
 bool check_path_wwid_change(struct path *pp);
+int finish_path_init(struct path *pp, struct vectors * vecs);
 #endif /* MAIN_H */
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2021-10-20 19:15 ` [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 22:01   ` Martin Wilck
  2021-10-20 19:15 ` [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service Benjamin Marzinski
  2021-11-04 22:00 ` [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Martin Wilck
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The only reason multipath is monitoring an INIT_PARTIAL path is because
it was discovered in a multipath device table. If it stops being part
of a multipath device before it gets fully initialized, drop it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 1de9175e..3f60f3d5 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -307,9 +307,12 @@ void orphan_paths(vector pathvec, struct multipath *mpp, const char *reason)
 
 	vector_foreach_slot (pathvec, pp, i) {
 		if (pp->mpp == mpp) {
-			if (pp->initialized == INIT_REMOVED) {
-				condlog(3, "%s: freeing path in removed state",
-					pp->dev);
+			if (pp->initialized == INIT_REMOVED ||
+			    pp->initialized == INIT_PARTIAL) {
+				condlog(3, "%s: freeing path in %s state",
+					pp->dev,
+					pp->initialized == INIT_REMOVED ?
+					"removed" : "partial");
 				vector_del_slot(pathvec, i--);
 				free_path(pp);
 			} else
@@ -468,11 +471,14 @@ static void check_removed_paths(const struct multipath *mpp, vector pathvec)
 	int i;
 
 	vector_foreach_slot(pathvec, pp, i) {
-		if (pp->initialized != INIT_REMOVED || pp->mpp != mpp)
-			continue;
-		if (!find_devt_in_pathgroups(mpp, pp->dev_t)) {
-			condlog(2, "%s: %s: freeing path in removed state",
-				__func__, pp->dev);
+		if (pp->mpp == mpp &&
+		    (pp->initialized == INIT_REMOVED ||
+		     pp->initialized == INIT_PARTIAL) &&
+		    !find_devt_in_pathgroups(mpp, pp->dev_t)) {
+			condlog(2, "%s: %s: freeing path in %s state",
+				__func__, pp->dev,
+				pp->initialized == INIT_REMOVED ?
+				"removed" : "partial");
 			vector_del_slot(pathvec, i--);
 			free_path(pp);
 		}
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2021-10-20 19:15 ` [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device Benjamin Marzinski
@ 2021-10-20 19:15 ` Benjamin Marzinski
  2021-11-04 22:17   ` Martin Wilck
  2021-11-04 22:00 ` [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Martin Wilck
  7 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-10-20 19:15 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

multipathd can now handle starting up with incompletely initialized
paths, so it no longer needs to wait for the device Coldplug to
complete. However multipathd may need to write to /etc (for the wwids
and bindings files), so in needs to wait for the root filesystem to
be remounted read/write before starting.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/multipathd.service | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index 0b2ac814..87cb5349 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -1,9 +1,8 @@
 [Unit]
 Description=Device-Mapper Multipath Device Controller
-Wants=systemd-udev-trigger.service systemd-udev-settle.service
 Before=iscsi.service iscsid.service lvm2-activation-early.service
 Before=local-fs-pre.target blk-availability.service shutdown.target
-After=multipathd.socket systemd-udev-trigger.service systemd-udev-settle.service
+After=multipathd.socket systemd-remount-fs.service
 DefaultDependencies=no
 Conflicts=shutdown.target
 ConditionKernelCommandLine=!nompath
-- 
2.17.2

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name
  2021-10-20 19:15 ` [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name Benjamin Marzinski
@ 2021-11-04 21:20   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 21:20 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> The path already must have a udev device at this point, so it just
> needs to copy the sysname from it.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm
  2021-10-20 19:15 ` [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm Benjamin Marzinski
@ 2021-11-04 21:27   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 21:27 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> When new paths are added in update_pathvec_from_dm(). If they can't
> get
> their regular wwid, they shouldn't try the getting the fallback wwid,
> and should just copy the wwid of the multipath device.
> 
> Signed-off-by: Benjamin Marzinski <marzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed
  2021-10-20 19:15 ` [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed Benjamin Marzinski
@ 2021-11-04 21:28   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 21:28 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> Avoiding this corner case simplifies a future patch
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency
  2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2021-10-20 19:15 ` [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service Benjamin Marzinski
@ 2021-11-04 22:00 ` Martin Wilck
  2021-11-05 20:49   ` Benjamin Marzinski
  7 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 22:00 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> So, it turns out that commit 4ef67017 "libmultipath: add
> update_pathvec_from_dm()" already does most of the hard work of making
> multipath handle the uninitialized paths that exist during boot, after
> the switch root, but before the all the coldplug uevents have been
> processed. The only problem is that this code leaves the paths in a
> partially initialized state, which doesn't completely get corrected
> until
> a reconfigure happens. 
> 
> This patchset makes multipath track these partially initailized paths,
> and makes sure that they get fully initialized, or removed, as
> necessary.
> 
> The first patch is a tangentially related bug I found when trying
> (unsuccessfully) to recreate multipathd's problem with dropping
> uninitialized paths. Multipathd was not removing completely missing
> paths from maps that it found while starting up. The solution I chose
> was just to make sure that a full reload will happen on these maps,
> even
> if a weak reconfigure was requested. However, this means that multipath
> still completely ignores these missing paths. A side effect of this is
> that "multipath -l" does not show these paths, even though they exist
> as
> part of the multipath table. The full reloads are necessary, to take
> care of issues that update_pathvec_from_dm() can run into, but we might
> also want to think about marking these missing paths as INIT_REMOVED,
> instead of not adding them at all. This would make "multipath -l" still
> show them, until they actually get removed from the table.

Yes, that might be a good thing. Completely missing paths (not existing
in sysfs) in a map represent a dangerous condition; it would be good if
multipath -l was able to tell the user about it.

> 
> Patch 0005 makes sure to fully initialize the paths when their coldplug
> event occurs, but if the path is already fully initialized in udev, but
> multipathd finds out about it from update_pathvec_from_dm(), multipath
> doesn't do anything to finish initializing the path and moving it to
> INIT_OK, besides waiting for a uevent or a reconfigure. This is
> probably
> fine, since the only way I can see for a path to be in this state is
> for
> someone to manually remove the path from multipathd monitoring. But if
> I'm missing some other way that paths could end up in this state, we
> could try proactively finishing the initalization of INIT_PARTIAL paths
> that have all their udev information.

One option would be to try finishing the initialization in the path
checker.

What about the checker, in general?  Should we take some care that
partially initialized paths aren't mistakenly set as failed? I'm not
sure if libudev is able to return a proper fd from
udev_device_get_devnode() if the device isn't yet initialized in the
udev db. Without a proper fd, the checker is doomed to fail. And other
failure modes are possible too without proper udev initialization, I
suppose?


> 
> I'm also kind of on the fence about patch 0006. There is no reason
> why
> we have to remove INIT_PARTIAL paths if the multipath device goes
> away.
> But if a path is in INIT_PARTIAL for long enough that the multipath
> device gets removed, then it's likely not a path we want to be
> monitoring, and if a uevent occurs, we'll add it back, anyway. Also,
> knowing that INIT_PARTIAL paths are always part of multipath devices
> does make the code simpler.
> 
> I've tested these patches both by rebooting with necessary and
> unnecessary multipath devices in the initramfs and multipathd.service
> set to make multipathd start up at various points after the switch
> root,
> and by manually sending remove uevents to unintialize some paths, and
> then starting multipathd to test specific combinations of path
> states.
> But more testing is always welcome.

I'll try to give this code a test shortly.

Cheers,
Martin


> 
> Benjamin Marzinski (7):
>   multipathd: remove missing paths on startup
>   libmultipath: skip unneeded steps to get path name
>   libmultipath: don't use fallback wwid in update_pathvec_from_dm
>   libmultipath: always set INIT_REMOVED in set_path_removed
>   multipathd: fully initialize paths added by update_pathvec_from_dm
>   multipathd: remove INIT_PARTIAL paths that aren't in a multipath
>     device
>   multipathd: Remove dependency on systemd-udev-settle.service
> 
>  libmultipath/configure.c      |  2 ++
>  libmultipath/devmapper.c      |  2 ++
>  libmultipath/discovery.c      |  7 ++---
>  libmultipath/discovery.h      |  2 ++
>  libmultipath/structs.h        |  7 +++++
>  libmultipath/structs_vec.c    | 40 ++++++++++++++++-------------
>  multipathd/cli_handlers.c     |  4 +++
>  multipathd/main.c             | 48 ++++++++++++++++++++++++++++++++-
> --
>  multipathd/main.h             |  1 +
>  multipathd/multipathd.service |  3 +--
>  10 files changed, 90 insertions(+), 26 deletions(-)
> 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device
  2021-10-20 19:15 ` [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device Benjamin Marzinski
@ 2021-11-04 22:01   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 22:01 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> The only reason multipath is monitoring an INIT_PARTIAL path is
> because
> it was discovered in a multipath device table. If it stops being part
> of a multipath device before it gets fully initialized, drop it.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-10-20 19:15 ` [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm Benjamin Marzinski
@ 2021-11-04 22:10   ` Martin Wilck
  2021-11-05 10:55     ` Martin Wilck
  2021-11-05 21:03     ` Benjamin Marzinski
  0 siblings, 2 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 22:10 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> When paths are added by update_pathvec_from_dm(), udev may not have
> initialized them. This means that it's possible that they are
> supposed
> to be blacklisted by udev properties, but weren't.  Also, in order to
> avoid doing potentially stalling IO, update_pathvec_from_dm() doesn't
> get all the path information in pathinfo(). These paths end up in the
> unexpected state of INIT_MISSING_UDEV or INIT_NEW, but with their mpp
> and wwid set.
> 
> If udev has already initialized the path, but multipathed wasn't
> monitoring it, the blacklist checks and wwid determination in
> update_pathvec_from_dm() will work correctly, so paths added by it
> are
> safe, but not completely initialized. The most likely reason why this
> would happen is if the path was manually removed from multipathd
> monitoring with "multipathd del path".

Not quite getting this - normally the path would be removed from maps,
too. Are you referring to the REMOVE_PATH_DELAY case?


>  The most common time when
> uninitialized paths would in be part of multipath devices is during
> boot, after the pivot root, but before the udev coldplug happens.
> These
> paths are not necessarily safe. It's possible that
> /etc/multipath.conf
> in the initramfs and regular filesystem differ, and they should now
> be
> either blacklisted by udev_property, or have a different wwid.
> However
> an "add" event should appear for them shortly.
> 
> Multipath now has a new state to deal with these devices,
> INIT_PARTIAL.
> Devices in this state are treated mostly like INIT_OK devices, but
> when "multipathd add path" is called or an add/change uevent happens
> on these devices, multipathd will finish initializing them, and
> remove
> them if necessary.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Nice. Somehow in my mind the issue always look much more complex.
I like this, but I want to give it some testing before I finally ack
it.

Regards
Martin




> ---
>  libmultipath/structs.h     |  6 +++++
>  libmultipath/structs_vec.c |  5 ++--
>  multipathd/cli_handlers.c  |  4 ++++
>  multipathd/main.c          | 48 ++++++++++++++++++++++++++++++++++--
> --
>  multipathd/main.h          |  1 +
>  5 files changed, 58 insertions(+), 6 deletions(-)
> 
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index d0b266b7..69409fd4 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -200,6 +200,12 @@ enum initialized_states {
>          * mapped by some multipath map because of map reload
> failure.
>          */
>         INIT_REMOVED,
> +       /*
> +        * INIT_PARTIAL: paths added by update_pathvec_from_dm() will
> not
> +        * be fully initialized. This will be handled when an add or
> +        * change uevent is received.
> +        */
> +       INIT_PARTIAL,
>  };
>  
>  enum prkey_sources {
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index fb26437a..1de9175e 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -194,6 +194,7 @@ bool update_pathvec_from_dm(vector pathvec,
> struct multipath *mpp,
>                                         }
>                                         condlog(2, "%s: adding new
> path %s",
>                                                 mpp->alias, pp->dev);
> +                                       pp->initialized =
> INIT_PARTIAL;
>                                         store_path(pathvec, pp);
>                                         pp->tick = 1;
>                                 }
> @@ -392,12 +393,12 @@ extract_hwe_from_path(struct multipath * mpp)
>         condlog(4, "%s: searching paths for valid hwe", mpp->alias);
>         /* doing this in two passes seems like paranoia to me */
>         vector_foreach_slot(mpp->paths, pp, i) {
> -               if (pp->state == PATH_UP &&
> +               if (pp->state == PATH_UP && pp->initialized !=
> INIT_PARTIAL &&
>                     pp->initialized != INIT_REMOVED && pp->hwe)
>                         goto done;
>         }
>         vector_foreach_slot(mpp->paths, pp, i) {
> -               if (pp->state != PATH_UP &&
> +               if ((pp->state != PATH_UP || pp->initialized ==
> INIT_PARTIAL) &&
>                     pp->initialized != INIT_REMOVED && pp->hwe)
>                         goto done;
>         }
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 58b9916c..8d37431e 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -526,6 +526,10 @@ cli_add_path (void *v, struct strbuf *reply,
> void *data)
>                         return 1;
>                 }
>  
> +               if (pp->initialized == INIT_PARTIAL &&
> +                   finish_path_init(pp, vecs) < 0)
> +                       return 1;
> +
>                 if (pp->mpp)
>                         return 0;
>         } else if (pp) {
> diff --git a/multipathd/main.c b/multipathd/main.c
> index cc4a4a5d..4b8e2cde 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -976,12 +976,19 @@ check_path_wwid_change(struct path *pp)
>         return false;
>  }
>  
> +/*
> + * uev_add_path can call uev_update_path, and uev_update_path can
> call
> + * uev_add_path
> + */
> +static int uev_update_path (struct uevent *uev, struct vectors *
> vecs);
> +
>  static int
>  uev_add_path (struct uevent *uev, struct vectors * vecs, int
> need_do_map)
>  {
>         struct path *pp;
>         int ret = 0, i;
>         struct config *conf;
> +       bool partial_init = false;
>  
>         condlog(3, "%s: add path (uevent)", uev->kernel);
>         if (strstr(uev->kernel, "..") != NULL) {
> @@ -1000,7 +1007,10 @@ uev_add_path (struct uevent *uev, struct
> vectors * vecs, int need_do_map)
>                 int r;
>                 struct multipath *prev_mpp = NULL;
>  
> -               if (pp->initialized == INIT_REMOVED) {
> +               if (pp->initialized == INIT_PARTIAL) {
> +                       partial_init = true;
> +                       goto out;
> +               } else if (pp->initialized == INIT_REMOVED) {
>                         condlog(3, "%s: re-adding removed path", pp-
> >dev);
>                         pp->initialized = INIT_NEW;
>                         prev_mpp = pp->mpp;
> @@ -1110,6 +1120,8 @@ uev_add_path (struct uevent *uev, struct
> vectors * vecs, int need_do_map)
>         }
>  out:
>         lock_cleanup_pop(vecs->lock);
> +       if (partial_init)
> +               return uev_update_path(uev, vecs);
>         return ret;
>  }
>  
> @@ -1405,6 +1417,25 @@ fail:
>         return REMOVE_PATH_MAP_ERROR;
>  }
>  
> +int
> +finish_path_init(struct path *pp, struct vectors * vecs)
> +{
> +       int r;
> +       struct config *conf;
> +
> +       conf = get_multipath_config();
> +       pthread_cleanup_push(put_multipath_config, conf);
> +       r = pathinfo(pp, conf, DI_ALL|DI_BLACKLIST);
> +       pthread_cleanup_pop(1);
> +
> +       if (r == PATHINFO_OK)
> +               return 0;
> +
> +       condlog(0, "%s: error fully initializing path, removing", pp-
> >dev);
> +       ev_remove_path(pp, vecs, 1);
> +       return -1;
> +}
> +
>  static int
>  uev_update_path (struct uevent *uev, struct vectors * vecs)
>  {
> @@ -1443,7 +1474,7 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>                 }
>                 /* Don't deal with other types of failed
> initialization
>                  * now. check_path will handle it */
> -               if (!strlen(pp->wwid))
> +               if (!strlen(pp->wwid) && pp->initialized !=
> INIT_PARTIAL)
>                         goto out;
>  
>                 strcpy(wwid, pp->wwid);
> @@ -1451,12 +1482,20 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  
>                 if (rc != 0)
>                         strcpy(pp->wwid, wwid);
> -               else if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> +               else if (strlen(wwid) &&
> +                        strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
>                         condlog(0, "%s: path wwid changed from '%s'
> to '%s'",
>                                 uev->kernel, wwid, pp->wwid);
>                         ev_remove_path(pp, vecs, 1);
>                         needs_reinit = 1;
>                         goto out;
> +               } else if (pp->initialized == INIT_PARTIAL) {
> +                       udev_device_unref(pp->udev);
> +                       pp->udev = udev_device_ref(uev->udev);
> +                       if (finish_path_init(pp, vecs) < 0) {
> +                               retval = 1;
> +                               goto out;
> +                       }
>                 } else {
>                         udev_device_unref(pp->udev);
>                         pp->udev = udev_device_ref(uev->udev);
> @@ -1507,6 +1546,7 @@ out:
>  
>                 condlog(0, "%s: spurious uevent, path not found",
> uev->kernel);
>         }
> +       /* pp->initalized must not be INIT_PARTIAL if needs_reinit is
> set */

Did you mean "cannot" here? At least for the "wwid changed" case this
is subtle, as it's set to INIT_REMOVED in ev_remove_path().

>         if (needs_reinit)
>                 retval = uev_add_path(uev, vecs, 1);
>         return retval;
> @@ -2116,7 +2156,7 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>         int marginal_pathgroups, marginal_changed = 0;
>         int ret;
>  
> -       if (((pp->initialized == INIT_OK ||
> +       if (((pp->initialized == INIT_OK || pp->initialized ==
> INIT_PARTIAL ||
>               pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
>             pp->initialized == INIT_REMOVED)
>                 return 0;
> diff --git a/multipathd/main.h b/multipathd/main.h
> index c8a1ce92..4acd1b8c 100644
> --- a/multipathd/main.h
> +++ b/multipathd/main.h
> @@ -66,4 +66,5 @@ int reload_and_sync_map(struct multipath *mpp,
> struct vectors *vecs,
>  
>  void handle_path_wwid_change(struct path *pp, struct vectors *vecs);
>  bool check_path_wwid_change(struct path *pp);
> +int finish_path_init(struct path *pp, struct vectors * vecs);
>  #endif /* MAIN_H */


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup
  2021-10-20 19:15 ` [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup Benjamin Marzinski
@ 2021-11-04 22:12   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 22:12 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> If a path device was removed from the system while multipathd was not
> running, multipathd would not remove the path from the multipath
> table
> on start-up, or on a weak reconfigure. update_pathvec_from_dm() would
> return that a reload was necessary, but that information wasn't
> propigated back to where it could be used to reload the device.
> 
> Multipath devices now remember if they need to be reloaded, and if
> so,
> force_reload is set in select_action().  This means that even when
> configure is called with FORCE_RELOAD_WEAK, these devices will still
> be
> reloaded.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

(I found the subject of this patch somewhat disconcerting, because it
describes what your patch set is supposed to prevent - mistakenly
deleting paths. But I understand you meant *completely* missing paths,
i.e. paths that aren't present in sysfs).

> ---
>  libmultipath/configure.c   | 2 ++
>  libmultipath/devmapper.c   | 2 ++
>  libmultipath/structs.h     | 1 +
>  libmultipath/structs_vec.c | 1 +
>  4 files changed, 6 insertions(+)
> 
> diff --git a/libmultipath/configure.c b/libmultipath/configure.c
> index eb8ec1bd..f1a890af 100644
> --- a/libmultipath/configure.c
> +++ b/libmultipath/configure.c
> @@ -715,6 +715,8 @@ void select_action (struct multipath *mpp, const
> struct _vector *curmp,
>  
>         cmpp = find_mp_by_wwid(curmp, mpp->wwid);
>         cmpp_by_name = find_mp_by_alias(curmp, mpp->alias);
> +       if (mpp->need_reload || (cmpp && cmpp->need_reload))
> +               force_reload = 1;
>  
>         if (!cmpp_by_name) {
>                 if (cmpp) {
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index c05dc201..3e1a7260 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -522,6 +522,8 @@ freeout:
>  addout:
>         dm_task_destroy (dmt);
>  
> +       if (r)
> +               mpp->need_reload = false;
>         return r;
>  }
>  
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 399540e7..d0b266b7 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -355,6 +355,7 @@ struct multipath {
>         int retain_hwhandler;
>         int deferred_remove;
>         bool in_recovery;
> +       bool need_reload;
>         int san_path_err_threshold;
>         int san_path_err_forget_rate;
>         int san_path_err_recovery_time;
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index 85d97ac1..e52db0c4 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -237,6 +237,7 @@ bool update_pathvec_from_dm(vector pathvec,
> struct multipath *mpp,
>                 free_pathgroup(pgp, KEEP_PATHS);
>                 must_reload = true;
>         }
> +       mpp->need_reload = mpp->need_reload || must_reload;
>         return must_reload;
>  }
>  


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service
  2021-10-20 19:15 ` [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service Benjamin Marzinski
@ 2021-11-04 22:17   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-04 22:17 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> multipathd can now handle starting up with incompletely initialized
> paths, so it no longer needs to wait for the device Coldplug to
> complete. However multipathd may need to write to /etc (for the wwids
> and bindings files), so in needs to wait for the root filesystem to
> be remounted read/write before starting.

Side note: This reminds me of the fact that daemons writing to /etc are
deprecated these days. One day we should move the default location of
the bindings file to /var/lib/multipath or the like.

> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-04 22:10   ` Martin Wilck
@ 2021-11-05 10:55     ` Martin Wilck
  2021-11-05 21:49       ` Benjamin Marzinski
  2021-11-05 21:03     ` Benjamin Marzinski
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2021-11-05 10:55 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

Hi Ben,

On Thu, 2021-11-04 at 23:10 +0100, Martin Wilck wrote:
> On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > ...
> 
> > 
> > Multipath now has a new state to deal with these devices,
> > INIT_PARTIAL.
> > Devices in this state are treated mostly like INIT_OK devices, but
> > when "multipathd add path" is called or an add/change uevent
> > happens
> > on these devices, multipathd will finish initializing them, and
> > remove
> > them if necessary.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Nice. Somehow in my mind the issue always look much more complex.
> I like this, but I want to give it some testing before I finally ack
> it.

I noted that running "multipathd add path $path" for a path in
INIT_PARTIAL state changes this paths's init state to INIT_OK, even if
the udev still has no information about it (*).

The reason is that pp->wwid is set, and thus pathinfo() won't even try
to retrieve the WWID, although for INIT_PARTIAL paths the origin of the
WWID is not 100% trustworthy (being just copied from pp->mpp-
>wwid). pathinfo() will return PATHINFO_OK without having retrieved the
WWID. 

I suppose we could apply a similar logic as in uev_update_path() here,
clearing pp->wwid before calling pathinfo(). If udev was still unaware
of the path, this would mean a transition from INIT_PARTIAL to
INIT_MISSING_UDEV. OTOH, we'd now need to be prepared to have to remove
the path from the map if the WWID doesn't match after the call to
pathinfo(). This means we'd basically need to reimplement the "changed
WWID" logic from uev_update_path(), and thus we'd need to unify both
code paths.

In general, the difference between INIT_MISSING_UDEV and INIT_PARTIAL
is subtle. Correct me if I'm wrong, but INIT_PARTIAL means "udev has no
information about this device" and INIT_MISSING_UDEV currently means
"we weren't able to figure out a WWID for the path" (meaning that
INIT_MISSING_UDEV is a misnomer - when fallback are allowed,
INIT_MISSING_UDEV is actually independent of the device's state in the
udev db. We should rename this to INIT_MISSING_WWID, perhaps). The
semantics of the two states are very different though:
INIT_MISSING_UDEV will trigger an attempt to regenerate an uevent,
whereas INIT_PARTIAL will just stick passively. IMO it'd make sense to
trigger an uevent in the INIT_PARTIAL case, too, albeit perhaps not
immediately (after the default uevent timeout - 180s?). 

Also, we currently don't track the udev state of devices cleanly. We
set INIT_MISSING_UDEV if we can't obtain uid_attribute, which doesn't
necessarily mean that udev is unaware of the device. I believe we
should e.g. check the USEC_INITIALIZED property - it is non-NULL if and
only if the device is present in the udev db. If uid_attribute isn't
set regardless, we could conclude that the udev rules just don't set
it. It might make sense to retrigger *one* uevent in that case, but no
more.

IMO we need a clear definition of the INIT_xyz states and their
transitions. We won't get along with intuitive logic any more.

Cheers,
Martin

(*) my test procedure is simply to delete the paths' files from
/run/udev/data and to restart multipathd.


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency
  2021-11-04 22:00 ` [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Martin Wilck
@ 2021-11-05 20:49   ` Benjamin Marzinski
  0 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2021-11-05 20:49 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Nov 04, 2021 at 10:00:11PM +0000, Martin Wilck wrote:
> On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > So, it turns out that commit 4ef67017 "libmultipath: add
> > update_pathvec_from_dm()" already does most of the hard work of making
> > multipath handle the uninitialized paths that exist during boot, after
> > the switch root, but before the all the coldplug uevents have been
> > processed. The only problem is that this code leaves the paths in a
> > partially initialized state, which doesn't completely get corrected
> > until
> > a reconfigure happens. 
> > 
> > This patchset makes multipath track these partially initailized paths,
> > and makes sure that they get fully initialized, or removed, as
> > necessary.
> > 
> > The first patch is a tangentially related bug I found when trying
> > (unsuccessfully) to recreate multipathd's problem with dropping
> > uninitialized paths. Multipathd was not removing completely missing
> > paths from maps that it found while starting up. The solution I chose
> > was just to make sure that a full reload will happen on these maps,
> > even
> > if a weak reconfigure was requested. However, this means that multipath
> > still completely ignores these missing paths. A side effect of this is
> > that "multipath -l" does not show these paths, even though they exist
> > as
> > part of the multipath table. The full reloads are necessary, to take
> > care of issues that update_pathvec_from_dm() can run into, but we might
> > also want to think about marking these missing paths as INIT_REMOVED,
> > instead of not adding them at all. This would make "multipath -l" still
> > show them, until they actually get removed from the table.
> 
> Yes, that might be a good thing. Completely missing paths (not existing
> in sysfs) in a map represent a dangerous condition; it would be good if
> multipath -l was able to tell the user about it.

Sure, but I can do that as a separate patch?
 
> > 
> > Patch 0005 makes sure to fully initialize the paths when their coldplug
> > event occurs, but if the path is already fully initialized in udev, but
> > multipathd finds out about it from update_pathvec_from_dm(), multipath
> > doesn't do anything to finish initializing the path and moving it to
> > INIT_OK, besides waiting for a uevent or a reconfigure. This is
> > probably
> > fine, since the only way I can see for a path to be in this state is
> > for
> > someone to manually remove the path from multipathd monitoring. But if
> > I'm missing some other way that paths could end up in this state, we
> > could try proactively finishing the initalization of INIT_PARTIAL paths
> > that have all their udev information.
> 
> One option would be to try finishing the initialization in the path
> checker.

Yep. Something like that is what I was thinking.  But I'm still trying
to come up with a way that paths could get into this state without
someone running something like:

# multipathd del path sda
# mutipath

And I'm not sure how much code we want to add just to handle something
contrived like this.

> What about the checker, in general?  Should we take some care that
> partially initialized paths aren't mistakenly set as failed? I'm not
> sure if libudev is able to return a proper fd from
> udev_device_get_devnode() if the device isn't yet initialized in the
> udev db. Without a proper fd, the checker is doomed to fail. And other
> failure modes are possible too without proper udev initialization, I
> suppose?
> 

Since this is a case where the device nodes do exist, since they were
already made into multipath devices, I didn't run into any problems with
checking INIT_PARTIAL paths.

-Ben

> > 
> > I'm also kind of on the fence about patch 0006. There is no reason
> > why
> > we have to remove INIT_PARTIAL paths if the multipath device goes
> > away.
> > But if a path is in INIT_PARTIAL for long enough that the multipath
> > device gets removed, then it's likely not a path we want to be
> > monitoring, and if a uevent occurs, we'll add it back, anyway. Also,
> > knowing that INIT_PARTIAL paths are always part of multipath devices
> > does make the code simpler.
> > 
> > I've tested these patches both by rebooting with necessary and
> > unnecessary multipath devices in the initramfs and multipathd.service
> > set to make multipathd start up at various points after the switch
> > root,
> > and by manually sending remove uevents to unintialize some paths, and
> > then starting multipathd to test specific combinations of path
> > states.
> > But more testing is always welcome.
> 
> I'll try to give this code a test shortly.
> 
> Cheers,
> Martin
> 
> 
> > 
> > Benjamin Marzinski (7):
> >   multipathd: remove missing paths on startup
> >   libmultipath: skip unneeded steps to get path name
> >   libmultipath: don't use fallback wwid in update_pathvec_from_dm
> >   libmultipath: always set INIT_REMOVED in set_path_removed
> >   multipathd: fully initialize paths added by update_pathvec_from_dm
> >   multipathd: remove INIT_PARTIAL paths that aren't in a multipath
> >     device
> >   multipathd: Remove dependency on systemd-udev-settle.service
> > 
> >  libmultipath/configure.c      |  2 ++
> >  libmultipath/devmapper.c      |  2 ++
> >  libmultipath/discovery.c      |  7 ++---
> >  libmultipath/discovery.h      |  2 ++
> >  libmultipath/structs.h        |  7 +++++
> >  libmultipath/structs_vec.c    | 40 ++++++++++++++++-------------
> >  multipathd/cli_handlers.c     |  4 +++
> >  multipathd/main.c             | 48 ++++++++++++++++++++++++++++++++-
> > --
> >  multipathd/main.h             |  1 +
> >  multipathd/multipathd.service |  3 +--
> >  10 files changed, 90 insertions(+), 26 deletions(-)
> > 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-04 22:10   ` Martin Wilck
  2021-11-05 10:55     ` Martin Wilck
@ 2021-11-05 21:03     ` Benjamin Marzinski
  1 sibling, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2021-11-05 21:03 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Nov 04, 2021 at 10:10:18PM +0000, Martin Wilck wrote:
> On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > When paths are added by update_pathvec_from_dm(), udev may not have
> > initialized them. This means that it's possible that they are
> > supposed
> > to be blacklisted by udev properties, but weren't.  Also, in order to
> > avoid doing potentially stalling IO, update_pathvec_from_dm() doesn't
> > get all the path information in pathinfo(). These paths end up in the
> > unexpected state of INIT_MISSING_UDEV or INIT_NEW, but with their mpp
> > and wwid set.
> > 
> > If udev has already initialized the path, but multipathed wasn't
> > monitoring it, the blacklist checks and wwid determination in
> > update_pathvec_from_dm() will work correctly, so paths added by it
> > are
> > safe, but not completely initialized. The most likely reason why this
> > would happen is if the path was manually removed from multipathd
> > monitoring with "multipathd del path".
> 
> Not quite getting this - normally the path would be removed from maps,
> too. Are you referring to the REMOVE_PATH_DELAY case?

Yeah, but if the user then runs "multipath", it will add that no longer
tracked path back to the multipath device. Like I said in 0/7, I'm not
sure that how much we need to proatively try to initalize these paths,
or if we can just ignore them on the grounds that sometimes when people
shoot themselves in the foot, it hurts.

-Ben

> 
> 
> >  The most common time when
> > uninitialized paths would in be part of multipath devices is during
> > boot, after the pivot root, but before the udev coldplug happens.
> > These
> > paths are not necessarily safe. It's possible that
> > /etc/multipath.conf
> > in the initramfs and regular filesystem differ, and they should now
> > be
> > either blacklisted by udev_property, or have a different wwid.
> > However
> > an "add" event should appear for them shortly.
> > 
> > Multipath now has a new state to deal with these devices,
> > INIT_PARTIAL.
> > Devices in this state are treated mostly like INIT_OK devices, but
> > when "multipathd add path" is called or an add/change uevent happens
> > on these devices, multipathd will finish initializing them, and
> > remove
> > them if necessary.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Nice. Somehow in my mind the issue always look much more complex.
> I like this, but I want to give it some testing before I finally ack
> it.
> 
> Regards
> Martin
> 
> 
> 
> 
> > ---
> >  libmultipath/structs.h     |  6 +++++
> >  libmultipath/structs_vec.c |  5 ++--
> >  multipathd/cli_handlers.c  |  4 ++++
> >  multipathd/main.c          | 48 ++++++++++++++++++++++++++++++++++--
> > --
> >  multipathd/main.h          |  1 +
> >  5 files changed, 58 insertions(+), 6 deletions(-)
> > 
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index d0b266b7..69409fd4 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -200,6 +200,12 @@ enum initialized_states {
> >          * mapped by some multipath map because of map reload
> > failure.
> >          */
> >         INIT_REMOVED,
> > +       /*
> > +        * INIT_PARTIAL: paths added by update_pathvec_from_dm() will
> > not
> > +        * be fully initialized. This will be handled when an add or
> > +        * change uevent is received.
> > +        */
> > +       INIT_PARTIAL,
> >  };
> >  
> >  enum prkey_sources {
> > diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> > index fb26437a..1de9175e 100644
> > --- a/libmultipath/structs_vec.c
> > +++ b/libmultipath/structs_vec.c
> > @@ -194,6 +194,7 @@ bool update_pathvec_from_dm(vector pathvec,
> > struct multipath *mpp,
> >                                         }
> >                                         condlog(2, "%s: adding new
> > path %s",
> >                                                 mpp->alias, pp->dev);
> > +                                       pp->initialized =
> > INIT_PARTIAL;
> >                                         store_path(pathvec, pp);
> >                                         pp->tick = 1;
> >                                 }
> > @@ -392,12 +393,12 @@ extract_hwe_from_path(struct multipath * mpp)
> >         condlog(4, "%s: searching paths for valid hwe", mpp->alias);
> >         /* doing this in two passes seems like paranoia to me */
> >         vector_foreach_slot(mpp->paths, pp, i) {
> > -               if (pp->state == PATH_UP &&
> > +               if (pp->state == PATH_UP && pp->initialized !=
> > INIT_PARTIAL &&
> >                     pp->initialized != INIT_REMOVED && pp->hwe)
> >                         goto done;
> >         }
> >         vector_foreach_slot(mpp->paths, pp, i) {
> > -               if (pp->state != PATH_UP &&
> > +               if ((pp->state != PATH_UP || pp->initialized ==
> > INIT_PARTIAL) &&
> >                     pp->initialized != INIT_REMOVED && pp->hwe)
> >                         goto done;
> >         }
> > diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> > index 58b9916c..8d37431e 100644
> > --- a/multipathd/cli_handlers.c
> > +++ b/multipathd/cli_handlers.c
> > @@ -526,6 +526,10 @@ cli_add_path (void *v, struct strbuf *reply,
> > void *data)
> >                         return 1;
> >                 }
> >  
> > +               if (pp->initialized == INIT_PARTIAL &&
> > +                   finish_path_init(pp, vecs) < 0)
> > +                       return 1;
> > +
> >                 if (pp->mpp)
> >                         return 0;
> >         } else if (pp) {
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index cc4a4a5d..4b8e2cde 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -976,12 +976,19 @@ check_path_wwid_change(struct path *pp)
> >         return false;
> >  }
> >  
> > +/*
> > + * uev_add_path can call uev_update_path, and uev_update_path can
> > call
> > + * uev_add_path
> > + */
> > +static int uev_update_path (struct uevent *uev, struct vectors *
> > vecs);
> > +
> >  static int
> >  uev_add_path (struct uevent *uev, struct vectors * vecs, int
> > need_do_map)
> >  {
> >         struct path *pp;
> >         int ret = 0, i;
> >         struct config *conf;
> > +       bool partial_init = false;
> >  
> >         condlog(3, "%s: add path (uevent)", uev->kernel);
> >         if (strstr(uev->kernel, "..") != NULL) {
> > @@ -1000,7 +1007,10 @@ uev_add_path (struct uevent *uev, struct
> > vectors * vecs, int need_do_map)
> >                 int r;
> >                 struct multipath *prev_mpp = NULL;
> >  
> > -               if (pp->initialized == INIT_REMOVED) {
> > +               if (pp->initialized == INIT_PARTIAL) {
> > +                       partial_init = true;
> > +                       goto out;
> > +               } else if (pp->initialized == INIT_REMOVED) {
> >                         condlog(3, "%s: re-adding removed path", pp-
> > >dev);
> >                         pp->initialized = INIT_NEW;
> >                         prev_mpp = pp->mpp;
> > @@ -1110,6 +1120,8 @@ uev_add_path (struct uevent *uev, struct
> > vectors * vecs, int need_do_map)
> >         }
> >  out:
> >         lock_cleanup_pop(vecs->lock);
> > +       if (partial_init)
> > +               return uev_update_path(uev, vecs);
> >         return ret;
> >  }
> >  
> > @@ -1405,6 +1417,25 @@ fail:
> >         return REMOVE_PATH_MAP_ERROR;
> >  }
> >  
> > +int
> > +finish_path_init(struct path *pp, struct vectors * vecs)
> > +{
> > +       int r;
> > +       struct config *conf;
> > +
> > +       conf = get_multipath_config();
> > +       pthread_cleanup_push(put_multipath_config, conf);
> > +       r = pathinfo(pp, conf, DI_ALL|DI_BLACKLIST);
> > +       pthread_cleanup_pop(1);
> > +
> > +       if (r == PATHINFO_OK)
> > +               return 0;
> > +
> > +       condlog(0, "%s: error fully initializing path, removing", pp-
> > >dev);
> > +       ev_remove_path(pp, vecs, 1);
> > +       return -1;
> > +}
> > +
> >  static int
> >  uev_update_path (struct uevent *uev, struct vectors * vecs)
> >  {
> > @@ -1443,7 +1474,7 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >                 }
> >                 /* Don't deal with other types of failed
> > initialization
> >                  * now. check_path will handle it */
> > -               if (!strlen(pp->wwid))
> > +               if (!strlen(pp->wwid) && pp->initialized !=
> > INIT_PARTIAL)
> >                         goto out;
> >  
> >                 strcpy(wwid, pp->wwid);
> > @@ -1451,12 +1482,20 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  
> >                 if (rc != 0)
> >                         strcpy(pp->wwid, wwid);
> > -               else if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> > +               else if (strlen(wwid) &&
> > +                        strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> >                         condlog(0, "%s: path wwid changed from '%s'
> > to '%s'",
> >                                 uev->kernel, wwid, pp->wwid);
> >                         ev_remove_path(pp, vecs, 1);
> >                         needs_reinit = 1;
> >                         goto out;
> > +               } else if (pp->initialized == INIT_PARTIAL) {
> > +                       udev_device_unref(pp->udev);
> > +                       pp->udev = udev_device_ref(uev->udev);
> > +                       if (finish_path_init(pp, vecs) < 0) {
> > +                               retval = 1;
> > +                               goto out;
> > +                       }
> >                 } else {
> >                         udev_device_unref(pp->udev);
> >                         pp->udev = udev_device_ref(uev->udev);
> > @@ -1507,6 +1546,7 @@ out:
> >  
> >                 condlog(0, "%s: spurious uevent, path not found",
> > uev->kernel);
> >         }
> > +       /* pp->initalized must not be INIT_PARTIAL if needs_reinit is
> > set */
> 
> Did you mean "cannot" here? At least for the "wwid changed" case this
> is subtle, as it's set to INIT_REMOVED in ev_remove_path().
> 
> >         if (needs_reinit)
> >                 retval = uev_add_path(uev, vecs, 1);
> >         return retval;
> > @@ -2116,7 +2156,7 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >         int marginal_pathgroups, marginal_changed = 0;
> >         int ret;
> >  
> > -       if (((pp->initialized == INIT_OK ||
> > +       if (((pp->initialized == INIT_OK || pp->initialized ==
> > INIT_PARTIAL ||
> >               pp->initialized == INIT_REQUESTED_UDEV) && !pp->mpp) ||
> >             pp->initialized == INIT_REMOVED)
> >                 return 0;
> > diff --git a/multipathd/main.h b/multipathd/main.h
> > index c8a1ce92..4acd1b8c 100644
> > --- a/multipathd/main.h
> > +++ b/multipathd/main.h
> > @@ -66,4 +66,5 @@ int reload_and_sync_map(struct multipath *mpp,
> > struct vectors *vecs,
> >  
> >  void handle_path_wwid_change(struct path *pp, struct vectors *vecs);
> >  bool check_path_wwid_change(struct path *pp);
> > +int finish_path_init(struct path *pp, struct vectors * vecs);
> >  #endif /* MAIN_H */

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-05 10:55     ` Martin Wilck
@ 2021-11-05 21:49       ` Benjamin Marzinski
  2021-11-05 23:20         ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-11-05 21:49 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Fri, Nov 05, 2021 at 10:55:11AM +0000, Martin Wilck wrote:
> Hi Ben,
> 
> On Thu, 2021-11-04 at 23:10 +0100, Martin Wilck wrote:
> > On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > > ...
> > 
> > > 
> > > Multipath now has a new state to deal with these devices,
> > > INIT_PARTIAL.
> > > Devices in this state are treated mostly like INIT_OK devices, but
> > > when "multipathd add path" is called or an add/change uevent
> > > happens
> > > on these devices, multipathd will finish initializing them, and
> > > remove
> > > them if necessary.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > 
> > Nice. Somehow in my mind the issue always look much more complex.
> > I like this, but I want to give it some testing before I finally ack
> > it.
> 
> I noted that running "multipathd add path $path" for a path in
> INIT_PARTIAL state changes this paths's init state to INIT_OK, even if
> the udev still has no information about it (*).

Ah. Didn't think about that.
 
> The reason is that pp->wwid is set, and thus pathinfo() won't even try
> to retrieve the WWID, although for INIT_PARTIAL paths the origin of the
> WWID is not 100% trustworthy (being just copied from pp->mpp-
> >wwid). pathinfo() will return PATHINFO_OK without having retrieved the
> WWID. 
> 
> I suppose we could apply a similar logic as in uev_update_path() here,
> clearing pp->wwid before calling pathinfo(). If udev was still unaware
> of the path, this would mean a transition from INIT_PARTIAL to
> INIT_MISSING_UDEV. OTOH, we'd now need to be prepared to have to remove
> the path from the map if the WWID doesn't match after the call to
> pathinfo(). This means we'd basically need to reimplement the "changed
> WWID" logic from uev_update_path(), and thus we'd need to unify both
> code paths.
> 
> In general, the difference between INIT_MISSING_UDEV and INIT_PARTIAL
> is subtle. Correct me if I'm wrong, but INIT_PARTIAL means "udev has no
> information about this device" and INIT_MISSING_UDEV currently means
> "we weren't able to figure out a WWID for the path" (meaning that
> INIT_MISSING_UDEV is a misnomer - when fallback are allowed,
> INIT_MISSING_UDEV is actually independent of the device's state in the
> udev db. We should rename this to INIT_MISSING_WWID, perhaps).

Yeah. makes sense.

> The
> semantics of the two states are very different though:
> INIT_MISSING_UDEV will trigger an attempt to regenerate an uevent,
> whereas INIT_PARTIAL will just stick passively. IMO it'd make sense to
> trigger an uevent in the INIT_PARTIAL case, too, albeit perhaps not
> immediately (after the default uevent timeout - 180s?). 

Sure. We do want to wait long enough to be fairly sure that udev isn't
just being a little bit slow.

This would also handle the case where multipathd simply wasn't
tracking the path for some reason. If the device doesn't exist in the
udev database at all, do with send an "add" event instead of a "change"
event?

> Also, we currently don't track the udev state of devices cleanly. We
> set INIT_MISSING_UDEV if we can't obtain uid_attribute, which doesn't
> necessarily mean that udev is unaware of the device. I believe we
> should e.g. check the USEC_INITIALIZED property - it is non-NULL if and
> only if the device is present in the udev db. If uid_attribute isn't
> set regardless, we could conclude that the udev rules just don't set
> it. It might make sense to retrigger *one* uevent in that case, but no
> more.

IIRC, the initial reason why INIT_MISSING_UDEV was added was because
sometimes udev workers timed out, causing them to not run all the rules.
Do you know offhand if USEC_INITIALIZED is set in this case? If we could
differentiate between the following states:

1: udev hasn't gotten an event for a device
2: udev got an event but timed out or failed while processing it
3: udev successfully processed the event for the device, but multipath
   isn't seeing the attributes it was expecting.

We could react more sensibly.

-Ben

> IMO we need a clear definition of the INIT_xyz states and their
> transitions. We won't get along with intuitive logic any more.
> 
> Cheers,
> Martin
> 
> (*) my test procedure is simply to delete the paths' files from
> /run/udev/data and to restart multipathd.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-05 21:49       ` Benjamin Marzinski
@ 2021-11-05 23:20         ` Martin Wilck
  2021-11-08 16:29           ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2021-11-05 23:20 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Fri, 2021-11-05 at 16:49 -0500, Benjamin Marzinski wrote:
> On Fri, Nov 05, 2021 at 10:55:11AM +0000, Martin Wilck wrote:
> > Hi Ben,
> > 
> > On Thu, 2021-11-04 at 23:10 +0100, Martin Wilck wrote:
> > > On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > > > ...
> > > 
> > > > 
> > > > Multipath now has a new state to deal with these devices,
> > > > INIT_PARTIAL.
> > > > Devices in this state are treated mostly like INIT_OK devices,
> > > > but
> > > > when "multipathd add path" is called or an add/change uevent
> > > > happens
> > > > on these devices, multipathd will finish initializing them, and
> > > > remove
> > > > them if necessary.
> > > > 
> > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > 
> > > Nice. Somehow in my mind the issue always look much more complex.
> > > I like this, but I want to give it some testing before I finally
> > > ack
> > > it.
> > 
> > I noted that running "multipathd add path $path" for a path in
> > INIT_PARTIAL state changes this paths's init state to INIT_OK, even
> > if
> > the udev still has no information about it (*).
> 
> Ah. Didn't think about that.
>  
> > The reason is that pp->wwid is set, and thus pathinfo() won't even
> > try
> > to retrieve the WWID, although for INIT_PARTIAL paths the origin of
> > the
> > WWID is not 100% trustworthy (being just copied from pp->mpp-
> > > wwid). pathinfo() will return PATHINFO_OK without having retrieved
> > > the
> > WWID. 
> > 
> > I suppose we could apply a similar logic as in uev_update_path()
> > here,
> > clearing pp->wwid before calling pathinfo(). If udev was still
> > unaware
> > of the path, this would mean a transition from INIT_PARTIAL to
> > INIT_MISSING_UDEV. OTOH, we'd now need to be prepared to have to
> > remove
> > the path from the map if the WWID doesn't match after the call to
> > pathinfo(). This means we'd basically need to reimplement the
> > "changed
> > WWID" logic from uev_update_path(), and thus we'd need to unify both
> > code paths.
> > 
> > In general, the difference between INIT_MISSING_UDEV and INIT_PARTIAL
> > is subtle. Correct me if I'm wrong, but INIT_PARTIAL means "udev has
> > no
> > information about this device" and INIT_MISSING_UDEV currently means
> > "we weren't able to figure out a WWID for the path" (meaning that
> > INIT_MISSING_UDEV is a misnomer - when fallback are allowed,
> > INIT_MISSING_UDEV is actually independent of the device's state in
> > the
> > udev db. We should rename this to INIT_MISSING_WWID, perhaps).
> 
> Yeah. makes sense.
> 
> > The
> > semantics of the two states are very different though:
> > INIT_MISSING_UDEV will trigger an attempt to regenerate an uevent,
> > whereas INIT_PARTIAL will just stick passively. IMO it'd make sense
> > to
> > trigger an uevent in the INIT_PARTIAL case, too, albeit perhaps not
> > immediately (after the default uevent timeout - 180s?). 
> 
> Sure. We do want to wait long enough to be fairly sure that udev isn't
> just being a little bit slow.
> 
> This would also handle the case where multipathd simply wasn't
> tracking the path for some reason. If the device doesn't exist in the
> udev database at all, do with send an "add" event instead of a "change"
> event?

I don't think it matters. If no database file exists, udev can still
deliver some information:

# udevadm info /dev/sde
P: /devices/pci0000:00/0000:00:01.0/0000:01:00.7/host3/rport-3:0-
0/target3:0:0/3:0:0:4/block/sde
N: sde
L: 0
E: DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.7/host3/rport-
3:0-0/target3:0:0/3:0:0:4/block/sde
E: DEVNAME=/dev/sde
E: DEVTYPE=disk
E: MAJOR=8
E: MINOR=64
E: SUBSYSTEM=block

(note USEC_INITIALIZED is not in the set)

If I run "udevadm trigger -c change /dev/sde" in this situation, I'll
get the full info, as if I had run "add" before (some rules may differ
between "add" and "change", but that's quite unusual).

> 
> > Also, we currently don't track the udev state of devices cleanly.
> > We
> > set INIT_MISSING_UDEV if we can't obtain uid_attribute, which
> > doesn't
> > necessarily mean that udev is unaware of the device. I believe we
> > should e.g. check the USEC_INITIALIZED property - it is non-NULL if
> > and
> > only if the device is present in the udev db. If uid_attribute
> > isn't
> > set regardless, we could conclude that the udev rules just don't
> > set
> > it. It might make sense to retrigger *one* uevent in that case, but
> > no
> > more.
> 
> IIRC, the initial reason why INIT_MISSING_UDEV was added was because
> sometimes udev workers timed out, causing them to not run all the
> rules.
> Do you know offhand if USEC_INITIALIZED is set in this case? If we
> could
> differentiate between the following states:

udevd sets this property very early, when it first receives an uevent.
But libudev calls won't return it until the database file is written,
which happens last, after all rules and RUN statements have been
processed, _in the worker_. Thus if the worker is killed, the file
won't be written.


0. the device doesn't exist in sysfs 

> 1: udev hasn't gotten an event for a device

I don't think we can detect this without listening for kernel uevents.
And even if we listen to them, we could have missed some. udevd doesn't
have an API for it, either, AFAIK.

> 2: udev got an event but timed out or failed while processing it

This would be the USEC_INITIALIZED case, IMO

> 3: udev successfully processed the event for the device, but
> multipath
>    isn't seeing the attributes it was expecting.
> 
> We could react more sensibly.

Yes.

Regards
Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-05 23:20         ` Martin Wilck
@ 2021-11-08 16:29           ` Benjamin Marzinski
  2021-11-08 16:55             ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-11-08 16:29 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Fri, Nov 05, 2021 at 11:20:01PM +0000, Martin Wilck wrote:
> On Fri, 2021-11-05 at 16:49 -0500, Benjamin Marzinski wrote:
> > On Fri, Nov 05, 2021 at 10:55:11AM +0000, Martin Wilck wrote:
> > > Hi Ben,
> > > 
> > > On Thu, 2021-11-04 at 23:10 +0100, Martin Wilck wrote:
> > > > On Wed, 2021-10-20 at 14:15 -0500, Benjamin Marzinski wrote:
> > > > > ...
> > > > 
> > > > > 
> > > > > Multipath now has a new state to deal with these devices,
> > > > > INIT_PARTIAL.
> > > > > Devices in this state are treated mostly like INIT_OK devices,
> > > > > but
> > > > > when "multipathd add path" is called or an add/change uevent
> > > > > happens
> > > > > on these devices, multipathd will finish initializing them, and
> > > > > remove
> > > > > them if necessary.
> > > > > 
> > > > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > > 
> > > > Nice. Somehow in my mind the issue always look much more complex.
> > > > I like this, but I want to give it some testing before I finally
> > > > ack
> > > > it.
> > > 
> > > I noted that running "multipathd add path $path" for a path in
> > > INIT_PARTIAL state changes this paths's init state to INIT_OK, even
> > > if
> > > the udev still has no information about it (*).
> > 
> > Ah. Didn't think about that.
> >  
> > > The reason is that pp->wwid is set, and thus pathinfo() won't even
> > > try
> > > to retrieve the WWID, although for INIT_PARTIAL paths the origin of
> > > the
> > > WWID is not 100% trustworthy (being just copied from pp->mpp-
> > > > wwid). pathinfo() will return PATHINFO_OK without having retrieved
> > > > the
> > > WWID. 
> > > 
> > > I suppose we could apply a similar logic as in uev_update_path()
> > > here,
> > > clearing pp->wwid before calling pathinfo(). If udev was still
> > > unaware
> > > of the path, this would mean a transition from INIT_PARTIAL to
> > > INIT_MISSING_UDEV. OTOH, we'd now need to be prepared to have to
> > > remove
> > > the path from the map if the WWID doesn't match after the call to
> > > pathinfo(). This means we'd basically need to reimplement the
> > > "changed
> > > WWID" logic from uev_update_path(), and thus we'd need to unify both
> > > code paths.
> > > 
> > > In general, the difference between INIT_MISSING_UDEV and INIT_PARTIAL
> > > is subtle. Correct me if I'm wrong, but INIT_PARTIAL means "udev has
> > > no
> > > information about this device" and INIT_MISSING_UDEV currently means
> > > "we weren't able to figure out a WWID for the path" (meaning that
> > > INIT_MISSING_UDEV is a misnomer - when fallback are allowed,
> > > INIT_MISSING_UDEV is actually independent of the device's state in
> > > the
> > > udev db. We should rename this to INIT_MISSING_WWID, perhaps).
> > 
> > Yeah. makes sense.
> > 
> > > The
> > > semantics of the two states are very different though:
> > > INIT_MISSING_UDEV will trigger an attempt to regenerate an uevent,
> > > whereas INIT_PARTIAL will just stick passively. IMO it'd make sense
> > > to
> > > trigger an uevent in the INIT_PARTIAL case, too, albeit perhaps not
> > > immediately (after the default uevent timeout - 180s?). 
> > 
> > Sure. We do want to wait long enough to be fairly sure that udev isn't
> > just being a little bit slow.
> > 
> > This would also handle the case where multipathd simply wasn't
> > tracking the path for some reason. If the device doesn't exist in the
> > udev database at all, do with send an "add" event instead of a "change"
> > event?
> 
> I don't think it matters. If no database file exists, udev can still
> deliver some information:
> 
> # udevadm info /dev/sde
> P: /devices/pci0000:00/0000:00:01.0/0000:01:00.7/host3/rport-3:0-
> 0/target3:0:0/3:0:0:4/block/sde
> N: sde
> L: 0
> E: DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.7/host3/rport-
> 3:0-0/target3:0:0/3:0:0:4/block/sde
> E: DEVNAME=/dev/sde
> E: DEVTYPE=disk
> E: MAJOR=8
> E: MINOR=64
> E: SUBSYSTEM=block
> 
> (note USEC_INITIALIZED is not in the set)
> 
> If I run "udevadm trigger -c change /dev/sde" in this situation, I'll
> get the full info, as if I had run "add" before (some rules may differ
> between "add" and "change", but that's quite unusual).
>

udev rules may not change much, but, for instance, multipathd reacts
differently to add and change events.  So there may be other consumers
of uevents that care about the difference between add and change events.
 
> > 
> > > Also, we currently don't track the udev state of devices cleanly.
> > > We
> > > set INIT_MISSING_UDEV if we can't obtain uid_attribute, which
> > > doesn't
> > > necessarily mean that udev is unaware of the device. I believe we
> > > should e.g. check the USEC_INITIALIZED property - it is non-NULL if
> > > and
> > > only if the device is present in the udev db. If uid_attribute
> > > isn't
> > > set regardless, we could conclude that the udev rules just don't
> > > set
> > > it. It might make sense to retrigger *one* uevent in that case, but
> > > no
> > > more.
> > 
> > IIRC, the initial reason why INIT_MISSING_UDEV was added was because
> > sometimes udev workers timed out, causing them to not run all the
> > rules.
> > Do you know offhand if USEC_INITIALIZED is set in this case? If we
> > could
> > differentiate between the following states:
> 
> udevd sets this property very early, when it first receives an uevent.
> But libudev calls won't return it until the database file is written,
> which happens last, after all rules and RUN statements have been
> processed, _in the worker_. Thus if the worker is killed, the file
> won't be written.
> 
> 
> 0. the device doesn't exist in sysfs 

We simply delete devices that don't exist in sysfs, right? If we get a
non-remove uevent for a device, but it doesn't exist in sysfs, then I
would assume that a remove uevent will be shortly incomming.
 
> > 1: udev hasn't gotten an event for a device
> 
> I don't think we can detect this without listening for kernel uevents.
> And even if we listen to them, we could have missed some. udevd doesn't
> have an API for it, either, AFAIK.

Isn't this the most common INIT_PARTIAL case, where we are waiting for
the coldplug uevent? If there is no database file when we are processing
the device, we are in this state. Correct? 

> > 2: udev got an event but timed out or failed while processing it
> 
> This would be the USEC_INITIALIZED case, IMO

If udev has, in the past, successfully processed an event for a device,
but fails while processing a later uevent, it doesn't keep th data from
the previous event. So it could lose the uid_attribute. But the
database file should still exist. Correct?


-Ben

> > 3: udev successfully processed the event for the device, but
> > multipath
> >    isn't seeing the attributes it was expecting.
> > 
> > We could react more sensibly.
> 
> Yes.
> 
> Regards
> Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-08 16:29           ` Benjamin Marzinski
@ 2021-11-08 16:55             ` Martin Wilck
  2021-11-08 17:30               ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2021-11-08 16:55 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Mon, 2021-11-08 at 10:29 -0600, Benjamin Marzinski wrote:
> On Fri, Nov 05, 2021 at 11:20:01PM +0000, Martin Wilck wrote:
> > 
> > If I run "udevadm trigger -c change /dev/sde" in this situation,
> > I'll
> > get the full info, as if I had run "add" before (some rules may
> > differ
> > between "add" and "change", but that's quite unusual).
> > 
> 
> udev rules may not change much, but, for instance, multipathd reacts
> differently to add and change events.  So there may be other
> consumers
> of uevents that care about the difference between add and change
> events.

Let's send "add" then. It makes sense if the device didn't exist
before, after all. When we trigger, we don't know if an "add" event is
already under way, but that's just how it is.

>  
> > > 
> > > > Also, we currently don't track the udev state of devices
> > > > cleanly.
> > > > We
> > > > set INIT_MISSING_UDEV if we can't obtain uid_attribute, which
> > > > doesn't
> > > > necessarily mean that udev is unaware of the device. I believe
> > > > we
> > > > should e.g. check the USEC_INITIALIZED property - it is non-
> > > > NULL if
> > > > and
> > > > only if the device is present in the udev db. If uid_attribute
> > > > isn't
> > > > set regardless, we could conclude that the udev rules just
> > > > don't
> > > > set
> > > > it. It might make sense to retrigger *one* uevent in that case,
> > > > but
> > > > no
> > > > more.
> > > 
> > > IIRC, the initial reason why INIT_MISSING_UDEV was added was
> > > because
> > > sometimes udev workers timed out, causing them to not run all the
> > > rules.
> > > Do you know offhand if USEC_INITIALIZED is set in this case? If
> > > we
> > > could
> > > differentiate between the following states:
> > 
> > udevd sets this property very early, when it first receives an
> > uevent.
> > But libudev calls won't return it until the database file is
> > written,
> > which happens last, after all rules and RUN statements have been
> > processed, _in the worker_. Thus if the worker is killed, the file
> > won't be written.
> > 
> > 
> > 0. the device doesn't exist in sysfs 
> 
> We simply delete devices that don't exist in sysfs, right? If we get
> a
> non-remove uevent for a device, but it doesn't exist in sysfs, then I
> would assume that a remove uevent will be shortly incomming.

True, but still it's one possible "state" the device may be in. I just
wanted to mention that.

>  
> > > 1: udev hasn't gotten an event for a device
> > 
> > I don't think we can detect this without listening for kernel
> > uevents.
> > And even if we listen to them, we could have missed some. udevd
> > doesn't
> > have an API for it, either, AFAIK.
> 
> Isn't this the most common INIT_PARTIAL case, where we are waiting
> for
> the coldplug uevent? If there is no database file when we are
> processing
> the device, we are in this state. Correct? 

Not necessarily. udev may have got an event, but not have finished
processing it, or failed to process it entirely (e.g. because of a
timeout, your case 2.). When udevd sees an "add" or "change" event for
a device for the first time, creating the db entry is the last action
it takes. During coldplug, udevd will receive a lot of events almost
simultaneously, but it may take considerable time until it has
processed them all.

> > > 2: udev got an event but timed out or failed while processing it
> > 
> > This would be the USEC_INITIALIZED case, IMO
> 
> If udev has, in the past, successfully processed an event for a
> device,
> but fails while processing a later uevent, it doesn't keep th data
> from
> the previous event. So it could lose the uid_attribute. But the
> database file should still exist. Correct?

That's true. But we can't do anything about it. libudev will return
what the database currently contains, which is the content from the
last successfuly processed "add" or "change" uevent, whether or not
other uevents are in the queue or being processed. I don't think this
scenario matters in the current discussion about partially initialized
paths. This is the "changed wwid" scenario which I think we handle
quite nicely today already. Or am I misunderstanding?

Cheers,
Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-08 16:55             ` Martin Wilck
@ 2021-11-08 17:30               ` Benjamin Marzinski
  2021-11-08 19:11                 ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2021-11-08 17:30 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Mon, Nov 08, 2021 at 04:55:37PM +0000, Martin Wilck wrote:
> On Mon, 2021-11-08 at 10:29 -0600, Benjamin Marzinski wrote:
> > On Fri, Nov 05, 2021 at 11:20:01PM +0000, Martin Wilck wrote:
> > > > 1: udev hasn't gotten an event for a device
> > > 
> > > I don't think we can detect this without listening for kernel
> > > uevents.
> > > And even if we listen to them, we could have missed some. udevd
> > > doesn't
> > > have an API for it, either, AFAIK.
> > 
> > Isn't this the most common INIT_PARTIAL case, where we are waiting
> > for
> > the coldplug uevent? If there is no database file when we are
> > processing
> > the device, we are in this state. Correct? 
> 
> Not necessarily. udev may have got an event, but not have finished
> processing it, or failed to process it entirely (e.g. because of a
> timeout, your case 2.). When udevd sees an "add" or "change" event for
> a device for the first time, creating the db entry is the last action
> it takes. During coldplug, udevd will receive a lot of events almost
> simultaneously, but it may take considerable time until it has
> processed them all.

Fair enough. So if the first uevent hasn't completed already
successfully, it's gonna be hard to know why.
 
> > > > 2: udev got an event but timed out or failed while processing it
> > > 
> > > This would be the USEC_INITIALIZED case, IMO
> > 
> > If udev has, in the past, successfully processed an event for a
> > device,
> > but fails while processing a later uevent, it doesn't keep th data
> > from
> > the previous event. So it could lose the uid_attribute. But the
> > database file should still exist. Correct?
> 
> That's true. But we can't do anything about it. libudev will return
> what the database currently contains, which is the content from the
> last successfuly processed "add" or "change" uevent, whether or not
> other uevents are in the queue or being processed. I don't think this
> scenario matters in the current discussion about partially initialized
> paths. This is the "changed wwid" scenario which I think we handle
> quite nicely today already. Or am I misunderstanding?

If both events occurred before multipathd started up, then this wouldn't
simply be a "changed WWID".  The hope is to be able to reliably
distinguish this from case 3, where the data from udev is fine, but the
uid_attribute still isn't there.

-Ben

> Cheers,
> Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm
  2021-11-08 17:30               ` Benjamin Marzinski
@ 2021-11-08 19:11                 ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2021-11-08 19:11 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Mon, 2021-11-08 at 11:30 -0600, Benjamin Marzinski wrote:
> On Mon, Nov 08, 2021 at 04:55:37PM +0000, Martin Wilck wrote:
> > On Mon, 2021-11-08 at 10:29 -0600, Benjamin Marzinski wrote:
> > > On Fri, Nov 05, 2021 at 11:20:01PM +0000, Martin Wilck wrote:
> > > > > 1: udev hasn't gotten an event for a device
> > > > 
> > > > I don't think we can detect this without listening for kernel
> > > > uevents.
> > > > And even if we listen to them, we could have missed some. udevd
> > > > doesn't
> > > > have an API for it, either, AFAIK.
> > > 
> > > Isn't this the most common INIT_PARTIAL case, where we are
> > > waiting
> > > for
> > > the coldplug uevent? If there is no database file when we are
> > > processing
> > > the device, we are in this state. Correct? 
> > 
> > Not necessarily. udev may have got an event, but not have finished
> > processing it, or failed to process it entirely (e.g. because of a
> > timeout, your case 2.). When udevd sees an "add" or "change" event
> > for
> > a device for the first time, creating the db entry is the last
> > action
> > it takes. During coldplug, udevd will receive a lot of events
> > almost
> > simultaneously, but it may take considerable time until it has
> > processed them all.
> 
> Fair enough. So if the first uevent hasn't completed already
> successfully, it's gonna be hard to know why.

I just realized that what I said was wrong.

The worker does this (udev v246):

worker_process_device()
  udev_event_execute_rules()
    udev_rules_apply_to_event()
    update_devnode() /* symlinks etc */
    device_ensure_usec_initialized() /* set USEC_INITIALIZED */
    device_update_db() /* here the db file is (re)written */
       /* from this point on, 
          libudev will consider the device initialized */
    update_devnode() /* again */
    device_set_is_initialized() /* sets is_initialized property */
  udev_event_execute_run() /* Here RUN is executed */
  if (udev_event->inotify_watch)
     device_update_db()  /* again, this is what I was referring to */
device_monitor_send_device() /* send "udev" uevent to listeners */

If a worker is killed because of timeout, or exits with error, udevd
will receive SIGCHLD and delete the db entry for the device in
question.

>  
> > > > > 2: udev got an event but timed out or failed while processing
> > > > > it
> > > > 
> > > > This would be the USEC_INITIALIZED case, IMO
> > > 
> > > If udev has, in the past, successfully processed an event for a
> > > device,
> > > but fails while processing a later uevent, it doesn't keep th
> > > data
> > > from
> > > the previous event. So it could lose the uid_attribute. But the
> > > database file should still exist. Correct?
> > 
> > That's true. But we can't do anything about it. libudev will return
> > what the database currently contains, which is the content from the
> > last successfuly processed "add" or "change" uevent, whether or not
> > other uevents are in the queue or being processed. I don't think
> > this
> > scenario matters in the current discussion about partially
> > initialized
> > paths. This is the "changed wwid" scenario which I think we handle
> > quite nicely today already. Or am I misunderstanding?
> 
> If both events occurred before multipathd started up, then this
> wouldn't
> simply be a "changed WWID".  The hope is to be able to reliably
> distinguish this from case 3, where the data from udev is fine, but
> the
> uid_attribute still isn't there.

I'm not quite getting this. Are you talking about the case where one,
or both, uevents are lost? 

IMO we can use udev_device_get_is_initialized() as test. AFAICS it's
basically equivalent to USEC_INITIALIZED being set. If that function
returns true and uid_attribute is not set, re-triggering an uevent
makes relatively little sense to me. The likelihood that the next even
will yield a different result is pretty close to zero.

If if the first uevent is handled successfully but the second times
out, udevd will delete the db file (see above), but no uevent will be
sent to listeners. So multipathd will be ignorant of the lost / timed
out event. There's nothing we can do about that.

Regards
Martin








> 
> -Ben
> 
> > Cheers,
> > Martin
> 


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-11-08 19:12 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 19:15 [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Benjamin Marzinski
2021-10-20 19:15 ` [dm-devel] [PATCH 1/7] multipathd: remove missing paths on startup Benjamin Marzinski
2021-11-04 22:12   ` Martin Wilck
2021-10-20 19:15 ` [dm-devel] [PATCH 2/7] libmultipath: skip unneeded steps to get path name Benjamin Marzinski
2021-11-04 21:20   ` Martin Wilck
2021-10-20 19:15 ` [dm-devel] [PATCH 3/7] libmultipath: don't use fallback wwid in update_pathvec_from_dm Benjamin Marzinski
2021-11-04 21:27   ` Martin Wilck
2021-10-20 19:15 ` [dm-devel] [PATCH 4/7] libmultipath: always set INIT_REMOVED in set_path_removed Benjamin Marzinski
2021-11-04 21:28   ` Martin Wilck
2021-10-20 19:15 ` [dm-devel] [PATCH 5/7] multipathd: fully initialize paths added by update_pathvec_from_dm Benjamin Marzinski
2021-11-04 22:10   ` Martin Wilck
2021-11-05 10:55     ` Martin Wilck
2021-11-05 21:49       ` Benjamin Marzinski
2021-11-05 23:20         ` Martin Wilck
2021-11-08 16:29           ` Benjamin Marzinski
2021-11-08 16:55             ` Martin Wilck
2021-11-08 17:30               ` Benjamin Marzinski
2021-11-08 19:11                 ` Martin Wilck
2021-11-05 21:03     ` Benjamin Marzinski
2021-10-20 19:15 ` [dm-devel] [PATCH 6/7] multipathd: remove INIT_PARTIAL paths that aren't in a multipath device Benjamin Marzinski
2021-11-04 22:01   ` Martin Wilck
2021-10-20 19:15 ` [dm-devel] [PATCH 7/7] multipathd: Remove dependency on systemd-udev-settle.service Benjamin Marzinski
2021-11-04 22:17   ` Martin Wilck
2021-11-04 22:00 ` [dm-devel] [PATCH 0/7] multipathd: remove udev settle dependency Martin Wilck
2021-11-05 20:49   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.