All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Fix muitpath/multipathd flush issue
@ 2020-06-18  0:24 Benjamin Marzinski
  2020-06-18  0:24 ` [PATCH 1/7] libmultipath: change do_get_info returns Benjamin Marzinski
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If a multipath device is removed, and check_path() checks one of its
paths before multipathd processes either the uevent or the dm event from
removing it, multipathd will recreate the removed device. This happens
because check_path() will continute to check the removed device's former
paths until an event arrives removing the device.  A missing multpath
device will cause the update_multipath_strings() call to fail, setting
pp->dmstate to PSTATE_UNDEF.  If the path is up, this dmstate will cause
reinstate_path() to be called, which will also fail, because the
multipath device doesn't exist.  This will trigger a reload, restoring
the recently removed device.

This patchset handles this is two ways. The first two patches directly
fix these issues in check_path(), so that a missing multipath device
will no longer get recreated when checking one of its former paths. 

The other 5 patches add a "multipathd del maps" command, and make the
mutipath command delegate flush operations to multipathd so multipathd's
state remains in sync with the kernel's, while doing removes.

One source of complexity in these patches is that multipath suspends the
device with flushing before removing it, while multipathd doesn't.  It
does seem possible that the suspend could hang for a while, so I can
understand multipathd not using it.  However, that would take the
multipath device getting opened and IO being issued, between the time
when _dm_flush_map() verifies that the device isn't opened, and when it
suspends the device.  But more importantly, I don't understand how
suspending the device is useful.  If the device were to be opened
between when _dm_flush_map() checks the usage, and when it tries to
delete the device, device-mapper would simply fail the remove.  And if
the device isn't in use, there can't be any outstanding IO to flush.
When this code was added in commit 9a4ff93, there was no check if the
device was in use, and disabling queue_if_no_path could cause anything
that had the device open to error out and close it. But now that
multipath checks first if the device is open, I don't see the benefit to
doing this anymore. Removing it would allow multipathd and multipath to
use the same code to remove maps. Any thoughts?

Benjamin Marzinski (7):
  libmultipath: change do_get_info returns
  multipathd: fix check_path errors with removed map
  libmultipath: make dm_flush_maps only return 0 on success
  multipathd: add "del maps" multipathd command
  multipath: make flushing maps work like other commands
  multipath: delegate flushing maps to multipathd
  multipath: add option to skip multipathd delegation

 libmultipath/config.h     |  4 ++-
 libmultipath/configure.h  |  3 ---
 libmultipath/devmapper.c  | 41 ++++++++++++++++++-------------
 libmultipath/devmapper.h  |  3 ++-
 multipath/main.c          | 51 +++++++++++++++++++++++++++------------
 multipath/multipath.8     | 16 ++++++++----
 multipathd/cli.c          |  1 +
 multipathd/cli_handlers.c | 19 +++++++++++++++
 multipathd/cli_handlers.h |  1 +
 multipathd/main.c         | 39 ++++++++++++------------------
 multipathd/main.h         |  1 +
 11 files changed, 114 insertions(+), 65 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/7] libmultipath: change do_get_info returns
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 15:27   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 2/7] multipathd: fix check_path errors with removed map Benjamin Marzinski
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Make do_get_info() differentiate between dm failures and missing
devices, and update callers to retain their current behavior. Also,
rename it and make it external. These changes will be used by future
commits.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c | 29 ++++++++++++++++-------------
 libmultipath/devmapper.h |  1 +
 2 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 27d52398..b44f7545 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -496,8 +496,14 @@ int dm_addmap_reload(struct multipath *mpp, char *params, int flush)
 	return 0;
 }
 
-static int
-do_get_info(const char *name, struct dm_info *info)
+/*
+ * Returns:
+ * -1: Error
+ *  0: device does not exist
+ *  1: device exists
+ */
+int
+do_dm_get_info(const char *name, struct dm_info *info)
 {
 	int r = -1;
 	struct dm_task *dmt;
@@ -516,10 +522,7 @@ do_get_info(const char *name, struct dm_info *info)
 	if (!dm_task_get_info(dmt, info))
 		goto out;
 
-	if (!info->exists)
-		goto out;
-
-	r = 0;
+	r = !!info->exists;
 out:
 	dm_task_destroy(dmt);
 	return r;
@@ -529,7 +532,7 @@ int dm_map_present(const char * str)
 {
 	struct dm_info info;
 
-	return (do_get_info(str, &info) == 0);
+	return (do_dm_get_info(str, &info) == 1);
 }
 
 int dm_get_map(const char *name, unsigned long long *size, char *outparams)
@@ -820,7 +823,7 @@ dm_dev_t (const char * mapname, char * dev_t, int len)
 {
 	struct dm_info info;
 
-	if (do_get_info(mapname, &info) != 0)
+	if (do_dm_get_info(mapname, &info) != 1)
 		return 1;
 
 	if (snprintf(dev_t, len, "%i:%i", info.major, info.minor) > len)
@@ -862,7 +865,7 @@ dm_get_major_minor(const char *name, int *major, int *minor)
 {
 	struct dm_info info;
 
-	if (do_get_info(name, &info) != 0)
+	if (do_dm_get_info(name, &info) != 1)
 		return -1;
 
 	*major = info.major;
@@ -1199,7 +1202,7 @@ dm_geteventnr (const char *name)
 {
 	struct dm_info info;
 
-	if (do_get_info(name, &info) != 0)
+	if (do_dm_get_info(name, &info) != 1)
 		return -1;
 
 	return info.event_nr;
@@ -1210,7 +1213,7 @@ dm_is_suspended(const char *name)
 {
 	struct dm_info info;
 
-	if (do_get_info(name, &info) != 0)
+	if (do_dm_get_info(name, &info) != 1)
 		return -1;
 
 	return info.suspended;
@@ -1383,7 +1386,7 @@ dm_get_deferred_remove (const char * mapname)
 {
 	struct dm_info info;
 
-	if (do_get_info(mapname, &info) != 0)
+	if (do_dm_get_info(mapname, &info) != 1)
 		return -1;
 
 	return info.deferred_remove;
@@ -1442,7 +1445,7 @@ dm_get_info (const char * mapname, struct dm_info ** dmi)
 	if (!*dmi)
 		return 1;
 
-	if (do_get_info(mapname, *dmi) != 0) {
+	if (do_dm_get_info(mapname, *dmi) != 1) {
 		memset(*dmi, 0, sizeof(struct dm_info));
 		FREE(*dmi);
 		*dmi = NULL;
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index 5ed7edc5..79c9afb2 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -66,6 +66,7 @@ char * dm_mapname(int major, int minor);
 int dm_remove_partmaps (const char * mapname, int need_sync,
 			int deferred_remove);
 int dm_get_uuid(const char *name, char *uuid, int uuid_len);
+int do_dm_get_info(const char *, struct dm_info *);
 int dm_get_info (const char * mapname, struct dm_info ** dmi);
 int dm_rename (const char * old, char * new, char * delim, int skip_kpartx);
 int dm_reassign(const char * mapname);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
  2020-06-18  0:24 ` [PATCH 1/7] libmultipath: change do_get_info returns Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 19:34   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success Benjamin Marzinski
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

If a multipath device is removed during, or immediately before the call
to check_path(), multipathd can behave incorrectly. A missing multpath
device will cause update_multipath_strings() to fail, setting
pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will cause
reinstate_path() to be called, which will also fail.  This will trigger
a reload, restoring the recently removed device.

If update_multipath_strings() fails because there is no multipath
device, check_path should just quit, since the remove dmevent and uevent
are likely already queued up. Also, I don't see any reason to reload the
multipath device if reinstate fails. This code was added by
fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that reinstate
could fail if the path was disabled.  Looking through the current kernel
code, I can't see any reason why a reinstate would fail, where a reload
would help. If the path was missing from the multipath device,
update_multipath_strings() would already catch that, and quit
check_path() early, which make more sense to me than reloading does.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 6b7db2c0..8fb73922 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
 /*
  * caller must have locked the path list before calling that function
  */
-static int
+static void
 reinstate_path (struct path * pp)
 {
-	int ret = 0;
-
 	if (!pp->mpp)
-		return 0;
+		return;
 
-	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
+	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
 		condlog(0, "%s: reinstate failed", pp->dev_t);
-		ret = 1;
-	} else {
+	else {
 		condlog(2, "%s: reinstated", pp->dev_t);
 		update_queue_mode_add_path(pp->mpp);
 	}
-	return ret;
 }
 
 static void
@@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	 * Synchronize with kernel state
 	 */
 	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
+		struct dm_info info;
 		condlog(1, "%s: Could not synchronize with kernel state",
 			pp->dev);
+		if (pp->mpp && pp->mpp->alias &&
+		    do_dm_get_info(pp->mpp->alias, &info) == 0)
+			/* multipath device missing. Likely removed */
+			return 0;
 		pp->dmstate = PSTATE_UNDEF;
 	}
 	/* if update_multipath_strings orphaned the path, quit early */
@@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 		/*
 		 * reinstate this path
 		 */
-		if (!disable_reinstate && reinstate_path(pp)) {
-			condlog(3, "%s: reload map", pp->dev);
-			ev_add_path(pp, vecs, 1);
-			pp->tick = 1;
-			return 0;
-		}
+		if (!disable_reinstate)
+			reinstate_path(pp);
 		new_path_up = 1;
 
 		if (oldchkrstate != PATH_UP && oldchkrstate != PATH_GHOST)
@@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	else if (newstate == PATH_UP || newstate == PATH_GHOST) {
 		if ((pp->dmstate == PSTATE_FAILED ||
 		    pp->dmstate == PSTATE_UNDEF) &&
-		    !disable_reinstate) {
+		    !disable_reinstate)
 			/* Clear IO errors */
-			if (reinstate_path(pp)) {
-				condlog(3, "%s: reload map", pp->dev);
-				ev_add_path(pp, vecs, 1);
-				pp->tick = 1;
-				return 0;
-			}
-		} else {
+			reinstate_path(pp);
+		else {
 			LOG_MSG(4, verbosity, pp);
 			if (pp->checkint != max_checkint) {
 				/*
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
  2020-06-18  0:24 ` [PATCH 1/7] libmultipath: change do_get_info returns Benjamin Marzinski
  2020-06-18  0:24 ` [PATCH 2/7] multipathd: fix check_path errors with removed map Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 20:29   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 4/7] multipathd: add "del maps" multipathd command Benjamin Marzinski
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

dm_flush_maps() returned both 0 and 1 on error, depending on which part
of the function it was in, but the caller was always treating 0 as a
success. Make dm_flush_maps() always return 1 on error and 0 on success.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index b44f7545..682c0038 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -993,13 +993,13 @@ dm_flush_map_nopaths(const char * mapname, int deferred_remove)
 
 int dm_flush_maps (int retries)
 {
-	int r = 0;
+	int r = 1;
 	struct dm_task *dmt;
 	struct dm_names *names;
 	unsigned next = 0;
 
 	if (!(dmt = libmp_dm_task_create (DM_DEVICE_LIST)))
-		return 0;
+		return r;
 
 	dm_task_no_open_count(dmt);
 
@@ -1012,6 +1012,7 @@ int dm_flush_maps (int retries)
 	if (!names->dev)
 		goto out;
 
+	r = 0;
 	do {
 		r |= dm_suspend_and_flush_map(names->name, retries);
 		next = names->next;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/7] multipathd: add "del maps" multipathd command
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2020-06-18  0:24 ` [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 20:37   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 5/7] multipath: make flushing maps work like other commands Benjamin Marzinski
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

This will flush all multipath devices.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c  |  7 +++++--
 libmultipath/devmapper.h  |  2 +-
 multipath/main.c          |  2 +-
 multipathd/cli.c          |  1 +
 multipathd/cli_handlers.c | 19 +++++++++++++++++++
 multipathd/cli_handlers.h |  1 +
 multipathd/main.c         |  3 ++-
 multipathd/main.h         |  1 +
 8 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 682c0038..a5e0d298 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -991,7 +991,7 @@ dm_flush_map_nopaths(const char * mapname, int deferred_remove)
 
 #endif
 
-int dm_flush_maps (int retries)
+int dm_flush_maps (int need_suspend, int retries)
 {
 	int r = 1;
 	struct dm_task *dmt;
@@ -1014,7 +1014,10 @@ int dm_flush_maps (int retries)
 
 	r = 0;
 	do {
-		r |= dm_suspend_and_flush_map(names->name, retries);
+		if (need_suspend)
+			r |= dm_suspend_and_flush_map(names->name, retries);
+		else
+			r |= dm_flush_map(names->name);
 		next = names->next;
 		names = (void *) names + next;
 	} while (next);
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index 79c9afb2..adf89342 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -51,7 +51,7 @@ int dm_flush_map_nopaths(const char * mapname, int deferred_remove);
 #define dm_suspend_and_flush_map(mapname, retries) \
 	_dm_flush_map(mapname, 1, 0, 1, retries)
 int dm_cancel_deferred_remove(struct multipath *mpp);
-int dm_flush_maps (int retries);
+int dm_flush_maps (int need_suspend, int retries);
 int dm_fail_path(const char * mapname, char * path);
 int dm_reinstate_path(const char * mapname, char * path);
 int dm_queue_if_no_path(const char *mapname, int enable);
diff --git a/multipath/main.c b/multipath/main.c
index c4740fab..d89f0a91 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -1096,7 +1096,7 @@ main (int argc, char *argv[])
 		goto out;
 	}
 	else if (conf->remove == FLUSH_ALL) {
-		r = dm_flush_maps(retries) ? RTVL_FAIL : RTVL_OK;
+		r = dm_flush_maps(1, retries) ? RTVL_FAIL : RTVL_OK;
 		goto out;
 	}
 	while ((r = configure(conf, cmd, dev_type, dev)) == RTVL_RETRY)
diff --git a/multipathd/cli.c b/multipathd/cli.c
index 800c0fbe..bdc9fb10 100644
--- a/multipathd/cli.c
+++ b/multipathd/cli.c
@@ -568,6 +568,7 @@ cli_init (void) {
 	add_handler(DEL+PATH, NULL);
 	add_handler(ADD+MAP, NULL);
 	add_handler(DEL+MAP, NULL);
+	add_handler(DEL+MAPS, NULL);
 	add_handler(SWITCH+MAP+GROUP, NULL);
 	add_handler(RECONFIGURE, NULL);
 	add_handler(SUSPEND+MAP, NULL);
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 31c3d9fd..782bb003 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -852,6 +852,25 @@ cli_del_map (void * v, char ** reply, int * len, void * data)
 	return rc;
 }
 
+int
+cli_del_maps (void *v, char **reply, int *len, void *data)
+{
+	struct vectors * vecs = (struct vectors *)data;
+	struct multipath *mpp;
+	int i, ret = 0;
+
+	condlog(2, "remove maps (operator)");
+	vector_foreach_slot(vecs->mpvec, mpp, i) {
+		if (flush_map(mpp, vecs, 0))
+			ret++;
+		else
+			i--;
+	}
+	/* flush any multipath maps that aren't currently known by multipathd */
+	ret |= dm_flush_maps(0, 0);
+	return ret;
+}
+
 int
 cli_reload(void *v, char **reply, int *len, void *data)
 {
diff --git a/multipathd/cli_handlers.h b/multipathd/cli_handlers.h
index 0f451064..6f57b429 100644
--- a/multipathd/cli_handlers.h
+++ b/multipathd/cli_handlers.h
@@ -26,6 +26,7 @@ int cli_add_path (void * v, char ** reply, int * len, void * data);
 int cli_del_path (void * v, char ** reply, int * len, void * data);
 int cli_add_map (void * v, char ** reply, int * len, void * data);
 int cli_del_map (void * v, char ** reply, int * len, void * data);
+int cli_del_maps (void * v, char ** reply, int * len, void * data);
 int cli_switch_group(void * v, char ** reply, int * len, void * data);
 int cli_reconfigure(void * v, char ** reply, int * len, void * data);
 int cli_resize(void * v, char ** reply, int * len, void * data);
diff --git a/multipathd/main.c b/multipathd/main.c
index 8fb73922..8f055646 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -631,7 +631,7 @@ sync_maps_state(vector mpvec)
 		sync_map_state(mpp);
 }
 
-static int
+int
 flush_map(struct multipath * mpp, struct vectors * vecs, int nopaths)
 {
 	int r;
@@ -1551,6 +1551,7 @@ uxlsnrloop (void * ap)
 	set_handler_callback(DEL+PATH, cli_del_path);
 	set_handler_callback(ADD+MAP, cli_add_map);
 	set_handler_callback(DEL+MAP, cli_del_map);
+	set_handler_callback(DEL+MAPS, cli_del_maps);
 	set_handler_callback(SWITCH+MAP+GROUP, cli_switch_group);
 	set_unlocked_handler_callback(RECONFIGURE, cli_reconfigure);
 	set_handler_callback(SUSPEND+MAP, cli_suspend);
diff --git a/multipathd/main.h b/multipathd/main.h
index 7bb8463f..5dff17e5 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -28,6 +28,7 @@ int ev_add_path (struct path *, struct vectors *, int);
 int ev_remove_path (struct path *, struct vectors *, int);
 int ev_add_map (char *, const char *, struct vectors *);
 int ev_remove_map (char *, char *, int, struct vectors *);
+int flush_map(struct multipath *, struct vectors *, int);
 int set_config_state(enum daemon_status);
 void * mpath_alloc_prin_response(int prin_sa);
 int prin_do_scsi_ioctl(char *, int rq_servact, struct prin_resp * resp,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/7] multipath: make flushing maps work like other commands
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2020-06-18  0:24 ` [PATCH 4/7] multipathd: add "del maps" multipathd command Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 20:38   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 6/7] multipath: delegate flushing maps to multipathd Benjamin Marzinski
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The config structure doesn't need a special variable just for removes.
Multipath can just use the cmd variable, like it does for the other
commands.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/config.h    |  3 ++-
 libmultipath/configure.h |  3 ---
 multipath/main.c         | 20 ++++++++++----------
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/libmultipath/config.h b/libmultipath/config.h
index ceecff2d..55569360 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -38,6 +38,8 @@ enum mpath_cmds {
 	CMD_ADD_WWID,
 	CMD_USABLE_PATHS,
 	CMD_DUMP_CONFIG,
+	CMD_FLUSH_ONE,
+	CMD_FLUSH_ALL,
 };
 
 enum force_reload_types {
@@ -142,7 +144,6 @@ struct config {
 	unsigned int max_checkint;
 	bool use_watchdog;
 	int pgfailback;
-	int remove;
 	int rr_weight;
 	int no_path_retry;
 	int user_friendly_names;
diff --git a/libmultipath/configure.h b/libmultipath/configure.h
index d7509000..0e33bf40 100644
--- a/libmultipath/configure.h
+++ b/libmultipath/configure.h
@@ -45,9 +45,6 @@ enum {
 	CP_RETRY,
 };
 
-#define FLUSH_ONE 1
-#define FLUSH_ALL 2
-
 struct vectors;
 
 int setup_map (struct multipath * mpp, char * params, int params_size,
diff --git a/multipath/main.c b/multipath/main.c
index d89f0a91..101fd656 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -909,10 +909,10 @@ main (int argc, char *argv[])
 				cmd = CMD_DRY_RUN;
 			break;
 		case 'f':
-			conf->remove = FLUSH_ONE;
+			cmd = CMD_FLUSH_ONE;
 			break;
 		case 'F':
-			conf->remove = FLUSH_ALL;
+			cmd = CMD_FLUSH_ALL;
 			break;
 		case 'l':
 			if (optarg && !strncmp(optarg, "l", 1))
@@ -1053,6 +1053,10 @@ main (int argc, char *argv[])
 		condlog(0, "the -w option requires a device");
 		goto out;
 	}
+	if (cmd == CMD_FLUSH_ONE && dev_type != DEV_DEVMAP) {
+		condlog(0, "the -f option requires a map name to remove");
+		goto out;
+	}
 
 	switch(delegate_to_multipathd(cmd, dev, dev_type, conf)) {
 	case DELEGATE_OK:
@@ -1086,16 +1090,12 @@ main (int argc, char *argv[])
 	}
 	if (retries < 0)
 		retries = conf->remove_retries;
-	if (conf->remove == FLUSH_ONE) {
-		if (dev_type == DEV_DEVMAP) {
-			r = dm_suspend_and_flush_map(dev, retries) ?
-				RTVL_FAIL : RTVL_OK;
-		} else
-			condlog(0, "must provide a map name to remove");
-
+	if (cmd == CMD_FLUSH_ONE) {
+		r = dm_suspend_and_flush_map(dev, retries) ?
+		    RTVL_FAIL : RTVL_OK;
 		goto out;
 	}
-	else if (conf->remove == FLUSH_ALL) {
+	else if (cmd == CMD_FLUSH_ALL) {
 		r = dm_flush_maps(1, retries) ? RTVL_FAIL : RTVL_OK;
 		goto out;
 	}
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/7] multipath: delegate flushing maps to multipathd
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2020-06-18  0:24 ` [PATCH 5/7] multipath: make flushing maps work like other commands Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 20:40   ` Martin Wilck
  2020-06-18  0:24 ` [PATCH 7/7] multipath: add option to skip multipathd delegation Benjamin Marzinski
  2020-06-18  9:00 ` [PATCH 0/7] Fix muitpath/multipathd flush issue Martin Wilck
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Since there can be problems with removing maps outside of multipathd,
multipath should attempt to delegate this command to multipathd.
However, multipathd doesn't attempt to suspend the device, in order
to avoid potential hangs. If delegating to multipathd fails, multipath
should try the remove itself.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipath/main.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/multipath/main.c b/multipath/main.c
index 101fd656..6a24e483 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -820,6 +820,20 @@ int delegate_to_multipathd(enum mpath_cmds cmd,
 	if (cmd == CMD_CREATE && conf->force_reload == FORCE_RELOAD_YES) {
 		p += snprintf(p, n, "reconfigure");
 	}
+	else if (cmd == CMD_FLUSH_ONE && dev && dev_type == DEV_DEVMAP) {
+		p += snprintf(p, n, "del map %s", dev);
+		/* multipathd doesn't try as hard, to avoid potentially
+		 * hanging. If it fails, retry with the regular multipath
+		 * command */
+		r = NOT_DELEGATED;
+	}
+	else if (cmd == CMD_FLUSH_ALL) {
+		p += snprintf(p, n, "del maps");
+		/* multipathd doesn't try as hard, to avoid potentially
+		 * hanging. If it fails, retry with the regular multipath
+		 * command */
+		r = NOT_DELEGATED;
+	}
 	/* Add other translations here */
 
 	if (strlen(command) == 0)
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/7] multipath: add option to skip multipathd delegation
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2020-06-18  0:24 ` [PATCH 6/7] multipath: delegate flushing maps to multipathd Benjamin Marzinski
@ 2020-06-18  0:24 ` Benjamin Marzinski
  2020-06-18 20:44   ` Martin Wilck
  2020-06-18  9:00 ` [PATCH 0/7] Fix muitpath/multipathd flush issue Martin Wilck
  7 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18  0:24 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Add the -D option to allow users to skip delegating commands to
multipathd.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/config.h |  1 +
 multipath/main.c      | 15 +++++++++++----
 multipath/multipath.8 | 16 +++++++++++-----
 3 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/libmultipath/config.h b/libmultipath/config.h
index 55569360..92c61a0d 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -190,6 +190,7 @@ struct config {
 	int ghost_delay;
 	int find_multipaths_timeout;
 	int marginal_pathgroups;
+	int skip_delegate;
 	unsigned int version[3];
 	unsigned int sequence_nr;
 
diff --git a/multipath/main.c b/multipath/main.c
index 6a24e483..0cf7608f 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -130,9 +130,9 @@ usage (char * progname)
 {
 	fprintf (stderr, VERSION_STRING);
 	fprintf (stderr, "Usage:\n");
-	fprintf (stderr, "  %s [-v level] [-B|-d|-i|-q|-r] [-b file] [-p policy] [device]\n", progname);
-	fprintf (stderr, "  %s [-v level] [-R retries] -f device\n", progname);
-	fprintf (stderr, "  %s [-v level] [-R retries] -F\n", progname);
+	fprintf (stderr, "  %s [-v level] [-B|-d|-D|-i|-q|-r] [-b file] [-p policy] [device]\n", progname);
+	fprintf (stderr, "  %s [-v level] [-D|-R retries] -f device\n", progname);
+	fprintf (stderr, "  %s [-v level] [-D|-R retries] -F\n", progname);
 	fprintf (stderr, "  %s [-v level] [-l|-ll] [device]\n", progname);
 	fprintf (stderr, "  %s [-v level] [-a|-w] device\n", progname);
 	fprintf (stderr, "  %s [-v level] -W\n", progname);
@@ -153,6 +153,7 @@ usage (char * progname)
 		"  -C      check if a multipath device has usable paths\n"
 		"  -q      allow queue_if_no_path when multipathd is not running\n"
 		"  -d      dry run, do not create or update devmaps\n"
+		"  -D      Do not delegate command to multipathd\n"
 		"  -t      display the currently used multipathd configuration\n"
 		"  -T      display the multipathd configuration without builtin defaults\n"
 		"  -r      force devmap reload\n"
@@ -817,6 +818,9 @@ int delegate_to_multipathd(enum mpath_cmds cmd,
 	*p = '\0';
 	n = sizeof(command);
 
+	if (conf->skip_delegate)
+		return NOT_DELEGATED;
+
 	if (cmd == CMD_CREATE && conf->force_reload == FORCE_RELOAD_YES) {
 		p += snprintf(p, n, "reconfigure");
 	}
@@ -890,7 +894,7 @@ main (int argc, char *argv[])
 	multipath_conf = conf;
 	conf->retrigger_tries = 0;
 	conf->force_sync = 1;
-	while ((arg = getopt(argc, argv, ":adcChl::eFfM:v:p:b:BrR:itTquUwW")) != EOF ) {
+	while ((arg = getopt(argc, argv, ":adDcChl::eFfM:v:p:b:BrR:itTquUwW")) != EOF ) {
 		switch(arg) {
 		case 1: printf("optarg : %s\n",optarg);
 			break;
@@ -922,6 +926,9 @@ main (int argc, char *argv[])
 			if (cmd == CMD_CREATE)
 				cmd = CMD_DRY_RUN;
 			break;
+		case 'D':
+			conf->skip_delegate = 1;
+			break;
 		case 'f':
 			cmd = CMD_FLUSH_ONE;
 			break;
diff --git a/multipath/multipath.8 b/multipath/multipath.8
index 6fb8645a..16a27363 100644
--- a/multipath/multipath.8
+++ b/multipath/multipath.8
@@ -22,7 +22,7 @@ multipath \- Device mapper target autoconfig.
 .B multipath
 .RB [\| \-v\ \c
 .IR level \|]
-.RB [\| \-B | \-d | \-i | \-q | \-r \|]
+.RB [\| \-B | \-d | \-D | \-i | \-q | \-r \|]
 .RB [\| \-b\ \c
 .IR file \|]
 .RB [\| \-p\ \c
@@ -33,7 +33,7 @@ multipath \- Device mapper target autoconfig.
 .B multipath
 .RB [\| \-v\ \c
 .IR level \|]
-.RB [\| \-R\ \c
+.RB [\| \-D | \-R\ \c
 .IR retries \|]
 .B \-f device
 .
@@ -41,7 +41,7 @@ multipath \- Device mapper target autoconfig.
 .B multipath
 .RB [\| \-v\ \c
 .IR level \|]
-.RB [\| \-R\ \c
+.RB [\| \-D | \-R\ \c
 .IR retries \|]
 .B \-F
 .
@@ -125,11 +125,11 @@ the system.
 Other operation modes are chosen by using one of the following command line switches:
 .TP
 .B \-f
-Flush (remove) a multipath device map specified as parameter, if unused.
+Flush (remove) a multipath device map specified as parameter, if unused. This operation is delegated to the multipathd daemon if it's running.
 .
 .TP
 .B \-F
-Flush (remove) all unused multipath device maps.
+Flush (remove) all unused multipath device maps. This operation is delegated to the multipathd daemon if it's running.
 .
 .TP
 .B \-l
@@ -223,6 +223,12 @@ The verbosity level also controls the level of log and debug messages printed to
 Dry run, do not create or update devmaps.
 .
 .TP
+.B \-D
+Do not delegate operation to multipathd. By default, multipath delegates
+forced reloads and flushes (removes) to multipathd. Setting this option will
+cause the multipath command to execute these operations itself.
+.
+.TP
 .B \-e
 Enable all foreign libraries. This overrides the
 .I enable_foreign 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2020-06-18  0:24 ` [PATCH 7/7] multipath: add option to skip multipathd delegation Benjamin Marzinski
@ 2020-06-18  9:00 ` Martin Wilck
  2020-06-18 18:04   ` Benjamin Marzinski
  7 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18  9:00 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> 
> One source of complexity in these patches is that multipath suspends
> the
> device with flushing before removing it, while multipathd
> doesn't.  It
> does seem possible that the suspend could hang for a while, so I can
> understand multipathd not using it.  However, that would take the
> multipath device getting opened and IO being issued, between the time
> when _dm_flush_map() verifies that the device isn't opened, and when
> it
> suspends the device.  But more importantly, I don't understand how
> suspending the device is useful.  

I've looked up the origin of 9a4ff93 in SUSE bugzilla. The problem that
the patch tried to solve was that if map without healthy paths and with
queue_if_no_path set was removed, the removal might fail with 
"map is in use".  Hannes' comment at the time was this:

 "Problem is that we're not waiting for all outstanding I/Os to
 complete. So the check for 'map in use' might trigger a false
 positive, as the outstanding I/O will keep the map busy. Once it's
 finished we can remove the map."

 "I'll add an explicit 'suspend/resume' cycle here, which will wait for
 all outstanding I/O to finish. That should resolve the situation."

Thus setting queue_if_no_paths = 0 and doing a suspend/resume was
basically a trick to force dm to flush outstanding IO.

> If the device were to be opened
> between when _dm_flush_map() checks the usage, and when it tries to
> delete the device, device-mapper would simply fail the remove.  And
> if
> the device isn't in use, there can't be any outstanding IO to flush.
> When this code was added in commit 9a4ff93, there was no check if the
> device was in use,

Hm, I see this in the code preceding 9a4ff93:

extern int
_dm_flush_map (const char * mapname, int need_sync)
{
[...]
        if (dm_get_opencount(mapname)) {
                condlog(2, "%s: map in use", mapname);
                return 1;
        }

... so it seems that there was a check, even back then.

>  and disabling queue_if_no_path could cause anything
> that had the device open to error out and close it. But now that
> multipath checks first if the device is open, I don't see the benefit
> to
> doing this anymore. Removing it would allow multipathd and multipath
> to
> use the same code to remove maps. Any thoughts?

With queue_if_no_paths, there could be outstanding IO even if the
opencount was 0. This IO must be flushed somehow before the map can be
removed. Apparently just setting queue_if_no_path = 0 was not enough,
that's why Hannes added the suspend/resume. Maybe today we have some
better way to force the flush? libdm has the _error_device() function 
(aka dmsetup wipe_table) that replaces the map's table by the error
target. But this does a map reload, which implies a suspend, too.
Perhaps simply an fsync() on the dm device, or just wait a while until
all outstanding IO has errored out? But these alternatives don't
actually look better to me than Hannes' suspend/resume, they will take
at least as much time. Do you have a better idea?

multipathd 

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/7] libmultipath: change do_get_info returns
  2020-06-18  0:24 ` [PATCH 1/7] libmultipath: change do_get_info returns Benjamin Marzinski
@ 2020-06-18 15:27   ` Martin Wilck
  2020-06-18 23:17     ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 15:27 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> Make do_get_info() differentiate between dm failures and missing
> devices, and update callers to retain their current behavior. Also,
> rename it and make it external. These changes will be used by future
> commits.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/devmapper.c | 29 ++++++++++++++++-------------
>  libmultipath/devmapper.h |  1 +
>  2 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index 27d52398..b44f7545 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -496,8 +496,14 @@ int dm_addmap_reload(struct multipath *mpp, char
> *params, int flush)
>  	return 0;
>  }
>  
> -static int
> -do_get_info(const char *name, struct dm_info *info)
> +/*
> + * Returns:
> + * -1: Error
> + *  0: device does not exist
> + *  1: device exists
> + */

Can we use symbolic values here please? In particular as you have
changed the "success" return value from 0 to 1...

One day we should come up with a proper return value scheme
for libmultipath, defining specific enums for every function
doesn't scale. But do it here for now nonetheless, please.

Apart from that, ok.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-06-18  9:00 ` [PATCH 0/7] Fix muitpath/multipathd flush issue Martin Wilck
@ 2020-06-18 18:04   ` Benjamin Marzinski
  2020-06-18 20:06     ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 18:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 09:00:49AM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > 
> > One source of complexity in these patches is that multipath suspends
> > the
> > device with flushing before removing it, while multipathd
> > doesn't.  It
> > does seem possible that the suspend could hang for a while, so I can
> > understand multipathd not using it.  However, that would take the
> > multipath device getting opened and IO being issued, between the time
> > when _dm_flush_map() verifies that the device isn't opened, and when
> > it
> > suspends the device.  But more importantly, I don't understand how
> > suspending the device is useful.  
> 
> I've looked up the origin of 9a4ff93 in SUSE bugzilla. The problem that
> the patch tried to solve was that if map without healthy paths and with
> queue_if_no_path set was removed, the removal might fail with 
> "map is in use".  Hannes' comment at the time was this:
> 
>  "Problem is that we're not waiting for all outstanding I/Os to
>  complete. So the check for 'map in use' might trigger a false
>  positive, as the outstanding I/O will keep the map busy. Once it's
>  finished we can remove the map."
> 
>  "I'll add an explicit 'suspend/resume' cycle here, which will wait for
>  all outstanding I/O to finish. That should resolve the situation."
> 
> Thus setting queue_if_no_paths = 0 and doing a suspend/resume was
> basically a trick to force dm to flush outstanding IO.

I get the intention, I just don't think it currently is doing anything.

> > If the device were to be opened
> > between when _dm_flush_map() checks the usage, and when it tries to
> > delete the device, device-mapper would simply fail the remove.  And
> > if
> > the device isn't in use, there can't be any outstanding IO to flush.
> > When this code was added in commit 9a4ff93, there was no check if the
> > device was in use,
> 
> Hm, I see this in the code preceding 9a4ff93:
> 
> extern int
> _dm_flush_map (const char * mapname, int need_sync)
> {
> [...]
>         if (dm_get_opencount(mapname)) {
>                 condlog(2, "%s: map in use", mapname);
>                 return 1;
>         }
> 
> ... so it seems that there was a check, even back then.

oops. I missed that. 

> >  and disabling queue_if_no_path could cause anything
> > that had the device open to error out and close it. But now that
> > multipath checks first if the device is open, I don't see the benefit
> > to
> > doing this anymore. Removing it would allow multipathd and multipath
> > to
> > use the same code to remove maps. Any thoughts?
> 
> With queue_if_no_paths, there could be outstanding IO even if the
> opencount was 0.

This is what I don't accept at face value. I wrote a little test
program that fired off an async read, closed the fd without waiting for
a response, and then tried to exit.  running this on a queueing
multipath device causes the exit to hang like this

 cat /proc/22655/task/22655/stack
[<0>] exit_aio+0xdc/0xf0
[<0>] mmput+0x33/0x130
[<0>] do_exit+0x27f/0xb30
[<0>] do_group_exit+0x3a/0xa0
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x5b/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

and the multipath device to remain in use

# dmsetup info mpathb
Name:              mpathb
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        0
Event number:      7
Major, minor:      253, 5
Number of targets: 1
UUID: mpath-3600d0230000000000e13954ed5f89301

I've asked around, and device-mapper is certainly supposed to flush all
IO before the last close. Of course, there may be a corner case where it
doesn't. If you know of one, that would be worth sharing.

What I think that flush previously accomplished is that, just like the
flush_on_last_del code, if you stop queueing and error out the IO, then
whatever is waiting on it will get those errors, and likely close the
device soon after, hopefully in time for multipath to remove the device.

However, since multipath now checks if the device is in use before
disabling queue_if_no_path, it don't think this code is actually helpful
right now.

> This IO must be flushed somehow before the map can be
> removed. Apparently just setting queue_if_no_path = 0 was not enough,
> that's why Hannes added the suspend/resume. Maybe today we have some
> better way to force the flush? libdm has the _error_device() function 
> (aka dmsetup wipe_table) that replaces the map's table by the error
> target. But this does a map reload, which implies a suspend, too.
> Perhaps simply an fsync() on the dm device, or just wait a while until
> all outstanding IO has errored out? But these alternatives don't
> actually look better to me than Hannes' suspend/resume, they will take
> at least as much time. Do you have a better idea?

To go back to the old behavior, we would need to move the check if the
device is in use until after we suspended the device. Or we can keep the
current behavior, and simply remove the flushing and suspending.

> multipathd 
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-18  0:24 ` [PATCH 2/7] multipathd: fix check_path errors with removed map Benjamin Marzinski
@ 2020-06-18 19:34   ` Martin Wilck
  2020-06-18 23:17     ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 19:34 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> If a multipath device is removed during, or immediately before the
> call
> to check_path(), multipathd can behave incorrectly. A missing
> multpath
> device will cause update_multipath_strings() to fail, setting
> pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
> cause
> reinstate_path() to be called, which will also fail.  This will
> trigger
> a reload, restoring the recently removed device.
> 
> If update_multipath_strings() fails because there is no multipath
> device, check_path should just quit, since the remove dmevent and
> uevent
> are likely already queued up. Also, I don't see any reason to reload
> the
> multipath device if reinstate fails. This code was added by
> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> reinstate
> could fail if the path was disabled.  Looking through the current
> kernel
> code, I can't see any reason why a reinstate would fail, where a
> reload
> would help. If the path was missing from the multipath device,
> update_multipath_strings() would already catch that, and quit
> check_path() early, which make more sense to me than reloading does.

fac68d7 is related to the famous "dm-multipath: Accept failed paths for
multipath maps" patch (e.g. 
https://patchwork.kernel.org/patch/3368381/#7193001), which never made
it upstream. SUSE kernels have shipped this patch for a long time, but
we don't apply it any more in recent kernels.

With this patch, The reinstate_path() failure would occur if multipathd
had created a table with a "disabled" device (one which would be
present in a dm map even though the actual block device didn't exist),
and then tried to reinstate such a path. It sounds unlikely, but it
might be possible if devices are coming and going in quick succession.
In that situation, and with the "accept failed path" patch applied, a
reload makes some sense, because reload (unlike reinstate) would not
fail (at least not for this reason) and would actually add that just-
reinstated path. OTOH, it's not likely that the reload would improve
matters, either. After all, multipathd is just trying to reinstate a
non-existing path. So, I'm fine with skipping the reload.

> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/main.c | 36 ++++++++++++++----------------------
>  1 file changed, 14 insertions(+), 22 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 6b7db2c0..8fb73922 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
>  /*
>   * caller must have locked the path list before calling that
> function
>   */
> -static int
> +static void
>  reinstate_path (struct path * pp)
>  {
> -	int ret = 0;
> -
>  	if (!pp->mpp)
> -		return 0;
> +		return;
>  
> -	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> +	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
>  		condlog(0, "%s: reinstate failed", pp->dev_t);
> -		ret = 1;
> -	} else {
> +	else {
>  		condlog(2, "%s: reinstated", pp->dev_t);
>  		update_queue_mode_add_path(pp->mpp);
>  	}
> -	return ret;
>  }
>  static void
> @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>  	 * Synchronize with kernel state
>  	 */
>  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> +		struct dm_info info;
>  		condlog(1, "%s: Could not synchronize with kernel
> state",
>  			pp->dev);
> +		if (pp->mpp && pp->mpp->alias &&
> +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> +			/* multipath device missing. Likely removed */
> +			return 0;
>  		pp->dmstate = PSTATE_UNDEF;

It would be more elegant if we could distinguish different failure
modes from update_multipath_strings() directly, without having to call
do_dm_get_info() again.

>  	}
>  	/* if update_multipath_strings orphaned the path, quit early */
> @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>  		/*
>  		 * reinstate this path
>  		 */
> -		if (!disable_reinstate && reinstate_path(pp)) {
> -			condlog(3, "%s: reload map", pp->dev);
> -			ev_add_path(pp, vecs, 1);
> -			pp->tick = 1;
> -			return 0;
> -		}
> +		if (!disable_reinstate)
> +			reinstate_path(pp);
>  		new_path_up = 1;
>  
>  		if (oldchkrstate != PATH_UP && oldchkrstate !=
> PATH_GHOST)
> @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> path * pp, unsigned int ticks)
>  	else if (newstate == PATH_UP || newstate == PATH_GHOST) {
>  		if ((pp->dmstate == PSTATE_FAILED ||
>  		    pp->dmstate == PSTATE_UNDEF) &&
> -		    !disable_reinstate) {
> +		    !disable_reinstate)
>  			/* Clear IO errors */
> -			if (reinstate_path(pp)) {
> -				condlog(3, "%s: reload map", pp->dev);
> -				ev_add_path(pp, vecs, 1);
> -				pp->tick = 1;
> -				return 0;
> -			}
> -		} else {
> +			reinstate_path(pp);
> +		else {
>  			LOG_MSG(4, verbosity, pp);
>  			if (pp->checkint != max_checkint) {
>  				/*

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-06-18 18:04   ` Benjamin Marzinski
@ 2020-06-18 20:06     ` Martin Wilck
  2020-06-18 23:06       ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:06 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Thu, 2020-06-18 at 13:04 -0500, Benjamin Marzinski wrote:
> On Thu, Jun 18, 2020 at 09:00:49AM +0000, Martin Wilck wrote:
> > 
> > With queue_if_no_paths, there could be outstanding IO even if the
> > opencount was 0.
> 
> This is what I don't accept at face value. I wrote a little test
> program that fired off an async read, closed the fd without waiting
> for
> a response, and then tried to exit.  running this on a queueing
> multipath device causes the exit to hang like this
> 
>  cat /proc/22655/task/22655/stack
> [<0>] exit_aio+0xdc/0xf0
> [<0>] mmput+0x33/0x130
> [<0>] do_exit+0x27f/0xb30
> [<0>] do_group_exit+0x3a/0xa0
> [<0>] __x64_sys_exit_group+0x14/0x20
> [<0>] do_syscall_64+0x5b/0x160
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<0>] 0xffffffffffffffff
> 
> and the multipath device to remain in use
> 
> # dmsetup info mpathb
> Name:              mpathb
> State:             ACTIVE
> Read Ahead:        256
> Tables present:    LIVE
> Open count:        0
> Event number:      7
> Major, minor:      253, 5
> Number of targets: 1
> UUID: mpath-3600d0230000000000e13954ed5f89301
> 

The open count is 0. Wouldn't this be exactly the situation that
Hannes' patch was targeting - opencount 0, but still unable to
flush/close because of outstanding IO? If multipath was trying to flush
the map in this situation, it would disable queueing. Your process
would get an IO error sooner or later, and exit. But depending on the
amount of outstanding IO and the weather, this might not happen before
multipath had attempted to flush the map, and the flush attempt would
fail. By inserting the synchronous flush operation after disabling
queueing, multipath avoids that failure. Am I misunderstanding
something?


> I've asked around, and device-mapper is certainly supposed to flush
> all
> IO before the last close. Of course, there may be a corner case where
> it
> doesn't. If you know of one, that would be worth sharing.
> 
> What I think that flush previously accomplished is that, just like
> the
> flush_on_last_del code, if you stop queueing and error out the IO,
> then
> whatever is waiting on it will get those errors, and likely close the
> device soon after, hopefully in time for multipath to remove the
> device.
> 
> However, since multipath now checks if the device is in use before
> disabling queue_if_no_path, it don't think this code is actually
> helpful
> right now.
> 
> > This IO must be flushed somehow before the map can be
> > removed. Apparently just setting queue_if_no_path = 0 was not
> > enough,
> > that's why Hannes added the suspend/resume. Maybe today we have
> > some
> > better way to force the flush? libdm has the _error_device()
> > function 
> > (aka dmsetup wipe_table) that replaces the map's table by the error
> > target. But this does a map reload, which implies a suspend, too.
> > Perhaps simply an fsync() on the dm device, or just wait a while
> > until
> > all outstanding IO has errored out? But these alternatives don't
> > actually look better to me than Hannes' suspend/resume, they will
> > take
> > at least as much time. Do you have a better idea?
> 
> To go back to the old behavior, we would need to move the check if
> the
> device is in use until after we suspended the device. Or we can keep
> the
> current behavior, and simply remove the flushing and suspending.
> 

AFAICS the "in use" check looks at the opencount, and according to your
output above, it can be 0 while IO is outstanding. I haven't checked
this myself yet. But I can confirm that there was an actual bug (map
removal failing) that was allegedly fixed by the suspend/resume. It was
long ago, I can't tell with confidence if it could still happen.

Don't get me wrong, I'm not generally opposed to the removal of the
flush/suspend. I just wanted to clarify why it was there. It has been
used in multipath only, anyway. If we delegate to multipathd, it makes
sense to handle the situation in multipathd's manner. If we want to
make sure the user experience for "multipath -f" users is unchanged, we
could handle failure accordingly (suspend, resume, and retry flushing
the map).

Best,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success
  2020-06-18  0:24 ` [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success Benjamin Marzinski
@ 2020-06-18 20:29   ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:29 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> dm_flush_maps() returned both 0 and 1 on error, depending on which
> part
> of the function it was in, but the caller was always treating 0 as a
> success. Make dm_flush_maps() always return 1 on error and 0 on
> success.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/devmapper.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] multipathd: add "del maps" multipathd command
  2020-06-18  0:24 ` [PATCH 4/7] multipathd: add "del maps" multipathd command Benjamin Marzinski
@ 2020-06-18 20:37   ` Martin Wilck
  2020-06-18 23:12     ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:37 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> This will flush all multipath devices.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/devmapper.c  |  7 +++++--
>  libmultipath/devmapper.h  |  2 +-
>  multipath/main.c          |  2 +-
>  multipathd/cli.c          |  1 +
>  multipathd/cli_handlers.c | 19 +++++++++++++++++++
>  multipathd/cli_handlers.h |  1 +
>  multipathd/main.c         |  3 ++-
>  multipathd/main.h         |  1 +
>  8 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> index 682c0038..a5e0d298 100644
> --- a/libmultipath/devmapper.c
> +++ b/libmultipath/devmapper.c
> @@ -991,7 +991,7 @@ dm_flush_map_nopaths(const char * mapname, int
> deferred_remove)
>  
>  #endif
>  
> -int dm_flush_maps (int retries)
> +int dm_flush_maps (int need_suspend, int retries)
>  {
>  	int r = 1;
>  	struct dm_task *dmt;
> @@ -1014,7 +1014,10 @@ int dm_flush_maps (int retries)
>  
>  	r = 0;
>  	do {
> -		r |= dm_suspend_and_flush_map(names->name, retries);
> +		if (need_suspend)
> +			r |= dm_suspend_and_flush_map(names->name,
> retries);
> +		else
> +			r |= dm_flush_map(names->name);

This is fine, but considering the previous discussion, I'd prefer to
get rid of need_suspend and dm_suspend_and_flush_map() entirely. It
would simplify the _dm_flush_map code path significantly.

As we determined that we use the suspend/resume only in multipath
anyway, we could add it there in the "non-delegated" code path.

Thanks,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/7] multipath: make flushing maps work like other commands
  2020-06-18  0:24 ` [PATCH 5/7] multipath: make flushing maps work like other commands Benjamin Marzinski
@ 2020-06-18 20:38   ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:38 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> The config structure doesn't need a special variable just for
> removes.
> Multipath can just use the cmd variable, like it does for the other
> commands.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] multipath: delegate flushing maps to multipathd
  2020-06-18  0:24 ` [PATCH 6/7] multipath: delegate flushing maps to multipathd Benjamin Marzinski
@ 2020-06-18 20:40   ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:40 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> Since there can be problems with removing maps outside of multipathd,
> multipath should attempt to delegate this command to multipathd.
> However, multipathd doesn't attempt to suspend the device, in order
> to avoid potential hangs. If delegating to multipathd fails,
> multipath
> should try the remove itself.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>
(but see remark on patch 4/7).

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] multipath: add option to skip multipathd delegation
  2020-06-18  0:24 ` [PATCH 7/7] multipath: add option to skip multipathd delegation Benjamin Marzinski
@ 2020-06-18 20:44   ` Martin Wilck
  2020-06-18 23:15     ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-18 20:44 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: dm-devel

On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> Add the -D option to allow users to skip delegating commands to
> multipathd.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/config.h |  1 +
>  multipath/main.c      | 15 +++++++++++----
>  multipath/multipath.8 | 16 +++++++++++-----
>  3 files changed, 23 insertions(+), 9 deletions(-)
> 

I wonder if we really need this. We fall back to NOT_DELEGATED anyway.
If users really, really want this, they can run multipath while
multipathd is stopped.

I'm not saying it's totally useless, but the presence of this option
suggests to users that they may want to use it, which I doubt.
Perhaps we want to have it, for debugging or expert usage purpose, as a
hidden/undocumented option?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-06-18 20:06     ` Martin Wilck
@ 2020-06-18 23:06       ` Benjamin Marzinski
  2020-07-01 20:54         ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 23:06 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 08:06:53PM +0000, Martin Wilck wrote:
> On Thu, 2020-06-18 at 13:04 -0500, Benjamin Marzinski wrote:
> > On Thu, Jun 18, 2020 at 09:00:49AM +0000, Martin Wilck wrote:
> > > 
> > > With queue_if_no_paths, there could be outstanding IO even if the
> > > opencount was 0.
> > 
> > This is what I don't accept at face value. I wrote a little test
> > program that fired off an async read, closed the fd without waiting
> > for
> > a response, and then tried to exit.  running this on a queueing
> > multipath device causes the exit to hang like this
> > 
> >  cat /proc/22655/task/22655/stack
> > [<0>] exit_aio+0xdc/0xf0
> > [<0>] mmput+0x33/0x130
> > [<0>] do_exit+0x27f/0xb30
> > [<0>] do_group_exit+0x3a/0xa0
> > [<0>] __x64_sys_exit_group+0x14/0x20
> > [<0>] do_syscall_64+0x5b/0x160
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [<0>] 0xffffffffffffffff
> > 
> > and the multipath device to remain in use
> > 
> > # dmsetup info mpathb
> > Name:              mpathb
> > State:             ACTIVE
> > Read Ahead:        256
> > Tables present:    LIVE
> > Open count:        0
> > Event number:      7
> > Major, minor:      253, 5
> > Number of targets: 1
> > UUID: mpath-3600d0230000000000e13954ed5f89301
> > 
> 
> The open count is 0.

Whoops. I copied the output from after I restored the path, and the
program exitted.  While it is hung, you see this:

# dmsetup info mpathb
Name:              mpathb
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      8
Major, minor:      253, 5
Number of targets: 1
UUID: mpath-3600d0230000000000e13954ed5f89301

I uploaded the test program, aio_test:

https://github.com/bmarzins/test_programs.git

You just need to run in on a queueing multipath device with no active
paths and an open count of 0. It will hang with the device open. Restore
a path, and it will exit, and the open count will go to 0.

-Ben

> Wouldn't this be exactly the situation that
> Hannes' patch was targeting - opencount 0, but still unable to
> flush/close because of outstanding IO? If multipath was trying to flush
> the map in this situation, it would disable queueing. Your process
> would get an IO error sooner or later, and exit. But depending on the
> amount of outstanding IO and the weather, this might not happen before
> multipath had attempted to flush the map, and the flush attempt would
> fail. By inserting the synchronous flush operation after disabling
> queueing, multipath avoids that failure. Am I misunderstanding
> something?
> 
> 
> > I've asked around, and device-mapper is certainly supposed to flush
> > all
> > IO before the last close. Of course, there may be a corner case where
> > it
> > doesn't. If you know of one, that would be worth sharing.
> > 
> > What I think that flush previously accomplished is that, just like
> > the
> > flush_on_last_del code, if you stop queueing and error out the IO,
> > then
> > whatever is waiting on it will get those errors, and likely close the
> > device soon after, hopefully in time for multipath to remove the
> > device.
> > 
> > However, since multipath now checks if the device is in use before
> > disabling queue_if_no_path, it don't think this code is actually
> > helpful
> > right now.
> > 
> > > This IO must be flushed somehow before the map can be
> > > removed. Apparently just setting queue_if_no_path = 0 was not
> > > enough,
> > > that's why Hannes added the suspend/resume. Maybe today we have
> > > some
> > > better way to force the flush? libdm has the _error_device()
> > > function 
> > > (aka dmsetup wipe_table) that replaces the map's table by the error
> > > target. But this does a map reload, which implies a suspend, too.
> > > Perhaps simply an fsync() on the dm device, or just wait a while
> > > until
> > > all outstanding IO has errored out? But these alternatives don't
> > > actually look better to me than Hannes' suspend/resume, they will
> > > take
> > > at least as much time. Do you have a better idea?
> > 
> > To go back to the old behavior, we would need to move the check if
> > the
> > device is in use until after we suspended the device. Or we can keep
> > the
> > current behavior, and simply remove the flushing and suspending.
> > 
> 
> AFAICS the "in use" check looks at the opencount, and according to your
> output above, it can be 0 while IO is outstanding. I haven't checked
> this myself yet. But I can confirm that there was an actual bug (map
> removal failing) that was allegedly fixed by the suspend/resume. It was
> long ago, I can't tell with confidence if it could still happen.
> 
> Don't get me wrong, I'm not generally opposed to the removal of the
> flush/suspend. I just wanted to clarify why it was there. It has been
> used in multipath only, anyway. If we delegate to multipathd, it makes
> sense to handle the situation in multipathd's manner. If we want to
> make sure the user experience for "multipath -f" users is unchanged, we
> could handle failure accordingly (suspend, resume, and retry flushing
> the map).
> 
> Best,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] multipathd: add "del maps" multipathd command
  2020-06-18 20:37   ` Martin Wilck
@ 2020-06-18 23:12     ` Benjamin Marzinski
  2020-06-19 13:35       ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 23:12 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 08:37:20PM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > This will flush all multipath devices.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/devmapper.c  |  7 +++++--
> >  libmultipath/devmapper.h  |  2 +-
> >  multipath/main.c          |  2 +-
> >  multipathd/cli.c          |  1 +
> >  multipathd/cli_handlers.c | 19 +++++++++++++++++++
> >  multipathd/cli_handlers.h |  1 +
> >  multipathd/main.c         |  3 ++-
> >  multipathd/main.h         |  1 +
> >  8 files changed, 31 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 682c0038..a5e0d298 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -991,7 +991,7 @@ dm_flush_map_nopaths(const char * mapname, int
> > deferred_remove)
> >  
> >  #endif
> >  
> > -int dm_flush_maps (int retries)
> > +int dm_flush_maps (int need_suspend, int retries)
> >  {
> >  	int r = 1;
> >  	struct dm_task *dmt;
> > @@ -1014,7 +1014,10 @@ int dm_flush_maps (int retries)
> >  
> >  	r = 0;
> >  	do {
> > -		r |= dm_suspend_and_flush_map(names->name, retries);
> > +		if (need_suspend)
> > +			r |= dm_suspend_and_flush_map(names->name,
> > retries);
> > +		else
> > +			r |= dm_flush_map(names->name);
> 
> This is fine, but considering the previous discussion, I'd prefer to
> get rid of need_suspend and dm_suspend_and_flush_map() entirely. It
> would simplify the _dm_flush_map code path significantly.
> 
> As we determined that we use the suspend/resume only in multipath
> anyway, we could add it there in the "non-delegated" code path.

I'm confused. dm_flush_maps() is also called in the non-delegated code
path, where it uses the dm_suspend_and_flush_map(). Are you proposing
an alternative version of dm_flush_maps() for multipath?

-Ben

> Thanks,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] multipath: add option to skip multipathd delegation
  2020-06-18 20:44   ` Martin Wilck
@ 2020-06-18 23:15     ` Benjamin Marzinski
  0 siblings, 0 replies; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 23:15 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 08:44:10PM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > Add the -D option to allow users to skip delegating commands to
> > multipathd.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/config.h |  1 +
> >  multipath/main.c      | 15 +++++++++++----
> >  multipath/multipath.8 | 16 +++++++++++-----
> >  3 files changed, 23 insertions(+), 9 deletions(-)
> > 
> 
> I wonder if we really need this. We fall back to NOT_DELEGATED anyway.
> If users really, really want this, they can run multipath while
> multipathd is stopped.
> 
> I'm not saying it's totally useless, but the presence of this option
> suggests to users that they may want to use it, which I doubt.
> Perhaps we want to have it, for debugging or expert usage purpose, as a
> hidden/undocumented option?

I'm fine with having it as an undocumented option.

-Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-18 19:34   ` Martin Wilck
@ 2020-06-18 23:17     ` Benjamin Marzinski
  2020-06-19  6:32       ` Hannes Reinecke
  2020-06-19 13:42       ` Martin Wilck
  0 siblings, 2 replies; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 23:17 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > If a multipath device is removed during, or immediately before the
> > call
> > to check_path(), multipathd can behave incorrectly. A missing
> > multpath
> > device will cause update_multipath_strings() to fail, setting
> > pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
> > cause
> > reinstate_path() to be called, which will also fail.  This will
> > trigger
> > a reload, restoring the recently removed device.
> > 
> > If update_multipath_strings() fails because there is no multipath
> > device, check_path should just quit, since the remove dmevent and
> > uevent
> > are likely already queued up. Also, I don't see any reason to reload
> > the
> > multipath device if reinstate fails. This code was added by
> > fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
> > reinstate
> > could fail if the path was disabled.  Looking through the current
> > kernel
> > code, I can't see any reason why a reinstate would fail, where a
> > reload
> > would help. If the path was missing from the multipath device,
> > update_multipath_strings() would already catch that, and quit
> > check_path() early, which make more sense to me than reloading does.
> 
> fac68d7 is related to the famous "dm-multipath: Accept failed paths for
> multipath maps" patch (e.g. 
> https://patchwork.kernel.org/patch/3368381/#7193001), which never made
> it upstream. SUSE kernels have shipped this patch for a long time, but
> we don't apply it any more in recent kernels.
> 
> With this patch, The reinstate_path() failure would occur if multipathd
> had created a table with a "disabled" device (one which would be
> present in a dm map even though the actual block device didn't exist),
> and then tried to reinstate such a path. It sounds unlikely, but it
> might be possible if devices are coming and going in quick succession.
> In that situation, and with the "accept failed path" patch applied, a
> reload makes some sense, because reload (unlike reinstate) would not
> fail (at least not for this reason) and would actually add that just-
> reinstated path. OTOH, it's not likely that the reload would improve
> matters, either. After all, multipathd is just trying to reinstate a
> non-existing path. So, I'm fine with skipping the reload.
> 
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/main.c | 36 ++++++++++++++----------------------
> >  1 file changed, 14 insertions(+), 22 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 6b7db2c0..8fb73922 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1611,22 +1611,18 @@ fail_path (struct path * pp, int del_active)
> >  /*
> >   * caller must have locked the path list before calling that
> > function
> >   */
> > -static int
> > +static void
> >  reinstate_path (struct path * pp)
> >  {
> > -	int ret = 0;
> > -
> >  	if (!pp->mpp)
> > -		return 0;
> > +		return;
> >  
> > -	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t)) {
> > +	if (dm_reinstate_path(pp->mpp->alias, pp->dev_t))
> >  		condlog(0, "%s: reinstate failed", pp->dev_t);
> > -		ret = 1;
> > -	} else {
> > +	else {
> >  		condlog(2, "%s: reinstated", pp->dev_t);
> >  		update_queue_mode_add_path(pp->mpp);
> >  	}
> > -	return ret;
> >  }
> >  static void
> > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >  	 * Synchronize with kernel state
> >  	 */
> >  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> > +		struct dm_info info;
> >  		condlog(1, "%s: Could not synchronize with kernel
> > state",
> >  			pp->dev);
> > +		if (pp->mpp && pp->mpp->alias &&
> > +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> > +			/* multipath device missing. Likely removed */
> > +			return 0;
> >  		pp->dmstate = PSTATE_UNDEF;
> 
> It would be more elegant if we could distinguish different failure
> modes from update_multipath_strings() directly, without having to call
> do_dm_get_info() again.

Seems reasonable. I'll take a look at that.

-Ben
 
> >  	}
> >  	/* if update_multipath_strings orphaned the path, quit early */
> > @@ -2179,12 +2180,8 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >  		/*
> >  		 * reinstate this path
> >  		 */
> > -		if (!disable_reinstate && reinstate_path(pp)) {
> > -			condlog(3, "%s: reload map", pp->dev);
> > -			ev_add_path(pp, vecs, 1);
> > -			pp->tick = 1;
> > -			return 0;
> > -		}
> > +		if (!disable_reinstate)
> > +			reinstate_path(pp);
> >  		new_path_up = 1;
> >  
> >  		if (oldchkrstate != PATH_UP && oldchkrstate !=
> > PATH_GHOST)
> > @@ -2200,15 +2197,10 @@ check_path (struct vectors * vecs, struct
> > path * pp, unsigned int ticks)
> >  	else if (newstate == PATH_UP || newstate == PATH_GHOST) {
> >  		if ((pp->dmstate == PSTATE_FAILED ||
> >  		    pp->dmstate == PSTATE_UNDEF) &&
> > -		    !disable_reinstate) {
> > +		    !disable_reinstate)
> >  			/* Clear IO errors */
> > -			if (reinstate_path(pp)) {
> > -				condlog(3, "%s: reload map", pp->dev);
> > -				ev_add_path(pp, vecs, 1);
> > -				pp->tick = 1;
> > -				return 0;
> > -			}
> > -		} else {
> > +			reinstate_path(pp);
> > +		else {
> >  			LOG_MSG(4, verbosity, pp);
> >  			if (pp->checkint != max_checkint) {
> >  				/*
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/7] libmultipath: change do_get_info returns
  2020-06-18 15:27   ` Martin Wilck
@ 2020-06-18 23:17     ` Benjamin Marzinski
  0 siblings, 0 replies; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-18 23:17 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jun 18, 2020 at 03:27:22PM +0000, Martin Wilck wrote:
> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > Make do_get_info() differentiate between dm failures and missing
> > devices, and update callers to retain their current behavior. Also,
> > rename it and make it external. These changes will be used by future
> > commits.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/devmapper.c | 29 ++++++++++++++++-------------
> >  libmultipath/devmapper.h |  1 +
> >  2 files changed, 17 insertions(+), 13 deletions(-)
> > 
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 27d52398..b44f7545 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -496,8 +496,14 @@ int dm_addmap_reload(struct multipath *mpp, char
> > *params, int flush)
> >  	return 0;
> >  }
> >  
> > -static int
> > -do_get_info(const char *name, struct dm_info *info)
> > +/*
> > + * Returns:
> > + * -1: Error
> > + *  0: device does not exist
> > + *  1: device exists
> > + */
> 
> Can we use symbolic values here please? In particular as you have
> changed the "success" return value from 0 to 1...
> 
> One day we should come up with a proper return value scheme
> for libmultipath, defining specific enums for every function
> doesn't scale. But do it here for now nonetheless, please.

Sure

-Ben

> 
> Apart from that, ok.
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-18 23:17     ` Benjamin Marzinski
@ 2020-06-19  6:32       ` Hannes Reinecke
  2020-06-19 16:30         ` Benjamin Marzinski
  2020-06-19 13:42       ` Martin Wilck
  1 sibling, 1 reply; 39+ messages in thread
From: Hannes Reinecke @ 2020-06-19  6:32 UTC (permalink / raw)
  To: Benjamin Marzinski, Martin Wilck; +Cc: dm-devel

On 6/19/20 1:17 AM, Benjamin Marzinski wrote:
> On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
>> On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
>>> If a multipath device is removed during, or immediately before the
>>> call
>>> to check_path(), multipathd can behave incorrectly. A missing
>>> multpath
>>> device will cause update_multipath_strings() to fail, setting
>>> pp->dmstate to PSTATE_UNDEF.  If the path is up, this state will
>>> cause
>>> reinstate_path() to be called, which will also fail.  This will
>>> trigger
>>> a reload, restoring the recently removed device.
>>>
>>> If update_multipath_strings() fails because there is no multipath
>>> device, check_path should just quit, since the remove dmevent and
>>> uevent
>>> are likely already queued up. Also, I don't see any reason to reload
>>> the
>>> multipath device if reinstate fails. This code was added by
>>> fac68d7a99ef17d496079538a5c6836acd7911ab, which clamined that
>>> reinstate
>>> could fail if the path was disabled.  Looking through the current
>>> kernel
>>> code, I can't see any reason why a reinstate would fail, where a
>>> reload
>>> would help. If the path was missing from the multipath device,
>>> update_multipath_strings() would already catch that, and quit
>>> check_path() early, which make more sense to me than reloading does.
>>
>> fac68d7 is related to the famous "dm-multipath: Accept failed paths for
>> multipath maps" patch (e.g.
>> https://patchwork.kernel.org/patch/3368381/#7193001), which never made
>> it upstream. SUSE kernels have shipped this patch for a long time, but
>> we don't apply it any more in recent kernels.
>>
>> With this patch, The reinstate_path() failure would occur if multipathd
>> had created a table with a "disabled" device (one which would be
>> present in a dm map even though the actual block device didn't exist),
>> and then tried to reinstate such a path. It sounds unlikely, but it
>> might be possible if devices are coming and going in quick succession.
>> In that situation, and with the "accept failed path" patch applied, a
>> reload makes some sense, because reload (unlike reinstate) would not
>> fail (at least not for this reason) and would actually add that just-
>> reinstated path. OTOH, it's not likely that the reload would improve
>> matters, either. After all, multipathd is just trying to reinstate a
>> non-existing path. So, I'm fine with skipping the reload.
>>
It's actually _not_ unlikely, but a direct result of multipathd 
listening to uevents.

If you have a map with four paths, and all four of them are going down, 
you end up getting four events.
And multipath will be processing each event _individually_, causing it 
to generate a reload sequence like:

[a b c d]
[b c d]
[c d]
[d]
[]

Of which only the last is valid, as all the intermediate ones contain 
invalid paths.

And _that_ is the scenario I was referring to when creating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] multipathd: add "del maps" multipathd command
  2020-06-18 23:12     ` Benjamin Marzinski
@ 2020-06-19 13:35       ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-19 13:35 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Thu, 2020-06-18 at 18:12 -0500, Benjamin Marzinski wrote:
> On Thu, Jun 18, 2020 at 08:37:20PM +0000, Martin Wilck wrote:
> > 
> > > -int dm_flush_maps (int retries)
> > > +int dm_flush_maps (int need_suspend, int retries)
> > >  {
> > >  	int r = 1;
> > >  	struct dm_task *dmt;
> > > @@ -1014,7 +1014,10 @@ int dm_flush_maps (int retries)
> > >  
> > >  	r = 0;
> > >  	do {
> > > -		r |= dm_suspend_and_flush_map(names->name, retries);
> > > +		if (need_suspend)
> > > +			r |= dm_suspend_and_flush_map(names->name,
> > > retries);
> > > +		else
> > > +			r |= dm_flush_map(names->name);
> > 
> > This is fine, but considering the previous discussion, I'd prefer
> > to
> > get rid of need_suspend and dm_suspend_and_flush_map() entirely. It
> > would simplify the _dm_flush_map code path significantly.
> > 
> > As we determined that we use the suspend/resume only in multipath
> > anyway, we could add it there in the "non-delegated" code path.
> 
> I'm confused. dm_flush_maps() is also called in the non-delegated
> code
> path, where it uses the dm_suspend_and_flush_map(). Are you proposing
> an alternative version of dm_flush_maps() for multipath?

No, just a different error handling in multipath. Instead of calling
just calling dm_flush_maps() again from multipath with need_suspend=1,
we could suspend / resume directly in the error path, and then call
dm_flush_map again. Which would allow us to get rid of need_suspend, as
no callers would set it any more.

IOW, move the handling of this multipath-specific error situation out
of libmultipath, into the multipath-specific code.

Just an idea, I don't insist on it.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-18 23:17     ` Benjamin Marzinski
  2020-06-19  6:32       ` Hannes Reinecke
@ 2020-06-19 13:42       ` Martin Wilck
  2020-06-19 16:52         ` Benjamin Marzinski
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-06-19 13:42 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Thu, 2020-06-18 at 18:17 -0500, Benjamin Marzinski wrote:
> On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
> > On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > > 
> > >  static void
> > > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct
> > > path
> > > * pp, unsigned int ticks)
> > >  	 * Synchronize with kernel state
> > >  	 */
> > >  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> > > +		struct dm_info info;
> > >  		condlog(1, "%s: Could not synchronize with kernel
> > > state",
> > >  			pp->dev);
> > > +		if (pp->mpp && pp->mpp->alias &&
> > > +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> > > +			/* multipath device missing. Likely removed */
> > > +			return 0;
> > >  		pp->dmstate = PSTATE_UNDEF;
> > 
> > It would be more elegant if we could distinguish different failure
> > modes from update_multipath_strings() directly, without having to
> > call
> > do_dm_get_info() again.
> 
> Seems reasonable. I'll take a look at that.

Yet another idea: I just discussed this with Hannes, and he pointed out
that right below this code, we have

	/* if update_multipath_strings orphaned the path, quit early */
	if (!pp->mpp)
		return 0;

... which could have the same effect, if pp->mpp was reloaded. Probably
that doesn't happen because the pp->mpp dereference is cached in a
register, but we could simply add a READ_ONCE there.

Choose what you prefer.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-19  6:32       ` Hannes Reinecke
@ 2020-06-19 16:30         ` Benjamin Marzinski
  2020-06-19 20:03           ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-19 16:30 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, Martin Wilck

On Fri, Jun 19, 2020 at 08:32:34AM +0200, Hannes Reinecke wrote:
> >>
> >>fac68d7 is related to the famous "dm-multipath: Accept failed paths for
> >>multipath maps" patch (e.g.
> >>https://patchwork.kernel.org/patch/3368381/#7193001), which never made
> >>it upstream. SUSE kernels have shipped this patch for a long time, but
> >>we don't apply it any more in recent kernels.
> >>
> >>With this patch, The reinstate_path() failure would occur if multipathd
> >>had created a table with a "disabled" device (one which would be
> >>present in a dm map even though the actual block device didn't exist),
> >>and then tried to reinstate such a path. It sounds unlikely, but it
> >>might be possible if devices are coming and going in quick succession.
> >>In that situation, and with the "accept failed path" patch applied, a
> >>reload makes some sense, because reload (unlike reinstate) would not
> >>fail (at least not for this reason) and would actually add that just-
> >>reinstated path. OTOH, it's not likely that the reload would improve
> >>matters, either. After all, multipathd is just trying to reinstate a
> >>non-existing path. So, I'm fine with skipping the reload.
> >>
> It's actually _not_ unlikely, but a direct result of multipathd listening to
> uevents.
> 
> If you have a map with four paths, and all four of them are going down, you
> end up getting four events.
> And multipath will be processing each event _individually_, causing it to
> generate a reload sequence like:
> 
> [a b c d]
> [b c d]
> [c d]
> [d]
> []
> 
> Of which only the last is valid, as all the intermediate ones contain
> invalid paths.
> 
> And _that_ is the scenario I was referring to when creating the patch.

But even still, I'm not in favor of calling ev_add_path() with the code
as it currently is. In the case you point out, it will presumably fail
when adopt_paths() calls pathinfo(). That will orphan the path, which is
good. But if the map got removed (intentionally) instead of the path,
ev_add_path() will recreate the map, which is bad. Also, it seems more
likely for a path to still exist and be usable while the multipath
device is gone (the bad case), than for the path to pass the checker
function and then disappear before mutipathd tries to reinstate it
immediately afterwards (the good case).  Further, the bad result, where
a deleted multipath device reappears, is actually a problem for users.
Having multipathd take a bit of time and pointless effort to catch up
after multiple changes happen at once doesn't effect users noticeably.

Since the checkerloop thread spends the vast majority of it's not not
checking any specific path, if a path goes away, it is most likely to be
caught by the path_offline() function. I we want to do something to
proactively deal with a path that has been removed, we should do it
there.

-Ben
 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke            Teamlead Storage & Networking
> hare@suse.de                               +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-19 13:42       ` Martin Wilck
@ 2020-06-19 16:52         ` Benjamin Marzinski
  2020-06-19 20:09           ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-06-19 16:52 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Fri, Jun 19, 2020 at 01:42:47PM +0000, Martin Wilck wrote:
> On Thu, 2020-06-18 at 18:17 -0500, Benjamin Marzinski wrote:
> > On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
> > > On Wed, 2020-06-17 at 19:24 -0500, Benjamin Marzinski wrote:
> > > > 
> > > >  static void
> > > > @@ -2088,8 +2084,13 @@ check_path (struct vectors * vecs, struct
> > > > path
> > > > * pp, unsigned int ticks)
> > > >  	 * Synchronize with kernel state
> > > >  	 */
> > > >  	if (update_multipath_strings(pp->mpp, vecs->pathvec, 1)) {
> > > > +		struct dm_info info;
> > > >  		condlog(1, "%s: Could not synchronize with kernel
> > > > state",
> > > >  			pp->dev);
> > > > +		if (pp->mpp && pp->mpp->alias &&
> > > > +		    do_dm_get_info(pp->mpp->alias, &info) == 0)
> > > > +			/* multipath device missing. Likely removed */
> > > > +			return 0;
> > > >  		pp->dmstate = PSTATE_UNDEF;
> > > 
> > > It would be more elegant if we could distinguish different failure
> > > modes from update_multipath_strings() directly, without having to
> > > call
> > > do_dm_get_info() again.
> > 
> > Seems reasonable. I'll take a look at that.
> 
> Yet another idea: I just discussed this with Hannes, and he pointed out
> that right below this code, we have
> 
> 	/* if update_multipath_strings orphaned the path, quit early */
> 	if (!pp->mpp)
> 		return 0;
> 
> ... which could have the same effect, if pp->mpp was reloaded. Probably
> that doesn't happen because the pp->mpp dereference is cached in a
> register, but we could simply add a READ_ONCE there.

When update_multipath_strings() calls update_multipath_table() it will
fail because the table no longer exists.  If we differentiate between
a dm error and not finding the map, we can exit early without having to
call do_dm_get_info() again. But right now, if the map is gone, but we
haven't got the uevent removing it, then nothing will clear pp->mpp. If
we did get the uevent, then it must have grabbed the vecs lock. That
better have caused a memory barrier, which will guarantee that
check_path() sees the updated value.

-Ben
 
> Choose what you prefer.
> 
> Regards,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-19 16:30         ` Benjamin Marzinski
@ 2020-06-19 20:03           ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-19 20:03 UTC (permalink / raw)
  To: Benjamin Marzinski, Hannes Reinecke; +Cc: dm-devel

On Fri, 2020-06-19 at 11:30 -0500, Benjamin Marzinski wrote:
> On Fri, Jun 19, 2020 at 08:32:34AM +0200, Hannes Reinecke wrote:
> > > > fac68d7 is related to the famous "dm-multipath: Accept failed
> > > > paths for
> > > > multipath maps" patch (e.g.
> > > > https://patchwork.kernel.org/patch/3368381/#7193001), which
> > > > never made
> > > > it upstream. SUSE kernels have shipped this patch for a long
> > > > time, but
> > > > we don't apply it any more in recent kernels.
> > > > 
> > > > With this patch, The reinstate_path() failure would occur if
> > > > multipathd
> > > > had created a table with a "disabled" device (one which would
> > > > be
> > > > present in a dm map even though the actual block device didn't
> > > > exist),
> > > > and then tried to reinstate such a path. It sounds unlikely,
> > > > but it
> > > > might be possible if devices are coming and going in quick
> > > > succession.
> > > > In that situation, and with the "accept failed path" patch
> > > > applied, a
> > > > reload makes some sense, because reload (unlike reinstate)
> > > > would not
> > > > fail (at least not for this reason) and would actually add that
> > > > just-
> > > > reinstated path. OTOH, it's not likely that the reload would
> > > > improve
> > > > matters, either. After all, multipathd is just trying to
> > > > reinstate a
> > > > non-existing path. So, I'm fine with skipping the reload.
> > > > 
> > It's actually _not_ unlikely, but a direct result of multipathd
> > listening to
> > uevents.
> > 
> > If you have a map with four paths, and all four of them are going
> > down, you
> > end up getting four events.
> > And multipath will be processing each event _individually_, causing
> > it to
> > generate a reload sequence like:
> > 
> > [a b c d]
> > [b c d]
> > [c d]
> > [d]
> > []
> > 
> > Of which only the last is valid, as all the intermediate ones
> > contain
> > invalid paths.
> > 
> > And _that_ is the scenario I was referring to when creating the
> > patch.
> 
> But even still, I'm not in favor of calling ev_add_path() with the
> code
> as it currently is. In the case you point out, it will presumably
> fail
> when adopt_paths() calls pathinfo(). That will orphan the path, which
> is
> good. But if the map got removed (intentionally) instead of the path,
> ev_add_path() will recreate the map, which is bad. Also, it seems
> more
> likely for a path to still exist and be usable while the multipath
> device is gone (the bad case), than for the path to pass the checker
> function and then disappear before mutipathd tries to reinstate it
> immediately afterwards (the good case).  Further, the bad result,
> where
> a deleted multipath device reappears, is actually a problem for
> users.
> Having multipathd take a bit of time and pointless effort to catch up
> after multiple changes happen at once doesn't effect users
> noticeably.
> 
> Since the checkerloop thread spends the vast majority of it's not not
> checking any specific path, if a path goes away, it is most likely to
> be
> caught by the path_offline() function. I we want to do something to
> proactively deal with a path that has been removed, we should do it

Hannes and I discussed about this, and he agreed that calling
ev_add_path() in the situation at hand wasn't helpful. Hence his
suggestion with pp->mpp that we discussed in the other sub-thread.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] multipathd: fix check_path errors with removed map
  2020-06-19 16:52         ` Benjamin Marzinski
@ 2020-06-19 20:09           ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-06-19 20:09 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Fri, 2020-06-19 at 11:52 -0500, Benjamin Marzinski wrote:
> On Fri, Jun 19, 2020 at 01:42:47PM +0000, Martin Wilck wrote:
> > On Thu, 2020-06-18 at 18:17 -0500, Benjamin Marzinski wrote:
> > > On Thu, Jun 18, 2020 at 07:34:38PM +0000, Martin Wilck wrote:
> > > > 
> > > > It would be more elegant if we could distinguish different
> > > > failure
> > > > modes from update_multipath_strings() directly, without having
> > > > to
> > > > call
> > > > do_dm_get_info() again.
> > > 
> > > Seems reasonable. I'll take a look at that.
> > 
> > Yet another idea: I just discussed this with Hannes, and he pointed
> > out
> > that right below this code, we have
> > 
> > 	/* if update_multipath_strings orphaned the path, quit early */
> > 	if (!pp->mpp)
> > 		return 0;
> > 
> > ... which could have the same effect, if pp->mpp was reloaded.
> > Probably
> > that doesn't happen because the pp->mpp dereference is cached in a
> > register, but we could simply add a READ_ONCE there.
> 
> When update_multipath_strings() calls update_multipath_table() it
> will
> fail because the table no longer exists.  If we differentiate between
> a dm error and not finding the map, we can exit early without having
> to
> call do_dm_get_info() again. But right now, if the map is gone, but
> we
> haven't got the uevent removing it, then nothing will clear pp->mpp. 

You're right. Let's "differentiate", then :-)

> If
> we did get the uevent, then it must have grabbed the vecs lock. That
> better have caused a memory barrier, which will guarantee that
> check_path() sees the updated value.

It could hardly have grabbed the lock while check_path() was running.
Anyway, this wasn't the right suggestion then, and "differentiating" is
better anyway. Sorry for the confusion.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-06-18 23:06       ` Benjamin Marzinski
@ 2020-07-01 20:54         ` Martin Wilck
  2020-07-02  3:14           ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-07-01 20:54 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel

On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> 
> I uploaded the test program, aio_test:
> 
> https://github.com/bmarzins/test_programs.git
> 
> You just need to run in on a queueing multipath device with no active
> paths and an open count of 0. It will hang with the device open.
> Restore
> a path, and it will exit, and the open count will go to 0.

Tried it now, it behaves as you say. I admit I can't imagine how the
suspend/resume would improve anything here. Indeed, as you say, it opens 
up a race window. Another process might open the device while
it's suspended. Worse perhaps, once the device is resumed, an uevent will be 
generated, and stacked devices might (in principle at least) be recreated
before we get down to flush the map.

MAYBE the suspend/resume was necessary in the past because some earlier 
kernels wouldn't immediately fail all outstanding commands when 
queue_if_no_path was deactivated? (just speculating, I don't know if this
is the case).

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-01 20:54         ` Martin Wilck
@ 2020-07-02  3:14           ` Benjamin Marzinski
  2020-07-02 12:24             ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-07-02  3:14 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > 
> > I uploaded the test program, aio_test:
> > 
> > https://github.com/bmarzins/test_programs.git
> > 
> > You just need to run in on a queueing multipath device with no active
> > paths and an open count of 0. It will hang with the device open.
> > Restore
> > a path, and it will exit, and the open count will go to 0.
> 
> Tried it now, it behaves as you say. I admit I can't imagine how the
> suspend/resume would improve anything here. Indeed, as you say, it opens 
> up a race window. Another process might open the device while
> it's suspended. Worse perhaps, once the device is resumed, an uevent will be 
> generated, and stacked devices might (in principle at least) be recreated
> before we get down to flush the map.
> 
> MAYBE the suspend/resume was necessary in the past because some earlier 
> kernels wouldn't immediately fail all outstanding commands when 
> queue_if_no_path was deactivated? (just speculating, I don't know if this
> is the case).

If you disable queue_if_no_path and then do a suspend with flushing, you
are guaranteed that the supend won't return until all the IO has
completed or failed.  This would allow anything that was waiting on
queued IO to have the IO failback, which could allow it to close the
device in time for multipath to be able to remove it (obviously this is
racey).  However, this assumes that you do your open checks after the
suspend, which multipath no longer does. I realize that multipath can't
suspend until after it tries to remove the partition devices, otherwise
those can get stuck. But there probably is some order that gets this all
right-ish.

So, for a while now, the suspending has been doing nothing for us.  We
could either try to reorder things so that we actually try to flush the
queued IOs back first (with or without suspending), or we could just
remove suspending and say that things are working fine the way they
currently are.

-Ben
 
> Regards,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-02  3:14           ` Benjamin Marzinski
@ 2020-07-02 12:24             ` Martin Wilck
  2020-07-02 15:18               ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-07-02 12:24 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Wed, 2020-07-01 at 22:14 -0500, Benjamin Marzinski wrote:
> On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> > On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > > I uploaded the test program, aio_test:
> > > 
> > > https://github.com/bmarzins/test_programs.git
> > > 
> > > You just need to run in on a queueing multipath device with no
> > > active
> > > paths and an open count of 0. It will hang with the device open.
> > > Restore
> > > a path, and it will exit, and the open count will go to 0.
> > 
> > Tried it now, it behaves as you say. I admit I can't imagine how
> > the
> > suspend/resume would improve anything here. Indeed, as you say, it
> > opens 
> > up a race window. Another process might open the device while
> > it's suspended. Worse perhaps, once the device is resumed, an
> > uevent will be 
> > generated, and stacked devices might (in principle at least) be
> > recreated
> > before we get down to flush the map.
> > 
> > MAYBE the suspend/resume was necessary in the past because some
> > earlier 
> > kernels wouldn't immediately fail all outstanding commands when 
> > queue_if_no_path was deactivated? (just speculating, I don't know
> > if this
> > is the case).
> 
> If you disable queue_if_no_path and then do a suspend with flushing,
> you
> are guaranteed that the supend won't return until all the IO has
> completed or failed.

I just realized that if we suspend, we don't even need disable
queue_if_no_path, because the kernel does that automatically if a
"suspend with flush" is issued, and has been doing so basically
forever.

>   This would allow anything that was waiting on
> queued IO to have the IO failback, which could allow it to close the
> device in time for multipath to be able to remove it (obviously this
> is
> racey).  However, this assumes that you do your open checks after the
> suspend, which multipath no longer does.
>  I realize that multipath can't
> suspend until after it tries to remove the partition devices,
> otherwise
> those can get stuck. But there probably is some order that gets this
> all
> right-ish.

Our code is currently racy. IMO we should

 0 unset queue_if_no_path
 1 remove partition mappings
 2 open(O_EXCL|O_RDONLY) the device
 3 If this fails, in multipath, try again after a suspend/resume cycle.
   In multipathd, I think we should just fail for now; perhaps later
   we could handle the explicit "remove map" command like multipath.
 4 If it fails again, bail out (in multipath, retry if asked to do so)
 5 run a "deferred remove" ioctl
 6 close the fd, the dm device will now be removed "atomically".

This cuts down the race window to the minimum possible - after (2), no
mounts / kpartx / lvm operations won't be possible any more.

When we remove the partition mappings, we could use the same technique
to avoid races on that layer. Unfortunately, a pending "deferred
remove" operation doesn't cause new open() or mount() calls to fail; if
it did, we could use a simpler approach.

> So, for a while now, the suspending has been doing nothing for
> us.  We
> could either try to reorder things so that we actually try to flush
> the
> queued IOs back first (with or without suspending), or we could just
> remove suspending and say that things are working fine the way they
> currently are.

What else can we do except suspend/resume to avoid racing with pending
close(), umount() or similar operations? Well, I guess if we open the
device anyway as I proposed, we could do an fsync() on it. That might
actually be better because it avoids the uevent being sent on resume.

We definitely can't suspend the map before we remove the partitions. We
could try a suspend/resume on the partition devices themselves (or
fsync()) if the opencount is > 0.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-02 12:24             ` Martin Wilck
@ 2020-07-02 15:18               ` Benjamin Marzinski
  2020-07-02 16:45                 ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-07-02 15:18 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jul 02, 2020 at 12:24:32PM +0000, Martin Wilck wrote:
> On Wed, 2020-07-01 at 22:14 -0500, Benjamin Marzinski wrote:
> > On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> > > On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > > > I uploaded the test program, aio_test:
> > > > 
> > > > https://github.com/bmarzins/test_programs.git
> > > > 
> > > > You just need to run in on a queueing multipath device with no
> > > > active
> > > > paths and an open count of 0. It will hang with the device open.
> > > > Restore
> > > > a path, and it will exit, and the open count will go to 0.
> > > 
> > > Tried it now, it behaves as you say. I admit I can't imagine how
> > > the
> > > suspend/resume would improve anything here. Indeed, as you say, it
> > > opens 
> > > up a race window. Another process might open the device while
> > > it's suspended. Worse perhaps, once the device is resumed, an
> > > uevent will be 
> > > generated, and stacked devices might (in principle at least) be
> > > recreated
> > > before we get down to flush the map.
> > > 
> > > MAYBE the suspend/resume was necessary in the past because some
> > > earlier 
> > > kernels wouldn't immediately fail all outstanding commands when 
> > > queue_if_no_path was deactivated? (just speculating, I don't know
> > > if this
> > > is the case).
> > 
> > If you disable queue_if_no_path and then do a suspend with flushing,
> > you
> > are guaranteed that the supend won't return until all the IO has
> > completed or failed.
> 
> I just realized that if we suspend, we don't even need disable
> queue_if_no_path, because the kernel does that automatically if a
> "suspend with flush" is issued, and has been doing so basically
> forever.
> 
> >   This would allow anything that was waiting on
> > queued IO to have the IO failback, which could allow it to close the
> > device in time for multipath to be able to remove it (obviously this
> > is
> > racey).  However, this assumes that you do your open checks after the
> > suspend, which multipath no longer does.
> >  I realize that multipath can't
> > suspend until after it tries to remove the partition devices,
> > otherwise
> > those can get stuck. But there probably is some order that gets this
> > all
> > right-ish.
> 
> Our code is currently racy. IMO we should
> 
>  0 unset queue_if_no_path
>  1 remove partition mappings
>  2 open(O_EXCL|O_RDONLY) the device
>  3 If this fails, in multipath, try again after a suspend/resume cycle.
>    In multipathd, I think we should just fail for now; perhaps later
>    we could handle the explicit "remove map" command like multipath.
>  4 If it fails again, bail out (in multipath, retry if asked to do so)
>  5 run a "deferred remove" ioctl
>  6 close the fd, the dm device will now be removed "atomically".
> 
> This cuts down the race window to the minimum possible - after (2), no
> mounts / kpartx / lvm operations won't be possible any more.

I wasn't actually worried about someone wanting to use the device. In
that case the remove should fail.  I was worried about things that would
close the device, but couldn't because of queued IO.  The race I was
talking about is that after whatever has the device open gets the IO
error, it might not close the device before multipath checks the open
count.

Also, I'm not sure that resume helps us, since that will trigger
uevents, which will open the device again.
 
> When we remove the partition mappings, we could use the same technique
> to avoid races on that layer. Unfortunately, a pending "deferred
> remove" operation doesn't cause new open() or mount() calls to fail; if
> it did, we could use a simpler approach.
> 
> > So, for a while now, the suspending has been doing nothing for
> > us.  We
> > could either try to reorder things so that we actually try to flush
> > the
> > queued IOs back first (with or without suspending), or we could just
> > remove suspending and say that things are working fine the way they
> > currently are.
> 
> What else can we do except suspend/resume to avoid racing with pending
> close(), umount() or similar operations? Well, I guess if we open the
> device anyway as I proposed, we could do an fsync() on it. That might
> actually be better because it avoids the uevent being sent on resume.

yeah. I think that simply disabling queueing and doing an fsync() is
probably better than suspending. If new IO comes in after that, then
something IS using the device, and we don't want to remove it. In
multipath, maybe:

1. disable queueing
2. remove partition mappings
3. open device
4. flush
5. check if we are the only opener.
	If not, and there are retries left... goto 4? sleep and recheck?
	we don't want to wait if the answer is that they device really
	is in use.
6. close device
7. remove device

Possibly the solution to not wanting to wait when a device is in use is
that we could add an option to multipath to distinguish between the way
flushing works now, where we check early if the device is in use, and
just bail if it is, and a more aggressive attempt at remove that might
take longer if used on devices that are in use.

-Ben

> We definitely can't suspend the map before we remove the partitions. We
> could try a suspend/resume on the partition devices themselves (or
> fsync()) if the opencount is > 0.
> 
> Regards,
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE  Software Solutions Germany GmbH
> HRB 36809, AG Nürnberg GF: Felix
> Imendörffer
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-02 15:18               ` Benjamin Marzinski
@ 2020-07-02 16:45                 ` Martin Wilck
  2020-07-02 19:41                   ` Benjamin Marzinski
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-07-02 16:45 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel

On Thu, 2020-07-02 at 10:18 -0500, Benjamin Marzinski wrote:
> On Thu, Jul 02, 2020 at 12:24:32PM +0000, Martin Wilck wrote:
> > On Wed, 2020-07-01 at 22:14 -0500, Benjamin Marzinski wrote:
> > > On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> > > > On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > > > > I uploaded the test program, aio_test:
> > > > > 
> > > > > https://github.com/bmarzins/test_programs.git
> > > > > 
> > > > > You just need to run in on a queueing multipath device with
> > > > > no
> > > > > active
> > > > > paths and an open count of 0. It will hang with the device
> > > > > open.
> > > > > Restore
> > > > > a path, and it will exit, and the open count will go to 0.
> > > > 
> > > > Tried it now, it behaves as you say. I admit I can't imagine
> > > > how
> > > > the
> > > > suspend/resume would improve anything here. Indeed, as you say,
> > > > it
> > > > opens 
> > > > up a race window. Another process might open the device while
> > > > it's suspended. Worse perhaps, once the device is resumed, an
> > > > uevent will be 
> > > > generated, and stacked devices might (in principle at least) be
> > > > recreated
> > > > before we get down to flush the map.
> > > > 
> > > > MAYBE the suspend/resume was necessary in the past because some
> > > > earlier 
> > > > kernels wouldn't immediately fail all outstanding commands
> > > > when 
> > > > queue_if_no_path was deactivated? (just speculating, I don't
> > > > know
> > > > if this
> > > > is the case).
> > > 
> > > If you disable queue_if_no_path and then do a suspend with
> > > flushing,
> > > you
> > > are guaranteed that the supend won't return until all the IO has
> > > completed or failed.
> > 
> > I just realized that if we suspend, we don't even need disable
> > queue_if_no_path, because the kernel does that automatically if a
> > "suspend with flush" is issued, and has been doing so basically
> > forever.
> > 
> > >   This would allow anything that was waiting on
> > > queued IO to have the IO failback, which could allow it to close
> > > the
> > > device in time for multipath to be able to remove it (obviously
> > > this
> > > is
> > > racey).  However, this assumes that you do your open checks after
> > > the
> > > suspend, which multipath no longer does.
> > >  I realize that multipath can't
> > > suspend until after it tries to remove the partition devices,
> > > otherwise
> > > those can get stuck. But there probably is some order that gets
> > > this
> > > all
> > > right-ish.
> > 
> > Our code is currently racy. IMO we should
> > 
> >  0 unset queue_if_no_path
> >  1 remove partition mappings
> >  2 open(O_EXCL|O_RDONLY) the device
> >  3 If this fails, in multipath, try again after a suspend/resume
> > cycle.
> >    In multipathd, I think we should just fail for now; perhaps
> > later
> >    we could handle the explicit "remove map" command like
> > multipath.
> >  4 If it fails again, bail out (in multipath, retry if asked to do
> > so)
> >  5 run a "deferred remove" ioctl
> >  6 close the fd, the dm device will now be removed "atomically".
> > 
> > This cuts down the race window to the minimum possible - after (2),
> > no
> > mounts / kpartx / lvm operations won't be possible any more.
> 
> I wasn't actually worried about someone wanting to use the device. In
> that case the remove should fail.  I was worried about things that
> would
> close the device, but couldn't because of queued IO.  
> The race I was
> talking about is that after whatever has the device open gets the IO
> error, it might not close the device before multipath checks the open
> count.

Understood.

> Also, I'm not sure that resume helps us, since that will trigger
> uevents, which will open the device again.

Not sure if I understand correctly: It's possible to just suspend/flush
and then remove the device, without resuming. But that's dangerous,
because if some process opens the device while it's resumed, even if
it's just for reading a single block (think blkid), the open will
succeed, the IO will hang, the opencount will be increased, and the
REMOVE ioctl will fail. Therefore I think *if* we suspend we should
also resume. But I concur wrt the uevent, of course.

> > When we remove the partition mappings, we could use the same
> > technique
> > to avoid races on that layer. Unfortunately, a pending "deferred
> > remove" operation doesn't cause new open() or mount() calls to
> > fail; if
> > it did, we could use a simpler approach.
> > 
> > > So, for a while now, the suspending has been doing nothing for
> > > us.  We
> > > could either try to reorder things so that we actually try to
> > > flush
> > > the
> > > queued IOs back first (with or without suspending), or we could
> > > just
> > > remove suspending and say that things are working fine the way
> > > they
> > > currently are.
> > 
> > What else can we do except suspend/resume to avoid racing with
> > pending
> > close(), umount() or similar operations? Well, I guess if we open
> > the
> > device anyway as I proposed, we could do an fsync() on it. That
> > might
> > actually be better because it avoids the uevent being sent on
> > resume.
> 
> yeah. I think that simply disabling queueing and doing an fsync() is
> probably better than suspending. If new IO comes in after that, then
> something IS using the device, and we don't want to remove it. In
> multipath, maybe:
> 
> 1. disable queueing
> 2. remove partition mappings
> 3. open device
> 4. flush
> 5. check if we are the only opener.
> 	If not, and there are retries left... goto 4? sleep and
> recheck?
> 	we don't want to wait if the answer is that they device really
> 	is in use.
> 6. close device
> 7. remove device
> 
> Possibly the solution to not wanting to wait when a device is in use
> is
> that we could add an option to multipath to distinguish between the
> way
> flushing works now, where we check early if the device is in use, and
> just bail if it is, and a more aggressive attempt at remove that
> might
> take longer if used on devices that are in use.

What's wrong with deferred remove? After all, the user explicitly asked
for a flush. As long as some other process has the device open, it
won't be removed. That's why I like the O_EXCL idea, which will allow
small programs like blkid to access the device, but will cause all
attempts to mount or add stacked devices to fail until the device is
actually removed. I see no reason no to do this, as it's a race anyway
if some other process opens the device when we're supposed to flush it
and the opencount already dropped to 0. By using O_EXCL, we just
increase multipathd's chances to win the race and do what the user
asked for.

To make sure we're on the same boat: this is a topic for a separate
patch set IMO, I'm not expecting you to fix it in a v3.

Cheers,
Martin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-02 16:45                 ` Martin Wilck
@ 2020-07-02 19:41                   ` Benjamin Marzinski
  2020-07-03 15:12                     ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Benjamin Marzinski @ 2020-07-02 19:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Thu, Jul 02, 2020 at 04:45:21PM +0000, Martin Wilck wrote:
> On Thu, 2020-07-02 at 10:18 -0500, Benjamin Marzinski wrote:
> > On Thu, Jul 02, 2020 at 12:24:32PM +0000, Martin Wilck wrote:
> > > On Wed, 2020-07-01 at 22:14 -0500, Benjamin Marzinski wrote:
> > > > On Wed, Jul 01, 2020 at 10:54:34PM +0200, Martin Wilck wrote:
> > > > > On Thu, 2020-06-18 at 18:06 -0500, Benjamin Marzinski wrote:
> > > > > > I uploaded the test program, aio_test:
> > > > > > 
> > > > > > https://github.com/bmarzins/test_programs.git
> > > > > > 
> > > > > > You just need to run in on a queueing multipath device with
> > > > > > no
> > > > > > active
> > > > > > paths and an open count of 0. It will hang with the device
> > > > > > open.
> > > > > > Restore
> > > > > > a path, and it will exit, and the open count will go to 0.
> > > > > 
> > > > > Tried it now, it behaves as you say. I admit I can't imagine
> > > > > how
> > > > > the
> > > > > suspend/resume would improve anything here. Indeed, as you say,
> > > > > it
> > > > > opens 
> > > > > up a race window. Another process might open the device while
> > > > > it's suspended. Worse perhaps, once the device is resumed, an
> > > > > uevent will be 
> > > > > generated, and stacked devices might (in principle at least) be
> > > > > recreated
> > > > > before we get down to flush the map.
> > > > > 
> > > > > MAYBE the suspend/resume was necessary in the past because some
> > > > > earlier 
> > > > > kernels wouldn't immediately fail all outstanding commands
> > > > > when 
> > > > > queue_if_no_path was deactivated? (just speculating, I don't
> > > > > know
> > > > > if this
> > > > > is the case).
> > > > 
> > > > If you disable queue_if_no_path and then do a suspend with
> > > > flushing,
> > > > you
> > > > are guaranteed that the supend won't return until all the IO has
> > > > completed or failed.
> > > 
> > > I just realized that if we suspend, we don't even need disable
> > > queue_if_no_path, because the kernel does that automatically if a
> > > "suspend with flush" is issued, and has been doing so basically
> > > forever.
> > > 
> > > >   This would allow anything that was waiting on
> > > > queued IO to have the IO failback, which could allow it to close
> > > > the
> > > > device in time for multipath to be able to remove it (obviously
> > > > this
> > > > is
> > > > racey).  However, this assumes that you do your open checks after
> > > > the
> > > > suspend, which multipath no longer does.
> > > >  I realize that multipath can't
> > > > suspend until after it tries to remove the partition devices,
> > > > otherwise
> > > > those can get stuck. But there probably is some order that gets
> > > > this
> > > > all
> > > > right-ish.
> > > 
> > > Our code is currently racy. IMO we should
> > > 
> > >  0 unset queue_if_no_path
> > >  1 remove partition mappings
> > >  2 open(O_EXCL|O_RDONLY) the device
> > >  3 If this fails, in multipath, try again after a suspend/resume
> > > cycle.
> > >    In multipathd, I think we should just fail for now; perhaps
> > > later
> > >    we could handle the explicit "remove map" command like
> > > multipath.
> > >  4 If it fails again, bail out (in multipath, retry if asked to do
> > > so)
> > >  5 run a "deferred remove" ioctl
> > >  6 close the fd, the dm device will now be removed "atomically".
> > > 
> > > This cuts down the race window to the minimum possible - after (2),
> > > no
> > > mounts / kpartx / lvm operations won't be possible any more.
> > 
> > I wasn't actually worried about someone wanting to use the device. In
> > that case the remove should fail.  I was worried about things that
> > would
> > close the device, but couldn't because of queued IO.  
> > The race I was
> > talking about is that after whatever has the device open gets the IO
> > error, it might not close the device before multipath checks the open
> > count.
> 
> Understood.
> 
> > Also, I'm not sure that resume helps us, since that will trigger
> > uevents, which will open the device again.
> 
> Not sure if I understand correctly: It's possible to just suspend/flush
> and then remove the device, without resuming. But that's dangerous,
> because if some process opens the device while it's resumed, even if
> it's just for reading a single block (think blkid), the open will
> succeed, the IO will hang, the opencount will be increased, and the
> REMOVE ioctl will fail. Therefore I think *if* we suspend we should
> also resume. But I concur wrt the uevent, of course.

We obviously need to resume if the remove fails. I just an not sure we
want to resume before deciding the remove has failed, since it will just
add openers that we don't care about.
 
> > > When we remove the partition mappings, we could use the same
> > > technique
> > > to avoid races on that layer. Unfortunately, a pending "deferred
> > > remove" operation doesn't cause new open() or mount() calls to
> > > fail; if
> > > it did, we could use a simpler approach.
> > > 
> > > > So, for a while now, the suspending has been doing nothing for
> > > > us.  We
> > > > could either try to reorder things so that we actually try to
> > > > flush
> > > > the
> > > > queued IOs back first (with or without suspending), or we could
> > > > just
> > > > remove suspending and say that things are working fine the way
> > > > they
> > > > currently are.
> > > 
> > > What else can we do except suspend/resume to avoid racing with
> > > pending
> > > close(), umount() or similar operations? Well, I guess if we open
> > > the
> > > device anyway as I proposed, we could do an fsync() on it. That
> > > might
> > > actually be better because it avoids the uevent being sent on
> > > resume.
> > 
> > yeah. I think that simply disabling queueing and doing an fsync() is
> > probably better than suspending. If new IO comes in after that, then
> > something IS using the device, and we don't want to remove it. In
> > multipath, maybe:
> > 
> > 1. disable queueing
> > 2. remove partition mappings
> > 3. open device
> > 4. flush
> > 5. check if we are the only opener.
> > 	If not, and there are retries left... goto 4? sleep and
> > recheck?
> > 	we don't want to wait if the answer is that they device really
> > 	is in use.
> > 6. close device
> > 7. remove device
> > 
> > Possibly the solution to not wanting to wait when a device is in use
> > is
> > that we could add an option to multipath to distinguish between the
> > way
> > flushing works now, where we check early if the device is in use, and
> > just bail if it is, and a more aggressive attempt at remove that
> > might
> > take longer if used on devices that are in use.
> 
> What's wrong with deferred remove? After all, the user explicitly asked
> for a flush. As long as some other process has the device open, it
> won't be removed. That's why I like the O_EXCL idea, which will allow
> small programs like blkid to access the device, but will cause all
> attempts to mount or add stacked devices to fail until the device is
> actually removed. I see no reason no to do this, as it's a race anyway
> if some other process opens the device when we're supposed to flush it
> and the opencount already dropped to 0. By using O_EXCL, we just
> increase multipathd's chances to win the race and do what the user
> asked for.

I'm not actually a fan of deferred remove in general. It leaves the
device in this weird state were it is there but no longer openable. I
wish I had originally dealt with deferred removes by having multipathd
occasionally try to flush devices with no paths, or possibly listen for
notifications that the device has been closed, and flush then.

My specific objections here are that not all things that open a device
for longer than an instant do so with O_EXCL.  So it's very possible
that you run "multipath -F", it returns having removed a number of
unused devices.  But some of the devices it didn't remove were opened
without O_EXCL, and they will stick around for a while, and then
suddenly disappear.  Even if they don't say around for that long, this
is a can still effect scripts or other programs that are expecting to
check the device state immediately after calling multipath -F, and
having it not change a second or so later. So far multipath -f/-F will
not return until it has removed all the removeable devices (and waited
for them to be removed from udev). I think it should stay this way.

> To make sure we're on the same boat: this is a topic for a separate
> patch set IMO, I'm not expecting you to fix it in a v3.

Yep. It's future work.

> Cheers,
> Martin
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-02 19:41                   ` Benjamin Marzinski
@ 2020-07-03 15:12                     ` Martin Wilck
  2020-07-03 16:39                       ` Mike Snitzer
  0 siblings, 1 reply; 39+ messages in thread
From: Martin Wilck @ 2020-07-03 15:12 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel, snitzer, zkabelac

On Thu, 2020-07-02 at 14:41 -0500, Benjamin Marzinski wrote:
> On Thu, Jul 02, 2020 at 04:45:21PM +0000, Martin Wilck wrote:
> > 
> > What's wrong with deferred remove? After all, the user explicitly
> > asked
> > for a flush. As long as some other process has the device open, it
> > won't be removed. That's why I like the O_EXCL idea, which will
> > allow
> > small programs like blkid to access the device, but will cause all
> > attempts to mount or add stacked devices to fail until the device
> > is
> > actually removed. I see no reason no to do this, as it's a race
> > anyway
> > if some other process opens the device when we're supposed to flush
> > it
> > and the opencount already dropped to 0. By using O_EXCL, we just
> > increase multipathd's chances to win the race and do what the user
> > asked for.
> 
> I'm not actually a fan of deferred remove in general. It leaves the
> device in this weird state were it is there but no longer openable. 

Ok, I didn't expect that ;-)

AFAICS, devices in DEFERRED REMOVE state are actually still openable. I
just tested this once more on a 5.3 kernel.

As long as the device is opened by some process and thus not removed,
it can be opened by other processes, and is not deleted until the last
opener closes it. It's even possible to create new device mapper layers
like kpartx partitions on top of a DM device in DEFERRED REMOVE state.

> I
> wish I had originally dealt with deferred removes by having
> multipathd
> occasionally try to flush devices with no paths, or possibly listen
> for
> notifications that the device has been closed, and flush then.
> 
> My specific objections here are that not all things that open a
> device
> for longer than an instant do so with O_EXCL.  So it's very possible
> that you run "multipath -F", it returns having removed a number of
> unused devices.  But some of the devices it didn't remove were opened
> without O_EXCL, and they will stick around for a while, and then
> suddenly disappear.  Even if they don't say around for that long,
> this
> is a can still effect scripts or other programs that are expecting to
> check the device state immediately after calling multipath -F, and
> having it not change a second or so later. So far multipath -f/-F
> will
> not return until it has removed all the removeable devices (and
> waited
> for them to be removed from udev). I think it should stay this way.

I see. That's a valid point. IMHO it'd be better if the kernel didn't
allow any new access to devices in "deferred remove" state, and
possibly sent a REMOVE uevent and hide the device immediately after the
deferred remove ioctl. 

That would also be closer to how "lazy umount" (umount -l) behaves.
But I'm certainly overlooking some subtle semantic issues. 

@Mike, Zdenek: perhaps you can explain why "deferred remove" behaves
like this?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-03 15:12                     ` Martin Wilck
@ 2020-07-03 16:39                       ` Mike Snitzer
  2020-07-03 18:50                         ` Martin Wilck
  0 siblings, 1 reply; 39+ messages in thread
From: Mike Snitzer @ 2020-07-03 16:39 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, zkabelac

On Fri, Jul 03 2020 at 11:12am -0400,
Martin Wilck <Martin.Wilck@suse.com> wrote:

> On Thu, 2020-07-02 at 14:41 -0500, Benjamin Marzinski wrote:
> > On Thu, Jul 02, 2020 at 04:45:21PM +0000, Martin Wilck wrote:
> > > 
> > > What's wrong with deferred remove? After all, the user explicitly
> > > asked
> > > for a flush. As long as some other process has the device open, it
> > > won't be removed. That's why I like the O_EXCL idea, which will
> > > allow
> > > small programs like blkid to access the device, but will cause all
> > > attempts to mount or add stacked devices to fail until the device
> > > is
> > > actually removed. I see no reason no to do this, as it's a race
> > > anyway
> > > if some other process opens the device when we're supposed to flush
> > > it
> > > and the opencount already dropped to 0. By using O_EXCL, we just
> > > increase multipathd's chances to win the race and do what the user
> > > asked for.
> > 
> > I'm not actually a fan of deferred remove in general. It leaves the
> > device in this weird state were it is there but no longer openable. 
> 
> Ok, I didn't expect that ;-)
> 
> AFAICS, devices in DEFERRED REMOVE state are actually still openable. I
> just tested this once more on a 5.3 kernel.
> 
> As long as the device is opened by some process and thus not removed,
> it can be opened by other processes, and is not deleted until the last
> opener closes it. It's even possible to create new device mapper layers
> like kpartx partitions on top of a DM device in DEFERRED REMOVE state.
> 
> > I
> > wish I had originally dealt with deferred removes by having
> > multipathd
> > occasionally try to flush devices with no paths, or possibly listen
> > for
> > notifications that the device has been closed, and flush then.
> > 
> > My specific objections here are that not all things that open a
> > device
> > for longer than an instant do so with O_EXCL.  So it's very possible
> > that you run "multipath -F", it returns having removed a number of
> > unused devices.  But some of the devices it didn't remove were opened
> > without O_EXCL, and they will stick around for a while, and then
> > suddenly disappear.  Even if they don't say around for that long,
> > this
> > is a can still effect scripts or other programs that are expecting to
> > check the device state immediately after calling multipath -F, and
> > having it not change a second or so later. So far multipath -f/-F
> > will
> > not return until it has removed all the removeable devices (and
> > waited
> > for them to be removed from udev). I think it should stay this way.
> 
> I see. That's a valid point. IMHO it'd be better if the kernel didn't
> allow any new access to devices in "deferred remove" state, and
> possibly sent a REMOVE uevent and hide the device immediately after the
> deferred remove ioctl. 
> 
> That would also be closer to how "lazy umount" (umount -l) behaves.
> But I'm certainly overlooking some subtle semantic issues. 
> 
> @Mike, Zdenek: perhaps you can explain why "deferred remove" behaves
> like this?

"deferred remove" was introduced with commits:

2c140a246dc dm: allow remove to be deferred
acfe0ad74d2 dm: allocate a special workqueue for deferred device removal

The feature was developed to cater to the docker "devicemapper" graph
driver [1][2] that uses DM thin provisioning in the backend (Red Hat's
openshift once used a docker that used thinp in production for thinp's
snapshot capabilities. overlayfs is now used instead because it allows
page cache sharing which results in the ability to support vastly more
containers that all are layered on snapshots of the same "device").

Anyway, back to deferred remove: docker's Go-lang based implementation
and storage graph driver interface were clumsily written to require this
lazy removal of used resources.  As such, we had to adapt and the result
was "deferred device" remove that really could be used by any DM device.

Docker couldn't have later opens fail due to a pending removal -- it'd
break their app.  So if you want it to do what you'd have imagined it to
be; we'll need to introduce a new flag that alters the behavior (maybe
as a module param off of DM core's dm-mod.ko).  Patches welcome -- but
you'll need a pretty good reason (not read back far enough but maybe
you have one?).

Thanks,
Mike

 
[1] https://docs.docker.com/storage/storagedriver/device-mapper-driver/
[2] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/managing_containers/managing_storage_with_docker_formatted_containers

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/7] Fix muitpath/multipathd flush issue
  2020-07-03 16:39                       ` Mike Snitzer
@ 2020-07-03 18:50                         ` Martin Wilck
  0 siblings, 0 replies; 39+ messages in thread
From: Martin Wilck @ 2020-07-03 18:50 UTC (permalink / raw)
  To: snitzer; +Cc: dm-devel, zkabelac

On Fri, 2020-07-03 at 12:39 -0400, Mike Snitzer wrote:
> 
> Docker couldn't have later opens fail due to a pending removal --
> it'd
> break their app.  So if you want it to do what you'd have imagined it
> to
> be; we'll need to introduce a new flag that alters the behavior
> (maybe
> as a module param off of DM core's dm-mod.ko).  Patches welcome --
> but
> you'll need a pretty good reason (not read back far enough but maybe
> you have one?).

Thanks a lot for the explanation. I don't think I'm going write patches
and reason about it. We were just looking for the best way to safely
flush maps in multipath-tools, and I'd considered deferred remove as
one option, which it most likely is not. Anyway, I like to understand
why thinks are the way they are, so thanks again.

Cheers,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE  Software Solutions Germany GmbH
HRB 36809, AG Nürnberg GF: Felix
Imendörffer

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-07-03 18:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18  0:24 [PATCH 0/7] Fix muitpath/multipathd flush issue Benjamin Marzinski
2020-06-18  0:24 ` [PATCH 1/7] libmultipath: change do_get_info returns Benjamin Marzinski
2020-06-18 15:27   ` Martin Wilck
2020-06-18 23:17     ` Benjamin Marzinski
2020-06-18  0:24 ` [PATCH 2/7] multipathd: fix check_path errors with removed map Benjamin Marzinski
2020-06-18 19:34   ` Martin Wilck
2020-06-18 23:17     ` Benjamin Marzinski
2020-06-19  6:32       ` Hannes Reinecke
2020-06-19 16:30         ` Benjamin Marzinski
2020-06-19 20:03           ` Martin Wilck
2020-06-19 13:42       ` Martin Wilck
2020-06-19 16:52         ` Benjamin Marzinski
2020-06-19 20:09           ` Martin Wilck
2020-06-18  0:24 ` [PATCH 3/7] libmultipath: make dm_flush_maps only return 0 on success Benjamin Marzinski
2020-06-18 20:29   ` Martin Wilck
2020-06-18  0:24 ` [PATCH 4/7] multipathd: add "del maps" multipathd command Benjamin Marzinski
2020-06-18 20:37   ` Martin Wilck
2020-06-18 23:12     ` Benjamin Marzinski
2020-06-19 13:35       ` Martin Wilck
2020-06-18  0:24 ` [PATCH 5/7] multipath: make flushing maps work like other commands Benjamin Marzinski
2020-06-18 20:38   ` Martin Wilck
2020-06-18  0:24 ` [PATCH 6/7] multipath: delegate flushing maps to multipathd Benjamin Marzinski
2020-06-18 20:40   ` Martin Wilck
2020-06-18  0:24 ` [PATCH 7/7] multipath: add option to skip multipathd delegation Benjamin Marzinski
2020-06-18 20:44   ` Martin Wilck
2020-06-18 23:15     ` Benjamin Marzinski
2020-06-18  9:00 ` [PATCH 0/7] Fix muitpath/multipathd flush issue Martin Wilck
2020-06-18 18:04   ` Benjamin Marzinski
2020-06-18 20:06     ` Martin Wilck
2020-06-18 23:06       ` Benjamin Marzinski
2020-07-01 20:54         ` Martin Wilck
2020-07-02  3:14           ` Benjamin Marzinski
2020-07-02 12:24             ` Martin Wilck
2020-07-02 15:18               ` Benjamin Marzinski
2020-07-02 16:45                 ` Martin Wilck
2020-07-02 19:41                   ` Benjamin Marzinski
2020-07-03 15:12                     ` Martin Wilck
2020-07-03 16:39                       ` Mike Snitzer
2020-07-03 18:50                         ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.