dm-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths
@ 2024-04-25 23:35 Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 1/5] libmultipath: export partmap_in_use Benjamin Marzinski
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Commit 9bd3482e ("multipathd: make flush_map() delete maps like the
multipath command") fixed a multipathd hang when trying to remove a
multipath device with only kpartx devices opening it but with
outstanding IO. However that commit didn't deal with autoremoving the
multipath device when the last path is deleted. It's possible for the
same hang to occur during an autoremove. Since this remove is not
initiated by the user, multipathd shouldn't just automatically disable
queueing. On the other hand, multipathd hanging is a big problem, since
that stops all of its work on all multipath devices. This patchset
handles the issue by changing the options for flush_on_last_del to give
the users more choice in how to deal with this situation. But none of
the options will allow multipath to flush a device while it is queueing,
which is the action that can cause it to hang.

Differences from v1 (from conversations with Martin Wilck):
0001: Since flush_map_nopaths() will no longer need to know what the
      deferred_remove setting is, only exported partmap_in_use.
0002: Fixed up the commit message. Removed all of the code treating
      deferred removes as special in flush_map_nopaths(), and made
      sure flush_map_nopaths() skipped the autoremove if disabling
      queueing failed.
Original 0003: Dropped. flush_map_nopaths() no longer needs to care
      about the deferred_remove value, so dm_flush_map_nopaths() does.
0003-0005: New commits based on suggestions from Martin

Benjamin Marzinski (5):
  libmultipath: export partmap_in_use
  libmultipath: change flush_on_last_del to fix a multipathd hang
  libmultipath: remove redundant config option from InfiniBox config
  libmultipath: pad dev_loss_tmo to avoid race with no_path_retry
  libmultipath: fix deferred_remove function arguments

 libmultipath/defaults.h           |  2 +-
 libmultipath/devmapper.c          | 21 +++++----
 libmultipath/devmapper.h          |  1 +
 libmultipath/dict.c               | 72 +++++++++++++++++++++++++++----
 libmultipath/dict.h               |  1 +
 libmultipath/discovery.c          |  5 +++
 libmultipath/hwtable.c            |  5 +--
 libmultipath/libmultipath.version |  3 +-
 libmultipath/propsel.c            |  4 +-
 libmultipath/structs.h            |  7 +--
 multipath/multipath.conf.5.in     | 20 ++++++---
 multipathd/main.c                 | 39 +++++++++++++----
 12 files changed, 139 insertions(+), 41 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] libmultipath: export partmap_in_use
  2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
@ 2024-04-25 23:35 ` Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang Benjamin Marzinski
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

A future commit will make use of this function

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c          | 2 +-
 libmultipath/devmapper.h          | 1 +
 libmultipath/libmultipath.version | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index 4ce7e82f..a87abf7e 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -1043,7 +1043,7 @@ has_partmap(const char *name __attribute__((unused)),
 	return 1;
 }
 
-static int
+int
 partmap_in_use(const char *name, void *data)
 {
 	int part_count, *ret_count = (int *)data;
diff --git a/libmultipath/devmapper.h b/libmultipath/devmapper.h
index a7d66604..93caa2aa 100644
--- a/libmultipath/devmapper.h
+++ b/libmultipath/devmapper.h
@@ -57,6 +57,7 @@ enum {
 	DM_FLUSH_BUSY,
 };
 
+int partmap_in_use(const char *name, void *data);
 int _dm_flush_map (const char *, int, int, int, int);
 int dm_flush_map_nopaths(const char * mapname, int deferred_remove);
 #define dm_flush_map(mapname) _dm_flush_map(mapname, 1, 0, 0, 0)
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index 724786d5..e070f296 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -141,6 +141,7 @@ global:
 	need_io_err_check;
 	orphan_path;
 	parse_prkey_flags;
+	partmap_in_use;
 	pathcount;
 	path_discovery;
 	path_get_tpgs;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang
  2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 1/5] libmultipath: export partmap_in_use Benjamin Marzinski
@ 2024-04-25 23:35 ` Benjamin Marzinski
  2024-04-30 17:06   ` Martin Wilck
  2024-05-02 15:06   ` Martin Wilck
  2024-04-25 23:35 ` [PATCH v2 3/5] libmultipath: remove redundant config option from InfiniBox config Benjamin Marzinski
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Commit 9bd3482e ("multipathd: make flush_map() delete maps like the
multipath command") fixed an issue where deleting a queueing multipath
device through multipathd could hang because the multipath device had
outstanding IO, even though the only openers of it at the time of
deletion were the kpartx partition devices. However it is still possible
to hang multipathd, because autoremoving the device when all paths have
been deleted doesn't disable queueing. To reproduce this hang:

1. create a multipath device with a kpartx partition on top of it and
no_path_retry set to either "queue" or something long enough to run all
the commands in the reproducer before it disables queueing.
2. disable all the paths to the device with something like:
 # echo offline > /sys/block/<path_dev>/device/state
3. Write directly to the multipath device with something like:
 # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
4. delete all the paths to the device with something like:
 # echo 1 > /sys/block/<path_dev>/device/delete

Multipathd will hang trying to delete the kpartx device because, as the
last opener, it must flush the multipath device before closing it.
Because it hangs holding the vecs_lock, multipathd will never disable
queueing on the device, so it will hang forever, even if no_path_retry
is set to a number.

This hang can occur, even if deferred_remove is set. Since nothing has
the kpartx device opened, device-mapper does an immediate remove, which
will still hang. This means that even if deferred_remove is set,
multipathd still cannot delete a map while queueing is enabled. It must
either disable queueing or skip the autoremove.

Mulitpath can currently be configured to avoid this hang by setting

flush_on_last_del yes

However there are good reasons why users wouldn't want to set that. They
may need to be able to survive having all paths getting temporarily
deleted.  I should note that this is a pretty rare corner case, since
multipath automatically sets dev_loss_tmo so that it should not trigger
before queueing is disabled.

This commit avoids the hang by changing the possible values for
flush_on_last_del to "never", "unused", and "always", and sets the
default to "unused".  "always" works like "yes" did, "never" will not
disable queueing, and "unused" will only disable queueing if nothing has
the kpartx devices or the multipath device open. In order to be safe, if
the device has queue_if_no_paths set (and in case of "unused", the
device is in-use) the autoremove will be skipped. Also, instead of just
trusting the lack of "queue_if_no_paths" in the current mpp->features,
multipathd will tell the kernel to disable queueing, just to be sure it
actually is.

I chose "unused" as the default because this should generally only cause
multipathd to work differently from the users perspective when nothing
has the multipath device open but it is queueing and there is
outstanding IO. Without this change, it would have hung instead of
failing the outstanding IO. However, I do understand that an argument
could be made that "never" makes more sense as default, even though it
will cause multipathd to skip autoremoves in cases where it wouldn't
before. The change to the behavior of deffered_remove will be
noticeable, but skipping an autoremove is much better than hanging.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/defaults.h       |  2 +-
 libmultipath/dict.c           | 72 +++++++++++++++++++++++++++++++----
 libmultipath/dict.h           |  1 +
 libmultipath/hwtable.c        |  6 +--
 libmultipath/propsel.c        |  4 +-
 libmultipath/structs.h        |  7 ++--
 multipath/multipath.conf.5.in | 20 +++++++---
 multipathd/main.c             | 39 +++++++++++++++----
 8 files changed, 122 insertions(+), 29 deletions(-)

diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index 64b633f2..ed08c251 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -43,7 +43,7 @@
 #define DEFAULT_PRIO		PRIO_CONST
 #define DEFAULT_PRIO_ARGS	""
 #define DEFAULT_CHECKER		TUR
-#define DEFAULT_FLUSH		FLUSH_DISABLED
+#define DEFAULT_FLUSH		FLUSH_UNUSED
 #define DEFAULT_USER_FRIENDLY_NAMES USER_FRIENDLY_NAMES_OFF
 #define DEFAULT_FORCE_SYNC	0
 #define UNSET_PARTITION_DELIM "/UNSET/"
diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 5af036b7..546103f2 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -791,14 +791,70 @@ declare_def_snprint(checker_timeout, print_nonzero)
 declare_def_handler(allow_usb_devices, set_yes_no)
 declare_def_snprint(allow_usb_devices, print_yes_no)
 
-declare_def_handler(flush_on_last_del, set_yes_no_undef)
-declare_def_snprint_defint(flush_on_last_del, print_yes_no_undef, DEFAULT_FLUSH)
-declare_ovr_handler(flush_on_last_del, set_yes_no_undef)
-declare_ovr_snprint(flush_on_last_del, print_yes_no_undef)
-declare_hw_handler(flush_on_last_del, set_yes_no_undef)
-declare_hw_snprint(flush_on_last_del, print_yes_no_undef)
-declare_mp_handler(flush_on_last_del, set_yes_no_undef)
-declare_mp_snprint(flush_on_last_del, print_yes_no_undef)
+
+static const char * const flush_on_last_del_optvals[] = {
+	[FLUSH_NEVER] = "never",
+	[FLUSH_ALWAYS] = "always",
+	[FLUSH_UNUSED] = "unused",
+};
+
+static int
+set_flush_on_last_del(vector strvec, void *ptr, const char *file, int line_nr)
+{
+	int i;
+	int *flush_val_ptr = (int *)ptr;
+	char *buff;
+
+	buff = set_value(strvec);
+	if (!buff)
+		return 1;
+
+	for (i = FLUSH_NEVER; i <= FLUSH_UNUSED; i++) {
+		if (flush_on_last_del_optvals[i] != NULL &&
+		    !strcmp(buff, flush_on_last_del_optvals[i])) {
+			*flush_val_ptr = i;
+			break;
+		}
+	}
+
+	if (i > FLUSH_UNUSED) {
+		bool deprecated = true;
+		if (strcmp(buff, "no") == 0 || strcmp(buff, "0") == 0)
+			*flush_val_ptr = FLUSH_UNUSED;
+		else if (strcmp(buff, "yes") == 0 || strcmp(buff, "1") == 0)
+			*flush_val_ptr = FLUSH_ALWAYS;
+		else {
+			deprecated = false;
+			condlog(1, "%s line %d, invalid value for flush_on_last_del: \"%s\"",
+				file, line_nr, buff);
+		}
+		if (deprecated)
+			condlog(2, "%s line %d, \"%s\" is a deprecated value for flush_on_last_del and is treated as \"%s\"",
+				file, line_nr, buff,
+				flush_on_last_del_optvals[*flush_val_ptr]);
+	}
+
+	free(buff);
+	return 0;
+}
+
+int
+print_flush_on_last_del(struct strbuf *buff, long v)
+{
+	if (v == FLUSH_UNDEF)
+		return 0;
+	return append_strbuf_quoted(buff, flush_on_last_del_optvals[(int)v]);
+}
+
+declare_def_handler(flush_on_last_del, set_flush_on_last_del)
+declare_def_snprint_defint(flush_on_last_del, print_flush_on_last_del,
+			   DEFAULT_FLUSH)
+declare_ovr_handler(flush_on_last_del, set_flush_on_last_del)
+declare_ovr_snprint(flush_on_last_del, print_flush_on_last_del)
+declare_hw_handler(flush_on_last_del, set_flush_on_last_del)
+declare_hw_snprint(flush_on_last_del, print_flush_on_last_del)
+declare_mp_handler(flush_on_last_del, set_flush_on_last_del)
+declare_mp_snprint(flush_on_last_del, print_flush_on_last_del)
 
 declare_def_handler(user_friendly_names, set_yes_no_undef)
 declare_def_snprint_defint(user_friendly_names, print_yes_no_undef,
diff --git a/libmultipath/dict.h b/libmultipath/dict.h
index 7e2dfbe0..e1794537 100644
--- a/libmultipath/dict.h
+++ b/libmultipath/dict.h
@@ -18,4 +18,5 @@ int print_undef_off_zero(struct strbuf *buff, long v);
 int print_dev_loss(struct strbuf *buff, unsigned long v);
 int print_off_int_undef(struct strbuf *buff, long v);
 int print_auto_resize(struct strbuf *buff, long v);
+int print_flush_on_last_del(struct strbuf *buff, long v);
 #endif /* _DICT_H */
diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index 640bf347..ce600030 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -60,7 +60,7 @@
 		.no_path_retry = NO_PATH_RETRY_UNDEF,
 		.minio         = 1000,
 		.minio_rq      = 1,
-		.flush_on_last_del = FLUSH_DISABLED,
+		.flush_on_last_del = FLUSH_UNUSED,
 		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
 		.fast_io_fail  = 5,
 		.dev_loss      = 600,
@@ -829,7 +829,7 @@ static struct hwentry default_hw[] = {
 		.no_path_retry = NO_PATH_RETRY_QUEUE,
 		.pgpolicy      = GROUP_BY_PRIO,
 		.pgfailback    = -FAILBACK_IMMEDIATE,
-		.flush_on_last_del = FLUSH_ENABLED,
+		.flush_on_last_del = FLUSH_ALWAYS,
 		.dev_loss      = MAX_DEV_LOSS_TMO,
 		.prio_name     = PRIO_ONTAP,
 		.user_friendly_names = USER_FRIENDLY_NAMES_OFF,
@@ -1160,7 +1160,7 @@ static struct hwentry default_hw[] = {
 		.no_path_retry = NO_PATH_RETRY_FAIL,
 		.minio         = 1,
 		.minio_rq      = 1,
-		.flush_on_last_del = FLUSH_ENABLED,
+		.flush_on_last_del = FLUSH_ALWAYS,
 		.fast_io_fail  = 15,
 	},
 	/*
diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c
index 44241e2a..e2dcb316 100644
--- a/libmultipath/propsel.c
+++ b/libmultipath/propsel.c
@@ -963,6 +963,7 @@ out:
 int select_flush_on_last_del(struct config *conf, struct multipath *mp)
 {
 	const char *origin;
+	STRBUF_ON_STACK(buff);
 
 	mp_set_mpe(flush_on_last_del);
 	mp_set_ovr(flush_on_last_del);
@@ -970,8 +971,9 @@ int select_flush_on_last_del(struct config *conf, struct multipath *mp)
 	mp_set_conf(flush_on_last_del);
 	mp_set_default(flush_on_last_del, DEFAULT_FLUSH);
 out:
+	print_flush_on_last_del(&buff, mp->flush_on_last_del);
 	condlog(3, "%s: flush_on_last_del = %s %s", mp->alias,
-		(mp->flush_on_last_del == FLUSH_ENABLED)? "yes" : "no", origin);
+		get_strbuf_str(&buff), origin);
 	return 0;
 }
 
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index a25eb9d5..dbaf4d43 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -109,9 +109,10 @@ enum marginal_pathgroups_mode {
 };
 
 enum flush_states {
-	FLUSH_UNDEF = YNU_UNDEF,
-	FLUSH_DISABLED = YNU_NO,
-	FLUSH_ENABLED = YNU_YES,
+	FLUSH_UNDEF,
+	FLUSH_NEVER,
+	FLUSH_ALWAYS,
+	FLUSH_UNUSED,
 };
 
 enum log_checker_err_states {
diff --git a/multipath/multipath.conf.5.in b/multipath/multipath.conf.5.in
index b29a75fe..8cdb763f 100644
--- a/multipath/multipath.conf.5.in
+++ b/multipath/multipath.conf.5.in
@@ -707,12 +707,22 @@ The default is: \fBno\fR
 .TP
 .B flush_on_last_del
 If set to
-.I yes
+.I always
 , multipathd will disable queueing when the last path to a device has been
-deleted.
-.RS
-.TP
-The default is: \fBno\fR
+deleted. If set to
+.I never
+, multipathd will not disable queueing when the last path to a device has
+been deleted. Since multipath cannot safely remove a device while queueing
+is enabled, setting this to \fInever\fR means that multipathd will not
+automatically remove an unused multipath device whose paths are all deleted if
+it is currently set to queue_if_no_path. If set to
+.I unused
+, multipathd will only disable queueing when the last path is removed if
+nothing currently has the multipath device or any of the kpartx parition
+devices on top of it open.
+.RS
+.TP
+The default is: \fBunused\fR
 .RE
 .
 .
diff --git a/multipathd/main.c b/multipathd/main.c
index d8518a92..09286dd0 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -584,19 +584,42 @@ int update_multipath (struct vectors *vecs, char *mapname)
 static bool
 flush_map_nopaths(struct multipath *mpp, struct vectors *vecs) {
 	int r;
+	bool is_queueing = true;
 
+	if (mpp->features)
+		is_queueing = strstr(mpp->features, "queue_if_no_path");
+
+	/* It's not safe to do a remove of a map that has "queue_if_no_path"
+	 * set, since there could be outstanding IO which will cause
+	 * multipathd to hang while attempting the remove */
+	if (mpp->flush_on_last_del == FLUSH_NEVER && is_queueing) {
+		condlog(2, "%s: map is queueing, can't remove", mpp->alias);
+		return false;
+	}
+	if (mpp->flush_on_last_del == FLUSH_UNUSED &&
+            partmap_in_use(mpp->alias, NULL) && is_queueing) {
+		condlog(2, "%s: map in use and queueing, can't remove",
+			mpp->alias);
+		return false;
+	}
 	/*
-	 * flush_map will fail if the device is open
+	 * This will flush FLUSH_NEVER devices and FLUSH_UNUSED devices
+	 * that are in use, but only if they are already marked as not
+	 * queueing. That is just to make absolutely certain that they
+	 * really are not queueing, like they claim.
 	 */
-	if (mpp->flush_on_last_del == FLUSH_ENABLED) {
-		condlog(2, "%s Last path deleted, disabling queueing",
+	condlog(is_queueing ? 2 : 3, "%s Last path deleted, disabling queueing",
+		mpp->alias);
+	mpp->retry_tick = 0;
+	mpp->no_path_retry = NO_PATH_RETRY_FAIL;
+	mpp->disable_queueing = 1;
+	mpp->stat_map_failures++;
+	if (dm_queue_if_no_path(mpp, 0) != 0) {
+		condlog(0, "%s: failed to disable queueing. Not removing",
 			mpp->alias);
-		mpp->retry_tick = 0;
-		mpp->no_path_retry = NO_PATH_RETRY_FAIL;
-		mpp->disable_queueing = 1;
-		mpp->stat_map_failures++;
-		dm_queue_if_no_path(mpp, 0);
+		return false;
 	}
+
 	r = dm_flush_map_nopaths(mpp->alias, mpp->deferred_remove);
 	if (r != DM_FLUSH_OK) {
 		if (r == DM_FLUSH_DEFERRED) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] libmultipath: remove redundant config option from InfiniBox config
  2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 1/5] libmultipath: export partmap_in_use Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang Benjamin Marzinski
@ 2024-04-25 23:35 ` Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry Benjamin Marzinski
  2024-04-25 23:35 ` [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments Benjamin Marzinski
  4 siblings, 0 replies; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

The InfiniBox config already sets no_path_retry to "fail", so it won't
ever queue IO. That means setting flush_on_last_del to "always" is
redundant, since queueing is always disabled. Remove the
flush_on_last_del parameter, to make it easier for users to override the
default behavior if desired.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/hwtable.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c
index ce600030..51b15eab 100644
--- a/libmultipath/hwtable.c
+++ b/libmultipath/hwtable.c
@@ -1160,7 +1160,6 @@ static struct hwentry default_hw[] = {
 		.no_path_retry = NO_PATH_RETRY_FAIL,
 		.minio         = 1,
 		.minio_rq      = 1,
-		.flush_on_last_del = FLUSH_ALWAYS,
 		.fast_io_fail  = 15,
 	},
 	/*
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry
  2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2024-04-25 23:35 ` [PATCH v2 3/5] libmultipath: remove redundant config option from InfiniBox config Benjamin Marzinski
@ 2024-04-25 23:35 ` Benjamin Marzinski
  2024-05-02 15:14   ` Martin Wilck
  2024-04-25 23:35 ` [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments Benjamin Marzinski
  4 siblings, 1 reply; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Currently multipath makes sure that dev_loss_tmo is at least as long as
the configured no path queueing time. The goal is to make sure that path
devices aren't removed while the multipath device is still queueing in
hopes that they will become usable again.

This is racy. Multipathd may take longer to check the paths than
configured. If strict_timing isn't set, it will definitely take longer.
To account for this, pad the minimum dev_loss_tmo value by five seconds
(one default checker interval) plus one second per minute of no path
queueing time.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 6fd4dabb..e2052422 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -901,6 +901,11 @@ sysfs_set_scsi_tmo (struct config *conf, struct multipath *mpp)
 		uint64_t no_path_retry_tmo =
 			(uint64_t)mpp->no_path_retry * conf->checkint;
 
+		/* pad no_path_retry_tmo by one standard check interval
+		 * plus one second per minute to account for timing
+		 * issues with the rechecks */
+		no_path_retry_tmo += no_path_retry_tmo / 60 + DEFAULT_CHECKINT;
+
 		if (no_path_retry_tmo > MAX_DEV_LOSS_TMO)
 			min_dev_loss = MAX_DEV_LOSS_TMO;
 		else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments
  2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2024-04-25 23:35 ` [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry Benjamin Marzinski
@ 2024-04-25 23:35 ` Benjamin Marzinski
  2024-05-02 15:52   ` Martin Wilck
  4 siblings, 1 reply; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-25 23:35 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

Aside from one version of dm_flush_map_nopaths(), all the callers of
_dm_flush_map() and dm_simplecmd() set deferred_remove to 0. But these
functions, as well as some helper functions they called, all treated the
deferred_remove argument as an enum deferred_remove_states, and called
do_deferred() to see if the remove should be deferred. To simplify the
code, make these functions treat deferred_remove as a boolean value
signifying whether a remove is deferred, and make dm_flush_map_nopaths()
do the work of checking if the remove should be deferred.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/devmapper.c          | 19 +++++++++----------
 libmultipath/libmultipath.version |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
index a87abf7e..2e7b2c64 100644
--- a/libmultipath/devmapper.c
+++ b/libmultipath/devmapper.c
@@ -386,8 +386,6 @@ libmp_dm_task_create(int task)
 	return dm_task_create(task);
 }
 
-#define do_deferred(x) ((x) == DEFERRED_REMOVE_ON || (x) == DEFERRED_REMOVE_IN_PROGRESS)
-
 static int
 dm_simplecmd (int task, const char *name, int no_flush, int need_sync,
 	      uint16_t udev_flags, int deferred_remove __DR_UNUSED__) {
@@ -411,7 +409,7 @@ dm_simplecmd (int task, const char *name, int no_flush, int need_sync,
 		dm_task_no_flush(dmt);		/* for DM_DEVICE_SUSPEND/RESUME */
 #endif
 #ifdef LIBDM_API_DEFERRED
-	if (do_deferred(deferred_remove))
+	if (deferred_remove)
 		dm_task_deferred_remove(dmt);
 #endif
 	if (udev_wait_flag &&
@@ -1082,7 +1080,7 @@ int _dm_flush_map (const char * mapname, int need_sync, int deferred_remove,
 
 	/* If you aren't doing a deferred remove, make sure that no
 	 * devices are in use */
-	if (!do_deferred(deferred_remove) && partmap_in_use(mapname, NULL))
+	if (!deferred_remove && partmap_in_use(mapname, NULL))
 			return DM_FLUSH_BUSY;
 
 	if (need_suspend &&
@@ -1100,7 +1098,7 @@ int _dm_flush_map (const char * mapname, int need_sync, int deferred_remove,
 	if ((r = dm_remove_partmaps(mapname, need_sync, deferred_remove)))
 		return r;
 
-	if (!do_deferred(deferred_remove) && dm_get_opencount(mapname)) {
+	if (!deferred_remove && dm_get_opencount(mapname)) {
 		condlog(2, "%s: map in use", mapname);
 		return DM_FLUSH_BUSY;
 	}
@@ -1112,8 +1110,7 @@ int _dm_flush_map (const char * mapname, int need_sync, int deferred_remove,
 		r = dm_device_remove(mapname, need_sync, deferred_remove);
 
 		if (r) {
-			if (do_deferred(deferred_remove)
-			    && dm_map_present(mapname)) {
+			if (deferred_remove && dm_map_present(mapname)) {
 				condlog(4, "multipath map %s remove deferred",
 					mapname);
 				return DM_FLUSH_DEFERRED;
@@ -1147,7 +1144,10 @@ int _dm_flush_map (const char * mapname, int need_sync, int deferred_remove,
 int
 dm_flush_map_nopaths(const char * mapname, int deferred_remove)
 {
-	return _dm_flush_map(mapname, 1, deferred_remove, 0, 0);
+	return _dm_flush_map(mapname, 1,
+			     (deferred_remove == DEFERRED_REMOVE_ON ||
+			      deferred_remove == DEFERRED_REMOVE_IN_PROGRESS),
+			     0, 0);
 }
 
 #else
@@ -1539,8 +1539,7 @@ remove_partmap(const char *name, void *data)
 
 	if (dm_get_opencount(name)) {
 		dm_remove_partmaps(name, rd->need_sync, rd->deferred_remove);
-		if (!do_deferred(rd->deferred_remove) &&
-		    dm_get_opencount(name)) {
+		if (rd->deferred_remove && dm_get_opencount(name)) {
 			condlog(2, "%s: map in use", name);
 			return DM_FLUSH_BUSY;
 		}
diff --git a/libmultipath/libmultipath.version b/libmultipath/libmultipath.version
index e070f296..eb511749 100644
--- a/libmultipath/libmultipath.version
+++ b/libmultipath/libmultipath.version
@@ -43,7 +43,7 @@ LIBMPATHCOMMON_1.0.0 {
 	put_multipath_config;
 };
 
-LIBMULTIPATH_23.0.0 {
+LIBMULTIPATH_24.0.0 {
 global:
 	/* symbols referenced by multipath and multipathd */
 	add_foreign;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang
  2024-04-25 23:35 ` [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang Benjamin Marzinski
@ 2024-04-30 17:06   ` Martin Wilck
  2024-04-30 21:29     ` Benjamin Marzinski
  2024-05-02 15:06   ` Martin Wilck
  1 sibling, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2024-04-30 17:06 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> 
> 1. create a multipath device with a kpartx partition on top of it and
> no_path_retry set to either "queue" or something long enough to run
> all
> the commands in the reproducer before it disables queueing.
> 2. disable all the paths to the device with something like:
>  # echo offline > /sys/block/<path_dev>/device/state
> 3. Write directly to the multipath device with something like:
>  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> 4. delete all the paths to the device with something like:
>  # echo 1 > /sys/block/<path_dev>/device/delete

I've tried to reproduce the issue with these commands. Test system was
using a LIO iSCSI target with 2 paths. I created a  test script
(attached) to try the offline / IO / delete procedure repeatedly.
I haven't been able to make multipathd hang even once.

I also played around with dd options. If I use oflag=sync or
oflag=direct, the dd command itself hangs.

Did I set up anything wrongly, or does the behavior perhaps depend on
the kernel, or something else perhaps? Mine was a 6.4 kernel. This is
not to say there's something wrong with your patch, but I'd like to
understand the error situation better, as it doesn't seem to be
trigger-able on my test system.

multipath.conf:

defaults {
	verbosity 3
	flush_on_last_del yes
}

blacklist {
	wwid QEMU
}

overrides {
	no_path_retry queue
}

Regards,
Martin




[-- Attachment #2: flush-0-paths.sh --]
[-- Type: application/x-shellscript, Size: 988 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang
  2024-04-30 17:06   ` Martin Wilck
@ 2024-04-30 21:29     ` Benjamin Marzinski
  2024-05-02  8:36       ` Martin Wilck
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Marzinski @ 2024-04-30 21:29 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote:
> On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > 
> > 1. create a multipath device with a kpartx partition on top of it and
> > no_path_retry set to either "queue" or something long enough to run
> > all
> > the commands in the reproducer before it disables queueing.
> > 2. disable all the paths to the device with something like:
> >  # echo offline > /sys/block/<path_dev>/device/state
> > 3. Write directly to the multipath device with something like:
> >  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> > 4. delete all the paths to the device with something like:
> >  # echo 1 > /sys/block/<path_dev>/device/delete
> 
> I've tried to reproduce the issue with these commands. Test system was
> using a LIO iSCSI target with 2 paths. I created a  test script
> (attached) to try the offline / IO / delete procedure repeatedly.
> I haven't been able to make multipathd hang even once.
> 
> I also played around with dd options. If I use oflag=sync or
> oflag=direct, the dd command itself hangs.
> 
> Did I set up anything wrongly, or does the behavior perhaps depend on
> the kernel, or something else perhaps? Mine was a 6.4 kernel. This is
> not to say there's something wrong with your patch, but I'd like to
> understand the error situation better, as it doesn't seem to be
> trigger-able on my test system.
> 
> multipath.conf:
> 
> defaults {
> 	verbosity 3
> 	flush_on_last_del yes

If you set flush_on_last_del to "yes", then you won't be able to hit
this, because you will never be queueing when multipathd tries to
autoremove the device. The goal of my patch was to make sure multipathd
never hung on an autoremove, regardless of the no_path_retry setting and
the flush_on_last_del setting.

With "always", the device will always have queueing disabled, so the
device can be safely removed.

With "unused", if the device is unused, queuing is disabled. Otherwise,
multipathd will skip the autoremove if the device is queueing.

With "never", multipathd will skip the autoremove if the device is
queueing.

Your script looks fine, but with a system set up to hit it, the bug
should occur every time.

-Ben

> }
> 
> blacklist {
> 	wwid QEMU
> }
> 
> overrides {
> 	no_path_retry queue
> }
> 
> Regards,
> Martin
> 
> 
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang
  2024-04-30 21:29     ` Benjamin Marzinski
@ 2024-05-02  8:36       ` Martin Wilck
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2024-05-02  8:36 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Tue, 2024-04-30 at 17:29 -0400, Benjamin Marzinski wrote:
> On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > > 
> > > 1. create a multipath device with a kpartx partition on top of it
> > > and
> > > no_path_retry set to either "queue" or something long enough to
> > > run
> > > all
> > > the commands in the reproducer before it disables queueing.
> > > 2. disable all the paths to the device with something like:
> > >  # echo offline > /sys/block/<path_dev>/device/state
> > > 3. Write directly to the multipath device with something like:
> > >  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> > > 4. delete all the paths to the device with something like:
> > >  # echo 1 > /sys/block/<path_dev>/device/delete
> > 
> > I've tried to reproduce the issue with these commands. Test system
> > was
> > using a LIO iSCSI target with 2 paths. I created a  test script
> > (attached) to try the offline / IO / delete procedure repeatedly.
> > I haven't been able to make multipathd hang even once.
> > 
> > I also played around with dd options. If I use oflag=sync or
> > oflag=direct, the dd command itself hangs.
> > 
> > Did I set up anything wrongly, or does the behavior perhaps depend
> > on
> > the kernel, or something else perhaps? Mine was a 6.4 kernel. This
> > is
> > not to say there's something wrong with your patch, but I'd like to
> > understand the error situation better, as it doesn't seem to be
> > trigger-able on my test system.
> > 
> > multipath.conf:
> > 
> > defaults {
> > 	verbosity 3
> > 	flush_on_last_del yes
> 
> If you set flush_on_last_del to "yes", then you won't be able to hit
> this, because you will never be queueing when multipathd tries to
> autoremove the device. The goal of my patch was to make sure
> multipathd
> never hung on an autoremove, regardless of the no_path_retry setting
> and
> the flush_on_last_del setting

Stupid me. To my excuse, I'd set "flush_on_last_del yes" because I
previously had been unable to reproduce the multipathd hang with the
default setting "flush_on_last_del no", and thought I'd misunderstood
something about flush_on_last_del. But I'd made some other mistake  at
that point, apparently, which caused the issue not to reproduce.

I just set "flush_on_last_del no" and indeed reproduced the issue with
my script, immediately.

Thanks,
Martin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang
  2024-04-25 23:35 ` [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang Benjamin Marzinski
  2024-04-30 17:06   ` Martin Wilck
@ 2024-05-02 15:06   ` Martin Wilck
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2024-05-02 15:06 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> Commit 9bd3482e ("multipathd: make flush_map() delete maps like the
> multipath command") fixed an issue where deleting a queueing
> multipath
> device through multipathd could hang because the multipath device had
> outstanding IO, even though the only openers of it at the time of
> deletion were the kpartx partition devices. However it is still
> possible
> to hang multipathd, because autoremoving the device when all paths
> have
> been deleted doesn't disable queueing. To reproduce this hang:
> 
> 1. create a multipath device with a kpartx partition on top of it and
> no_path_retry set to either "queue" or something long enough to run
> all
> the commands in the reproducer before it disables queueing.
> 2. disable all the paths to the device with something like:
>  # echo offline > /sys/block/<path_dev>/device/state
> 3. Write directly to the multipath device with something like:
>  # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1
> 4. delete all the paths to the device with something like:
>  # echo 1 > /sys/block/<path_dev>/device/delete
> 
> Multipathd will hang trying to delete the kpartx device because, as
> the
> last opener, it must flush the multipath device before closing it.
> Because it hangs holding the vecs_lock, multipathd will never disable
> queueing on the device, so it will hang forever, even if
> no_path_retry
> is set to a number.
> 
> This hang can occur, even if deferred_remove is set. Since nothing
> has
> the kpartx device opened, device-mapper does an immediate remove,
> which
> will still hang. This means that even if deferred_remove is set,
> multipathd still cannot delete a map while queueing is enabled. It
> must
> either disable queueing or skip the autoremove.
> 
> Mulitpath can currently be configured to avoid this hang by setting
> 
> flush_on_last_del yes
> 
> However there are good reasons why users wouldn't want to set that.
> They
> may need to be able to survive having all paths getting temporarily
> deleted.  I should note that this is a pretty rare corner case, since
> multipath automatically sets dev_loss_tmo so that it should not
> trigger
> before queueing is disabled.
> 
> This commit avoids the hang by changing the possible values for
> flush_on_last_del to "never", "unused", and "always", and sets the
> default to "unused".  "always" works like "yes" did, "never" will not
> disable queueing, and "unused" will only disable queueing if nothing
> has
> the kpartx devices or the multipath device open. In order to be safe,
> if
> the device has queue_if_no_paths set (and in case of "unused", the
> device is in-use) the autoremove will be skipped. Also, instead of
> just
> trusting the lack of "queue_if_no_paths" in the current mpp-
> >features,
> multipathd will tell the kernel to disable queueing, just to be sure
> it
> actually is.
> 
> I chose "unused" as the default because this should generally only
> cause
> multipathd to work differently from the users perspective when
> nothing
> has the multipath device open but it is queueing and there is
> outstanding IO. Without this change, it would have hung instead of
> failing the outstanding IO. However, I do understand that an argument
> could be made that "never" makes more sense as default, even though
> it
> will cause multipathd to skip autoremoves in cases where it wouldn't
> before. The change to the behavior of deffered_remove will be
> noticeable, but skipping an autoremove is much better than hanging.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/defaults.h       |  2 +-
>  libmultipath/dict.c           | 72 +++++++++++++++++++++++++++++++--
> --
>  libmultipath/dict.h           |  1 +
>  libmultipath/hwtable.c        |  6 +--
>  libmultipath/propsel.c        |  4 +-
>  libmultipath/structs.h        |  7 ++--
>  multipath/multipath.conf.5.in | 20 +++++++---
>  multipathd/main.c             | 39 +++++++++++++++----
>  8 files changed, 122 insertions(+), 29 deletions(-)
> 
> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> index 64b633f2..ed08c251 100644

The spell checker noted a typo ("parition") which I'm going to amend.

Reviewed-by: Martin Wilck <mwilck@suse.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry
  2024-04-25 23:35 ` [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry Benjamin Marzinski
@ 2024-05-02 15:14   ` Martin Wilck
  2024-05-02 16:05     ` Benjamin Marzinski
  0 siblings, 1 reply; 14+ messages in thread
From: Martin Wilck @ 2024-05-02 15:14 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> Currently multipath makes sure that dev_loss_tmo is at least as long
> as
> the configured no path queueing time. The goal is to make sure that
> path
> devices aren't removed while the multipath device is still queueing
> in
> hopes that they will become usable again.
> 
> This is racy. Multipathd may take longer to check the paths than
> configured. If strict_timing isn't set, it will definitely take
> longer.
> To account for this, pad the minimum dev_loss_tmo value by five
> seconds
> (one default checker interval) plus one second per minute of no path
> queueing time.

What's the rationale for the "one second per minute" part?
It feels kind of arbitrary to me, and I don't find it obvious that we
need larger padding for larger timeouts.

Regards
Martin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments
  2024-04-25 23:35 ` [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments Benjamin Marzinski
@ 2024-05-02 15:52   ` Martin Wilck
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2024-05-02 15:52 UTC (permalink / raw)
  To: Benjamin Marzinski, Christophe Varoqui; +Cc: device-mapper development

On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> Aside from one version of dm_flush_map_nopaths(), all the callers of
> _dm_flush_map() and dm_simplecmd() set deferred_remove to 0. But
> these
> functions, as well as some helper functions they called, all treated
> the
> deferred_remove argument as an enum deferred_remove_states, and
> called
> do_deferred() to see if the remove should be deferred. To simplify
> the
> code, make these functions treat deferred_remove as a boolean value
> signifying whether a remove is deferred, and make
> dm_flush_map_nopaths()
> do the work of checking if the remove should be deferred.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

While correct in general, this patch doesn't make me quite happy yet.
The argument type of _dm_flush_map() and elsewhere should at least be
changed to "bool" to express the semantics more clearly.

But I have something more intrusive in mind. I'll post a patch on top
of your series.

Thus:

Reviewed-by: Martin Wilck <mwilck@suse.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry
  2024-05-02 15:14   ` Martin Wilck
@ 2024-05-02 16:05     ` Benjamin Marzinski
  2024-05-02 18:44       ` Martin Wilck
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Marzinski @ 2024-05-02 16:05 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Christophe Varoqui, device-mapper development

On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > Currently multipath makes sure that dev_loss_tmo is at least as long
> > as
> > the configured no path queueing time. The goal is to make sure that
> > path
> > devices aren't removed while the multipath device is still queueing
> > in
> > hopes that they will become usable again.
> > 
> > This is racy. Multipathd may take longer to check the paths than
> > configured. If strict_timing isn't set, it will definitely take
> > longer.
> > To account for this, pad the minimum dev_loss_tmo value by five
> > seconds
> > (one default checker interval) plus one second per minute of no path
> > queueing time.
> 
> What's the rationale for the "one second per minute" part?
> It feels kind of arbitrary to me, and I don't find it obvious that we
> need larger padding for larger timeouts.

The way the checker works, without strict_timing, we do all the checking
work, and then we wait for a second. Obviously, the checking work takes
time, so every nominal one second checker loop will in reality take
longer than a second. The larger you have configured (no_path_retry *
checkint), the further away the actual time when we disable queueing
will be from (no_path_retry * checkint) seconds.

With strict_timing on, things work better. As long as the checking work
doesn't take more than a second, you will stay fairly close to one
second per loop, but even still, it can be longer. nanosleep() will
sleep till at least the time specified. If multipathd is not running as
a realtime process, it will usually sleep for longer.

Also, if the checker work takes longer than a second, then strict_timing
will make sure that the loop takes (close to) exactly 2 (or whatever the
next number is) seconds. However, retry_count_tick() still only ticks
down by 1 for each loop. This means that the real time before queueing
is disabled can diverge from (no_path_retry * checkint). And again, the
longer you've configured it to wait, the more it diverges.

I thought about passing ticks to retry_count_tick(), so that it would
correctly deal with these multi-tick loops, but this can cause it to
diverge from the actual number of retries attempted, in a way that would
mean we disable queueing before the desired number of retries have
occurred, which is a bigger problem, IMHO.  No matter how long the
checker work takes, you will only ever do one path check per loop. In
the worse case, say that the checker work (or anything else in
multipathd) gets stuck for a couple of minutes. You would only do one
path check, but hundreds of ticks would be passed to the
retry_count_tick(), meaning that you could disable queueing after only
one retry. Now you could limit the number of ticks you passed to
retry_count_tick(), but even still, if the a path was 1 tick away from
being checked, and the checker work took 3 seconds, you would still pass
3 ticks to retry_count_tick(), which would cause it to start to diverge
from the actual time it takes for the path to do the configured amount
of retries.

We could improve retry_count_tick(), so that it actually tracks the
number of retries you did, but that still leaves the problems when
either strict_timing isn't on or multipathd isn't a realtime process.
It seemed easier to just say, the longer you have configured multipathd
to wait to disable queueing, the more the time when it actually disables
queueing can diverge from the configured time.

The number I picked for the rate of divergence is just a guest, and if
you don't like it, I'm fine with removing it. After all, the whole point
of this patchset is to make sure that the worst thing that can happen if
paths disappear while we are still queueing is that a multipath device
isn't autoremoved.

-Ben

> 
> Regards
> Martin


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry
  2024-05-02 16:05     ` Benjamin Marzinski
@ 2024-05-02 18:44       ` Martin Wilck
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Wilck @ 2024-05-02 18:44 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Christophe Varoqui, device-mapper development

On Thu, 2024-05-02 at 12:05 -0400, Benjamin Marzinski wrote:
> On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote:
> > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote:
> > > Currently multipath makes sure that dev_loss_tmo is at least as
> > > long
> > > as
> > > the configured no path queueing time. The goal is to make sure
> > > that
> > > path
> > > devices aren't removed while the multipath device is still
> > > queueing
> > > in
> > > hopes that they will become usable again.
> > > 
> > > This is racy. Multipathd may take longer to check the paths than
> > > configured. If strict_timing isn't set, it will definitely take
> > > longer.
> > > To account for this, pad the minimum dev_loss_tmo value by five
> > > seconds
> > > (one default checker interval) plus one second per minute of no
> > > path
> > > queueing time.
> > 
> > What's the rationale for the "one second per minute" part?
> > It feels kind of arbitrary to me, and I don't find it obvious that
> > we
> > need larger padding for larger timeouts.
> 
> The way the checker works, without strict_timing, we do all the
> checking
> work, and then we wait for a second. Obviously, the checking work
> takes
> time, so every nominal one second checker loop will in reality take
> longer than a second. The larger you have configured (no_path_retry *
> checkint), the further away the actual time when we disable queueing
> will be from (no_path_retry * checkint) seconds.
> 
> With strict_timing on, things work better. As long as the checking
> work
> doesn't take more than a second, you will stay fairly close to one
> second per loop, but even still, it can be longer. nanosleep() will
> sleep till at least the time specified. If multipathd is not running
> as
> a realtime process, it will usually sleep for longer.
> 
> Also, if the checker work takes longer than a second, then
> strict_timing
> will make sure that the loop takes (close to) exactly 2 (or whatever
> the
> next number is) seconds. However, retry_count_tick() still only ticks
> down by 1 for each loop. This means that the real time before
> queueing
> is disabled can diverge from (no_path_retry * checkint). And again,
> the
> longer you've configured it to wait, the more it diverges.
> 
> I thought about passing ticks to retry_count_tick(), so that it would
> correctly deal with these multi-tick loops, but this can cause it to
> diverge from the actual number of retries attempted, in a way that
> would
> mean we disable queueing before the desired number of retries have
> occurred, which is a bigger problem, IMHO.  No matter how long the
> checker work takes, you will only ever do one path check per loop. In
> the worse case, say that the checker work (or anything else in
> multipathd) gets stuck for a couple of minutes. You would only do one
> path check, but hundreds of ticks would be passed to the
> retry_count_tick(), meaning that you could disable queueing after
> only
> one retry. Now you could limit the number of ticks you passed to
> retry_count_tick(), but even still, if the a path was 1 tick away
> from
> being checked, and the checker work took 3 seconds, you would still
> pass
> 3 ticks to retry_count_tick(), which would cause it to start to
> diverge
> from the actual time it takes for the path to do the configured
> amount
> of retries.
> 
> We could improve retry_count_tick(), so that it actually tracks the
> number of retries you did, but that still leaves the problems when
> either strict_timing isn't on or multipathd isn't a realtime process.
> It seemed easier to just say, the longer you have configured
> multipathd
> to wait to disable queueing, the more the time when it actually
> disables
> queueing can diverge from the configured time.
> 
> The number I picked for the rate of divergence is just a guest, and
> if
> you don't like it, I'm fine with removing it. After all, the whole
> point
> of this patchset is to make sure that the worst thing that can happen
> if
> paths disappear while we are still queueing is that a multipath
> device
> isn't autoremoved.

I was just asking for a rationale. This is fine.

I think we can, and should, improve multipathd's timing. Unfortunately
this has been on my agenda for a long time already.

For 4/5, too:

Reviewed-by: Martin Wilck <mwilck@suse.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-05-02 18:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 23:35 [PATCH v2 0/5] multipath: fix hang in flush_map_nopaths Benjamin Marzinski
2024-04-25 23:35 ` [PATCH v2 1/5] libmultipath: export partmap_in_use Benjamin Marzinski
2024-04-25 23:35 ` [PATCH v2 2/5] libmultipath: change flush_on_last_del to fix a multipathd hang Benjamin Marzinski
2024-04-30 17:06   ` Martin Wilck
2024-04-30 21:29     ` Benjamin Marzinski
2024-05-02  8:36       ` Martin Wilck
2024-05-02 15:06   ` Martin Wilck
2024-04-25 23:35 ` [PATCH v2 3/5] libmultipath: remove redundant config option from InfiniBox config Benjamin Marzinski
2024-04-25 23:35 ` [PATCH v2 4/5] libmultipath: pad dev_loss_tmo to avoid race with no_path_retry Benjamin Marzinski
2024-05-02 15:14   ` Martin Wilck
2024-05-02 16:05     ` Benjamin Marzinski
2024-05-02 18:44       ` Martin Wilck
2024-04-25 23:35 ` [PATCH v2 5/5] libmultipath: fix deferred_remove function arguments Benjamin Marzinski
2024-05-02 15:52   ` Martin Wilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).