All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] multipath: aio and systemd service improvements
@ 2023-10-24 16:49 mwilck
  2023-10-24 16:49 ` [PATCH 1/6] libmultipath: reduce log level of directio messages mwilck
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

The bulk of this patch set comes out of my attempts to solve
https://github.com/opensvc/multipath-tools/issues/73 and the insight
that aio memory can't be freed unless either io_getevents() returns
the iocb in question as finished, or io_destroy() has returned.
aio handling in multipathd needs more work, because io_destroy()
may block for a long time, and we must avoid to block, but I would
like to postpone that to a later patch set (I plan to add a generic
aio framework to libmultipath which could then be used by both
the directio checker and the io_err_stat code, but this isn't finished yet).

The last two patches are improvements to multipathd's systemd service and
kernel module loading logic. They are unrelated to the others.

Martin Wilck (6):
  libmultipath: reduce log level of directio messages
  libmultipath: directio: don't reset ct->running after io_cancel()
  libmultipath: io_err_stat: don't free aio memory before completion
  libmultipath: io_err_stat: call io_destroy() before
    free_io_err_pathvec()
  multipath-tools: systemd: require modprobe@dm_multipath.service
  libmpathutil: remove systemd_service_enabled()

 libmpathutil/libmpathutil.version | 17 +++------
 libmpathutil/util.c               | 58 -------------------------------
 libmpathutil/util.h               |  1 -
 libmultipath/checkers/directio.c  | 21 ++++-------
 libmultipath/io_err_stat.c        | 28 +++++++++------
 libmultipath/valid.c              | 16 ++-------
 multipath/Makefile                |  2 --
 multipath/modules-load.conf       |  3 --
 multipathd/multipathd.service     |  2 ++
 tests/directio.c                  | 10 +-----
 tests/valid.c                     | 24 ++-----------
 11 files changed, 35 insertions(+), 147 deletions(-)
 delete mode 100644 multipath/modules-load.conf

-- 
2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/6] libmultipath: reduce log level of directio messages
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:10   ` Benjamin Marzinski
  2023-10-24 16:49 ` [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel() mwilck
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

The directio checker logs too much at verbosity 3.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/checkers/directio.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
index 2f3ece0..25b2383 100644
--- a/libmultipath/checkers/directio.c
+++ b/libmultipath/checkers/directio.c
@@ -266,7 +266,7 @@ get_events(struct aio_group *aio_grp, struct timespec *timeout)
 		for (i = 0; i < nr; i++) {
 			struct async_req *req = container_of(events[i].obj, struct async_req, io);
 
-			LOG(3, "io finished %lu/%lu", events[i].res,
+			LOG(4, "io finished %lu/%lu", events[i].res,
 			    events[i].res2);
 
 			/* got an orphaned request */
@@ -283,7 +283,7 @@ get_events(struct aio_group *aio_grp, struct timespec *timeout)
 	} while (nr == 128); /* assume there are more events and try again */
 
 	if (nr < 0)
-		LOG(3, "async io getevents returned %i (errno=%s)",
+		LOG(4, "async io getevents returned %i (errno=%s)",
 		    nr, strerror(errno));
 
 	return got_events;
@@ -315,7 +315,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
 	} else {
 		struct iocb *ios[1] = { &ct->req->io };
 
-		LOG(3, "starting new request");
+		LOG(4, "starting new request");
 		memset(&ct->req->io, 0, sizeof(struct iocb));
 		io_prep_pread(&ct->req->io, fd, ct->req->buf,
 			      ct->req->blksize, 0);
@@ -360,7 +360,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
 			ct->running = 0;
 		rc = PATH_DOWN;
 	} else {
-		LOG(3, "async io pending");
+		LOG(4, "async io pending");
 		rc = PATH_PENDING;
 	}
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel()
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
  2023-10-24 16:49 ` [PATCH 1/6] libmultipath: reduce log level of directio messages mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:14   ` Benjamin Marzinski
  2023-10-24 16:49 ` [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion mwilck
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

io_cancel() never succeeds, and even if it did, io_getevents() must
still be called to reclaim the IO resources from the kernel.
Don't pretend the opposite by resetting ct->running, or freeing the
memory, before io_getevents() has indicated that the request is finished.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/checkers/directio.c | 13 ++-----------
 tests/directio.c                 | 10 +---------
 2 files changed, 3 insertions(+), 20 deletions(-)

diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
index 25b2383..71ce4cb 100644
--- a/libmultipath/checkers/directio.c
+++ b/libmultipath/checkers/directio.c
@@ -216,7 +216,6 @@ out:
 void libcheck_free (struct checker * c)
 {
 	struct directio_context * ct = (struct directio_context *)c->context;
-	struct io_event event;
 	long flags;
 
 	if (!ct)
@@ -232,9 +231,7 @@ void libcheck_free (struct checker * c)
 		}
 	}
 
-	if (ct->running &&
-	    (ct->req->state != PATH_PENDING ||
-	     io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event) == 0))
+	if (ct->running && ct->req->state != PATH_PENDING)
 		ct->running = 0;
 	if (!ct->running) {
 		free(ct->req->buf);
@@ -351,13 +348,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
 
 		LOG(3, "abort check on timeout");
 
-		r = io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
-		/*
-		 * Only reset ct->running if we really
-		 * could abort the pending I/O
-		 */
-		if (!r)
-			ct->running = 0;
+		io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
 		rc = PATH_DOWN;
 	} else {
 		LOG(4, "async io pending");
diff --git a/tests/directio.c b/tests/directio.c
index db9643e..b340498 100644
--- a/tests/directio.c
+++ b/tests/directio.c
@@ -501,13 +501,8 @@ static void test_free_with_pending(void **state)
 	check_aio_grp(aio_grp, 2, 0);
         libcheck_free(&c[0]);
 	check_aio_grp(aio_grp, 1, 0);
-        will_return(__wrap_io_cancel, 0);
         libcheck_free(&c[1]);
-#ifdef DIO_TEST_DEV
-	check_aio_grp(aio_grp, 1, 1); /* real cancel doesn't remove request */
-#else
-        check_aio_grp(aio_grp, 0, 0);
-#endif
+	check_aio_grp(aio_grp, 1, 1); /* cancel doesn't remove request */
         do_libcheck_reset(1);
 }
 
@@ -533,7 +528,6 @@ static void test_orphaned_aio_group(void **state)
 	assert_int_equal(i, 1);
 	for (i = 0; i < AIO_GROUP_SIZE; i++) {
 		assert_true(is_checker_running(&c[i]));
-		will_return(__wrap_io_cancel, -1);
 		if (i == AIO_GROUP_SIZE - 1) {
 			/* remove the orphaned group and create a new one */
 			will_return(__wrap_io_destroy, 0);
@@ -637,7 +631,6 @@ static void test_orphan_checker_cleanup(void **state)
 	aio_grp = get_aio_grp(c);
 	return_io_getevents_none();
 	do_check_state(&c[0], 0, 30, PATH_PENDING);
-	will_return(__wrap_io_cancel, -1);
 	check_aio_grp(aio_grp, 2, 0);
 	libcheck_free(&c[0]);
 	check_aio_grp(aio_grp, 2, 1);
@@ -662,7 +655,6 @@ static void test_orphan_reset_cleanup(void **state)
 	orphan_aio_grp = get_aio_grp(&c);
 	return_io_getevents_none();
 	do_check_state(&c, 0, 30, PATH_PENDING);
-	will_return(__wrap_io_cancel, -1);
 	check_aio_grp(orphan_aio_grp, 1, 0);
 	libcheck_free(&c);
 	check_aio_grp(orphan_aio_grp, 1, 1);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
  2023-10-24 16:49 ` [PATCH 1/6] libmultipath: reduce log level of directio messages mwilck
  2023-10-24 16:49 ` [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel() mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:33   ` Benjamin Marzinski
  2023-10-24 16:49 ` [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec() mwilck
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

It is wrong to assume that aio data structures can be reused or freed
after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
anyway. Use the io_starttime field to indicate whether an io event
has been completed by the kernel. Make sure no in-flight buffers are freed.

Fixes https://github.com/opensvc/multipath-tools/issues/73.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Cc: Li Xiao Keng <lixiaokeng@huawei.com>
Cc: Miao Guanqin <miaoguanqin@huawei.com>
---
 libmultipath/io_err_stat.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index dc1c252..c474c34 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -111,10 +111,14 @@ static int init_each_dio_ctx(struct dio_ctx *ct, int blksize,
 	return 0;
 }
 
-static void deinit_each_dio_ctx(struct dio_ctx *ct)
+static int deinit_each_dio_ctx(struct dio_ctx *ct)
 {
-	if (ct->buf)
-		free(ct->buf);
+	if (!ct->buf)
+		return 0;
+	if (ct->io_starttime.tv_sec != 0 || ct->io_starttime.tv_nsec != 0)
+		return 1;
+	free(ct->buf);
+	return 0;
 }
 
 static int setup_directio_ctx(struct io_err_stat_path *p)
@@ -164,6 +168,7 @@ fail_close:
 static void free_io_err_stat_path(struct io_err_stat_path *p)
 {
 	int i;
+	int inflight = 0;
 
 	if (!p)
 		return;
@@ -173,8 +178,13 @@ static void free_io_err_stat_path(struct io_err_stat_path *p)
 	cancel_inflight_io(p);
 
 	for (i = 0; i < CONCUR_NR_EVENT; i++)
-		deinit_each_dio_ctx(p->dio_ctx_array + i);
-	free(p->dio_ctx_array);
+		inflight += deinit_each_dio_ctx(p->dio_ctx_array + i);
+
+	if (!inflight)
+		free(p->dio_ctx_array);
+	else
+		io_err_stat_log(2, "%s: can't free aio space of %s, %d IOs in flight",
+				__func__, p->devname, inflight);
 
 	if (p->fd > 0)
 		close(p->fd);
@@ -503,7 +513,7 @@ static int try_to_cancel_timeout_io(struct dio_ctx *ct, struct timespec *t,
 	int		rc = PATH_UNCHECKED;
 	int		r;
 
-	if (ct->io_starttime.tv_sec == 0)
+	if (ct->io_starttime.tv_sec == 0 && ct->io_starttime.tv_nsec == 0)
 		return rc;
 	timespecsub(t, &ct->io_starttime, &difftime);
 	if (difftime.tv_sec > IOTIMEOUT_SEC) {
@@ -514,8 +524,6 @@ static int try_to_cancel_timeout_io(struct dio_ctx *ct, struct timespec *t,
 		if (r)
 			io_err_stat_log(5, "%s: io_cancel error %i",
 					dev, errno);
-		ct->io_starttime.tv_sec = 0;
-		ct->io_starttime.tv_nsec = 0;
 		rc = PATH_TIMEOUT;
 	} else {
 		rc = PATH_PENDING;
@@ -559,8 +567,6 @@ static void cancel_inflight_io(struct io_err_stat_path *pp)
 		if (r)
 			io_err_stat_log(5, "%s: io_cancel error %d, %i",
 					pp->devname, r, errno);
-		ct->io_starttime.tv_sec = 0;
-		ct->io_starttime.tv_nsec = 0;
 	}
 }
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec()
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
                   ` (2 preceding siblings ...)
  2023-10-24 16:49 ` [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:40   ` Benjamin Marzinski
  2023-10-24 16:49 ` [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service mwilck
  2023-10-24 16:49 ` [PATCH 6/6] libmpathutil: remove systemd_service_enabled() mwilck
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/io_err_stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
index c474c34..1982915 100644
--- a/libmultipath/io_err_stat.c
+++ b/libmultipath/io_err_stat.c
@@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
 		pthread_cancel(io_err_stat_thr);
 
 	pthread_join(io_err_stat_thr, NULL);
-	free_io_err_pathvec();
 	io_destroy(ioctx);
+	free_io_err_pathvec();
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
                   ` (3 preceding siblings ...)
  2023-10-24 16:49 ` [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec() mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:43   ` Benjamin Marzinski
  2023-10-24 16:49 ` [PATCH 6/6] libmpathutil: remove systemd_service_enabled() mwilck
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Since 92f0893 ("multipath-tools: install modules-load.d/multipath.conf"), we
use modules-load.d to load the dm-multipath module early on. That isn't
optimal, because the module will always be loaded, even if multipathd is
not running.

Use systemd's "modprobe@.service" instead to make sure the module is loaded
before multipathd starts. If "multipath -u" is invoked from udev rules
before multipathd.service has been started, it will access the socket,
which will autostart multipathd via socket activation and cause the
dm-multipath module to be loaded.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipath/Makefile            | 2 --
 multipath/modules-load.conf   | 3 ---
 multipathd/multipathd.service | 2 ++
 3 files changed, 2 insertions(+), 5 deletions(-)
 delete mode 100644 multipath/modules-load.conf

diff --git a/multipath/Makefile b/multipath/Makefile
index 68cb5ce..a69262b 100644
--- a/multipath/Makefile
+++ b/multipath/Makefile
@@ -28,7 +28,6 @@ install:
 	$(Q)$(INSTALL_PROGRAM) -m 644 11-dm-mpath.rules $(DESTDIR)$(udevrulesdir)
 	$(Q)$(INSTALL_PROGRAM) -m 644 multipath.rules $(DESTDIR)$(udevrulesdir)/56-multipath.rules
 	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(modulesloaddir)
-	$(Q)$(INSTALL_PROGRAM) -m 644 modules-load.conf $(DESTDIR)$(modulesloaddir)/multipath.conf
 	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(tmpfilesdir)
 	$(Q)$(INSTALL_PROGRAM) -m 644 tmpfiles.conf $(DESTDIR)$(tmpfilesdir)/multipath.conf
 	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8
@@ -44,7 +43,6 @@ endif
 uninstall:
 	$(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC)
 	$(Q)$(RM) $(DESTDIR)$(udevrulesdir)/11-dm-mpath.rules
-	$(Q)$(RM) $(DESTDIR)$(modulesloaddir)/multipath.conf
 	$(Q)$(RM) $(DESTDIR)$(modulesloaddir)/scsi_dh.conf
 	$(Q)$(RM) $(DESTDIR)$(libudevdir)/rules.d/56-multipath.rules
 	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8
diff --git a/multipath/modules-load.conf b/multipath/modules-load.conf
deleted file mode 100644
index b517d32..0000000
--- a/multipath/modules-load.conf
+++ /dev/null
@@ -1,3 +0,0 @@
-# load dm-multipath early, both multipathd and multipath depend on it
-# (note that multipath may be called from udev rules!)
-dm-multipath
diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
index 5a9cde1..40cf32c 100644
--- a/multipathd/multipathd.service
+++ b/multipathd/multipathd.service
@@ -3,6 +3,8 @@ Description=Device-Mapper Multipath Device Controller
 Before=lvm2-activation-early.service
 Before=local-fs-pre.target blk-availability.service shutdown.target
 Wants=systemd-udevd-kernel.socket
+Requires=modprobe@dm_multipath.service
+After=modprobe@dm_multipath.service
 After=systemd-udevd-kernel.socket
 After=multipathd.socket systemd-remount-fs.service
 Before=initrd-cleanup.service
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/6] libmpathutil: remove systemd_service_enabled()
  2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
                   ` (4 preceding siblings ...)
  2023-10-24 16:49 ` [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service mwilck
@ 2023-10-24 16:49 ` mwilck
  2023-10-25 23:58   ` Benjamin Marzinski
  5 siblings, 1 reply; 18+ messages in thread
From: mwilck @ 2023-10-24 16:49 UTC (permalink / raw)
  To: Christophe Varoqui, Benjamin Marzinski
  Cc: dm-devel, Miao Guanqin, Li Xiao Keng, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

Since  c203929 ("multipathd.service: add dependency on
systemd-udevd-kernel.socket"), multipathd will start early during boot.
Moreover, we recommend using socket activation for multipathd,
and if multipathd.socket is enabled, the __mpath_connect()
check will succeed anyway.

The systemd_service_enabled() test was just "good enough" for
standard situations but never robust; it checked for multipathd.wants in
"some" directory, which might or might not be the current target,
and it would return true even of multipathd.service was masked.

Remove this test.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmpathutil/libmpathutil.version | 17 +++------
 libmpathutil/util.c               | 58 -------------------------------
 libmpathutil/util.h               |  1 -
 libmultipath/valid.c              | 16 ++-------
 tests/valid.c                     | 24 ++-----------
 5 files changed, 9 insertions(+), 107 deletions(-)

diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
index 6ebb718..15ff467 100644
--- a/libmpathutil/libmpathutil.version
+++ b/libmpathutil/libmpathutil.version
@@ -93,12 +93,15 @@ local:
 };
 
 /* symbols referenced internally by libmultipath */
-LIBMPATHUTIL_1.0 {
+LIBMPATHUTIL_2.0 {
 	alloc_bitfield;
 	__append_strbuf_str;
 	append_strbuf_quoted;
 	basenamecpy;
+	cleanup_fd_ptr;
 	cleanup_free_ptr;
+	cleanup_vector_free;
+	cleanup_fclose;
 	filepresent;
 	find_keyword;
 	free_keywords;
@@ -120,21 +123,9 @@ LIBMPATHUTIL_1.0 {
 	snprint_keyword;
 	steal_strbuf_str;
 	strlcat;
-	systemd_service_enabled;
 	validate_config_strvec;
 	vector_find_or_add_slot;
 	vector_insert_slot;
 	vector_move_up;
 	vector_sort;
 };
-
-LIBMPATHUTIL_1.1 {
-global:
-	cleanup_fd_ptr;
-} LIBMPATHUTIL_1.0;
-
-LIBMPATHUTIL_1.2 {
-global:
-	cleanup_vector_free;
-	cleanup_fclose;
-} LIBMPATHUTIL_1.0;
diff --git a/libmpathutil/util.c b/libmpathutil/util.c
index 92f25a5..9d147fc 100644
--- a/libmpathutil/util.c
+++ b/libmpathutil/util.c
@@ -213,64 +213,6 @@ setup_thread_attr(pthread_attr_t *attr, size_t stacksize, int detached)
 	}
 }
 
-int systemd_service_enabled_in(const char *dev, const char *prefix)
-{
-	static const char service[] = "multipathd.service";
-	char path[PATH_MAX], file[PATH_MAX];
-	DIR *dirfd;
-	struct dirent *d;
-	int found = 0;
-
-	if (safe_sprintf(path, "%s/systemd/system", prefix))
-		return 0;
-
-	condlog(3, "%s: checking for %s in %s", dev, service, path);
-
-	dirfd = opendir(path);
-	if (dirfd == NULL)
-		return 0;
-
-	while ((d = readdir(dirfd)) != NULL) {
-		char *p;
-		struct stat stbuf;
-
-		if ((strcmp(d->d_name,".") == 0) ||
-		    (strcmp(d->d_name,"..") == 0))
-			continue;
-
-		if (strlen(d->d_name) < 6)
-			continue;
-
-		p = d->d_name + strlen(d->d_name) - 6;
-		if (strcmp(p, ".wants"))
-			continue;
-		if (!safe_sprintf(file, "%s/%s/%s",
-				  path, d->d_name, service)
-		    && stat(file, &stbuf) == 0) {
-			condlog(3, "%s: found %s", dev, file);
-			found++;
-			break;
-		}
-	}
-	closedir(dirfd);
-
-	return found;
-}
-
-int systemd_service_enabled(const char *dev)
-{
-	int found = 0;
-
-	found = systemd_service_enabled_in(dev, "/etc");
-	if (!found)
-		found = systemd_service_enabled_in(dev, "/usr/lib");
-	if (!found)
-		found = systemd_service_enabled_in(dev, "/lib");
-	if (!found)
-		found = systemd_service_enabled_in(dev, "/run");
-	return found;
-}
-
 static int _linux_version_code;
 static pthread_once_t _lvc_initialized = PTHREAD_ONCE_INIT;
 
diff --git a/libmpathutil/util.h b/libmpathutil/util.h
index 99a471d..de9fcfd 100644
--- a/libmpathutil/util.h
+++ b/libmpathutil/util.h
@@ -21,7 +21,6 @@ size_t strlcat(char * restrict dst, const char * restrict src, size_t size);
 dev_t parse_devt(const char *dev_t);
 char *convert_dev(char *dev, int is_path_device);
 void setup_thread_attr(pthread_attr_t *attr, size_t stacksize, int detached);
-int systemd_service_enabled(const char *dev);
 int get_linux_version_code(void);
 int safe_write(int fd, const void *buf, size_t count);
 void set_max_fds(rlim_t max_fds);
diff --git a/libmultipath/valid.c b/libmultipath/valid.c
index d4dae3e..f223778 100644
--- a/libmultipath/valid.c
+++ b/libmultipath/valid.c
@@ -314,23 +314,11 @@ is_path_valid(const char *name, struct config *conf, struct path *pp,
 		return PATH_IS_VALID_NO_CHECK;
 	}
 
-	/*
-	 * "multipath -u" may be run before the daemon is started. In this
-	 * case, systemd might own the socket but might delay multipathd
-	 * startup until some other unit (udev settle!)  has finished
-	 * starting. With many LUNs, the listen backlog may be exceeded, which
-	 * would cause connect() to block. This causes udev workers calling
-	 * "multipath -u" to hang, and thus creates a deadlock, until "udev
-	 * settle" times out.  To avoid this, call connect() in non-blocking
-	 * mode here, and take EAGAIN as indication for a filled-up systemd
-	 * backlog.
-	 */
-
 	if (check_multipathd) {
 		fd = __mpath_connect(1);
 		if (fd < 0) {
-			if (errno != EAGAIN && !systemd_service_enabled(name)) {
-				condlog(3, "multipathd not running or enabled");
+			if (errno != EAGAIN) {
+				condlog(3, "multipathd not running");
 				return PATH_IS_NOT_VALID;
 			}
 		} else
diff --git a/tests/valid.c b/tests/valid.c
index 7032932..18a5a7b 100644
--- a/tests/valid.c
+++ b/tests/valid.c
@@ -62,11 +62,6 @@ int __wrap___mpath_connect(int nonblocking)
 	return -1;
 }
 
-int __wrap_systemd_service_enabled(const char *dev)
-{
-	return (int)mock_type(bool);
-}
-
 /* There's no point in checking the return value here */
 int __wrap_mpath_disconnect(int fd)
 {
@@ -216,7 +211,6 @@ enum {
 enum {
 	CHECK_MPATHD_RUNNING,
 	CHECK_MPATHD_EAGAIN,
-	CHECK_MPATHD_ENABLED,
 	CHECK_MPATHD_SKIP,
 };
 
@@ -232,11 +226,8 @@ static void setup_passing(char *name, char *wwid, unsigned int check_multipathd,
 	else if (check_multipathd == CHECK_MPATHD_EAGAIN) {
 		will_return(__wrap___mpath_connect, false);
 		will_return(__wrap___mpath_connect, EAGAIN);
-	} else if (check_multipathd == CHECK_MPATHD_ENABLED) {
-		will_return(__wrap___mpath_connect, false);
-		will_return(__wrap___mpath_connect, ECONNREFUSED);
-		will_return(__wrap_systemd_service_enabled, true);
 	}
+
 	/* nothing for CHECK_MPATHD_SKIP */
 	if (stage == STAGE_CHECK_MULTIPATHD)
 		return;
@@ -342,19 +333,10 @@ static void test_check_multipathd(void **state)
 	will_return(__wrap_sysfs_is_multipathed, false);
 	will_return(__wrap___mpath_connect, false);
 	will_return(__wrap___mpath_connect, ECONNREFUSED);
-	will_return(__wrap_systemd_service_enabled, false);
+
 	assert_int_equal(is_path_valid(name, &conf, &pp, true),
 			 PATH_IS_NOT_VALID);
 	assert_string_equal(pp.dev, name);
-	/* test pass because service is enabled. fail getting udev */
-	memset(&pp, 0, sizeof(pp));
-	setup_passing(name, NULL, CHECK_MPATHD_ENABLED, STAGE_CHECK_MULTIPATHD);
-	will_return(__wrap_udev_device_new_from_subsystem_sysname, false);
-	will_return(__wrap_udev_device_new_from_subsystem_sysname,
-		    name);
-	assert_int_equal(is_path_valid(name, &conf, &pp, true),
-			 PATH_IS_ERROR);
-	assert_string_equal(pp.dev, name);
 	/* test pass because connect returned EAGAIN. fail getting udev */
 	setup_passing(name, NULL, CHECK_MPATHD_EAGAIN, STAGE_CHECK_MULTIPATHD);
 	will_return(__wrap_udev_device_new_from_subsystem_sysname, false);
@@ -533,7 +515,7 @@ static void test_check_uuid_present(void **state)
 
 	memset(&pp, 0, sizeof(pp));
 	conf.find_multipaths = FIND_MULTIPATHS_STRICT;
-	setup_passing(name, wwid, CHECK_MPATHD_ENABLED, STAGE_CHECK_WWIDS);
+	setup_passing(name, wwid, CHECK_MPATHD_RUNNING, STAGE_CHECK_WWIDS);
 	will_return(__wrap_dm_map_present_by_uuid, 1);
 	will_return(__wrap_dm_map_present_by_uuid, wwid);
 	assert_int_equal(is_path_valid(name, &conf, &pp, true),
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/6] libmultipath: reduce log level of directio messages
  2023-10-24 16:49 ` [PATCH 1/6] libmultipath: reduce log level of directio messages mwilck
@ 2023-10-25 23:10   ` Benjamin Marzinski
  0 siblings, 0 replies; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:10 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:32PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> The directio checker logs too much at verbosity 3.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/directio.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
> index 2f3ece0..25b2383 100644
> --- a/libmultipath/checkers/directio.c
> +++ b/libmultipath/checkers/directio.c
> @@ -266,7 +266,7 @@ get_events(struct aio_group *aio_grp, struct timespec *timeout)
>  		for (i = 0; i < nr; i++) {
>  			struct async_req *req = container_of(events[i].obj, struct async_req, io);
>  
> -			LOG(3, "io finished %lu/%lu", events[i].res,
> +			LOG(4, "io finished %lu/%lu", events[i].res,
>  			    events[i].res2);
>  
>  			/* got an orphaned request */
> @@ -283,7 +283,7 @@ get_events(struct aio_group *aio_grp, struct timespec *timeout)
>  	} while (nr == 128); /* assume there are more events and try again */
>  
>  	if (nr < 0)
> -		LOG(3, "async io getevents returned %i (errno=%s)",
> +		LOG(4, "async io getevents returned %i (errno=%s)",
>  		    nr, strerror(errno));
>  
>  	return got_events;
> @@ -315,7 +315,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
>  	} else {
>  		struct iocb *ios[1] = { &ct->req->io };
>  
> -		LOG(3, "starting new request");
> +		LOG(4, "starting new request");
>  		memset(&ct->req->io, 0, sizeof(struct iocb));
>  		io_prep_pread(&ct->req->io, fd, ct->req->buf,
>  			      ct->req->blksize, 0);
> @@ -360,7 +360,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
>  			ct->running = 0;
>  		rc = PATH_DOWN;
>  	} else {
> -		LOG(3, "async io pending");
> +		LOG(4, "async io pending");
>  		rc = PATH_PENDING;
>  	}
>  
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel()
  2023-10-24 16:49 ` [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel() mwilck
@ 2023-10-25 23:14   ` Benjamin Marzinski
  2023-10-26  5:33     ` Hannes Reinecke
  2023-10-26  8:31     ` Martin Wilck
  0 siblings, 2 replies; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:14 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:33PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> io_cancel() never succeeds, and even if it did, io_getevents() must
> still be called to reclaim the IO resources from the kernel.
> Don't pretend the opposite by resetting ct->running, or freeing the
> memory, before io_getevents() has indicated that the request is finished.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/checkers/directio.c | 13 ++-----------
>  tests/directio.c                 | 10 +---------
>  2 files changed, 3 insertions(+), 20 deletions(-)
> 
> diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
> index 25b2383..71ce4cb 100644
> --- a/libmultipath/checkers/directio.c
> +++ b/libmultipath/checkers/directio.c
> @@ -216,7 +216,6 @@ out:
>  void libcheck_free (struct checker * c)
>  {
>  	struct directio_context * ct = (struct directio_context *)c->context;
> -	struct io_event event;
>  	long flags;
>  
>  	if (!ct)
> @@ -232,9 +231,7 @@ void libcheck_free (struct checker * c)
>  		}
>  	}
>  
> -	if (ct->running &&
> -	    (ct->req->state != PATH_PENDING ||
> -	     io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event) == 0))

Does io_cancel() do nothing? If does make the IO likely to end sooner,
even if it always returns failure, then we should probably still run it
when we have an outstanding request that we are going to orphan. We
should just move the io_cancel() to the else block, where we add the
request to the orphans list.

Otherwise, this looks good.
-Ben

> +	if (ct->running && ct->req->state != PATH_PENDING)
>  		ct->running = 0;
>  	if (!ct->running) {
>  		free(ct->req->buf);
> @@ -351,13 +348,7 @@ check_state(int fd, struct directio_context *ct, int sync, int timeout_secs)
>  
>  		LOG(3, "abort check on timeout");
>  
> -		r = io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
> -		/*
> -		 * Only reset ct->running if we really
> -		 * could abort the pending I/O
> -		 */
> -		if (!r)
> -			ct->running = 0;
> +		io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event);
>  		rc = PATH_DOWN;
>  	} else {
>  		LOG(4, "async io pending");
> diff --git a/tests/directio.c b/tests/directio.c
> index db9643e..b340498 100644
> --- a/tests/directio.c
> +++ b/tests/directio.c
> @@ -501,13 +501,8 @@ static void test_free_with_pending(void **state)
>  	check_aio_grp(aio_grp, 2, 0);
>          libcheck_free(&c[0]);
>  	check_aio_grp(aio_grp, 1, 0);
> -        will_return(__wrap_io_cancel, 0);
>          libcheck_free(&c[1]);
> -#ifdef DIO_TEST_DEV
> -	check_aio_grp(aio_grp, 1, 1); /* real cancel doesn't remove request */
> -#else
> -        check_aio_grp(aio_grp, 0, 0);
> -#endif
> +	check_aio_grp(aio_grp, 1, 1); /* cancel doesn't remove request */
>          do_libcheck_reset(1);
>  }
>  
> @@ -533,7 +528,6 @@ static void test_orphaned_aio_group(void **state)
>  	assert_int_equal(i, 1);
>  	for (i = 0; i < AIO_GROUP_SIZE; i++) {
>  		assert_true(is_checker_running(&c[i]));
> -		will_return(__wrap_io_cancel, -1);
>  		if (i == AIO_GROUP_SIZE - 1) {
>  			/* remove the orphaned group and create a new one */
>  			will_return(__wrap_io_destroy, 0);
> @@ -637,7 +631,6 @@ static void test_orphan_checker_cleanup(void **state)
>  	aio_grp = get_aio_grp(c);
>  	return_io_getevents_none();
>  	do_check_state(&c[0], 0, 30, PATH_PENDING);
> -	will_return(__wrap_io_cancel, -1);
>  	check_aio_grp(aio_grp, 2, 0);
>  	libcheck_free(&c[0]);
>  	check_aio_grp(aio_grp, 2, 1);
> @@ -662,7 +655,6 @@ static void test_orphan_reset_cleanup(void **state)
>  	orphan_aio_grp = get_aio_grp(&c);
>  	return_io_getevents_none();
>  	do_check_state(&c, 0, 30, PATH_PENDING);
> -	will_return(__wrap_io_cancel, -1);
>  	check_aio_grp(orphan_aio_grp, 1, 0);
>  	libcheck_free(&c);
>  	check_aio_grp(orphan_aio_grp, 1, 1);
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion
  2023-10-24 16:49 ` [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion mwilck
@ 2023-10-25 23:33   ` Benjamin Marzinski
  2023-10-26  9:12     ` Martin Wilck
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:33 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:34PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> It is wrong to assume that aio data structures can be reused or freed
> after io_cancel(). io_cancel() will almost always return -EINPROGRESS,
> anyway. Use the io_starttime field to indicate whether an io event
> has been completed by the kernel. Make sure no in-flight buffers are freed.

There's a minor issue I mention below, but the bigger issue I noticed
isn't related to your fix.  It's that, unless I'm missing something,
we're not using libaio correctly in this code.  We call io_setup() to
allow CONCUR_NR_EVENT concurrent requests. But we could have up to
CONCUR_NR_EVENT * number_of_delayed_paths concurrent requests. The
easiest fix is to create an ioctx per delayed path. However this would
mean calling io_destroy() after each round of testings, which could
potentially stall this thread.
 
> Fixes https://github.com/opensvc/multipath-tools/issues/73.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> Cc: Li Xiao Keng <lixiaokeng@huawei.com>
> Cc: Miao Guanqin <miaoguanqin@huawei.com>
> ---
>  libmultipath/io_err_stat.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
> index dc1c252..c474c34 100644
> --- a/libmultipath/io_err_stat.c
> +++ b/libmultipath/io_err_stat.c
> @@ -111,10 +111,14 @@ static int init_each_dio_ctx(struct dio_ctx *ct, int blksize,
>  	return 0;
>  }
>  
> -static void deinit_each_dio_ctx(struct dio_ctx *ct)
> +static int deinit_each_dio_ctx(struct dio_ctx *ct)
>  {
> -	if (ct->buf)
> -		free(ct->buf);
> +	if (!ct->buf)
> +		return 0;
> +	if (ct->io_starttime.tv_sec != 0 || ct->io_starttime.tv_nsec != 0)
> +		return 1;
> +	free(ct->buf);
> +	return 0;
>  }
>  
>  static int setup_directio_ctx(struct io_err_stat_path *p)
> @@ -164,6 +168,7 @@ fail_close:
>  static void free_io_err_stat_path(struct io_err_stat_path *p)
>  {
>  	int i;
> +	int inflight = 0;
>  
>  	if (!p)
>  		return;
> @@ -173,8 +178,13 @@ static void free_io_err_stat_path(struct io_err_stat_path *p)
>  	cancel_inflight_io(p);

There's not much point in calling cancel_inflight_io() here, since since
we don't try to handle the done events afterwards, is there? If there
are unhandled ones, they will remain unhandled, and we will end up
leaking memory regardless of whether or not they're cancelled.

-Ben

>  	for (i = 0; i < CONCUR_NR_EVENT; i++)
> -		deinit_each_dio_ctx(p->dio_ctx_array + i);
> -	free(p->dio_ctx_array);
> +		inflight += deinit_each_dio_ctx(p->dio_ctx_array + i);
> +
> +	if (!inflight)
> +		free(p->dio_ctx_array);
> +	else
> +		io_err_stat_log(2, "%s: can't free aio space of %s, %d IOs in flight",
> +				__func__, p->devname, inflight);
>  
>  	if (p->fd > 0)
>  		close(p->fd);
> @@ -503,7 +513,7 @@ static int try_to_cancel_timeout_io(struct dio_ctx *ct, struct timespec *t,
>  	int		rc = PATH_UNCHECKED;
>  	int		r;
>  
> -	if (ct->io_starttime.tv_sec == 0)
> +	if (ct->io_starttime.tv_sec == 0 && ct->io_starttime.tv_nsec == 0)
>  		return rc;
>  	timespecsub(t, &ct->io_starttime, &difftime);
>  	if (difftime.tv_sec > IOTIMEOUT_SEC) {
> @@ -514,8 +524,6 @@ static int try_to_cancel_timeout_io(struct dio_ctx *ct, struct timespec *t,
>  		if (r)
>  			io_err_stat_log(5, "%s: io_cancel error %i",
>  					dev, errno);
> -		ct->io_starttime.tv_sec = 0;
> -		ct->io_starttime.tv_nsec = 0;
>  		rc = PATH_TIMEOUT;
>  	} else {
>  		rc = PATH_PENDING;
> @@ -559,8 +567,6 @@ static void cancel_inflight_io(struct io_err_stat_path *pp)
>  		if (r)
>  			io_err_stat_log(5, "%s: io_cancel error %d, %i",
>  					pp->devname, r, errno);
> -		ct->io_starttime.tv_sec = 0;
> -		ct->io_starttime.tv_nsec = 0;
>  	}
>  }
>  
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec()
  2023-10-24 16:49 ` [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec() mwilck
@ 2023-10-25 23:40   ` Benjamin Marzinski
  0 siblings, 0 replies; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:40 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:35PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  libmultipath/io_err_stat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libmultipath/io_err_stat.c b/libmultipath/io_err_stat.c
> index c474c34..1982915 100644
> --- a/libmultipath/io_err_stat.c
> +++ b/libmultipath/io_err_stat.c
> @@ -751,6 +751,6 @@ void stop_io_err_stat_thread(void)
>  		pthread_cancel(io_err_stat_thr);
>  
>  	pthread_join(io_err_stat_thr, NULL);
> -	free_io_err_pathvec();
>  	io_destroy(ioctx);
> +	free_io_err_pathvec();

This looks to me like another reason why we don't want the
cancel_inflight_io() function in free_io_err_stat_path(). I don't think
it's valid to call io_cancel(), which cancel_inflight_io() does, after
you call io_destroy().

-Ben

>  }
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service
  2023-10-24 16:49 ` [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service mwilck
@ 2023-10-25 23:43   ` Benjamin Marzinski
  2023-10-26  8:44     ` Martin Wilck
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:43 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:36PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since 92f0893 ("multipath-tools: install modules-load.d/multipath.conf"), we
> use modules-load.d to load the dm-multipath module early on. That isn't
> optimal, because the module will always be loaded, even if multipathd is
> not running.
> 
> Use systemd's "modprobe@.service" instead to make sure the module is loaded
> before multipathd starts. If "multipath -u" is invoked from udev rules
> before multipathd.service has been started, it will access the socket,
> which will autostart multipathd via socket activation and cause the
> dm-multipath module to be loaded.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipath/Makefile            | 2 --
>  multipath/modules-load.conf   | 3 ---
>  multipathd/multipathd.service | 2 ++
>  3 files changed, 2 insertions(+), 5 deletions(-)
>  delete mode 100644 multipath/modules-load.conf
> 
> diff --git a/multipath/Makefile b/multipath/Makefile
> index 68cb5ce..a69262b 100644
> --- a/multipath/Makefile
> +++ b/multipath/Makefile
> @@ -28,7 +28,6 @@ install:
>  	$(Q)$(INSTALL_PROGRAM) -m 644 11-dm-mpath.rules $(DESTDIR)$(udevrulesdir)
>  	$(Q)$(INSTALL_PROGRAM) -m 644 multipath.rules $(DESTDIR)$(udevrulesdir)/56-multipath.rules
>  	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(modulesloaddir)
> -	$(Q)$(INSTALL_PROGRAM) -m 644 modules-load.conf $(DESTDIR)$(modulesloaddir)/multipath.conf
>  	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(tmpfilesdir)
>  	$(Q)$(INSTALL_PROGRAM) -m 644 tmpfiles.conf $(DESTDIR)$(tmpfilesdir)/multipath.conf
>  	$(Q)$(INSTALL_PROGRAM) -d $(DESTDIR)$(mandir)/man8
> @@ -44,7 +43,6 @@ endif
>  uninstall:
>  	$(Q)$(RM) $(DESTDIR)$(bindir)/$(EXEC)
>  	$(Q)$(RM) $(DESTDIR)$(udevrulesdir)/11-dm-mpath.rules
> -	$(Q)$(RM) $(DESTDIR)$(modulesloaddir)/multipath.conf
>  	$(Q)$(RM) $(DESTDIR)$(modulesloaddir)/scsi_dh.conf
>  	$(Q)$(RM) $(DESTDIR)$(libudevdir)/rules.d/56-multipath.rules
>  	$(Q)$(RM) $(DESTDIR)$(mandir)/man8/$(EXEC).8
> diff --git a/multipath/modules-load.conf b/multipath/modules-load.conf
> deleted file mode 100644
> index b517d32..0000000
> --- a/multipath/modules-load.conf
> +++ /dev/null
> @@ -1,3 +0,0 @@
> -# load dm-multipath early, both multipathd and multipath depend on it
> -# (note that multipath may be called from udev rules!)
> -dm-multipath
> diff --git a/multipathd/multipathd.service b/multipathd/multipathd.service
> index 5a9cde1..40cf32c 100644
> --- a/multipathd/multipathd.service
> +++ b/multipathd/multipathd.service
> @@ -3,6 +3,8 @@ Description=Device-Mapper Multipath Device Controller
>  Before=lvm2-activation-early.service
>  Before=local-fs-pre.target blk-availability.service shutdown.target
>  Wants=systemd-udevd-kernel.socket
> +Requires=modprobe@dm_multipath.service
> +After=modprobe@dm_multipath.service
>  After=systemd-udevd-kernel.socket
>  After=multipathd.socket systemd-remount-fs.service
>  Before=initrd-cleanup.service
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] libmpathutil: remove systemd_service_enabled()
  2023-10-24 16:49 ` [PATCH 6/6] libmpathutil: remove systemd_service_enabled() mwilck
@ 2023-10-25 23:58   ` Benjamin Marzinski
  2023-10-26  8:42     ` Martin Wilck
  0 siblings, 1 reply; 18+ messages in thread
From: Benjamin Marzinski @ 2023-10-25 23:58 UTC (permalink / raw)
  To: mwilck; +Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Tue, Oct 24, 2023 at 06:49:37PM +0200, mwilck@suse.com wrote:
> From: Martin Wilck <mwilck@suse.com>
> 
> Since  c203929 ("multipathd.service: add dependency on
> systemd-udevd-kernel.socket"), multipathd will start early during boot.
> Moreover, we recommend using socket activation for multipathd,
> and if multipathd.socket is enabled, the __mpath_connect()
> check will succeed anyway.
> 
> The systemd_service_enabled() test was just "good enough" for
> standard situations but never robust; it checked for multipathd.wants in
> "some" directory, which might or might not be the current target,
> and it would return true even of multipathd.service was masked.
> 
> Remove this test.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

I'd be lying if I said I had no worries at all about this. Removing this
check means that if someone isn't using socket activation, and
multipathd is temporarily down, and a new device appears, it will always
be marked as not claimed by multipath. This could cause the device to be
grabbed by LVM, and not multipathed. This is probably just may paranoia
talking, since the chance of this happening in the real world seems
pretty low. So,

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

> ---
>  libmpathutil/libmpathutil.version | 17 +++------
>  libmpathutil/util.c               | 58 -------------------------------
>  libmpathutil/util.h               |  1 -
>  libmultipath/valid.c              | 16 ++-------
>  tests/valid.c                     | 24 ++-----------
>  5 files changed, 9 insertions(+), 107 deletions(-)
> 
> diff --git a/libmpathutil/libmpathutil.version b/libmpathutil/libmpathutil.version
> index 6ebb718..15ff467 100644
> --- a/libmpathutil/libmpathutil.version
> +++ b/libmpathutil/libmpathutil.version
> @@ -93,12 +93,15 @@ local:
>  };
>  
>  /* symbols referenced internally by libmultipath */
> -LIBMPATHUTIL_1.0 {
> +LIBMPATHUTIL_2.0 {
>  	alloc_bitfield;
>  	__append_strbuf_str;
>  	append_strbuf_quoted;
>  	basenamecpy;
> +	cleanup_fd_ptr;
>  	cleanup_free_ptr;
> +	cleanup_vector_free;
> +	cleanup_fclose;
>  	filepresent;
>  	find_keyword;
>  	free_keywords;
> @@ -120,21 +123,9 @@ LIBMPATHUTIL_1.0 {
>  	snprint_keyword;
>  	steal_strbuf_str;
>  	strlcat;
> -	systemd_service_enabled;
>  	validate_config_strvec;
>  	vector_find_or_add_slot;
>  	vector_insert_slot;
>  	vector_move_up;
>  	vector_sort;
>  };
> -
> -LIBMPATHUTIL_1.1 {
> -global:
> -	cleanup_fd_ptr;
> -} LIBMPATHUTIL_1.0;
> -
> -LIBMPATHUTIL_1.2 {
> -global:
> -	cleanup_vector_free;
> -	cleanup_fclose;
> -} LIBMPATHUTIL_1.0;
> diff --git a/libmpathutil/util.c b/libmpathutil/util.c
> index 92f25a5..9d147fc 100644
> --- a/libmpathutil/util.c
> +++ b/libmpathutil/util.c
> @@ -213,64 +213,6 @@ setup_thread_attr(pthread_attr_t *attr, size_t stacksize, int detached)
>  	}
>  }
>  
> -int systemd_service_enabled_in(const char *dev, const char *prefix)
> -{
> -	static const char service[] = "multipathd.service";
> -	char path[PATH_MAX], file[PATH_MAX];
> -	DIR *dirfd;
> -	struct dirent *d;
> -	int found = 0;
> -
> -	if (safe_sprintf(path, "%s/systemd/system", prefix))
> -		return 0;
> -
> -	condlog(3, "%s: checking for %s in %s", dev, service, path);
> -
> -	dirfd = opendir(path);
> -	if (dirfd == NULL)
> -		return 0;
> -
> -	while ((d = readdir(dirfd)) != NULL) {
> -		char *p;
> -		struct stat stbuf;
> -
> -		if ((strcmp(d->d_name,".") == 0) ||
> -		    (strcmp(d->d_name,"..") == 0))
> -			continue;
> -
> -		if (strlen(d->d_name) < 6)
> -			continue;
> -
> -		p = d->d_name + strlen(d->d_name) - 6;
> -		if (strcmp(p, ".wants"))
> -			continue;
> -		if (!safe_sprintf(file, "%s/%s/%s",
> -				  path, d->d_name, service)
> -		    && stat(file, &stbuf) == 0) {
> -			condlog(3, "%s: found %s", dev, file);
> -			found++;
> -			break;
> -		}
> -	}
> -	closedir(dirfd);
> -
> -	return found;
> -}
> -
> -int systemd_service_enabled(const char *dev)
> -{
> -	int found = 0;
> -
> -	found = systemd_service_enabled_in(dev, "/etc");
> -	if (!found)
> -		found = systemd_service_enabled_in(dev, "/usr/lib");
> -	if (!found)
> -		found = systemd_service_enabled_in(dev, "/lib");
> -	if (!found)
> -		found = systemd_service_enabled_in(dev, "/run");
> -	return found;
> -}
> -
>  static int _linux_version_code;
>  static pthread_once_t _lvc_initialized = PTHREAD_ONCE_INIT;
>  
> diff --git a/libmpathutil/util.h b/libmpathutil/util.h
> index 99a471d..de9fcfd 100644
> --- a/libmpathutil/util.h
> +++ b/libmpathutil/util.h
> @@ -21,7 +21,6 @@ size_t strlcat(char * restrict dst, const char * restrict src, size_t size);
>  dev_t parse_devt(const char *dev_t);
>  char *convert_dev(char *dev, int is_path_device);
>  void setup_thread_attr(pthread_attr_t *attr, size_t stacksize, int detached);
> -int systemd_service_enabled(const char *dev);
>  int get_linux_version_code(void);
>  int safe_write(int fd, const void *buf, size_t count);
>  void set_max_fds(rlim_t max_fds);
> diff --git a/libmultipath/valid.c b/libmultipath/valid.c
> index d4dae3e..f223778 100644
> --- a/libmultipath/valid.c
> +++ b/libmultipath/valid.c
> @@ -314,23 +314,11 @@ is_path_valid(const char *name, struct config *conf, struct path *pp,
>  		return PATH_IS_VALID_NO_CHECK;
>  	}
>  
> -	/*
> -	 * "multipath -u" may be run before the daemon is started. In this
> -	 * case, systemd might own the socket but might delay multipathd
> -	 * startup until some other unit (udev settle!)  has finished
> -	 * starting. With many LUNs, the listen backlog may be exceeded, which
> -	 * would cause connect() to block. This causes udev workers calling
> -	 * "multipath -u" to hang, and thus creates a deadlock, until "udev
> -	 * settle" times out.  To avoid this, call connect() in non-blocking
> -	 * mode here, and take EAGAIN as indication for a filled-up systemd
> -	 * backlog.
> -	 */
> -
>  	if (check_multipathd) {
>  		fd = __mpath_connect(1);
>  		if (fd < 0) {
> -			if (errno != EAGAIN && !systemd_service_enabled(name)) {
> -				condlog(3, "multipathd not running or enabled");
> +			if (errno != EAGAIN) {
> +				condlog(3, "multipathd not running");
>  				return PATH_IS_NOT_VALID;
>  			}
>  		} else
> diff --git a/tests/valid.c b/tests/valid.c
> index 7032932..18a5a7b 100644
> --- a/tests/valid.c
> +++ b/tests/valid.c
> @@ -62,11 +62,6 @@ int __wrap___mpath_connect(int nonblocking)
>  	return -1;
>  }
>  
> -int __wrap_systemd_service_enabled(const char *dev)
> -{
> -	return (int)mock_type(bool);
> -}
> -
>  /* There's no point in checking the return value here */
>  int __wrap_mpath_disconnect(int fd)
>  {
> @@ -216,7 +211,6 @@ enum {
>  enum {
>  	CHECK_MPATHD_RUNNING,
>  	CHECK_MPATHD_EAGAIN,
> -	CHECK_MPATHD_ENABLED,
>  	CHECK_MPATHD_SKIP,
>  };
>  
> @@ -232,11 +226,8 @@ static void setup_passing(char *name, char *wwid, unsigned int check_multipathd,
>  	else if (check_multipathd == CHECK_MPATHD_EAGAIN) {
>  		will_return(__wrap___mpath_connect, false);
>  		will_return(__wrap___mpath_connect, EAGAIN);
> -	} else if (check_multipathd == CHECK_MPATHD_ENABLED) {
> -		will_return(__wrap___mpath_connect, false);
> -		will_return(__wrap___mpath_connect, ECONNREFUSED);
> -		will_return(__wrap_systemd_service_enabled, true);
>  	}
> +
>  	/* nothing for CHECK_MPATHD_SKIP */
>  	if (stage == STAGE_CHECK_MULTIPATHD)
>  		return;
> @@ -342,19 +333,10 @@ static void test_check_multipathd(void **state)
>  	will_return(__wrap_sysfs_is_multipathed, false);
>  	will_return(__wrap___mpath_connect, false);
>  	will_return(__wrap___mpath_connect, ECONNREFUSED);
> -	will_return(__wrap_systemd_service_enabled, false);
> +
>  	assert_int_equal(is_path_valid(name, &conf, &pp, true),
>  			 PATH_IS_NOT_VALID);
>  	assert_string_equal(pp.dev, name);
> -	/* test pass because service is enabled. fail getting udev */
> -	memset(&pp, 0, sizeof(pp));
> -	setup_passing(name, NULL, CHECK_MPATHD_ENABLED, STAGE_CHECK_MULTIPATHD);
> -	will_return(__wrap_udev_device_new_from_subsystem_sysname, false);
> -	will_return(__wrap_udev_device_new_from_subsystem_sysname,
> -		    name);
> -	assert_int_equal(is_path_valid(name, &conf, &pp, true),
> -			 PATH_IS_ERROR);
> -	assert_string_equal(pp.dev, name);
>  	/* test pass because connect returned EAGAIN. fail getting udev */
>  	setup_passing(name, NULL, CHECK_MPATHD_EAGAIN, STAGE_CHECK_MULTIPATHD);
>  	will_return(__wrap_udev_device_new_from_subsystem_sysname, false);
> @@ -533,7 +515,7 @@ static void test_check_uuid_present(void **state)
>  
>  	memset(&pp, 0, sizeof(pp));
>  	conf.find_multipaths = FIND_MULTIPATHS_STRICT;
> -	setup_passing(name, wwid, CHECK_MPATHD_ENABLED, STAGE_CHECK_WWIDS);
> +	setup_passing(name, wwid, CHECK_MPATHD_RUNNING, STAGE_CHECK_WWIDS);
>  	will_return(__wrap_dm_map_present_by_uuid, 1);
>  	will_return(__wrap_dm_map_present_by_uuid, wwid);
>  	assert_int_equal(is_path_valid(name, &conf, &pp, true),
> -- 
> 2.42.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel()
  2023-10-25 23:14   ` Benjamin Marzinski
@ 2023-10-26  5:33     ` Hannes Reinecke
  2023-10-26  8:31     ` Martin Wilck
  1 sibling, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-10-26  5:33 UTC (permalink / raw)
  To: Benjamin Marzinski, mwilck
  Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On 10/26/23 01:14, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:33PM +0200, mwilck@suse.com wrote:
>> From: Martin Wilck <mwilck@suse.com>
>>
>> io_cancel() never succeeds, and even if it did, io_getevents() must
>> still be called to reclaim the IO resources from the kernel.
>> Don't pretend the opposite by resetting ct->running, or freeing the
>> memory, before io_getevents() has indicated that the request is finished.
>>
>> Signed-off-by: Martin Wilck <mwilck@suse.com>
>> ---
>>   libmultipath/checkers/directio.c | 13 ++-----------
>>   tests/directio.c                 | 10 +---------
>>   2 files changed, 3 insertions(+), 20 deletions(-)
>>
>> diff --git a/libmultipath/checkers/directio.c b/libmultipath/checkers/directio.c
>> index 25b2383..71ce4cb 100644
>> --- a/libmultipath/checkers/directio.c
>> +++ b/libmultipath/checkers/directio.c
>> @@ -216,7 +216,6 @@ out:
>>   void libcheck_free (struct checker * c)
>>   {
>>   	struct directio_context * ct = (struct directio_context *)c->context;
>> -	struct io_event event;
>>   	long flags;
>>   
>>   	if (!ct)
>> @@ -232,9 +231,7 @@ void libcheck_free (struct checker * c)
>>   		}
>>   	}
>>   
>> -	if (ct->running &&
>> -	    (ct->req->state != PATH_PENDING ||
>> -	     io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event) == 0))
> 
> Does io_cancel() do nothing? If does make the IO likely to end sooner,
> even if it always returns failure, then we should probably still run it
> when we have an outstanding request that we are going to orphan. We
> should just move the io_cancel() to the else block, where we add the
> request to the orphans list.
> 
Well, it doesn't to _nothing_, but what matters is that it doesn't 
actually affect the running I/O. It just cancels the element in the 
libaio infrastructure, not the I/O itself.

So Martin is perfectly right here.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel()
  2023-10-25 23:14   ` Benjamin Marzinski
  2023-10-26  5:33     ` Hannes Reinecke
@ 2023-10-26  8:31     ` Martin Wilck
  1 sibling, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2023-10-26  8:31 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Wed, 2023-10-25 at 19:14 -0400, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:33PM +0200, mwilck@suse.com wrote:
> > 
> > --- a/libmultipath/checkers/directio.c
> > +++ b/libmultipath/checkers/directio.c
> > @@ -216,7 +216,6 @@ out:
> >  void libcheck_free (struct checker * c)
> >  {
> >  	struct directio_context * ct = (struct directio_context
> > *)c->context;
> > -	struct io_event event;
> >  	long flags;
> >  
> >  	if (!ct)
> > @@ -232,9 +231,7 @@ void libcheck_free (struct checker * c)
> >  		}
> >  	}
> >  
> > -	if (ct->running &&
> > -	    (ct->req->state != PATH_PENDING ||
> > -	     io_cancel(ct->aio_grp->ioctx, &ct->req->io, &event)
> > == 0))
> 
> Does io_cancel() do nothing? 

No it doesn't, as Hannes already said. Currently, at least. 
See https://elixir.bootlin.com/linux/latest/source/fs/aio.c:
ki_cancel() is only set for aio_poll() and some USB gadgets. For
regular block devices, io_cancel() is equivalent to 
"return -EINPROGRESS;".

> If does make the IO likely to end sooner,
> even if it always returns failure, then we should probably still run
> it
> when we have an outstanding request that we are going to orphan.
> We
> should just move the io_cancel() to the else block, where we add the
> request to the orphans list.

But I suppose it would be possible to hook ki_cancel() into the SCSI
layer, translating it into a (series of) command aborts. Perhaps
someone will implement that in the future; it seem that just nobody has
bothered yet.

So yes, I will re-add io_cancel() in the else clause in
libcheck_free(). 
It won't do harm, and we'll be prepared for kernel 8.0 ;-)

If we want more predictable timeouts for directio and the io_err_stat
code _today_, we would need to move away from aio and use SG_IO
instead. That would allow us to set SCSI timeouts explicitly, and
(mostly) avoid retries on the SCSI mid layer.

Regards
Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/6] libmpathutil: remove systemd_service_enabled()
  2023-10-25 23:58   ` Benjamin Marzinski
@ 2023-10-26  8:42     ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2023-10-26  8:42 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Wed, 2023-10-25 at 19:58 -0400, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:37PM +0200, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > Since  c203929 ("multipathd.service: add dependency on
> > systemd-udevd-kernel.socket"), multipathd will start early during
> > boot.
> > Moreover, we recommend using socket activation for multipathd,
> > and if multipathd.socket is enabled, the __mpath_connect()
> > check will succeed anyway.
> > 
> > The systemd_service_enabled() test was just "good enough" for
> > standard situations but never robust; it checked for
> > multipathd.wants in
> > "some" directory, which might or might not be the current target,
> > and it would return true even of multipathd.service was masked.
> > 
> > Remove this test.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> 
> I'd be lying if I said I had no worries at all about this. Removing
> this
> check means that if someone isn't using socket activation, and
> multipathd is temporarily down, and a new device appears, it will
> always
> be marked as not claimed by multipath. This could cause the device to
> be
> grabbed by LVM, and not multipathed. This is probably just may
> paranoia
> talking, since the chance of this happening in the real world seems
> pretty low. So,
> 
> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

Thanks. I had the same doubts, but I came to the conclusion that
there's no need to be afraid.

I would put your argument a bit differently: If socket activation is
not used and multipathd is not running, we are in a situation where
multipath can't seriously claim any device. Whether multipathd is
enabled in some (apparently inactive) systemd target doesn't really
matter in such a situation. Note that we never checked  _which_ systemd
unit lists multipathd under its ".wants". It could be
"shutdown.target".

Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service
  2023-10-25 23:43   ` Benjamin Marzinski
@ 2023-10-26  8:44     ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2023-10-26  8:44 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng

On Wed, 2023-10-25 at 19:43 -0400, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:36PM +0200, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > Since 92f0893 ("multipath-tools: install modules-
> > load.d/multipath.conf"), we
> > use modules-load.d to load the dm-multipath module early on. That
> > isn't
> > optimal, because the module will always be loaded, even if
> > multipathd is
> > not running.
> > 
> > Use systemd's "modprobe@.service" instead to make sure the module
> > is loaded
> > before multipathd starts. If "multipath -u" is invoked from udev
> > rules
> > before multipathd.service has been started, it will access the
> > socket,
> > which will autostart multipathd via socket activation and cause the
> > dm-multipath module to be loaded.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

I just realized that modprobe@.service has only be available since
systemd v245 (2020). I should make this change conditional on the
systemd version.

Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion
  2023-10-25 23:33   ` Benjamin Marzinski
@ 2023-10-26  9:12     ` Martin Wilck
  0 siblings, 0 replies; 18+ messages in thread
From: Martin Wilck @ 2023-10-26  9:12 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: Christophe Varoqui, dm-devel, Miao Guanqin, Li Xiao Keng, Guan Junxiong

On Wed, 2023-10-25 at 19:33 -0400, Benjamin Marzinski wrote:
> On Tue, Oct 24, 2023 at 06:49:34PM +0200, mwilck@suse.com wrote:
> > From: Martin Wilck <mwilck@suse.com>
> > 
> > It is wrong to assume that aio data structures can be reused or
> > freed
> > after io_cancel(). io_cancel() will almost always return -
> > EINPROGRESS,
> > anyway. Use the io_starttime field to indicate whether an io event
> > has been completed by the kernel. Make sure no in-flight buffers
> > are freed.
> 
> There's a minor issue I mention below, but the bigger issue I noticed
> isn't related to your fix.  It's that, unless I'm missing something,
> we're not using libaio correctly in this code.  We call io_setup() to
> allow CONCUR_NR_EVENT concurrent requests. But we could have up to
> CONCUR_NR_EVENT * number_of_delayed_paths concurrent requests. The
> easiest fix is to create an ioctx per delayed path. However this
> would
> mean calling io_destroy() after each round of testings, which could
> potentially stall this thread.

Right. I suppose what's going to happen is that io_submit() will simply
fail with -EAGAIN. Note that the code logs io_submit() errors only at
log level 5. I guess if all goes well, the aio requests will terminate
quickly, and the algorithm will still "work", sort of. But if timeouts
happen (which is what this code was written for), the iocbs will be
quickly exhausted, possibly with IOs from just a single path, and
other paths won't be checked at all.

Btw, it's kind of odd to send 32 identical requests (all for block 0)
to the same device basically at the same time. I would think that with
very high probability, all IOs will either succeed or fail. The IO has
no FUA bit set, so even if it's direct IO, I assume that the device
will answer all but the first request from cache. I understand that
this is supposed to detect marginal paths, but it would make more sense
to send these IOs in random intervals than all at once. But maybe I
misunderstand something.

I've added Guan Junxiong, the original author, to the recipients list
of this email. As a first remedy against the iocb starvation, I suggest
to simply allocate a significantly larger ioctx, and increase the
loglevel of the message indicating io_submit() errors. I'll add a patch
to this set.

>  
> > Fixes https://github.com/opensvc/multipath-tools/issues/73.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > Cc: Li Xiao Keng <lixiaokeng@huawei.com>
> > Cc: Miao Guanqin <miaoguanqin@huawei.com>
> > ---
> >  libmultipath/io_err_stat.c | 26 ++++++++++++++++----------
> >  1 file changed, 16 insertions(+), 10 deletions(-)
> > 
> > diff --git a/libmultipath/io_err_stat.c
> > b/libmultipath/io_err_stat.c
> > index dc1c252..c474c34 100644
> > --- a/libmultipath/io_err_stat.c
> > +++ b/libmultipath/io_err_stat.c
> > @@ -111,10 +111,14 @@ static int init_each_dio_ctx(struct dio_ctx
> > *ct, int blksize,
> >  	return 0;
> >  }
> >  
> > -static void deinit_each_dio_ctx(struct dio_ctx *ct)
> > +static int deinit_each_dio_ctx(struct dio_ctx *ct)
> >  {
> > -	if (ct->buf)
> > -		free(ct->buf);
> > +	if (!ct->buf)
> > +		return 0;
> > +	if (ct->io_starttime.tv_sec != 0 || ct-
> > >io_starttime.tv_nsec != 0)
> > +		return 1;
> > +	free(ct->buf);
> > +	return 0;
> >  }
> >  
> >  static int setup_directio_ctx(struct io_err_stat_path *p)
> > @@ -164,6 +168,7 @@ fail_close:
> >  static void free_io_err_stat_path(struct io_err_stat_path *p)
> >  {
> >  	int i;
> > +	int inflight = 0;
> >  
> >  	if (!p)
> >  		return;
> > @@ -173,8 +178,13 @@ static void free_io_err_stat_path(struct
> > io_err_stat_path *p)
> >  	cancel_inflight_io(p);
> 
> There's not much point in calling cancel_inflight_io() here, since
> since
> we don't try to handle the done events afterwards, is there? If there
> are unhandled ones, they will remain unhandled, and we will end up
> leaking memory regardless of whether or not they're cancelled.

You're right. I'll have a closer look.

Martin


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-10-26  9:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-24 16:49 [PATCH 0/6] multipath: aio and systemd service improvements mwilck
2023-10-24 16:49 ` [PATCH 1/6] libmultipath: reduce log level of directio messages mwilck
2023-10-25 23:10   ` Benjamin Marzinski
2023-10-24 16:49 ` [PATCH 2/6] libmultipath: directio: don't reset ct->running after io_cancel() mwilck
2023-10-25 23:14   ` Benjamin Marzinski
2023-10-26  5:33     ` Hannes Reinecke
2023-10-26  8:31     ` Martin Wilck
2023-10-24 16:49 ` [PATCH 3/6] libmultipath: io_err_stat: don't free aio memory before completion mwilck
2023-10-25 23:33   ` Benjamin Marzinski
2023-10-26  9:12     ` Martin Wilck
2023-10-24 16:49 ` [PATCH 4/6] libmultipath: io_err_stat: call io_destroy() before free_io_err_pathvec() mwilck
2023-10-25 23:40   ` Benjamin Marzinski
2023-10-24 16:49 ` [PATCH 5/6] multipath-tools: systemd: require modprobe@dm_multipath.service mwilck
2023-10-25 23:43   ` Benjamin Marzinski
2023-10-26  8:44     ` Martin Wilck
2023-10-24 16:49 ` [PATCH 6/6] libmpathutil: remove systemd_service_enabled() mwilck
2023-10-25 23:58   ` Benjamin Marzinski
2023-10-26  8:42     ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.