All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] multipath: miscellaneous bug fixes
@ 2018-02-08 23:56 Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development

This is just a set of fixes for bugs I've run into.

V2:
	-rework tur locking fix to use atomic variables as Martin
	 suggested. If this version doesn't make people happy, I'll
	 just move the condlog() and be done with it.

	-add comments and fix messages for ev_add_map

	-fix bugs in cli_restore_queueing() and
	 cli_restore_all_queueing()

	-check if paths exist in sysfs in fast listing mode

Benjamin Marzinski (7):
  libmultipath: fix tur checker locking
  multipath: fix DEF_TIMEOUT use
  multipathd: remove coalesce_paths from ev_add_map
  multipathd: remove unused configure parameter
  Fix set_no_path_retry() regression
  multipathd: change spurious uevent msg priority
  multipath: print sysfs state in fast list mode

 libmultipath/checkers/tur.c | 98 ++++++++++++++++++---------------------------
 libmultipath/discovery.c    |  2 +-
 libmultipath/structs_vec.c  |  5 ++-
 multipath/main.c            | 10 +++--
 multipathd/cli_handlers.c   | 20 +++++----
 multipathd/main.c           | 60 +++++++++------------------
 6 files changed, 83 insertions(+), 112 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-09 16:15   ` Bart Van Assche
  2018-02-09 20:30   ` Martin Wilck
  2018-02-08 23:56 ` [PATCH v2 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development; +Cc: Bart Van Assche, Martin Wilck

Commit 6e2423fd fixed a bug where the tur checker could cancel a
detached thread after it had exitted. However in fixing this, the new
code grabbed a mutex (to call condlog) while holding a spin_lock.  To
deal with this, I've done away with the holder spin_lock completely, and
replaced it with two atomic variables, based on a suggestion by Martin
Wilck.

The holder variable works exactly like before.  When the checker is
initialized, it is set to 1. When a thread is created it is incremented.
When either the thread or the checker are done with the context, they
atomically decrement the holder variable and check its value. If it
is 0, they free the context. If it is 1, they never touch the context
again.

The other variable has changed. First, ct->running and ct->thread have
switched uses. ct->thread is now only ever accessed by the checker,
never the thread.  If it is non-NULL, a thread has started up, but
hasn't been dealt with by the checker yet. It is also obviously used
by the checker to cancel the thread.

ct->running is now an atomic variable.  When the thread is started
it is set to 1. When the checker wants to kill a thread, it atomically
sets the value to 0 and reads the previous value.  If it was 1,
the checker cancels the thread. If it was 0, the nothing needs to be
done.  After the checker has dealt with the thread, it sets ct->thread
to NULL.

When the thread is done, it atomicalllys sets the value of ct->running
to 0 and reads the previous value. If it was 1, the thread just exits.
If it was 0, then the checker is trying to cancel the thread, and so
the thread calls pause(), which is a cancellation point.

Cc: Martin Wilck <mwilck@suse.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 98 ++++++++++++++++++---------------------------
 1 file changed, 40 insertions(+), 58 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index b4a5cb2..894ad41 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -15,6 +15,7 @@
 #include <errno.h>
 #include <sys/time.h>
 #include <pthread.h>
+#include <urcu/uatomic.h>
 
 #include "checkers.h"
 
@@ -44,7 +45,6 @@ struct tur_checker_context {
 	pthread_t thread;
 	pthread_mutex_t lock;
 	pthread_cond_t active;
-	pthread_spinlock_t hldr_lock;
 	int holders;
 	char message[CHECKER_MSG_LEN];
 };
@@ -74,13 +74,12 @@ int libcheck_init (struct checker * c)
 
 	ct->state = PATH_UNCHECKED;
 	ct->fd = -1;
-	ct->holders = 1;
+	uatomic_set(&ct->holders, 1);
 	pthread_cond_init_mono(&ct->active);
 	pthread_mutexattr_init(&attr);
 	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutex_init(&ct->lock, &attr);
 	pthread_mutexattr_destroy(&attr);
-	pthread_spin_init(&ct->hldr_lock, PTHREAD_PROCESS_PRIVATE);
 	c->context = ct;
 
 	return 0;
@@ -90,7 +89,6 @@ static void cleanup_context(struct tur_checker_context *ct)
 {
 	pthread_mutex_destroy(&ct->lock);
 	pthread_cond_destroy(&ct->active);
-	pthread_spin_destroy(&ct->hldr_lock);
 	free(ct);
 }
 
@@ -99,16 +97,14 @@ void libcheck_free (struct checker * c)
 	if (c->context) {
 		struct tur_checker_context *ct = c->context;
 		int holders;
-		pthread_t thread;
-
-		pthread_spin_lock(&ct->hldr_lock);
-		ct->holders--;
-		holders = ct->holders;
-		thread = ct->thread;
-		pthread_spin_unlock(&ct->hldr_lock);
-		if (holders)
-			pthread_cancel(thread);
-		else
+		int running;
+
+		running = uatomic_xchg(&ct->running, 0);
+		if (running)
+			pthread_cancel(ct->thread);
+		ct->thread = 0;
+		holders = uatomic_sub_return(&ct->holders, 1);
+		if (!holders)
 			cleanup_context(ct);
 		c->context = NULL;
 	}
@@ -218,26 +214,20 @@ retry:
 
 static void cleanup_func(void *data)
 {
-	int holders;
+	int running, holders;
 	struct tur_checker_context *ct = data;
-	pthread_spin_lock(&ct->hldr_lock);
-	ct->holders--;
-	holders = ct->holders;
-	ct->thread = 0;
-	pthread_spin_unlock(&ct->hldr_lock);
+
+	running = uatomic_xchg(&ct->running, 0);
+	holders = uatomic_sub_return(&ct->holders, 1);
 	if (!holders)
 		cleanup_context(ct);
+	if (!running)
+		pause();
 }
 
 static int tur_running(struct tur_checker_context *ct)
 {
-	pthread_t thread;
-
-	pthread_spin_lock(&ct->hldr_lock);
-	thread = ct->thread;
-	pthread_spin_unlock(&ct->hldr_lock);
-
-	return thread != 0;
+	return (uatomic_read(&ct->running) != 0);
 }
 
 static void copy_msg_to_tcc(void *ct_p, const char *msg)
@@ -325,7 +315,6 @@ int libcheck_check(struct checker * c)
 	int tur_status, r;
 	char devt[32];
 
-
 	if (!ct)
 		return PATH_UNCHECKED;
 
@@ -349,35 +338,26 @@ int libcheck_check(struct checker * c)
 		return PATH_WILD;
 	}
 
-	if (ct->running) {
-		/*
-		 * Check if TUR checker is still running. Hold hldr_lock
-		 * around the pthread_cancel() call to avoid that
-		 * pthread_cancel() gets called after the (detached) TUR
-		 * thread has exited.
-		 */
-		pthread_spin_lock(&ct->hldr_lock);
-		if (ct->thread) {
-			if (tur_check_async_timeout(c)) {
-				condlog(3, "%s: tur checker timeout",
-					tur_devt(devt, sizeof(devt), ct));
+	if (ct->thread) {
+		if (tur_check_async_timeout(c)) {
+			int running = uatomic_xchg(&ct->running, 0);
+			if (running)
 				pthread_cancel(ct->thread);
-				ct->running = 0;
-				MSG(c, MSG_TUR_TIMEOUT);
-				tur_status = PATH_TIMEOUT;
-			} else {
-				condlog(3, "%s: tur checker not finished",
+			condlog(3, "%s: tur checker timeout",
+				tur_devt(devt, sizeof(devt), ct));
+			ct->thread = 0;
+			MSG(c, MSG_TUR_TIMEOUT);
+			tur_status = PATH_TIMEOUT;
+		} else if (tur_running(ct)) {
+			condlog(3, "%s: tur checker not finished",
 					tur_devt(devt, sizeof(devt), ct));
-				ct->running++;
-				tur_status = PATH_PENDING;
-			}
+			tur_status = PATH_PENDING;
 		} else {
 			/* TUR checker done */
-			ct->running = 0;
+			ct->thread = 0;
 			tur_status = ct->state;
 			strlcpy(c->message, ct->message, sizeof(c->message));
 		}
-		pthread_spin_unlock(&ct->hldr_lock);
 		pthread_mutex_unlock(&ct->lock);
 	} else {
 		if (tur_running(ct)) {
@@ -391,19 +371,17 @@ int libcheck_check(struct checker * c)
 		ct->state = PATH_UNCHECKED;
 		ct->fd = c->fd;
 		ct->timeout = c->timeout;
-		pthread_spin_lock(&ct->hldr_lock);
-		ct->holders++;
-		pthread_spin_unlock(&ct->hldr_lock);
+		uatomic_add(&ct->holders, 1);
+		uatomic_set(&ct->running, 1);
 		tur_set_async_timeout(c);
 		setup_thread_attr(&attr, 32 * 1024, 1);
 		r = pthread_create(&ct->thread, &attr, tur_thread, ct);
 		pthread_attr_destroy(&attr);
 		if (r) {
-			pthread_spin_lock(&ct->hldr_lock);
-			ct->holders--;
-			pthread_spin_unlock(&ct->hldr_lock);
-			pthread_mutex_unlock(&ct->lock);
+			uatomic_sub(&ct->holders, 1);
+			uatomic_set(&ct->running, 0);
 			ct->thread = 0;
+			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
 				" sync mode", tur_devt(devt, sizeof(devt), ct));
 			return tur_check(c->fd, c->timeout,
@@ -418,8 +396,12 @@ int libcheck_check(struct checker * c)
 		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
 			condlog(3, "%s: tur checker still running",
 				tur_devt(devt, sizeof(devt), ct));
-			ct->running = 1;
 			tur_status = PATH_PENDING;
+		} else {
+			int running = uatomic_xchg(&ct->running, 0);
+			if (running)
+				pthread_cancel(ct->thread);
+			ct->thread = 0;
 		}
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 2/7] multipath: fix DEF_TIMEOUT use
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development

DEF_TIMEOUT is specified in seconds. The io_hdr timeout is specified in
milliseconds, so we need to convert it. Multipath should be waiting
longer than 30 milliseconds here. If there are concerns that 30 seconds
may be too long, we could always make this configurable, using
conf->checker_timeout if set.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 4b31dde..f118800 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -766,7 +766,7 @@ do_inq(int sg_fd, int cmddt, int evpd, unsigned int pg_op,
 	io_hdr.dxferp = resp;
 	io_hdr.cmdp = inqCmdBlk;
 	io_hdr.sbp = sense_b;
-	io_hdr.timeout = DEF_TIMEOUT;
+	io_hdr.timeout = DEF_TIMEOUT * 1000;
 
 	if (ioctl(sg_fd, SG_IO, &io_hdr) < 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-12 20:30   ` Martin Wilck
  2018-02-08 23:56 ` [PATCH v2 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

If ev_add_map is called for a multipath device that doesn't exist in
device-mapper, it will call coalesce_paths to add it.  This doesn't work
and hasn't for years. It doesn't add the map to the mpvec, or start up
waiters, or do any of the necessary things that do get done when you
call ev_add_map for a map that does exist in device mapper.

Fortunately, there are only two things that call ev_add_map. uev_add_map
makes sure that the device does exist in device-mapper before calling
ev_add_map, and cli_add_map creates the device first and then calls
ev_add_map, if the device doesn't exist.

So, there is no reason for coalesce_paths to be in ev_add_map. This
removes it.

Cc: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 46 ++++++++++++++--------------------------------
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 27cf234..dbf9890 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -412,18 +412,19 @@ uev_add_map (struct uevent * uev, struct vectors * vecs)
 	return rc;
 }
 
+/*
+ * ev_add_map expects that the multipath device already exists in kernel
+ * before it is called. It just adds a device to multipathd or updates an
+ * existing device.
+ */
 int
 ev_add_map (char * dev, char * alias, struct vectors * vecs)
 {
-	char * refwwid;
 	struct multipath * mpp;
-	int map_present;
-	int r = 1, delayed_reconfig, reassign_maps;
+	int delayed_reconfig, reassign_maps;
 	struct config *conf;
 
-	map_present = dm_map_present(alias);
-
-	if (map_present && !dm_is_mpath(alias)) {
+	if (!dm_is_mpath(alias)) {
 		condlog(4, "%s: not a multipath map", alias);
 		return 0;
 	}
@@ -468,33 +469,14 @@ ev_add_map (char * dev, char * alias, struct vectors * vecs)
 	/*
 	 * now we can register the map
 	 */
-	if (map_present) {
-		if ((mpp = add_map_without_path(vecs, alias))) {
-			sync_map_state(mpp);
-			condlog(2, "%s: devmap %s registered", alias, dev);
-			return 0;
-		} else {
-			condlog(2, "%s: uev_add_map failed", dev);
-			return 1;
-		}
-	}
-	r = get_refwwid(CMD_NONE, dev, DEV_DEVMAP, vecs->pathvec, &refwwid);
-
-	if (refwwid) {
-		r = coalesce_paths(vecs, NULL, refwwid, FORCE_RELOAD_NONE,
-				   CMD_NONE);
-		dm_lib_release();
+	if ((mpp = add_map_without_path(vecs, alias))) {
+		sync_map_state(mpp);
+		condlog(2, "%s: devmap %s registered", alias, dev);
+		return 0;
+	} else {
+		condlog(2, "%s: ev_add_map failed", dev);
+		return 1;
 	}
-
-	if (!r)
-		condlog(2, "%s: devmap %s added", alias, dev);
-	else if (r == 2)
-		condlog(2, "%s: uev_add_map %s blacklisted", alias, dev);
-	else
-		condlog(0, "%s: uev_add_map %s failed", alias, dev);
-
-	FREE(refwwid);
-	return r;
 }
 
 static int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 4/7] multipathd: remove unused configure parameter
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2018-02-08 23:56 ` [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development

configure() is always called with start_waiters=1, so there is no point
in having the parameter. Remove it.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index dbf9890..51e0f5e 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1955,7 +1955,7 @@ checkerloop (void *ap)
 }
 
 int
-configure (struct vectors * vecs, int start_waiters)
+configure (struct vectors * vecs)
 {
 	struct multipath * mpp;
 	struct path * pp;
@@ -2054,11 +2054,9 @@ configure (struct vectors * vecs, int start_waiters)
 			i--;
 			continue;
 		}
-		if (start_waiters) {
-			if (start_waiter_thread(mpp, vecs)) {
-				remove_map(mpp, vecs, 1);
-				i--;
-			}
+		if (start_waiter_thread(mpp, vecs)) {
+			remove_map(mpp, vecs, 1);
+			i--;
 		}
 	}
 	return 0;
@@ -2125,7 +2123,7 @@ reconfigure (struct vectors * vecs)
 	rcu_assign_pointer(multipath_conf, conf);
 	call_rcu(&old->rcu, rcu_free_config);
 
-	configure(vecs, 1);
+	configure(vecs);
 
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 5/7] Fix set_no_path_retry() regression
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2018-02-08 23:56 ` [PATCH v2 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-12 20:13   ` Martin Wilck
  2018-02-08 23:56 ` [PATCH v2 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
  6 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

commit 0f850db7fceb6b2bf4968f3831efd250c17c6138 "multipathd: clean up
set_no_path_retry" has a bug in it. It made set_no_path_retry
never reset mpp->retry_ticks, even if the device was in recovery mode,
and there were valid paths. This meant that adding new paths didn't
remove a device from recovery mode, and queueing could get disabled,
even while there were valid paths. This patch fixes that.

This patch also fixes a bug in cli_restore_queueing() and
cli_restore_all_queueing(), where a device that had no_path_retry
set to "queue" would enter recovery mode (although queueing would
never actually get disabled).

Cc: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c |  5 +++--
 multipathd/cli_handlers.c  | 20 ++++++++++++--------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index fbab61f..0de2221 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -343,9 +343,10 @@ static void set_no_path_retry(struct multipath *mpp)
 			dm_queue_if_no_path(mpp->alias, 1);
 		break;
 	default:
-		if (mpp->nr_active > 0)
+		if (mpp->nr_active > 0) {
+			mpp->retry_tick = 0;
 			dm_queue_if_no_path(mpp->alias, 1);
-		else if (is_queueing && mpp->retry_tick == 0)
+		} else if (is_queueing && mpp->retry_tick == 0)
 			enter_recovery_mode(mpp);
 		break;
 	}
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 7f13bc9..80519b1 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -995,10 +995,12 @@ cli_restore_queueing(void *v, char **reply, int *len, void *data)
 	if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
 			mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
 		dm_queue_if_no_path(mpp->alias, 1);
-		if (mpp->nr_active > 0)
-			mpp->retry_tick = 0;
-		else
-			enter_recovery_mode(mpp);
+		if (mpp->no_path_retry > 0) {
+			if (mpp->nr_active > 0)
+				mpp->retry_tick = 0;
+			else
+				enter_recovery_mode(mpp);
+		}
 	}
 	return 0;
 }
@@ -1019,10 +1021,12 @@ cli_restore_all_queueing(void *v, char **reply, int *len, void *data)
 		if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
 		    mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
 			dm_queue_if_no_path(mpp->alias, 1);
-			if (mpp->nr_active > 0)
-				mpp->retry_tick = 0;
-			else
-				enter_recovery_mode(mpp);
+			if (mpp->no_path_retry > 0) {
+				if (mpp->nr_active > 0)
+					mpp->retry_tick = 0;
+				else
+					enter_recovery_mode(mpp);
+			}
 		}
 	}
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 6/7] multipathd: change spurious uevent msg priority
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2018-02-08 23:56 ` [PATCH v2 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-08 23:56 ` [PATCH v2 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development

The "spurious uevent, path already in pathvec" is not anything to worry
about, so it should not have the error priority.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 51e0f5e..7ac59d9 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -562,7 +562,7 @@ uev_add_path (struct uevent *uev, struct vectors * vecs, int need_do_map)
 	if (pp) {
 		int r;
 
-		condlog(0, "%s: spurious uevent, path already in pathvec",
+		condlog(2, "%s: spurious uevent, path already in pathvec",
 			uev->kernel);
 		if (!pp->mpp && !strlen(pp->wwid)) {
 			condlog(3, "%s: reinitialize path", uev->kernel);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 7/7] multipath: print sysfs state in fast list mode
  2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2018-02-08 23:56 ` [PATCH v2 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
@ 2018-02-08 23:56 ` Benjamin Marzinski
  2018-02-12 20:33   ` Martin Wilck
  6 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-08 23:56 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

commit b123e711ea2a4b471a98ff5e26815df3773636b5 "libmultipath: cleanup
orphan device states" stopped multipathd from showing old state for
orphan paths by checking if pp->mpp was set, and only printing the state
if it was.   Unfortunately, when "multipath -l" is run, pp->mpp isn't
set. While the checker state isn't checked and shouldn't be printed with
"-l", the sysfs state can be, and was before b123e711. This patch sets
pp->mpp in fast list mode, so that the sysfs state gets printed. It
also verifies that the path exists in sysfs, and if not, marks it as
faulty.

Cc: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipath/main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/multipath/main.c b/multipath/main.c
index bffe065..d2415a9 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -149,7 +149,7 @@ usage (char * progname)
 }
 
 static int
-update_paths (struct multipath * mpp)
+update_paths (struct multipath * mpp, int quick)
 {
 	int i, j;
 	struct pathgroup * pgp;
@@ -171,9 +171,12 @@ update_paths (struct multipath * mpp)
 					 * path is not in sysfs anymore
 					 */
 					pp->chkrstate = pp->state = PATH_DOWN;
+					pp->offline = 1;
 					continue;
 				}
 				pp->mpp = mpp;
+				if (quick)
+					continue;
 				conf = get_multipath_config();
 				if (pathinfo(pp, conf, DI_ALL))
 					pp->state = PATH_UNCHECKED;
@@ -181,6 +184,8 @@ update_paths (struct multipath * mpp)
 				continue;
 			}
 			pp->mpp = mpp;
+			if (quick)
+				continue;
 			if (pp->state == PATH_UNCHECKED ||
 			    pp->state == PATH_WILD) {
 				conf = get_multipath_config();
@@ -238,8 +243,7 @@ get_dm_mpvec (enum mpath_cmds cmd, vector curmp, vector pathvec, char * refwwid)
 		 * If not in "fast list mode", we need to fetch information
 		 * about them
 		 */
-		if (cmd != CMD_LIST_SHORT)
-			update_paths(mpp);
+		update_paths(mpp, (cmd == CMD_LIST_SHORT));
 
 		if (cmd == CMD_LIST_LONG)
 			mpp->bestpg = select_path_group(mpp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
@ 2018-02-09 16:15   ` Bart Van Assche
  2018-02-09 17:26     ` Benjamin Marzinski
  2018-02-09 20:30   ` Martin Wilck
  1 sibling, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2018-02-09 16:15 UTC (permalink / raw)
  To: bmarzins, dm-devel; +Cc: mwilck

On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
>  static void cleanup_func(void *data)
>  {
> -	int holders;
> +	int running, holders;
>  	struct tur_checker_context *ct = data;
> -	pthread_spin_lock(&ct->hldr_lock);
> -	ct->holders--;
> -	holders = ct->holders;
> -	ct->thread = 0;
> -	pthread_spin_unlock(&ct->hldr_lock);
> +
> +	running = uatomic_xchg(&ct->running, 0);
> +	holders = uatomic_sub_return(&ct->holders, 1);
>  	if (!holders)
>  		cleanup_context(ct);
> +	if (!running)
> +		pause();
>  }

Hello Ben,

Why has the pause() call been added? I think it is safe to call pthread_cancel()
for a non-detached thread that has finished so I don't think that pause() call
is necessary.
 
>  static int tur_running(struct tur_checker_context *ct)
>  {
> -	pthread_t thread;
> -
> -	pthread_spin_lock(&ct->hldr_lock);
> -	thread = ct->thread;
> -	pthread_spin_unlock(&ct->hldr_lock);
> -
> -	return thread != 0;
> +	return (uatomic_read(&ct->running) != 0);
>  }

Is such a one line function really useful? I think the code will be easier to read
if this function is inlined into its callers.

> @@ -418,8 +396,12 @@ int libcheck_check(struct checker * c)
>  		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
>  			condlog(3, "%s: tur checker still running",
>  				tur_devt(devt, sizeof(devt), ct));
> -			ct->running = 1;
>  			tur_status = PATH_PENDING;
> +		} else {
> +			int running = uatomic_xchg(&ct->running, 0);
> +			if (running)
> +				pthread_cancel(ct->thread);
> +			ct->thread = 0;
>  		}
>  	}

Why has this pthread_cancel() call been added? I think that else clause can only be
reached if ct->running == 0 so I don't think that the pthread_cancel() call will ever
be reached.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 16:15   ` Bart Van Assche
@ 2018-02-09 17:26     ` Benjamin Marzinski
  2018-02-09 17:42       ` Bart Van Assche
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-09 17:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: dm-devel, mwilck

On Fri, Feb 09, 2018 at 04:15:34PM +0000, Bart Van Assche wrote:
> On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> >  static void cleanup_func(void *data)
> >  {
> > -	int holders;
> > +	int running, holders;
> >  	struct tur_checker_context *ct = data;
> > -	pthread_spin_lock(&ct->hldr_lock);
> > -	ct->holders--;
> > -	holders = ct->holders;
> > -	ct->thread = 0;
> > -	pthread_spin_unlock(&ct->hldr_lock);
> > +
> > +	running = uatomic_xchg(&ct->running, 0);
> > +	holders = uatomic_sub_return(&ct->holders, 1);
> >  	if (!holders)
> >  		cleanup_context(ct);
> > +	if (!running)
> > +		pause();
> >  }
> 
> Hello Ben,
> 
> Why has the pause() call been added? I think it is safe to call pthread_cancel()
> for a non-detached thread that has finished so I don't think that pause() call
> is necessary.

Martin objected to having the threads getting detached as part of
cancelling them (I think. I'm a little fuzzy on what he didn't like).
But he definitely said he preferred the thread to start detached, so in
this version, it does.  That's why we need the pause().  If he's fine with
the threads getting detached later, I will happily replace the pause()
with

if (running)
	pthread_detach(pthread_self());

and add pthread_detach(ct->thread) after the calls to
pthread_cancel(ct->thread). Otherwise we need the pause() to solve your
original bug.

As an aside, Martin, if your problem is the thread detaching itself, we
can skip that if we are fine with a zombie thread hanging around until
the next time we call libcheck_check() or libcheck_free(). Then the
checker can always be in charge of detaching the thread.
 
> >  static int tur_running(struct tur_checker_context *ct)
> >  {
> > -	pthread_t thread;
> > -
> > -	pthread_spin_lock(&ct->hldr_lock);
> > -	thread = ct->thread;
> > -	pthread_spin_unlock(&ct->hldr_lock);
> > -
> > -	return thread != 0;
> > +	return (uatomic_read(&ct->running) != 0);
> >  }
> 
> Is such a one line function really useful?

Nope. I just left it there to keep the number of changes that the patch
makes lower, to make it more straightforward to review. I'm fine will
inlining it.

> I think the code will be easier to read if this function is inlined
> into its callers.

> > @@ -418,8 +396,12 @@ int libcheck_check(struct checker * c)
> >  		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
> >  			condlog(3, "%s: tur checker still running",
> >  				tur_devt(devt, sizeof(devt), ct));
> > -			ct->running = 1;
> >  			tur_status = PATH_PENDING;
> > +		} else {
> > +			int running = uatomic_xchg(&ct->running, 0);
> > +			if (running)
> > +				pthread_cancel(ct->thread);
> > +			ct->thread = 0;
> >  		}
> >  	}
> 
> Why has this pthread_cancel() call been added? I think that else clause can only be
> reached if ct->running == 0 so I don't think that the pthread_cancel() call will ever
> be reached.

It can be reached if ct->running is 1, as long as tur_status has been
updated.  In practice this means that the thread should have done
everything it needs to do, and all that's left is for it to shutdown.
However, if the thread doesn't shut itself down before the next time you
call libcheck_check(), the checker will give up and return PATH_TIMEOUT.

It seems pretty unlikely that this will happen, since there should be a
significant delay before calling libcheck_check() again. This
theoretical race has been in the code for a while, and AFAIK, it's never
occured. But there definitely is the possiblity that the thread will
still be running at the end of libcheck_check(), and it doesn't hurt
things to forceably shut the thread down, if it is. 

-Ben

> Thanks,
> 
> Bart.
> 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 17:26     ` Benjamin Marzinski
@ 2018-02-09 17:42       ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2018-02-09 17:42 UTC (permalink / raw)
  To: bmarzins; +Cc: dm-devel, mwilck

On Fri, 2018-02-09 at 11:26 -0600, Benjamin Marzinski wrote:
> On Fri, Feb 09, 2018 at 04:15:34PM +0000, Bart Van Assche wrote:
> > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> > >  static void cleanup_func(void *data)
> > >  {
> > > -	int holders;
> > > +	int running, holders;
> > >  	struct tur_checker_context *ct = data;
> > > -	pthread_spin_lock(&ct->hldr_lock);
> > > -	ct->holders--;
> > > -	holders = ct->holders;
> > > -	ct->thread = 0;
> > > -	pthread_spin_unlock(&ct->hldr_lock);
> > > +
> > > +	running = uatomic_xchg(&ct->running, 0);
> > > +	holders = uatomic_sub_return(&ct->holders, 1);
> > >  	if (!holders)
> > >  		cleanup_context(ct);
> > > +	if (!running)
> > > +		pause();
> > >  }
> > 
> > Hello Ben,
> > 
> > Why has the pause() call been added? I think it is safe to call pthread_cancel()
> > for a non-detached thread that has finished so I don't think that pause() call
> > is necessary.
> 
> Martin objected to having the threads getting detached as part of
> cancelling them (I think. I'm a little fuzzy on what he didn't like).
> But he definitely said he preferred the thread to start detached, so in
> this version, it does.  That's why we need the pause().  If he's fine with
> the threads getting detached later, I will happily replace the pause()
> with
> 
> if (running)
> 	pthread_detach(pthread_self());
> 
> and add pthread_detach(ct->thread) after the calls to
> pthread_cancel(ct->thread). Otherwise we need the pause() to solve your
> original bug.

Ah, thanks, I had overlooked that the tur checker detaches the checker thread. Have
you considered to add a comment above the pause() call that explains the purpose of
that call?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
  2018-02-09 16:15   ` Bart Van Assche
@ 2018-02-09 20:30   ` Martin Wilck
  2018-02-09 23:04     ` Benjamin Marzinski
  1 sibling, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2018-02-09 20:30 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development; +Cc: Bart Van Assche

On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it
> atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct-
> >thread
> to NULL.
> 
> When the thread is done, it atomicalllys sets the value of ct-
> >running
> to 0 and reads the previous value. If it was 1, the thread just
> exits.
> If it was 0, then the checker is trying to cancel the thread, and so
> the thread calls pause(), which is a cancellation point.
> 

I'm missing one thing here. My poor brain is aching.

cleanup_func() can be entered in two ways: a) if the thread has been
cancelled and passed a cancellation point already, or b) if it exits
normally and calls pthread_cleanup_pop(). 
In case b), waiting for the cancellation request by calling pause()
makes sense to me. But in case a), the thread has already seen the
cancellation request - wouldn't calling pause() cause it to sleep
forever?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 20:30   ` Martin Wilck
@ 2018-02-09 23:04     ` Benjamin Marzinski
  2018-02-09 23:28       ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-09 23:04 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, device-mapper development

On Fri, Feb 09, 2018 at 09:30:56PM +0100, Martin Wilck wrote:
> On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> > ct->running is now an atomic variable.  When the thread is started
> > it is set to 1. When the checker wants to kill a thread, it
> > atomically
> > sets the value to 0 and reads the previous value.  If it was 1,
> > the checker cancels the thread. If it was 0, the nothing needs to be
> > done.  After the checker has dealt with the thread, it sets ct-
> > >thread
> > to NULL.
> > 
> > When the thread is done, it atomicalllys sets the value of ct-
> > >running
> > to 0 and reads the previous value. If it was 1, the thread just
> > exits.
> > If it was 0, then the checker is trying to cancel the thread, and so
> > the thread calls pause(), which is a cancellation point.
> > 
> 
> I'm missing one thing here. My poor brain is aching.
> 
> cleanup_func() can be entered in two ways: a) if the thread has been
> cancelled and passed a cancellation point already, or b) if it exits
> normally and calls pthread_cleanup_pop(). 
> In case b), waiting for the cancellation request by calling pause()
> makes sense to me. But in case a), the thread has already seen the
> cancellation request - wouldn't calling pause() cause it to sleep
> forever?

Urgh. You're right. If it is in the cleanup helper because it already
has been cancelled, then the pause isn't going get cancelled. So much
for my quick rewrite.

That leaves three options.

1. have either the thread or the checker detach the thread (depending
   on which one exits first)
2. make the checker always cancel and detach the thread. This simplifies
   the code, but there will zombie threads hanging around between calls
   to the checker.
3. just move the condlog

I really don't care which one we pick anymore.

-Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 23:04     ` Benjamin Marzinski
@ 2018-02-09 23:28       ` Martin Wilck
  2018-02-09 23:36         ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2018-02-09 23:28 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2603 bytes --]

On Fri, 2018-02-09 at 17:04 -0600, Benjamin Marzinski wrote:
> On Fri, Feb 09, 2018 at 09:30:56PM +0100, Martin Wilck wrote:
> > On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> > > ct->running is now an atomic variable.  When the thread is
> > > started
> > > it is set to 1. When the checker wants to kill a thread, it
> > > atomically
> > > sets the value to 0 and reads the previous value.  If it was 1,
> > > the checker cancels the thread. If it was 0, the nothing needs to
> > > be
> > > done.  After the checker has dealt with the thread, it sets ct-
> > > > thread
> > > 
> > > to NULL.
> > > 
> > > When the thread is done, it atomicalllys sets the value of ct-
> > > > running
> > > 
> > > to 0 and reads the previous value. If it was 1, the thread just
> > > exits.
> > > If it was 0, then the checker is trying to cancel the thread, and
> > > so
> > > the thread calls pause(), which is a cancellation point.
> > > 
> > 
> > I'm missing one thing here. My poor brain is aching.
> > 
> > cleanup_func() can be entered in two ways: a) if the thread has
> > been
> > cancelled and passed a cancellation point already, or b) if it
> > exits
> > normally and calls pthread_cleanup_pop(). 
> > In case b), waiting for the cancellation request by calling pause()
> > makes sense to me. But in case a), the thread has already seen the
> > cancellation request - wouldn't calling pause() cause it to sleep
> > forever?
> 
> Urgh. You're right. If it is in the cleanup helper because it already
> has been cancelled, then the pause isn't going get cancelled. So much
> for my quick rewrite.

Maybe it's easier than we thought. Attached is a patch on top of yours
that I think might work, please have a look. 

It's quite late here, so I'll need to ponder your alternatives below
the other day.

Cheers
Martin

> 
> That leaves three options.
> 
> 1. have either the thread or the checker detach the thread (depending
>    on which one exits first)
> 2. make the checker always cancel and detach the thread. This
> simplifies
>    the code, but there will zombie threads hanging around between
> calls
>    to the checker.
> 3. just move the condlog
> 
> I really don't care which one we pick anymore.
> 
> -Ben
> 
> > 
> > Martin
> > 
> > -- 
> > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham
> > Norton
> > HRB 21284 (AG Nürnberg)
> 
> 

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

[-- Attachment #2: 0001-tur-checker-make-sure-pthread_cancel-isn-t-called-fo.patch --]
[-- Type: text/x-patch, Size: 1352 bytes --]

From 831ef27b41858fa248201b74f2dd8ea5b7c4aece Mon Sep 17 00:00:00 2001
From: Martin Wilck <mwilck@suse.com>
Date: Sat, 10 Feb 2018 00:22:17 +0100
Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited
 thread

If we enter the cleanup function as the result of a pthread_cancel by another
thread, we don't need to wait for a cancellation any more. If we exit
regularly, just tell the other thread not to try to cancel us.
---
 libmultipath/checkers/tur.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 894ad41c89c3..31a87d2b5cf2 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -221,8 +221,6 @@ static void cleanup_func(void *data)
 	holders = uatomic_sub_return(&ct->holders, 1);
 	if (!holders)
 		cleanup_context(ct);
-	if (!running)
-		pause();
 }
 
 static int tur_running(struct tur_checker_context *ct)
@@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);
 
+	/* Tell main checker thread not to cancel us, as we exit anyway */
+	running = uatomic_xchg(&ct->running, 0);
+
 	condlog(3, "%s: tur checker finished, state %s",
 		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
 	tur_thread_cleanup_pop(ct);
-- 
2.16.1


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 23:28       ` Martin Wilck
@ 2018-02-09 23:36         ` Martin Wilck
  2018-02-10  0:17           ` Benjamin Marzinski
  2018-02-10  0:36           ` Benjamin Marzinski
  0 siblings, 2 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-09 23:36 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

[-- Attachment #1: Type: text/plain, Size: 401 bytes --]

On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> Maybe it's easier than we thought. Attached is a patch on top of
> yours that I think might work, please have a look. 
> 

That one didn't even compile. This one is better.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

[-- Attachment #2: 0001-tur-checker-make-sure-pthread_cancel-isn-t-called-fo.patch --]
[-- Type: text/x-patch, Size: 1517 bytes --]

From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001
From: Martin Wilck <mwilck@suse.com>
Date: Sat, 10 Feb 2018 00:22:17 +0100
Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited
 thread

If we enter the cleanup function as the result of a pthread_cancel by another
thread, we don't need to wait for a cancellation any more. If we exit
regularly, just tell the other thread not to try to cancel us.
---
 libmultipath/checkers/tur.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 894ad41c89c3..5d2b36bfa883 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -214,15 +214,13 @@ retry:
 
 static void cleanup_func(void *data)
 {
-	int running, holders;
+	int holders;
 	struct tur_checker_context *ct = data;
 
-	running = uatomic_xchg(&ct->running, 0);
+	uatomic_set(&ct->running, 0);
 	holders = uatomic_sub_return(&ct->holders, 1);
 	if (!holders)
 		cleanup_context(ct);
-	if (!running)
-		pause();
 }
 
 static int tur_running(struct tur_checker_context *ct)
@@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);
 
+	/* Tell main checker thread not to cancel us, as we exit anyway */
+	uatomic_set(&ct->running, 0);
+
 	condlog(3, "%s: tur checker finished, state %s",
 		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
 	tur_thread_cleanup_pop(ct);
-- 
2.16.1


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 23:36         ` Martin Wilck
@ 2018-02-10  0:17           ` Benjamin Marzinski
  2018-02-10 16:03             ` Martin Wilck
  2018-02-10  0:36           ` Benjamin Marzinski
  1 sibling, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-10  0:17 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, device-mapper development

On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > Maybe it's easier than we thought. Attached is a patch on top of
> > yours that I think might work, please have a look. 
> > 
> 
> That one didn't even compile. This one is better.
> 
> Martin

So if we have this ordering

- checker calls uatomic_xchg() which returns 1 and then gets scheduled
- thread calls uatomic_set() and then runs till it terminates
- checker calls pthread_cancel()

You will get Bart's original bug.  I realize that having the condlog()
after the uatomic_set() in the thread makes this unlikely, but I don't
races like this. I would be happier with simply taking the original code
and moving the condlog(), if neither of my other two options are
acceptable.

-Ben

> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

> From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001
> From: Martin Wilck <mwilck@suse.com>
> Date: Sat, 10 Feb 2018 00:22:17 +0100
> Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited
>  thread
> 
> If we enter the cleanup function as the result of a pthread_cancel by another
> thread, we don't need to wait for a cancellation any more. If we exit
> regularly, just tell the other thread not to try to cancel us.
> ---
>  libmultipath/checkers/tur.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index 894ad41c89c3..5d2b36bfa883 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -214,15 +214,13 @@ retry:
>  
>  static void cleanup_func(void *data)
>  {
> -	int running, holders;
> +	int holders;
>  	struct tur_checker_context *ct = data;
>  
> -	running = uatomic_xchg(&ct->running, 0);
> +	uatomic_set(&ct->running, 0);
>  	holders = uatomic_sub_return(&ct->holders, 1);
>  	if (!holders)
>  		cleanup_context(ct);
> -	if (!running)
> -		pause();
>  }
>  
>  static int tur_running(struct tur_checker_context *ct)
> @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
>  
> +	/* Tell main checker thread not to cancel us, as we exit anyway */
> +	uatomic_set(&ct->running, 0);
> +
>  	condlog(3, "%s: tur checker finished, state %s",
>  		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
>  	tur_thread_cleanup_pop(ct);
> -- 
> 2.16.1
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-09 23:36         ` Martin Wilck
  2018-02-10  0:17           ` Benjamin Marzinski
@ 2018-02-10  0:36           ` Benjamin Marzinski
  2018-02-10 16:11             ` Martin Wilck
  1 sibling, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-10  0:36 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, device-mapper development

On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > Maybe it's easier than we thought. Attached is a patch on top of
> > yours that I think might work, please have a look. 
> > 
> 
> That one didn't even compile. This one is better.
> 
> Martin

How about this one instead. The idea is that once we are in the cleanup
handler, we just cleanup and exit. But before we enter it, we atomically
exchange running, and if running was 0, we pause(), since the checker is
either about to cancel us, or already has.

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 894ad41..3774a17 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -214,15 +214,12 @@ retry:
 
 static void cleanup_func(void *data)
 {
-	int running, holders;
+	int holders;
 	struct tur_checker_context *ct = data;
 
-	running = uatomic_xchg(&ct->running, 0);
 	holders = uatomic_sub_return(&ct->holders, 1);
 	if (!holders)
 		cleanup_context(ct);
-	if (!running)
-		pause();
 }
 
 static int tur_running(struct tur_checker_context *ct)
@@ -242,7 +239,7 @@ static void copy_msg_to_tcc(void *ct_p, const char *msg)
 static void *tur_thread(void *ctx)
 {
 	struct tur_checker_context *ct = ctx;
-	int state;
+	int state, running;
 	char devt[32];
 
 	condlog(3, "%s: tur checker starting up",
@@ -268,6 +265,11 @@ static void *tur_thread(void *ctx)
 
 	condlog(3, "%s: tur checker finished, state %s",
 		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
+
+	running = uatomic_xchg(&ct->running, 0);
+	if (!running)
+		pause();
+
 	tur_thread_cleanup_pop(ct);
 
 	return ((void *)0);


> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

> From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00 2001
> From: Martin Wilck <mwilck@suse.com>
> Date: Sat, 10 Feb 2018 00:22:17 +0100
> Subject: [PATCH] tur checker: make sure pthread_cancel isn't called for exited
>  thread
> 
> If we enter the cleanup function as the result of a pthread_cancel by another
> thread, we don't need to wait for a cancellation any more. If we exit
> regularly, just tell the other thread not to try to cancel us.
> ---
>  libmultipath/checkers/tur.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index 894ad41c89c3..5d2b36bfa883 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -214,15 +214,13 @@ retry:
>  
>  static void cleanup_func(void *data)
>  {
> -	int running, holders;
> +	int holders;
>  	struct tur_checker_context *ct = data;
>  
> -	running = uatomic_xchg(&ct->running, 0);
> +	uatomic_set(&ct->running, 0);
>  	holders = uatomic_sub_return(&ct->holders, 1);
>  	if (!holders)
>  		cleanup_context(ct);
> -	if (!running)
> -		pause();
>  }
>  
>  static int tur_running(struct tur_checker_context *ct)
> @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
>  
> +	/* Tell main checker thread not to cancel us, as we exit anyway */
> +	uatomic_set(&ct->running, 0);
> +
>  	condlog(3, "%s: tur checker finished, state %s",
>  		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
>  	tur_thread_cleanup_pop(ct);
> -- 
> 2.16.1
> 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-10  0:17           ` Benjamin Marzinski
@ 2018-02-10 16:03             ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-10 16:03 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

On Fri, 2018-02-09 at 18:17 -0600, Benjamin Marzinski wrote:
> On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > > Maybe it's easier than we thought. Attached is a patch on top of
> > > yours that I think might work, please have a look. 
> > > 
> > 
> > That one didn't even compile. This one is better.
> > 
> > Martin
> 
> So if we have this ordering
> 
> - checker calls uatomic_xchg() which returns 1 and then gets
> scheduled
> - thread calls uatomic_set() and then runs till it terminates
> - checker calls pthread_cancel()
> 
> You will get Bart's original bug. 

Yes, I realized that overnight :-( I shouldn't post stuff like this
around midnight. But I have another idea in my mind.

Martin

>  I realize that having the condlog()
> after the uatomic_set() in the thread makes this unlikely, but I
> don't
> races like this. I would be happier with simply taking the original
> code
> and moving the condlog(), if neither of my other two options are
> acceptable.
> 
> -Ben
> 
> > 
> > -- 
> > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham
> > Norton
> > HRB 21284 (AG Nürnberg)
> > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00
> > 2001
> > From: Martin Wilck <mwilck@suse.com>
> > Date: Sat, 10 Feb 2018 00:22:17 +0100
> > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called
> > for exited
> >  thread
> > 
> > If we enter the cleanup function as the result of a pthread_cancel
> > by another
> > thread, we don't need to wait for a cancellation any more. If we
> > exit
> > regularly, just tell the other thread not to try to cancel us.
> > ---
> >  libmultipath/checkers/tur.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index 894ad41c89c3..5d2b36bfa883 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -214,15 +214,13 @@ retry:
> >  
> >  static void cleanup_func(void *data)
> >  {
> > -	int running, holders;
> > +	int holders;
> >  	struct tur_checker_context *ct = data;
> >  
> > -	running = uatomic_xchg(&ct->running, 0);
> > +	uatomic_set(&ct->running, 0);
> >  	holders = uatomic_sub_return(&ct->holders, 1);
> >  	if (!holders)
> >  		cleanup_context(ct);
> > -	if (!running)
> > -		pause();
> >  }
> >  
> >  static int tur_running(struct tur_checker_context *ct)
> > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
> >  	pthread_cond_signal(&ct->active);
> >  	pthread_mutex_unlock(&ct->lock);
> >  
> > +	/* Tell main checker thread not to cancel us, as we exit
> > anyway */
> > +	uatomic_set(&ct->running, 0);
> > +
> >  	condlog(3, "%s: tur checker finished, state %s",
> >  		tur_devt(devt, sizeof(devt), ct),
> > checker_state_name(state));
> >  	tur_thread_cleanup_pop(ct);
> > -- 
> > 2.16.1
> > 
> 
> 

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-10  0:36           ` Benjamin Marzinski
@ 2018-02-10 16:11             ` Martin Wilck
  2018-02-10 19:42               ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2018-02-10 16:11 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote:
> On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > > Maybe it's easier than we thought. Attached is a patch on top of
> > > yours that I think might work, please have a look. 
> > > 
> > 
> > That one didn't even compile. This one is better.
> > 
> > Martin
> 
> How about this one instead. The idea is that once we are in the
> cleanup
> handler, we just cleanup and exit. But before we enter it, we
> atomically
> exchange running, and if running was 0, we pause(), since the checker
> is
> either about to cancel us, or already has.
> 

Yes, that should work. Nice.

Regards,
Martin


> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 894ad41..3774a17 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -214,15 +214,12 @@ retry:
>  
>  static void cleanup_func(void *data)
>  {
> -	int running, holders;
> +	int holders;
>  	struct tur_checker_context *ct = data;
>  
> -	running = uatomic_xchg(&ct->running, 0);
>  	holders = uatomic_sub_return(&ct->holders, 1);
>  	if (!holders)
>  		cleanup_context(ct);
> -	if (!running)
> -		pause();
>  }
>  
>  static int tur_running(struct tur_checker_context *ct)
> @@ -242,7 +239,7 @@ static void copy_msg_to_tcc(void *ct_p, const
> char *msg)
>  static void *tur_thread(void *ctx)
>  {
>  	struct tur_checker_context *ct = ctx;
> -	int state;
> +	int state, running;
>  	char devt[32];
>  
>  	condlog(3, "%s: tur checker starting up",
> @@ -268,6 +265,11 @@ static void *tur_thread(void *ctx)
>  
>  	condlog(3, "%s: tur checker finished, state %s",
>  		tur_devt(devt, sizeof(devt), ct),
> checker_state_name(state));
> +
> +	running = uatomic_xchg(&ct->running, 0);
> +	if (!running)
> +		pause();
> +
>  	tur_thread_cleanup_pop(ct);
>  
>  	return ((void *)0);
> 
> 
> > 
> > -- 
> > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham
> > Norton
> > HRB 21284 (AG Nürnberg)
> > From afb9c7de3658d49c4f28f6b9ee618a87b806ecdd Mon Sep 17 00:00:00
> > 2001
> > From: Martin Wilck <mwilck@suse.com>
> > Date: Sat, 10 Feb 2018 00:22:17 +0100
> > Subject: [PATCH] tur checker: make sure pthread_cancel isn't called
> > for exited
> >  thread
> > 
> > If we enter the cleanup function as the result of a pthread_cancel
> > by another
> > thread, we don't need to wait for a cancellation any more. If we
> > exit
> > regularly, just tell the other thread not to try to cancel us.
> > ---
> >  libmultipath/checkers/tur.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index 894ad41c89c3..5d2b36bfa883 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -214,15 +214,13 @@ retry:
> >  
> >  static void cleanup_func(void *data)
> >  {
> > -	int running, holders;
> > +	int holders;
> >  	struct tur_checker_context *ct = data;
> >  
> > -	running = uatomic_xchg(&ct->running, 0);
> > +	uatomic_set(&ct->running, 0);
> >  	holders = uatomic_sub_return(&ct->holders, 1);
> >  	if (!holders)
> >  		cleanup_context(ct);
> > -	if (!running)
> > -		pause();
> >  }
> >  
> >  static int tur_running(struct tur_checker_context *ct)
> > @@ -266,6 +264,9 @@ static void *tur_thread(void *ctx)
> >  	pthread_cond_signal(&ct->active);
> >  	pthread_mutex_unlock(&ct->lock);
> >  
> > +	/* Tell main checker thread not to cancel us, as we exit
> > anyway */
> > +	uatomic_set(&ct->running, 0);
> > +
> >  	condlog(3, "%s: tur checker finished, state %s",
> >  		tur_devt(devt, sizeof(devt), ct),
> > checker_state_name(state));
> >  	tur_thread_cleanup_pop(ct);
> > -- 
> > 2.16.1
> > 
> 
> 

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-10 16:11             ` Martin Wilck
@ 2018-02-10 19:42               ` Martin Wilck
  2018-02-12 18:44                 ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2018-02-10 19:42 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

On Sat, 2018-02-10 at 17:11 +0100, Martin Wilck wrote:
> On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote:
> > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > > > Maybe it's easier than we thought. Attached is a patch on top
> > > > of
> > > > yours that I think might work, please have a look. 
> > > > 
> > > 
> > > That one didn't even compile. This one is better.
> > > 
> > > Martin
> > 
> > How about this one instead. The idea is that once we are in the
> > cleanup
> > handler, we just cleanup and exit. But before we enter it, we
> > atomically
> > exchange running, and if running was 0, we pause(), since the
> > checker
> > is
> > either about to cancel us, or already has.
> > 
> 
> Yes, that should work. Nice.

... but I just realized that we don't rcu_register_thread() the TUR
thread. Maybe we should if we use RCU primitives?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-10 19:42               ` Martin Wilck
@ 2018-02-12 18:44                 ` Benjamin Marzinski
  2018-02-12 19:16                   ` Martin Wilck
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2018-02-12 18:44 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, device-mapper development

On Sat, Feb 10, 2018 at 08:42:31PM +0100, Martin Wilck wrote:
> On Sat, 2018-02-10 at 17:11 +0100, Martin Wilck wrote:
> > On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote:
> > > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> > > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > > > > Maybe it's easier than we thought. Attached is a patch on top
> > > > > of
> > > > > yours that I think might work, please have a look. 
> > > > > 
> > > > 
> > > > That one didn't even compile. This one is better.
> > > > 
> > > > Martin
> > > 
> > > How about this one instead. The idea is that once we are in the
> > > cleanup
> > > handler, we just cleanup and exit. But before we enter it, we
> > > atomically
> > > exchange running, and if running was 0, we pause(), since the
> > > checker
> > > is
> > > either about to cancel us, or already has.
> > > 
> > 
> > Yes, that should work. Nice.
> 
> ... but I just realized that we don't rcu_register_thread() the TUR
> thread. Maybe we should if we use RCU primitives?

Looking online, I see this.

"void rcu_register_thread(void): For all RCU flavors other than
bullet-proof RCU, each thread must invoke this function before its first
call to rcu_read_lock() or call_rcu(). If a given thread never invokes
any RCU read-side functions, it need not invoke
rcu_register_thread(void)."

That makes it sound like this is unnecessary for using the atomic
operations. Did you see something that makes you think differently? I
probably won't hurt to add it anyway. But, if it's fuzzy how to
correctly use the rcu operations, we could just go back to wrapping
normal operations in a spin_lock, now that we have a method that's not
excessively complicated, and would only require holding the lock for
simple variable manipulations.

Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 1/7] libmultipath: fix tur checker locking
  2018-02-12 18:44                 ` Benjamin Marzinski
@ 2018-02-12 19:16                   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-12 19:16 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart Van Assche, device-mapper development

On Mon, 2018-02-12 at 12:44 -0600, Benjamin Marzinski wrote:
> On Sat, Feb 10, 2018 at 08:42:31PM +0100, Martin Wilck wrote:
> > On Sat, 2018-02-10 at 17:11 +0100, Martin Wilck wrote:
> > > On Fri, 2018-02-09 at 18:36 -0600, Benjamin Marzinski wrote:
> > > > On Sat, Feb 10, 2018 at 12:36:05AM +0100, Martin Wilck wrote:
> > > > > On Sat, 2018-02-10 at 00:28 +0100, Martin Wilck wrote:
> > > > > > Maybe it's easier than we thought. Attached is a patch on
> > > > > > top
> > > > > > of
> > > > > > yours that I think might work, please have a look. 
> > > > > > 
> > > > > 
> > > > > That one didn't even compile. This one is better.
> > > > > 
> > > > > Martin
> > > > 
> > > > How about this one instead. The idea is that once we are in the
> > > > cleanup
> > > > handler, we just cleanup and exit. But before we enter it, we
> > > > atomically
> > > > exchange running, and if running was 0, we pause(), since the
> > > > checker
> > > > is
> > > > either about to cancel us, or already has.
> > > > 
> > > 
> > > Yes, that should work. Nice.
> > 
> > ... but I just realized that we don't rcu_register_thread() the TUR
> > thread. Maybe we should if we use RCU primitives?
> 
> Looking online, I see this.
> 
> "void rcu_register_thread(void): For all RCU flavors other than
> bullet-proof RCU, each thread must invoke this function before its
> first
> call to rcu_read_lock() or call_rcu(). If a given thread never
> invokes
> any RCU read-side functions, it need not invoke
> rcu_register_thread(void)."
> 
> That makes it sound like this is unnecessary for using the atomic
> operations. Did you see something that makes you think differently? I
> probably won't hurt to add it anyway. But, if it's fuzzy how to
> correctly use the rcu operations, we could just go back to wrapping
> normal operations in a spin_lock, now that we have a method that's
> not
> excessively complicated, and would only require holding the lock for
> simple variable manipulations.

OK, it's not strictly necessary then. Thanks for checking.
I wouldn't prefer going back to spinlocks. I think the atomic
operations are a step in the right direction.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 5/7] Fix set_no_path_retry() regression
  2018-02-08 23:56 ` [PATCH v2 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
@ 2018-02-12 20:13   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-12 20:13 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> commit 0f850db7fceb6b2bf4968f3831efd250c17c6138 "multipathd: clean up
> set_no_path_retry" has a bug in it. It made set_no_path_retry
> never reset mpp->retry_ticks, even if the device was in recovery
> mode,
> and there were valid paths. This meant that adding new paths didn't
> remove a device from recovery mode, and queueing could get disabled,
> even while there were valid paths. This patch fixes that.
> 
> This patch also fixes a bug in cli_restore_queueing() and
> cli_restore_all_queueing(), where a device that had no_path_retry
> set to "queue" would enter recovery mode (although queueing would
> never actually get disabled).
> 
> Cc: Martin Wilck <mwilck@suse.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/structs_vec.c |  5 +++--
>  multipathd/cli_handlers.c  | 20 ++++++++++++--------
>  2 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
> index fbab61f..0de2221 100644
> --- a/libmultipath/structs_vec.c
> +++ b/libmultipath/structs_vec.c
> @@ -343,9 +343,10 @@ static void set_no_path_retry(struct multipath
> *mpp)
>  			dm_queue_if_no_path(mpp->alias, 1);
>  		break;
>  	default:
> -		if (mpp->nr_active > 0)
> +		if (mpp->nr_active > 0) {
> +			mpp->retry_tick = 0;
>  			dm_queue_if_no_path(mpp->alias, 1);
> -		else if (is_queueing && mpp->retry_tick == 0)
> +		} else if (is_queueing && mpp->retry_tick == 0)
>  			enter_recovery_mode(mpp);
>  		break;
>  	}
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 7f13bc9..80519b1 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -995,10 +995,12 @@ cli_restore_queueing(void *v, char **reply, int
> *len, void *data)
>  	if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
>  			mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
>  		dm_queue_if_no_path(mpp->alias, 1);
> -		if (mpp->nr_active > 0)
> -			mpp->retry_tick = 0;
> -		else
> -			enter_recovery_mode(mpp);
> +		if (mpp->no_path_retry > 0) {
> +			if (mpp->nr_active > 0)
> +				mpp->retry_tick = 0;
> +			else
> +				enter_recovery_mode(mpp);
> +		}
>  	}
>  	return 0;
>  }
> @@ -1019,10 +1021,12 @@ cli_restore_all_queueing(void *v, char
> **reply, int *len, void *data)
>  		if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
>  		    mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
>  			dm_queue_if_no_path(mpp->alias, 1);
> -			if (mpp->nr_active > 0)
> -				mpp->retry_tick = 0;
> -			else
> -				enter_recovery_mode(mpp);
> +			if (mpp->no_path_retry > 0) {
> +				if (mpp->nr_active > 0)
> +					mpp->retry_tick = 0;
> +				else
> +					enter_recovery_mode(mpp);
> +			}
>  		}
>  	}
>  	return 0;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map
  2018-02-08 23:56 ` [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
@ 2018-02-12 20:30   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-12 20:30 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> If ev_add_map is called for a multipath device that doesn't exist in
> device-mapper, it will call coalesce_paths to add it.  This doesn't
> work
> and hasn't for years. It doesn't add the map to the mpvec, or start
> up
> waiters, or do any of the necessary things that do get done when you
> call ev_add_map for a map that does exist in device mapper.
> 
> Fortunately, there are only two things that call ev_add_map.
> uev_add_map
> makes sure that the device does exist in device-mapper before calling
> ev_add_map, and cli_add_map creates the device first and then calls
> ev_add_map, if the device doesn't exist.
> 
> So, there is no reason for coalesce_paths to be in ev_add_map. This
> removes it.
> 
> Cc: Martin Wilck <mwilck@suse.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>


Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  multipathd/main.c | 46 ++++++++++++++-------------------------------
> -
>  1 file changed, 14 insertions(+), 32 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 27cf234..dbf9890 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -412,18 +412,19 @@ uev_add_map (struct uevent * uev, struct
> vectors * vecs)
>  	return rc;
>  }
>  
> +/*
> + * ev_add_map expects that the multipath device already exists in
> kernel
> + * before it is called. It just adds a device to multipathd or
> updates an
> + * existing device.
> + */
>  int
>  ev_add_map (char * dev, char * alias, struct vectors * vecs)
>  {
> -	char * refwwid;
>  	struct multipath * mpp;
> -	int map_present;
> -	int r = 1, delayed_reconfig, reassign_maps;
> +	int delayed_reconfig, reassign_maps;
>  	struct config *conf;
>  
> -	map_present = dm_map_present(alias);
> -
> -	if (map_present && !dm_is_mpath(alias)) {
> +	if (!dm_is_mpath(alias)) {
>  		condlog(4, "%s: not a multipath map", alias);
>  		return 0;
>  	}
> @@ -468,33 +469,14 @@ ev_add_map (char * dev, char * alias, struct
> vectors * vecs)
>  	/*
>  	 * now we can register the map
>  	 */
> -	if (map_present) {
> -		if ((mpp = add_map_without_path(vecs, alias))) {
> -			sync_map_state(mpp);
> -			condlog(2, "%s: devmap %s registered",
> alias, dev);
> -			return 0;
> -		} else {
> -			condlog(2, "%s: uev_add_map failed", dev);
> -			return 1;
> -		}
> -	}
> -	r = get_refwwid(CMD_NONE, dev, DEV_DEVMAP, vecs->pathvec,
> &refwwid);
> -
> -	if (refwwid) {
> -		r = coalesce_paths(vecs, NULL, refwwid,
> FORCE_RELOAD_NONE,
> -				   CMD_NONE);
> -		dm_lib_release();
> +	if ((mpp = add_map_without_path(vecs, alias))) {
> +		sync_map_state(mpp);
> +		condlog(2, "%s: devmap %s registered", alias, dev);
> +		return 0;
> +	} else {
> +		condlog(2, "%s: ev_add_map failed", dev);
> +		return 1;
>  	}
> -
> -	if (!r)
> -		condlog(2, "%s: devmap %s added", alias, dev);
> -	else if (r == 2)
> -		condlog(2, "%s: uev_add_map %s blacklisted", alias,
> dev);
> -	else
> -		condlog(0, "%s: uev_add_map %s failed", alias, dev);
> -
> -	FREE(refwwid);
> -	return r;
>  }
>  
>  static int

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 7/7] multipath: print sysfs state in fast list mode
  2018-02-08 23:56 ` [PATCH v2 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
@ 2018-02-12 20:33   ` Martin Wilck
  0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2018-02-12 20:33 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Thu, 2018-02-08 at 17:56 -0600, Benjamin Marzinski wrote:
> commit b123e711ea2a4b471a98ff5e26815df3773636b5 "libmultipath:
> cleanup
> orphan device states" stopped multipathd from showing old state for
> orphan paths by checking if pp->mpp was set, and only printing the
> state
> if it was.   Unfortunately, when "multipath -l" is run, pp->mpp isn't
> set. While the checker state isn't checked and shouldn't be printed
> with
> "-l", the sysfs state can be, and was before b123e711. This patch
> sets
> pp->mpp in fast list mode, so that the sysfs state gets printed. It
> also verifies that the path exists in sysfs, and if not, marks it as
> faulty.
> 
> Cc: Martin Wilck <mwilck@suse.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-02-12 20:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-08 23:56 [PATCH v2 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
2018-02-08 23:56 ` [PATCH v2 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
2018-02-09 16:15   ` Bart Van Assche
2018-02-09 17:26     ` Benjamin Marzinski
2018-02-09 17:42       ` Bart Van Assche
2018-02-09 20:30   ` Martin Wilck
2018-02-09 23:04     ` Benjamin Marzinski
2018-02-09 23:28       ` Martin Wilck
2018-02-09 23:36         ` Martin Wilck
2018-02-10  0:17           ` Benjamin Marzinski
2018-02-10 16:03             ` Martin Wilck
2018-02-10  0:36           ` Benjamin Marzinski
2018-02-10 16:11             ` Martin Wilck
2018-02-10 19:42               ` Martin Wilck
2018-02-12 18:44                 ` Benjamin Marzinski
2018-02-12 19:16                   ` Martin Wilck
2018-02-08 23:56 ` [PATCH v2 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
2018-02-08 23:56 ` [PATCH v2 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
2018-02-12 20:30   ` Martin Wilck
2018-02-08 23:56 ` [PATCH v2 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
2018-02-08 23:56 ` [PATCH v2 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
2018-02-12 20:13   ` Martin Wilck
2018-02-08 23:56 ` [PATCH v2 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
2018-02-08 23:56 ` [PATCH v2 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
2018-02-12 20:33   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.