All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] multipath: miscellaneous bug fixes
@ 2018-02-13  3:42 Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

This is just a set of fixes for bugs I've run into.

V3:
	-rework tur locking fix to deal with an error pointed out
	 by Martin, and to remove tur_running(), as suggested by Bart

V2:
        -rework tur locking fix to use atomic variables as Martin
         suggested. If this version doesn't make people happy, I'll
         just move the condlog() and be done with it.

        -add comments and fix messages for ev_add_map

        -fix bugs in cli_restore_queueing() and
         cli_restore_all_queueing()

        -check if paths exist in sysfs in fast listing mode

Benjamin Marzinski (7):
  libmultipath: fix tur checker locking
  multipath: fix DEF_TIMEOUT use
  multipathd: remove coalesce_paths from ev_add_map
  multipathd: remove unused configure parameter
  Fix set_no_path_retry() regression
  multipathd: change spurious uevent msg priority
  multipath: print sysfs state in fast list mode

 libmultipath/checkers/tur.c | 107 ++++++++++++++++++--------------------------
 libmultipath/discovery.c    |   2 +-
 libmultipath/structs_vec.c  |   5 ++-
 multipath/main.c            |  10 +++--
 multipathd/cli_handlers.c   |  20 +++++----
 multipathd/main.c           |  60 +++++++++----------------
 6 files changed, 86 insertions(+), 118 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/7] libmultipath: fix tur checker locking
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  9:16   ` Martin Wilck
  2018-02-13 10:20   ` Hannes Reinecke
  2018-02-13  3:42 ` [PATCH v3 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development; +Cc: Bart Van Assche, Martin Wilck

Commit 6e2423fd fixed a bug where the tur checker could cancel a
detached thread after it had exitted. However in fixing this, the new
code grabbed a mutex (to call condlog) while holding a spin_lock.  To
deal with this, I've done away with the holder spin_lock completely, and
replaced it with two atomic variables, based on a suggestion by Martin
Wilck.

The holder variable works exactly like before.  When the checker is
initialized, it is set to 1. When a thread is created it is incremented.
When either the thread or the checker are done with the context, they
atomically decrement the holder variable and check its value. If it
is 0, they free the context. If it is 1, they never touch the context
again.

The other variable has changed. First, ct->running and ct->thread have
switched uses. ct->thread is now only ever accessed by the checker,
never the thread.  If it is non-NULL, a thread has started up, but
hasn't been dealt with by the checker yet. It is also obviously used
by the checker to cancel the thread.

ct->running is now an atomic variable.  When the thread is started
it is set to 1. When the checker wants to kill a thread, it atomically
sets the value to 0 and reads the previous value.  If it was 1,
the checker cancels the thread. If it was 0, the nothing needs to be
done.  After the checker has dealt with the thread, it sets ct->thread
to NULL.

Right before the thread finishes and pops the cleanup handler, it
atomically sets the value of ct->running to 0 and reads the previous
value. If it was 1, the thread just pops the cleanup handler and exits.
If it was 0, then the checker is trying to cancel the thread, and so the
thread calls pause(), which is a cancellation point.

Cc: Martin Wilck <mwilck@suse.com>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 107 ++++++++++++++++++--------------------------
 1 file changed, 43 insertions(+), 64 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index b4a5cb2..9155960 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -15,6 +15,7 @@
 #include <errno.h>
 #include <sys/time.h>
 #include <pthread.h>
+#include <urcu/uatomic.h>
 
 #include "checkers.h"
 
@@ -44,7 +45,6 @@ struct tur_checker_context {
 	pthread_t thread;
 	pthread_mutex_t lock;
 	pthread_cond_t active;
-	pthread_spinlock_t hldr_lock;
 	int holders;
 	char message[CHECKER_MSG_LEN];
 };
@@ -74,13 +74,12 @@ int libcheck_init (struct checker * c)
 
 	ct->state = PATH_UNCHECKED;
 	ct->fd = -1;
-	ct->holders = 1;
+	uatomic_set(&ct->holders, 1);
 	pthread_cond_init_mono(&ct->active);
 	pthread_mutexattr_init(&attr);
 	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutex_init(&ct->lock, &attr);
 	pthread_mutexattr_destroy(&attr);
-	pthread_spin_init(&ct->hldr_lock, PTHREAD_PROCESS_PRIVATE);
 	c->context = ct;
 
 	return 0;
@@ -90,7 +89,6 @@ static void cleanup_context(struct tur_checker_context *ct)
 {
 	pthread_mutex_destroy(&ct->lock);
 	pthread_cond_destroy(&ct->active);
-	pthread_spin_destroy(&ct->hldr_lock);
 	free(ct);
 }
 
@@ -99,16 +97,14 @@ void libcheck_free (struct checker * c)
 	if (c->context) {
 		struct tur_checker_context *ct = c->context;
 		int holders;
-		pthread_t thread;
-
-		pthread_spin_lock(&ct->hldr_lock);
-		ct->holders--;
-		holders = ct->holders;
-		thread = ct->thread;
-		pthread_spin_unlock(&ct->hldr_lock);
-		if (holders)
-			pthread_cancel(thread);
-		else
+		int running;
+
+		running = uatomic_xchg(&ct->running, 0);
+		if (running)
+			pthread_cancel(ct->thread);
+		ct->thread = 0;
+		holders = uatomic_sub_return(&ct->holders, 1);
+		if (!holders)
 			cleanup_context(ct);
 		c->context = NULL;
 	}
@@ -220,26 +216,12 @@ static void cleanup_func(void *data)
 {
 	int holders;
 	struct tur_checker_context *ct = data;
-	pthread_spin_lock(&ct->hldr_lock);
-	ct->holders--;
-	holders = ct->holders;
-	ct->thread = 0;
-	pthread_spin_unlock(&ct->hldr_lock);
+
+	holders = uatomic_sub_return(&ct->holders, 1);
 	if (!holders)
 		cleanup_context(ct);
 }
 
-static int tur_running(struct tur_checker_context *ct)
-{
-	pthread_t thread;
-
-	pthread_spin_lock(&ct->hldr_lock);
-	thread = ct->thread;
-	pthread_spin_unlock(&ct->hldr_lock);
-
-	return thread != 0;
-}
-
 static void copy_msg_to_tcc(void *ct_p, const char *msg)
 {
 	struct tur_checker_context *ct = ct_p;
@@ -252,7 +234,7 @@ static void copy_msg_to_tcc(void *ct_p, const char *msg)
 static void *tur_thread(void *ctx)
 {
 	struct tur_checker_context *ct = ctx;
-	int state;
+	int state, running;
 	char devt[32];
 
 	condlog(3, "%s: tur checker starting up",
@@ -278,6 +260,11 @@ static void *tur_thread(void *ctx)
 
 	condlog(3, "%s: tur checker finished, state %s",
 		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
+
+	running = uatomic_xchg(&ct->running, 0);
+	if (!running)
+		pause();
+
 	tur_thread_cleanup_pop(ct);
 
 	return ((void *)0);
@@ -325,7 +312,6 @@ int libcheck_check(struct checker * c)
 	int tur_status, r;
 	char devt[32];
 
-
 	if (!ct)
 		return PATH_UNCHECKED;
 
@@ -349,38 +335,29 @@ int libcheck_check(struct checker * c)
 		return PATH_WILD;
 	}
 
-	if (ct->running) {
-		/*
-		 * Check if TUR checker is still running. Hold hldr_lock
-		 * around the pthread_cancel() call to avoid that
-		 * pthread_cancel() gets called after the (detached) TUR
-		 * thread has exited.
-		 */
-		pthread_spin_lock(&ct->hldr_lock);
-		if (ct->thread) {
-			if (tur_check_async_timeout(c)) {
-				condlog(3, "%s: tur checker timeout",
-					tur_devt(devt, sizeof(devt), ct));
+	if (ct->thread) {
+		if (tur_check_async_timeout(c)) {
+			int running = uatomic_xchg(&ct->running, 0);
+			if (running)
 				pthread_cancel(ct->thread);
-				ct->running = 0;
-				MSG(c, MSG_TUR_TIMEOUT);
-				tur_status = PATH_TIMEOUT;
-			} else {
-				condlog(3, "%s: tur checker not finished",
+			condlog(3, "%s: tur checker timeout",
+				tur_devt(devt, sizeof(devt), ct));
+			ct->thread = 0;
+			MSG(c, MSG_TUR_TIMEOUT);
+			tur_status = PATH_TIMEOUT;
+		} else if (uatomic_read(&ct->running) != 0) {
+			condlog(3, "%s: tur checker not finished",
 					tur_devt(devt, sizeof(devt), ct));
-				ct->running++;
-				tur_status = PATH_PENDING;
-			}
+			tur_status = PATH_PENDING;
 		} else {
 			/* TUR checker done */
-			ct->running = 0;
+			ct->thread = 0;
 			tur_status = ct->state;
 			strlcpy(c->message, ct->message, sizeof(c->message));
 		}
-		pthread_spin_unlock(&ct->hldr_lock);
 		pthread_mutex_unlock(&ct->lock);
 	} else {
-		if (tur_running(ct)) {
+		if (uatomic_read(&ct->running) != 0) {
 			/* pthread cancel failed. continue in sync mode */
 			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: tur thread not responding",
@@ -391,19 +368,17 @@ int libcheck_check(struct checker * c)
 		ct->state = PATH_UNCHECKED;
 		ct->fd = c->fd;
 		ct->timeout = c->timeout;
-		pthread_spin_lock(&ct->hldr_lock);
-		ct->holders++;
-		pthread_spin_unlock(&ct->hldr_lock);
+		uatomic_add(&ct->holders, 1);
+		uatomic_set(&ct->running, 1);
 		tur_set_async_timeout(c);
 		setup_thread_attr(&attr, 32 * 1024, 1);
 		r = pthread_create(&ct->thread, &attr, tur_thread, ct);
 		pthread_attr_destroy(&attr);
 		if (r) {
-			pthread_spin_lock(&ct->hldr_lock);
-			ct->holders--;
-			pthread_spin_unlock(&ct->hldr_lock);
-			pthread_mutex_unlock(&ct->lock);
+			uatomic_sub(&ct->holders, 1);
+			uatomic_set(&ct->running, 0);
 			ct->thread = 0;
+			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
 				" sync mode", tur_devt(devt, sizeof(devt), ct));
 			return tur_check(c->fd, c->timeout,
@@ -414,12 +389,16 @@ int libcheck_check(struct checker * c)
 		tur_status = ct->state;
 		strlcpy(c->message, ct->message, sizeof(c->message));
 		pthread_mutex_unlock(&ct->lock);
-		if (tur_running(ct) &&
+		if (uatomic_read(&ct->running) != 0 &&
 		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
 			condlog(3, "%s: tur checker still running",
 				tur_devt(devt, sizeof(devt), ct));
-			ct->running = 1;
 			tur_status = PATH_PENDING;
+		} else {
+			int running = uatomic_xchg(&ct->running, 0);
+			if (running)
+				pthread_cancel(ct->thread);
+			ct->thread = 0;
 		}
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/7] multipath: fix DEF_TIMEOUT use
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

DEF_TIMEOUT is specified in seconds. The io_hdr timeout is specified in
milliseconds, so we need to convert it. Multipath should be waiting
longer than 30 milliseconds here. If there are concerns that 30 seconds
may be too long, we could always make this configurable, using
conf->checker_timeout if set.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 4b31dde..f118800 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -766,7 +766,7 @@ do_inq(int sg_fd, int cmddt, int evpd, unsigned int pg_op,
 	io_hdr.dxferp = resp;
 	io_hdr.cmdp = inqCmdBlk;
 	io_hdr.sbp = sense_b;
-	io_hdr.timeout = DEF_TIMEOUT;
+	io_hdr.timeout = DEF_TIMEOUT * 1000;
 
 	if (ioctl(sg_fd, SG_IO, &io_hdr) < 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/7] multipathd: remove coalesce_paths from ev_add_map
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

If ev_add_map is called for a multipath device that doesn't exist in
device-mapper, it will call coalesce_paths to add it.  This doesn't work
and hasn't for years. It doesn't add the map to the mpvec, or start up
waiters, or do any of the necessary things that do get done when you
call ev_add_map for a map that does exist in device mapper.

Fortunately, there are only two things that call ev_add_map. uev_add_map
makes sure that the device does exist in device-mapper before calling
ev_add_map, and cli_add_map creates the device first and then calls
ev_add_map, if the device doesn't exist.

So, there is no reason for coalesce_paths to be in ev_add_map. This
removes it.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 46 ++++++++++++++--------------------------------
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 27cf234..dbf9890 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -412,18 +412,19 @@ uev_add_map (struct uevent * uev, struct vectors * vecs)
 	return rc;
 }
 
+/*
+ * ev_add_map expects that the multipath device already exists in kernel
+ * before it is called. It just adds a device to multipathd or updates an
+ * existing device.
+ */
 int
 ev_add_map (char * dev, char * alias, struct vectors * vecs)
 {
-	char * refwwid;
 	struct multipath * mpp;
-	int map_present;
-	int r = 1, delayed_reconfig, reassign_maps;
+	int delayed_reconfig, reassign_maps;
 	struct config *conf;
 
-	map_present = dm_map_present(alias);
-
-	if (map_present && !dm_is_mpath(alias)) {
+	if (!dm_is_mpath(alias)) {
 		condlog(4, "%s: not a multipath map", alias);
 		return 0;
 	}
@@ -468,33 +469,14 @@ ev_add_map (char * dev, char * alias, struct vectors * vecs)
 	/*
 	 * now we can register the map
 	 */
-	if (map_present) {
-		if ((mpp = add_map_without_path(vecs, alias))) {
-			sync_map_state(mpp);
-			condlog(2, "%s: devmap %s registered", alias, dev);
-			return 0;
-		} else {
-			condlog(2, "%s: uev_add_map failed", dev);
-			return 1;
-		}
-	}
-	r = get_refwwid(CMD_NONE, dev, DEV_DEVMAP, vecs->pathvec, &refwwid);
-
-	if (refwwid) {
-		r = coalesce_paths(vecs, NULL, refwwid, FORCE_RELOAD_NONE,
-				   CMD_NONE);
-		dm_lib_release();
+	if ((mpp = add_map_without_path(vecs, alias))) {
+		sync_map_state(mpp);
+		condlog(2, "%s: devmap %s registered", alias, dev);
+		return 0;
+	} else {
+		condlog(2, "%s: ev_add_map failed", dev);
+		return 1;
 	}
-
-	if (!r)
-		condlog(2, "%s: devmap %s added", alias, dev);
-	else if (r == 2)
-		condlog(2, "%s: uev_add_map %s blacklisted", alias, dev);
-	else
-		condlog(0, "%s: uev_add_map %s failed", alias, dev);
-
-	FREE(refwwid);
-	return r;
 }
 
 static int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/7] multipathd: remove unused configure parameter
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2018-02-13  3:42 ` [PATCH v3 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

configure() is always called with start_waiters=1, so there is no point
in having the parameter. Remove it.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index dbf9890..51e0f5e 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1955,7 +1955,7 @@ checkerloop (void *ap)
 }
 
 int
-configure (struct vectors * vecs, int start_waiters)
+configure (struct vectors * vecs)
 {
 	struct multipath * mpp;
 	struct path * pp;
@@ -2054,11 +2054,9 @@ configure (struct vectors * vecs, int start_waiters)
 			i--;
 			continue;
 		}
-		if (start_waiters) {
-			if (start_waiter_thread(mpp, vecs)) {
-				remove_map(mpp, vecs, 1);
-				i--;
-			}
+		if (start_waiter_thread(mpp, vecs)) {
+			remove_map(mpp, vecs, 1);
+			i--;
 		}
 	}
 	return 0;
@@ -2125,7 +2123,7 @@ reconfigure (struct vectors * vecs)
 	rcu_assign_pointer(multipath_conf, conf);
 	call_rcu(&old->rcu, rcu_free_config);
 
-	configure(vecs, 1);
+	configure(vecs);
 
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 5/7] Fix set_no_path_retry() regression
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2018-02-13  3:42 ` [PATCH v3 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

commit 0f850db7fceb6b2bf4968f3831efd250c17c6138 "multipathd: clean up
set_no_path_retry" has a bug in it. It made set_no_path_retry
never reset mpp->retry_ticks, even if the device was in recovery mode,
and there were valid paths. This meant that adding new paths didn't
remove a device from recovery mode, and queueing could get disabled,
even while there were valid paths. This patch fixes that.

This patch also fixes a bug in cli_restore_queueing() and
cli_restore_all_queueing(), where a device that had no_path_retry
set to "queue" would enter recovery mode (although queueing would
never actually get disabled).

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c |  5 +++--
 multipathd/cli_handlers.c  | 20 ++++++++++++--------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index fbab61f..0de2221 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -343,9 +343,10 @@ static void set_no_path_retry(struct multipath *mpp)
 			dm_queue_if_no_path(mpp->alias, 1);
 		break;
 	default:
-		if (mpp->nr_active > 0)
+		if (mpp->nr_active > 0) {
+			mpp->retry_tick = 0;
 			dm_queue_if_no_path(mpp->alias, 1);
-		else if (is_queueing && mpp->retry_tick == 0)
+		} else if (is_queueing && mpp->retry_tick == 0)
 			enter_recovery_mode(mpp);
 		break;
 	}
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 7f13bc9..80519b1 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -995,10 +995,12 @@ cli_restore_queueing(void *v, char **reply, int *len, void *data)
 	if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
 			mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
 		dm_queue_if_no_path(mpp->alias, 1);
-		if (mpp->nr_active > 0)
-			mpp->retry_tick = 0;
-		else
-			enter_recovery_mode(mpp);
+		if (mpp->no_path_retry > 0) {
+			if (mpp->nr_active > 0)
+				mpp->retry_tick = 0;
+			else
+				enter_recovery_mode(mpp);
+		}
 	}
 	return 0;
 }
@@ -1019,10 +1021,12 @@ cli_restore_all_queueing(void *v, char **reply, int *len, void *data)
 		if (mpp->no_path_retry != NO_PATH_RETRY_UNDEF &&
 		    mpp->no_path_retry != NO_PATH_RETRY_FAIL) {
 			dm_queue_if_no_path(mpp->alias, 1);
-			if (mpp->nr_active > 0)
-				mpp->retry_tick = 0;
-			else
-				enter_recovery_mode(mpp);
+			if (mpp->no_path_retry > 0) {
+				if (mpp->nr_active > 0)
+					mpp->retry_tick = 0;
+				else
+					enter_recovery_mode(mpp);
+			}
 		}
 	}
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 6/7] multipathd: change spurious uevent msg priority
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2018-02-13  3:42 ` [PATCH v3 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  2018-02-13  3:42 ` [PATCH v3 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

The "spurious uevent, path already in pathvec" is not anything to worry
about, so it should not have the error priority.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 51e0f5e..7ac59d9 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -562,7 +562,7 @@ uev_add_path (struct uevent *uev, struct vectors * vecs, int need_do_map)
 	if (pp) {
 		int r;
 
-		condlog(0, "%s: spurious uevent, path already in pathvec",
+		condlog(2, "%s: spurious uevent, path already in pathvec",
 			uev->kernel);
 		if (!pp->mpp && !strlen(pp->wwid)) {
 			condlog(3, "%s: reinitialize path", uev->kernel);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 7/7] multipath: print sysfs state in fast list mode
  2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2018-02-13  3:42 ` [PATCH v3 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
@ 2018-02-13  3:42 ` Benjamin Marzinski
  6 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13  3:42 UTC (permalink / raw)
  To: device-mapper development

commit b123e711ea2a4b471a98ff5e26815df3773636b5 "libmultipath: cleanup
orphan device states" stopped multipathd from showing old state for
orphan paths by checking if pp->mpp was set, and only printing the state
if it was.   Unfortunately, when "multipath -l" is run, pp->mpp isn't
set. While the checker state isn't checked and shouldn't be printed with
"-l", the sysfs state can be, and was before b123e711. This patch sets
pp->mpp in fast list mode, so that the sysfs state gets printed. It
also verifies that the path exists in sysfs, and if not, marks it as
faulty.

Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipath/main.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/multipath/main.c b/multipath/main.c
index bffe065..d2415a9 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -149,7 +149,7 @@ usage (char * progname)
 }
 
 static int
-update_paths (struct multipath * mpp)
+update_paths (struct multipath * mpp, int quick)
 {
 	int i, j;
 	struct pathgroup * pgp;
@@ -171,9 +171,12 @@ update_paths (struct multipath * mpp)
 					 * path is not in sysfs anymore
 					 */
 					pp->chkrstate = pp->state = PATH_DOWN;
+					pp->offline = 1;
 					continue;
 				}
 				pp->mpp = mpp;
+				if (quick)
+					continue;
 				conf = get_multipath_config();
 				if (pathinfo(pp, conf, DI_ALL))
 					pp->state = PATH_UNCHECKED;
@@ -181,6 +184,8 @@ update_paths (struct multipath * mpp)
 				continue;
 			}
 			pp->mpp = mpp;
+			if (quick)
+				continue;
 			if (pp->state == PATH_UNCHECKED ||
 			    pp->state == PATH_WILD) {
 				conf = get_multipath_config();
@@ -238,8 +243,7 @@ get_dm_mpvec (enum mpath_cmds cmd, vector curmp, vector pathvec, char * refwwid)
 		 * If not in "fast list mode", we need to fetch information
 		 * about them
 		 */
-		if (cmd != CMD_LIST_SHORT)
-			update_paths(mpp);
+		update_paths(mpp, (cmd == CMD_LIST_SHORT));
 
 		if (cmd == CMD_LIST_LONG)
 			mpp->bestpg = select_path_group(mpp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/7] libmultipath: fix tur checker locking
  2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
@ 2018-02-13  9:16   ` Martin Wilck
  2018-02-13 10:20   ` Hannes Reinecke
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Wilck @ 2018-02-13  9:16 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development; +Cc: Bart Van Assche

On Mon, 2018-02-12 at 21:42 -0600, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock.  To
> deal with this, I've done away with the holder spin_lock completely,
> and
> replaced it with two atomic variables, based on a suggestion by
> Martin
> Wilck.
> 
> The holder variable works exactly like before.  When the checker is
> initialized, it is set to 1. When a thread is created it is
> incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
> 
> The other variable has changed. First, ct->running and ct->thread
> have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread.  If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
> 
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it
> atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct-
> >thread
> to NULL.
> 
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and
> exits.
> If it was 0, then the checker is trying to cancel the thread, and so
> the
> thread calls pause(), which is a cancellation point.
> 
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---

Reviewed-by: Martin Wilck <mwilck@suse.com>

Side note: a little voice in my head keeps whispering that it should be
possible to solve this problem with just one atomic variable instead of
two ("holders" and "running"). But I'm fine with the current approach
for now.

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/7] libmultipath: fix tur checker locking
  2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
  2018-02-13  9:16   ` Martin Wilck
@ 2018-02-13 10:20   ` Hannes Reinecke
  2018-02-13 20:41     ` Benjamin Marzinski
  1 sibling, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2018-02-13 10:20 UTC (permalink / raw)
  To: dm-devel

On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock.  To
> deal with this, I've done away with the holder spin_lock completely, and
> replaced it with two atomic variables, based on a suggestion by Martin
> Wilck.
> 
> The holder variable works exactly like before.  When the checker is
> initialized, it is set to 1. When a thread is created it is incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
> 
> The other variable has changed. First, ct->running and ct->thread have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread.  If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
> 
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct->thread
> to NULL.
> 
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and exits.
> If it was 0, then the checker is trying to cancel the thread, and so the
> thread calls pause(), which is a cancellation point.
> 
> Cc: Martin Wilck <mwilck@suse.com>
> Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/tur.c | 107 ++++++++++++++++++--------------------------
>  1 file changed, 43 insertions(+), 64 deletions(-)
> 
[ .. ]
> @@ -391,19 +368,17 @@ int libcheck_check(struct checker * c)
>  		ct->state = PATH_UNCHECKED;
>  		ct->fd = c->fd;
>  		ct->timeout = c->timeout;
> -		pthread_spin_lock(&ct->hldr_lock);
> -		ct->holders++;
> -		pthread_spin_unlock(&ct->hldr_lock);
> +		uatomic_add(&ct->holders, 1);
> +		uatomic_set(&ct->running, 1);
>  		tur_set_async_timeout(c);
>  		setup_thread_attr(&attr, 32 * 1024, 1);
>  		r = pthread_create(&ct->thread, &attr, tur_thread, ct);
>  		pthread_attr_destroy(&attr);
>  		if (r) {
> -			pthread_spin_lock(&ct->hldr_lock);
> -			ct->holders--;
> -			pthread_spin_unlock(&ct->hldr_lock);
> -			pthread_mutex_unlock(&ct->lock);
> +			uatomic_sub(&ct->holders, 1);
> +			uatomic_set(&ct->running, 0);
>  			ct->thread = 0;
> +			pthread_mutex_unlock(&ct->lock);
>  			condlog(3, "%s: failed to start tur thread, using"
>  				" sync mode", tur_devt(devt, sizeof(devt), ct));
>  			return tur_check(c->fd, c->timeout,
I would rather set 'ct->running' from within the thread; otherwise there
is a chance that pthread_create() returns without error, but the thread
itself hasn't been run yet.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/7] libmultipath: fix tur checker locking
  2018-02-13 10:20   ` Hannes Reinecke
@ 2018-02-13 20:41     ` Benjamin Marzinski
  0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Marzinski @ 2018-02-13 20:41 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel

On Tue, Feb 13, 2018 at 11:20:39AM +0100, Hannes Reinecke wrote:
> On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:

> I would rather set 'ct->running' from within the thread; otherwise there
> is a chance that pthread_create() returns without error, but the thread
> itself hasn't been run yet.

That worry is why I put ct->running where I did. If pthread_create
returns without an error, but ct->running isn't set yet because the
thread sets it, and it hasn't run yet, then it can appear to the checker
as if the thread has already completed.  This can cause it to return the
results of the thread before it actually has completed, and potentially
even start up a new thread, which would really mess with the value of
ct->running.

On the other hand, if you set running before starting the thread, the
checker won't do anything until the it hits the timeout, at which point
it will cancel the thread.  We are guaranteed that if pthread_create
returns successfully, we have the correct Thread ID to cancel the thread
with, so there's no problem with cancelling a thread that hasn't run.

Am I missing something here?

-Ben

> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-02-13 20:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-13  3:42 [PATCH v3 0/7] multipath: miscellaneous bug fixes Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 1/7] libmultipath: fix tur checker locking Benjamin Marzinski
2018-02-13  9:16   ` Martin Wilck
2018-02-13 10:20   ` Hannes Reinecke
2018-02-13 20:41     ` Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 2/7] multipath: fix DEF_TIMEOUT use Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 3/7] multipathd: remove coalesce_paths from ev_add_map Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 4/7] multipathd: remove unused configure parameter Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 5/7] Fix set_no_path_retry() regression Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 6/7] multipathd: change spurious uevent msg priority Benjamin Marzinski
2018-02-13  3:42 ` [PATCH v3 7/7] multipath: print sysfs state in fast list mode Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.