All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/19] Misc Multipath patches
@ 2018-09-21 23:05 Benjamin Marzinski
  2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
                   ` (19 more replies)
  0 siblings, 20 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

This batch of patches is a resend of the non-merged pathes from my last
set, along with 2 new ones. It would be really great if we could get
a version bug after the outstanding commits go it.

Patches 0001-0005 are a number of fixes to the tur checker.These are
	the ones that should get the most attention.
Patches 0006-0017 are minor issues found by coverity.
	Not all of them are bugs that could actually occur in practice,
	but they are simple and hopefully non-controversial changes.
Patches 0018-0019 are new and related to changing path wwids

Changes in v3
	added patches 0018-0019

Changes in v2
	0002-libmultipath-fix-tur-checker-double-locking.patch now sets
	ct->devt when initially creating the tur_checker_context, while
	that structure is still only referenced by a local variable.
	After that, ct->devt is only ever read. This should remove any
	issues with it needing locking.

Benjamin Marzinski (19):
  libmultipath: fix tur checker timeout
  libmultipath: fix tur checker double locking
  libmultipath: fix tur memory misuse
  libmultipath: cleanup tur locking
  libmultipath: fix tur checker timeout issue
  libmultipath: fix set_int error path
  libmultipath: fix length issues in get_vpd_sgio
  libmultipath: _install_keyword cleanup
  libmultipath: remove unused code
  libmultipath: fix memory issue in path_latency prio
  libmultipath: fix null dereference int alloc_path_group
  libmutipath: don't use malformed uevents
  multipath: fix max array size in print_cmd_valid
  multipathd: function return value tweaks
  multipathd: minor fixes
  multipathd: remove useless check and fix format
  multipathd: fix memory leak on error in configure
  libmultipath: Don't blank intialized paths
  libmultipath: Fixup updating paths

 libmultipath/checkers/tur.c              | 168 +++++++++++--------------------
 libmultipath/dict.c                      |   5 +-
 libmultipath/discovery.c                 |  18 ++--
 libmultipath/parser.c                    |  12 ++-
 libmultipath/print.c                     |   8 --
 libmultipath/prioritizers/path_latency.c |   3 +-
 libmultipath/structs.c                   |   2 +-
 libmultipath/uevent.c                    |   6 ++
 multipath/main.c                         |   2 +-
 multipathd/cli_handlers.c                |  11 +-
 multipathd/main.c                        |  50 +++++----
 11 files changed, 131 insertions(+), 154 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 19:51   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

The code previously was timing out mode if ct->thread was 0 but
ct->running wasn't. This combination never happens.  The idea was to
timeout if for some reason the path checker tried to kill the thread,
but it didn't die.  The correct thing to check for this is ct->holders.
ct->holders will always be at least one when libcheck_check() is called,
since libcheck_free() won't get called until the device is no longer
being checked. So, if ct->holders is 2, that means that the tur thread
is has not shut down yet.

Also, instead of returning PATH_TIMEOUT whenever the thread hasn't died,
it should only time out if the thread didn't successfully get a value,
which means the previous state was already PATH_TIMEOUT.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index bf8486d..275541f 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -355,12 +355,15 @@ int libcheck_check(struct checker * c)
 		}
 		pthread_mutex_unlock(&ct->lock);
 	} else {
-		if (uatomic_read(&ct->running) != 0) {
-			/* pthread cancel failed. continue in sync mode */
-			pthread_mutex_unlock(&ct->lock);
-			condlog(3, "%s: tur thread not responding",
-				tur_devt(devt, sizeof(devt), ct));
-			return PATH_TIMEOUT;
+		if (uatomic_read(&ct->holders) > 1) {
+			/* pthread cancel failed. If it didn't get the path
+			   state already, timeout */
+			if (ct->state == PATH_PENDING) {
+				pthread_mutex_unlock(&ct->lock);
+				condlog(3, "%s: tur thread not responding",
+					tur_devt(devt, sizeof(devt), ct));
+				return PATH_TIMEOUT;
+			}
 		}
 		/* Start new TUR checker */
 		ct->state = PATH_UNCHECKED;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
  2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 20:09   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Bart Van Assche, Martin Wilck

tur_devt() locks ct->lock. However, it is ocassionally called while
ct->lock is already locked. In reality, there is no reason why we need
to lock all the accesses to ct->devt. The tur checker only needs to
write to this variable one time, when it first gets the file descripter
that it is checking.  It also never uses ct->devt directly. Instead, it
always graps the major and minor, and turns them into a string. This
patch changes ct->devt into that string, and sets it in libcheck_init()
when it is first initializing the checker context. After that, ct->devt
is only ever read.

Cc: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 55 +++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 275541f..d173648 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -37,36 +37,24 @@
 #define MSG_TUR_FAILED	"tur checker failed to initialize"
 
 struct tur_checker_context {
-	dev_t devt;
+	char devt[32];
 	int state;
-	int running;
+	int running; /* uatomic access only */
 	int fd;
 	unsigned int timeout;
 	time_t time;
 	pthread_t thread;
 	pthread_mutex_t lock;
 	pthread_cond_t active;
-	int holders;
+	int holders; /* uatomic access only */
 	char message[CHECKER_MSG_LEN];
 };
 
-static const char *tur_devt(char *devt_buf, int size,
-			    struct tur_checker_context *ct)
-{
-	dev_t devt;
-
-	pthread_mutex_lock(&ct->lock);
-	devt = ct->devt;
-	pthread_mutex_unlock(&ct->lock);
-
-	snprintf(devt_buf, size, "%d:%d", major(devt), minor(devt));
-	return devt_buf;
-}
-
 int libcheck_init (struct checker * c)
 {
 	struct tur_checker_context *ct;
 	pthread_mutexattr_t attr;
+	struct stat sb;
 
 	ct = malloc(sizeof(struct tur_checker_context));
 	if (!ct)
@@ -81,6 +69,9 @@ int libcheck_init (struct checker * c)
 	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
 	pthread_mutex_init(&ct->lock, &attr);
 	pthread_mutexattr_destroy(&attr);
+	if (fstat(c->fd, &sb) == 0)
+		snprintf(ct->devt, sizeof(ct->devt), "%d:%d", major(sb.st_rdev),
+			 minor(sb.st_rdev));
 	c->context = ct;
 
 	return 0;
@@ -232,14 +223,12 @@ static void *tur_thread(void *ctx)
 {
 	struct tur_checker_context *ct = ctx;
 	int state, running;
-	char devt[32];
 
 	/* This thread can be canceled, so setup clean up */
 	tur_thread_cleanup_push(ct);
 	rcu_register_thread();
 
-	condlog(3, "%s: tur checker starting up",
-		tur_devt(devt, sizeof(devt), ct));
+	condlog(3, "%s: tur checker starting up", ct->devt);
 
 	/* TUR checker start up */
 	pthread_mutex_lock(&ct->lock);
@@ -256,8 +245,8 @@ static void *tur_thread(void *ctx)
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);
 
-	condlog(3, "%s: tur checker finished, state %s",
-		tur_devt(devt, sizeof(devt), ct), checker_state_name(state));
+	condlog(3, "%s: tur checker finished, state %s", ct->devt,
+		checker_state_name(state));
 
 	running = uatomic_xchg(&ct->running, 0);
 	if (!running)
@@ -305,20 +294,12 @@ int libcheck_check(struct checker * c)
 {
 	struct tur_checker_context *ct = c->context;
 	struct timespec tsp;
-	struct stat sb;
 	pthread_attr_t attr;
 	int tur_status, r;
-	char devt[32];
 
 	if (!ct)
 		return PATH_UNCHECKED;
 
-	if (fstat(c->fd, &sb) == 0) {
-		pthread_mutex_lock(&ct->lock);
-		ct->devt = sb.st_rdev;
-		pthread_mutex_unlock(&ct->lock);
-	}
-
 	if (c->sync)
 		return tur_check(c->fd, c->timeout, copy_msg_to_checker, c);
 
@@ -327,8 +308,7 @@ int libcheck_check(struct checker * c)
 	 */
 	r = pthread_mutex_lock(&ct->lock);
 	if (r != 0) {
-		condlog(2, "%s: tur mutex lock failed with %d",
-			tur_devt(devt, sizeof(devt), ct), r);
+		condlog(2, "%s: tur mutex lock failed with %d", ct->devt, r);
 		MSG(c, MSG_TUR_FAILED);
 		return PATH_WILD;
 	}
@@ -338,14 +318,12 @@ int libcheck_check(struct checker * c)
 			int running = uatomic_xchg(&ct->running, 0);
 			if (running)
 				pthread_cancel(ct->thread);
-			condlog(3, "%s: tur checker timeout",
-				tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker timeout", ct->devt);
 			ct->thread = 0;
 			MSG(c, MSG_TUR_TIMEOUT);
 			tur_status = PATH_TIMEOUT;
 		} else if (uatomic_read(&ct->running) != 0) {
-			condlog(3, "%s: tur checker not finished",
-					tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker not finished", ct->devt);
 			tur_status = PATH_PENDING;
 		} else {
 			/* TUR checker done */
@@ -361,7 +339,7 @@ int libcheck_check(struct checker * c)
 			if (ct->state == PATH_PENDING) {
 				pthread_mutex_unlock(&ct->lock);
 				condlog(3, "%s: tur thread not responding",
-					tur_devt(devt, sizeof(devt), ct));
+					ct->devt);
 				return PATH_TIMEOUT;
 			}
 		}
@@ -381,7 +359,7 @@ int libcheck_check(struct checker * c)
 			ct->thread = 0;
 			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
-				" sync mode", tur_devt(devt, sizeof(devt), ct));
+				" sync mode", ct->devt);
 			return tur_check(c->fd, c->timeout,
 					 copy_msg_to_checker, c);
 		}
@@ -392,8 +370,7 @@ int libcheck_check(struct checker * c)
 		pthread_mutex_unlock(&ct->lock);
 		if (uatomic_read(&ct->running) != 0 &&
 		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
-			condlog(3, "%s: tur checker still running",
-				tur_devt(devt, sizeof(devt), ct));
+			condlog(3, "%s: tur checker still running", ct->devt);
 			tur_status = PATH_PENDING;
 		} else {
 			int running = uatomic_xchg(&ct->running, 0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 03/19] libmultipath: fix tur memory misuse
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
  2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
  2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 20:59   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

when tur_thread() was calling tur_check(), it was passing ct->message as
the copy argument, but copy_msg_to_tcc() was assuming that it was
getting a tur_checker_context pointer. This means it was treating
ct->message as ct. This is why the tur checker never printed checker
messages. Intead of simply changing the copy argument passed in, I just
removed all the copying code, since it is completely unnecessary. The
callers of tur_check() can just pass in a buffer that it is safe to
write to, and copy it later, if necessary.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 46 +++++++++++----------------------------------
 1 file changed, 11 insertions(+), 35 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index d173648..abda162 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -103,17 +103,8 @@ void libcheck_free (struct checker * c)
 	return;
 }
 
-#define TUR_MSG(fmt, args...)					\
-	do {							\
-		char msg[CHECKER_MSG_LEN];			\
-								\
-		snprintf(msg, sizeof(msg), fmt, ##args);	\
-		copy_message(cb_arg, msg);			\
-	} while (0)
-
 static int
-tur_check(int fd, unsigned int timeout,
-	  void (*copy_message)(void *, const char *), void *cb_arg)
+tur_check(int fd, unsigned int timeout, char *msg)
 {
 	struct sg_io_hdr io_hdr;
 	unsigned char turCmdBlk[TUR_CMD_LEN] = { 0x00, 0, 0, 0, 0, 0 };
@@ -132,7 +123,7 @@ retry:
 	io_hdr.timeout = timeout * 1000;
 	io_hdr.pack_id = 0;
 	if (ioctl(fd, SG_IO, &io_hdr) < 0) {
-		TUR_MSG(MSG_TUR_DOWN);
+		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_DOWN);
 		return PATH_DOWN;
 	}
 	if ((io_hdr.status & 0x7e) == 0x18) {
@@ -140,7 +131,7 @@ retry:
 		 * SCSI-3 arrays might return
 		 * reservation conflict on TUR
 		 */
-		TUR_MSG(MSG_TUR_UP);
+		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_UP);
 		return PATH_UP;
 	}
 	if (io_hdr.info & SG_INFO_OK_MASK) {
@@ -185,14 +176,14 @@ retry:
 				 * LOGICAL UNIT NOT ACCESSIBLE,
 				 * TARGET PORT IN STANDBY STATE
 				 */
-				TUR_MSG(MSG_TUR_GHOST);
+				snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_GHOST);
 				return PATH_GHOST;
 			}
 		}
-		TUR_MSG(MSG_TUR_DOWN);
+		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_DOWN);
 		return PATH_DOWN;
 	}
-	TUR_MSG(MSG_TUR_UP);
+	snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_UP);
 	return PATH_UP;
 }
 
@@ -210,19 +201,11 @@ static void cleanup_func(void *data)
 	rcu_unregister_thread();
 }
 
-static void copy_msg_to_tcc(void *ct_p, const char *msg)
-{
-	struct tur_checker_context *ct = ct_p;
-
-	pthread_mutex_lock(&ct->lock);
-	strlcpy(ct->message, msg, sizeof(ct->message));
-	pthread_mutex_unlock(&ct->lock);
-}
-
 static void *tur_thread(void *ctx)
 {
 	struct tur_checker_context *ct = ctx;
 	int state, running;
+	char msg[CHECKER_MSG_LEN];
 
 	/* This thread can be canceled, so setup clean up */
 	tur_thread_cleanup_push(ct);
@@ -236,12 +219,13 @@ static void *tur_thread(void *ctx)
 	ct->message[0] = '\0';
 	pthread_mutex_unlock(&ct->lock);
 
-	state = tur_check(ct->fd, ct->timeout, copy_msg_to_tcc, ct->message);
+	state = tur_check(ct->fd, ct->timeout, msg);
 	pthread_testcancel();
 
 	/* TUR checker done */
 	pthread_mutex_lock(&ct->lock);
 	ct->state = state;
+	strlcpy(ct->message, msg, sizeof(ct->message));
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);
 
@@ -283,13 +267,6 @@ static int tur_check_async_timeout(struct checker *c)
 	return (now.tv_sec > ct->time);
 }
 
-static void copy_msg_to_checker(void *c_p, const char *msg)
-{
-	struct checker *c = c_p;
-
-	strlcpy(c->message, msg, sizeof(c->message));
-}
-
 int libcheck_check(struct checker * c)
 {
 	struct tur_checker_context *ct = c->context;
@@ -301,7 +278,7 @@ int libcheck_check(struct checker * c)
 		return PATH_UNCHECKED;
 
 	if (c->sync)
-		return tur_check(c->fd, c->timeout, copy_msg_to_checker, c);
+		return tur_check(c->fd, c->timeout, c->message);
 
 	/*
 	 * Async mode
@@ -360,8 +337,7 @@ int libcheck_check(struct checker * c)
 			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
 				" sync mode", ct->devt);
-			return tur_check(c->fd, c->timeout,
-					 copy_msg_to_checker, c);
+			return tur_check(c->fd, c->timeout, c->message);
 		}
 		tur_timeout(&tsp);
 		r = pthread_cond_timedwait(&ct->active, &ct->lock, &tsp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 04/19] libmultipath: cleanup tur locking
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:08   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

There are only three variables whose access needs to be synchronized
between the tur thread and the path checker itself: state, message, and
active.  The rest of the variables are either only written when the tur
thread isn't running, or they aren't accessed by the tur thread, or they
are atomics that are themselves used to synchronize things.

This patch limits the amount of code that is covered by ct->lock to
only what needs to be locked. It also makes ct->lock no longer a
recursive lock. To make this simpler, tur_thread now only sets the
state and message one time, instead of twice, since PATH_UNCHECKED
was never able to be returned anyway.

One benefit of this is that the tur checker thread gets more time to
call tur_check() and return before libcheck_check() gives up and
return PATH_PENDING.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 49 ++++++++++++++++++---------------------------
 1 file changed, 20 insertions(+), 29 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index abda162..9f6ef51 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -53,7 +53,6 @@ struct tur_checker_context {
 int libcheck_init (struct checker * c)
 {
 	struct tur_checker_context *ct;
-	pthread_mutexattr_t attr;
 	struct stat sb;
 
 	ct = malloc(sizeof(struct tur_checker_context));
@@ -65,10 +64,7 @@ int libcheck_init (struct checker * c)
 	ct->fd = -1;
 	uatomic_set(&ct->holders, 1);
 	pthread_cond_init_mono(&ct->active);
-	pthread_mutexattr_init(&attr);
-	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
-	pthread_mutex_init(&ct->lock, &attr);
-	pthread_mutexattr_destroy(&attr);
+	pthread_mutex_init(&ct->lock, NULL);
 	if (fstat(c->fd, &sb) == 0)
 		snprintf(ct->devt, sizeof(ct->devt), "%d:%d", major(sb.st_rdev),
 			 minor(sb.st_rdev));
@@ -213,12 +209,6 @@ static void *tur_thread(void *ctx)
 
 	condlog(3, "%s: tur checker starting up", ct->devt);
 
-	/* TUR checker start up */
-	pthread_mutex_lock(&ct->lock);
-	ct->state = PATH_PENDING;
-	ct->message[0] = '\0';
-	pthread_mutex_unlock(&ct->lock);
-
 	state = tur_check(ct->fd, ct->timeout, msg);
 	pthread_testcancel();
 
@@ -283,13 +273,6 @@ int libcheck_check(struct checker * c)
 	/*
 	 * Async mode
 	 */
-	r = pthread_mutex_lock(&ct->lock);
-	if (r != 0) {
-		condlog(2, "%s: tur mutex lock failed with %d", ct->devt, r);
-		MSG(c, MSG_TUR_FAILED);
-		return PATH_WILD;
-	}
-
 	if (ct->thread) {
 		if (tur_check_async_timeout(c)) {
 			int running = uatomic_xchg(&ct->running, 0);
@@ -305,23 +288,29 @@ int libcheck_check(struct checker * c)
 		} else {
 			/* TUR checker done */
 			ct->thread = 0;
+			pthread_mutex_lock(&ct->lock);
 			tur_status = ct->state;
 			strlcpy(c->message, ct->message, sizeof(c->message));
+			pthread_mutex_unlock(&ct->lock);
 		}
-		pthread_mutex_unlock(&ct->lock);
 	} else {
 		if (uatomic_read(&ct->holders) > 1) {
 			/* pthread cancel failed. If it didn't get the path
 			   state already, timeout */
-			if (ct->state == PATH_PENDING) {
-				pthread_mutex_unlock(&ct->lock);
+			pthread_mutex_lock(&ct->lock);
+			tur_status = ct->state;
+			pthread_mutex_unlock(&ct->lock);
+			if (tur_status == PATH_PENDING) {
 				condlog(3, "%s: tur thread not responding",
 					ct->devt);
 				return PATH_TIMEOUT;
 			}
 		}
 		/* Start new TUR checker */
-		ct->state = PATH_UNCHECKED;
+		pthread_mutex_lock(&ct->lock);
+		tur_status = ct->state = PATH_PENDING;
+		ct->message[0] = '\0';
+		pthread_mutex_unlock(&ct->lock);
 		ct->fd = c->fd;
 		ct->timeout = c->timeout;
 		uatomic_add(&ct->holders, 1);
@@ -334,20 +323,22 @@ int libcheck_check(struct checker * c)
 			uatomic_sub(&ct->holders, 1);
 			uatomic_set(&ct->running, 0);
 			ct->thread = 0;
-			pthread_mutex_unlock(&ct->lock);
 			condlog(3, "%s: failed to start tur thread, using"
 				" sync mode", ct->devt);
 			return tur_check(c->fd, c->timeout, c->message);
 		}
 		tur_timeout(&tsp);
-		r = pthread_cond_timedwait(&ct->active, &ct->lock, &tsp);
-		tur_status = ct->state;
-		strlcpy(c->message, ct->message, sizeof(c->message));
+		pthread_mutex_lock(&ct->lock);
+		if (ct->state == PATH_PENDING)
+			r = pthread_cond_timedwait(&ct->active, &ct->lock, 
+						   &tsp);
+		if (!r) {
+			tur_status = ct->state;
+			strlcpy(c->message, ct->message, sizeof(c->message));
+		}
 		pthread_mutex_unlock(&ct->lock);
-		if (uatomic_read(&ct->running) != 0 &&
-		    (tur_status == PATH_PENDING || tur_status == PATH_UNCHECKED)) {
+		if (tur_status == PATH_PENDING) {
 			condlog(3, "%s: tur checker still running", ct->devt);
-			tur_status = PATH_PENDING;
 		} else {
 			int running = uatomic_xchg(&ct->running, 0);
 			if (running)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 05/19] libmultipath: fix tur checker timeout issue
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:09   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

If the tur checker is run, and the tur_thread has timed out,
libcheck_check() doesn't actually check if the thread is still running.
This means that the thread could have already completed successfully,
but the tur checker would still return PATH_TIMEOUT, instead of the
value returned by the thread. This patch makes libcheck_check() actually
check if the thread completed, and if so, it returns the proper value.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/checkers/tur.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 9f6ef51..4e2c7a8 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -276,12 +276,19 @@ int libcheck_check(struct checker * c)
 	if (ct->thread) {
 		if (tur_check_async_timeout(c)) {
 			int running = uatomic_xchg(&ct->running, 0);
-			if (running)
+			if (running) {
 				pthread_cancel(ct->thread);
-			condlog(3, "%s: tur checker timeout", ct->devt);
+				condlog(3, "%s: tur checker timeout", ct->devt);
+				MSG(c, MSG_TUR_TIMEOUT);
+				tur_status = PATH_TIMEOUT;
+			} else {
+				pthread_mutex_lock(&ct->lock);
+				tur_status = ct->state;
+				strlcpy(c->message, ct->message,
+					sizeof(c->message));
+				pthread_mutex_unlock(&ct->lock);
+			}
 			ct->thread = 0;
-			MSG(c, MSG_TUR_TIMEOUT);
-			tur_status = PATH_TIMEOUT;
 		} else if (uatomic_read(&ct->running) != 0) {
 			condlog(3, "%s: tur checker not finished", ct->devt);
 			tur_status = PATH_PENDING;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 06/19] libmultipath: fix set_int error path
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:17   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

set_int() wasn't checking if the line actually had a value before
converting it to an integer.  Found by coverity. Also, it should
be using set_value().

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/dict.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 32524d5..bf4701e 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -33,7 +33,10 @@ set_int(vector strvec, void *ptr)
 	int *int_ptr = (int *)ptr;
 	char * buff;
 
-	buff = VECTOR_SLOT(strvec, 1);
+	buff = set_value(strvec);
+	if (!buff)
+		return 1;
+
 	*int_ptr = atoi(buff);
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:25   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

When get_vpd_sgio() finds out that the vpd info needed to be truncated
to fit in the buffer, it doesn't trucate the size as well,  which allows
it to overwrite the buffer. Also, in once len is set to -ENODATA,
get_vpd_sgio() should exit, instead of using the negative len in
memcpy(). Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 0b1855d..3e0db7f 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1116,17 +1116,21 @@ get_vpd_sgio (int fd, int pg, char * str, int maxlen)
 		return -ENODATA;
 	}
 	buff_len = get_unaligned_be16(&buff[2]) + 4;
-	if (buff_len > 4096)
+	if (buff_len > 4096) {
 		condlog(3, "vpd pg%02x page truncated", pg);
-
+		buff_len = 4096;
+	}
 	if (pg == 0x80)
 		len = parse_vpd_pg80(buff, str, maxlen);
 	else if (pg == 0x83)
 		len = parse_vpd_pg83(buff, buff_len, str, maxlen);
 	else if (pg == 0xc9 && maxlen >= 8) {
-		len = buff_len < 8 ? -ENODATA :
-			(buff_len <= maxlen ? buff_len : maxlen);
-		memcpy (str, buff, len);
+		if (buff_len < 8)
+			len = -ENODATA;
+		else {
+			len = (buff_len <= maxlen)? buff_len : maxlen;
+			memcpy (str, buff, len);
+		}
 	} else
 		len = -ENOSYS;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 08/19] libmultipath: _install_keyword cleanup
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:26   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

_install_keyword should use VECTOR_LAST_SLOT(), which has better error
checking. It should also fail if it gets a NULL pointer, instead of
dereferencing it. Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/parser.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/libmultipath/parser.c b/libmultipath/parser.c
index b8b7e0d..92ef7cf 100644
--- a/libmultipath/parser.c
+++ b/libmultipath/parser.c
@@ -79,12 +79,16 @@ _install_keyword(vector keywords, char *string,
 	struct keyword *keyword;
 
 	/* fetch last keyword */
-	keyword = VECTOR_SLOT(keywords, VECTOR_SIZE(keywords) - 1);
+	keyword = VECTOR_LAST_SLOT(keywords);
+	if (!keyword)
+		return 1;
 
 	/* position to last sub level */
-	for (i = 0; i < sublevel; i++)
-		keyword =
-		    VECTOR_SLOT(keyword->sub, VECTOR_SIZE(keyword->sub) - 1);
+	for (i = 0; i < sublevel; i++) {
+		keyword = VECTOR_LAST_SLOT(keyword->sub);
+		if (!keyword)
+			return 1;
+	}
 
 	/* First sub level allocation */
 	if (!keyword->sub)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 09/19] libmultipath: remove unused code
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (7 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:28   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

since vector_foreach_slot() already checks if the entry is NULL, there's
no point in checking it in the loop, since it can't be NULL there. Found
by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/print.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/libmultipath/print.c b/libmultipath/print.c
index 9da6a77..7b610b9 100644
--- a/libmultipath/print.c
+++ b/libmultipath/print.c
@@ -275,8 +275,6 @@ snprint_multipath_vpr (char * buff, size_t len, const struct multipath * mpp)
 	int i, j;
 
 	vector_foreach_slot(mpp->pg, pgp, i) {
-		if (!pgp)
-			continue;
 		vector_foreach_slot(pgp->paths, pp, j) {
 			if (strlen(pp->vendor_id) && strlen(pp->product_id))
 				return snprintf(buff, len, "%s,%s",
@@ -295,8 +293,6 @@ snprint_multipath_vend (char * buff, size_t len, const struct multipath * mpp)
 	int i, j;
 
 	vector_foreach_slot(mpp->pg, pgp, i) {
-		if (!pgp)
-			continue;
 		vector_foreach_slot(pgp->paths, pp, j) {
 			if (strlen(pp->vendor_id))
 				return snprintf(buff, len, "%s", pp->vendor_id);
@@ -313,8 +309,6 @@ snprint_multipath_prod (char * buff, size_t len, const struct multipath * mpp)
 	int i, j;
 
 	vector_foreach_slot(mpp->pg, pgp, i) {
-		if (!pgp)
-			continue;
 		vector_foreach_slot(pgp->paths, pp, j) {
 			if (strlen(pp->product_id))
 				return snprintf(buff, len, "%s", pp->product_id);
@@ -331,8 +325,6 @@ snprint_multipath_rev (char * buff, size_t len, const struct multipath * mpp)
 	int i, j;
 
 	vector_foreach_slot(mpp->pg, pgp, i) {
-		if (!pgp)
-			continue;
 		vector_foreach_slot(pgp->paths, pp, j) {
 			if (strlen(pp->rev))
 				return snprintf(buff, len, "%s", pp->rev);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (8 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:30   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

The path_latency prioriziter was assuming that prepare_directio_read()
always succeeds. However, it doesn't, and when it fails, the prioritizer
used buf without it pointing to alloced memory. Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/prioritizers/path_latency.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libmultipath/prioritizers/path_latency.c b/libmultipath/prioritizers/path_latency.c
index 765265c..eeee01e 100644
--- a/libmultipath/prioritizers/path_latency.c
+++ b/libmultipath/prioritizers/path_latency.c
@@ -237,7 +237,8 @@ int getprio(struct path *pp, char *args, unsigned int timeout)
 	lg_maxavglatency = log(MAX_AVG_LATENCY) / lg_base;
 	lg_minavglatency = log(MIN_AVG_LATENCY) / lg_base;
 
-	prepare_directio_read(pp->fd, &blksize, &buf, &restore_flags);
+	if (prepare_directio_read(pp->fd, &blksize, &buf, &restore_flags) < 0)
+		return PRIO_UNDEF;
 
 	temp = io_num;
 	while (temp-- > 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (9 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:33   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

If all_pathgroup failed to allocate a vector for pgp->paths, instead of
failing after it freed pgp, it would set pgp to NULL and then
dereference it. This patch fixes that. Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/structs.c b/libmultipath/structs.c
index ae847d6..caa178a 100644
--- a/libmultipath/structs.c
+++ b/libmultipath/structs.c
@@ -165,7 +165,7 @@ alloc_pathgroup (void)
 
 	if (!pgp->paths) {
 		FREE(pgp);
-		pgp = NULL;
+		return NULL;
 	}
 
 	dm_pathgroup_to_gen(pgp)->ops = &dm_gen_pathgroup_ops;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 12/19] libmutipath: don't use malformed uevents
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (10 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:31   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

A uevent that doesn't include the ACTION and DEVPATH fields is
malformed. It should be ignored, instead of used with those fields being
NULL.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/uevent.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/libmultipath/uevent.c b/libmultipath/uevent.c
index fd8ca35..5f910e6 100644
--- a/libmultipath/uevent.c
+++ b/libmultipath/uevent.c
@@ -729,6 +729,12 @@ struct uevent *uevent_from_udev_device(struct udev_device *dev)
 		if (i == HOTPLUG_NUM_ENVP - 1)
 			break;
 	}
+	if (!uev->devpath || ! uev->action) {
+		udev_device_unref(dev);
+		condlog(1, "uevent missing necessary fields");
+		FREE(uev);
+		return NULL;
+	}
 	uev->udev = dev;
 	uev->envp[i] = NULL;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (11 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:35   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

The code is attempting to verify that 0 <= k < 3
However, sizeof(val) is 12, assuming 4 byte integers. The check needs to
take integer size into account. Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipath/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/multipath/main.c b/multipath/main.c
index fc5bf16..d5aad95 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -482,7 +482,7 @@ static int print_cmd_valid(int k, const vector pathvec,
 	struct timespec until;
 	struct path *pp;
 
-	if (k < 0 || k >= sizeof(vals))
+	if (k < 0 || k >= (sizeof(vals) / sizeof(int)))
 		return 1;
 
 	if (k == 2) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 14/19] multipathd: function return value tweaks
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (12 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:37   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

In cli_add_map() the return value of get_refwwid is never used, and
refwwid is checked to see if the function returned successfully, so the
return value doesn't need to be saved.

In resize_map, if setup_map fails, multipathd shouldn't attempt to
create the device with resulting params string. It should just fail
instead. Found by coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/cli_handlers.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index 5682b5c..bb16472 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -796,8 +796,8 @@ cli_add_map (void * v, char ** reply, int * len, void * data)
 		if (!alias && !count) {
 			condlog(2, "%s: mapname not found for %d:%d",
 				param, major, minor);
-			rc = get_refwwid(CMD_NONE, param, DEV_DEVMAP,
-					 vecs->pathvec, &refwwid);
+			get_refwwid(CMD_NONE, param, DEV_DEVMAP,
+				    vecs->pathvec, &refwwid);
 			if (refwwid) {
 				if (coalesce_paths(vecs, NULL, refwwid,
 						   FORCE_RELOAD_NONE, CMD_NONE))
@@ -881,7 +881,12 @@ int resize_map(struct multipath *mpp, unsigned long long size,
 
 	mpp->size = size;
 	update_mpp_paths(mpp, vecs->pathvec);
-	setup_map(mpp, params, PARAMS_SIZE, vecs);
+	if (setup_map(mpp, params, PARAMS_SIZE, vecs) != 0) {
+		condlog(0, "%s: failed to setup map for resize : %s",
+			mpp->alias, strerror(errno));
+		mpp->size = orig_size;
+		return 1;
+	}
 	mpp->action = ACT_RESIZE;
 	mpp->force_udev_reload = 1;
 	if (domap(mpp, params, 1) <= 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 15/19] multipathd: minor fixes
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (13 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:38   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

In update_multipath(), conf is set again in a couple of lines, and
nothing uses it before then, so there's no point in setting it twice.
Also, in ev_remove_path(), strncpy() could end up unterminated, so
use strlcpy() instead.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index cc493c1..125a805 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -429,7 +429,7 @@ int update_multipath (struct vectors *vecs, char *mapname, int reset)
 				continue;
 
 			if (pp->state != PATH_DOWN) {
-				struct config *conf = get_multipath_config();
+				struct config *conf;
 				int oldstate = pp->state;
 				int checkint;
 
@@ -1096,7 +1096,7 @@ ev_remove_path (struct path *pp, struct vectors * vecs, int need_do_map)
 			/*
 			 * flush_map will fail if the device is open
 			 */
-			strncpy(alias, mpp->alias, WWID_SIZE);
+			strlcpy(alias, mpp->alias, WWID_SIZE);
 			if (mpp->flush_on_last_del == FLUSH_ENABLED) {
 				condlog(2, "%s Last path deleted, disabling queueing", mpp->alias);
 				mpp->retry_tick = 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 16/19] multipathd: remove useless check and fix format
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (14 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:40   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

The only thing this patch changes is to remove the check for pp->mpp
before the check for pp->mpp->prflags, since check_path() already
verified that pp->mpp != NULL. This fixes a number of coverity warnings.
Also, I normalized the spacing and indenting of the nearby code.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 125a805..3c2fe7b 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1978,14 +1978,14 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
 			return 1;
 		}
 
-		if(newstate == PATH_UP || newstate == PATH_GHOST){
-			if ( pp->mpp && pp->mpp->prflag ){
+		if (newstate == PATH_UP || newstate == PATH_GHOST) {
+			if (pp->mpp->prflag) {
 				/*
 				 * Check Persistent Reservation.
 				 */
-			condlog(2, "%s: checking persistent reservation "
-				"registration", pp->dev);
-			mpath_pr_event_handle(pp);
+				condlog(2, "%s: checking persistent "
+					"reservation registration", pp->dev);
+				mpath_pr_event_handle(pp);
 			}
 		}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 17/19] multipathd: fix memory leak on error in configure
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (15 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 21:42   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

If configure fails after allocing mpvec, it must free it. Found by
coverity.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 3c2fe7b..61ca455 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2277,7 +2277,7 @@ configure (struct vectors * vecs)
 	ret = path_discovery(vecs->pathvec, DI_ALL);
 	if (ret < 0) {
 		condlog(0, "configure failed at path discovery");
-		return 1;
+		goto fail;
 	}
 
 	vector_foreach_slot (vecs->pathvec, pp, i){
@@ -2294,7 +2294,7 @@ configure (struct vectors * vecs)
 	}
 	if (map_discovery(vecs)) {
 		condlog(0, "configure failed at map discovery");
-		return 1;
+		goto fail;
 	}
 
 	/*
@@ -2308,7 +2308,7 @@ configure (struct vectors * vecs)
 		force_reload = FORCE_RELOAD_YES;
 	if (ret) {
 		condlog(0, "configure failed while coalescing paths");
-		return 1;
+		goto fail;
 	}
 
 	/*
@@ -2317,7 +2317,7 @@ configure (struct vectors * vecs)
 	 */
 	if (coalesce_maps(vecs, mpvec)) {
 		condlog(0, "configure failed while coalescing maps");
-		return 1;
+		goto fail;
 	}
 
 	dm_lib_release();
@@ -2353,6 +2353,10 @@ configure (struct vectors * vecs)
 			i--;
 	}
 	return 0;
+
+fail:
+	vector_free(mpvec);
+	return 1;
 }
 
 int
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (16 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 22:00   ` Martin Wilck
  2018-10-08  9:35   ` Martin Wilck
  2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
  2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
  19 siblings, 2 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

When pathinfo fails for some likely transient reason, it clears the path
wwid, but otherwise returns successfully, to keep the path around but
not usable until it gets fully initialized. However, if the path has
already been initialized, and pathinfo hits a transient error, it
shouldn't clear the wwid.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 3e0db7f..33815dc 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1991,9 +1991,9 @@ blank:
 	/*
 	 * Recoverable error, for example faulty or offline path
 	 */
-	memset(pp->wwid, 0, WWID_SIZE);
 	pp->chkrstate = pp->state = PATH_DOWN;
-	pp->initialized = INIT_FAILED;
+	if (pp->initialized == INIT_FAILED)
+		memset(pp->wwid, 0, WWID_SIZE);
 
 	return PATHINFO_OK;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH v3 19/19] libmultipath: Fixup updating paths
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (17 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
@ 2018-09-21 23:05 ` Benjamin Marzinski
  2018-10-01 22:30   ` Martin Wilck
  2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
  19 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-09-21 23:05 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

Commit 582c56cc broke some code paths in uev_update_path. First, it
changed the handling of paths that were not fully initialized.
uev_update_path was simply setting the wwids for all of these paths.
Instead, it should ignore the ones that had not requested a new uevent.
These paths are likely down, and are already getting handled by
check_path, after it verifies that they have become active. Also,
setting the wwid doesn't update all of the other information that
may have been missed when the path was initially added.

Also, it wasn't possible for pp->wwid_changed to transition back to
zero, unless the path's wwid was empty, in which case there was no
reason to worry about the wwid change in the first place, since the path
hadn't been fully initialized yet. So, even if a path's wwid changed and
then changed back to the original value, the path still could not be
used.

This patch fixes these issues, and also moves the check for paths that
have requested a new uevent up in the functions. These paths will get
fully reinitialized anyway, so there is no reason to do all the other
work first.

Fixes: 582c56cc ("libmultipath: uev_update_path: always warn if WWID
changed")

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 61ca455..d6d122a 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1207,6 +1207,15 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 		struct multipath *mpp = pp->mpp;
 		char wwid[WWID_SIZE];
 
+		if (pp->initialized == INIT_REQUESTED_UDEV) {
+			needs_reinit = 1;
+			goto out;
+		}
+		/* Don't deal with other types of failed initialization
+		 * now. check_path will handle it */
+		if (!strlen(pp->wwid))
+			goto out;
+
 		strcpy(wwid, pp->wwid);
 		get_uid(pp, pp->state, uev->udev);
 
@@ -1215,9 +1224,8 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 				uev->kernel, wwid, pp->wwid,
 				(disable_changed_wwids ? "disallowing" :
 				 "continuing"));
-			if (disable_changed_wwids &&
-			    (strlen(wwid) || pp->wwid_changed)) {
-				strcpy(pp->wwid, wwid);
+			strcpy(pp->wwid, wwid);
+			if (disable_changed_wwids) {
 				if (!pp->wwid_changed) {
 					pp->wwid_changed = 1;
 					pp->tick = 1;
@@ -1225,11 +1233,9 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 						dm_fail_path(pp->mpp->alias, pp->dev_t);
 				}
 				goto out;
-			} else if (!disable_changed_wwids)
-				strcpy(pp->wwid, wwid);
-			else
-				pp->wwid_changed = 0;
+			}
 		} else {
+			pp->wwid_changed = 0;
 			udev_device_unref(pp->udev);
 			pp->udev = udev_device_ref(uev->udev);
 			conf = get_multipath_config();
@@ -1240,9 +1246,7 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
 			pthread_cleanup_pop(1);
 		}
 
-		if (pp->initialized == INIT_REQUESTED_UDEV)
-			needs_reinit = 1;
-		else if (mpp && ro >= 0) {
+		if (mpp && ro >= 0) {
 			condlog(2, "%s: update path write_protect to '%d' (uevent)", uev->kernel, ro);
 
 			if (mpp->wait_for_udev)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
@ 2018-10-01 19:51   ` Martin Wilck
  2018-10-04 16:31     ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 19:51 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> The code previously was timing out mode if ct->thread was 0 but
> ct->running wasn't. This combination never happens.  The idea was to
> timeout if for some reason the path checker tried to kill the thread,
> but it didn't die.  The correct thing to check for this is ct-
> >holders.
> ct->holders will always be at least one when libcheck_check() is
> called,
> since libcheck_free() won't get called until the device is no longer
> being checked. So, if ct->holders is 2, that means that the tur
> thread
> is has not shut down yet.
> 
> Also, instead of returning PATH_TIMEOUT whenever the thread hasn't
> died,
> it should only time out if the thread didn't successfully get a
> value,
> which means the previous state was already PATH_TIMEOUT.

What about PATH_UNCHECKED?

> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/tur.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index bf8486d..275541f 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -355,12 +355,15 @@ int libcheck_check(struct checker * c)
>  		}
>  		pthread_mutex_unlock(&ct->lock);
>  	} else {
> -		if (uatomic_read(&ct->running) != 0) {
> -			/* pthread cancel failed. continue in sync mode
> */
> -			pthread_mutex_unlock(&ct->lock);
> -			condlog(3, "%s: tur thread not responding",
> -				tur_devt(devt, sizeof(devt), ct));
> -			return PATH_TIMEOUT;
> +		if (uatomic_read(&ct->holders) > 1) {
> +			/* pthread cancel failed. If it didn't get the
> path
> +			   state already, timeout */
> +			if (ct->state == PATH_PENDING) {
> +				pthread_mutex_unlock(&ct->lock);
> +				condlog(3, "%s: tur thread not
> responding",
> +					tur_devt(devt, sizeof(devt),
> ct));
> +				return PATH_TIMEOUT;
> +			}
>  		}

I'm not quite getting it yet. We're entering this code path if 
ct->thread is 0. As you point out, if there are >1 holders, a
previously cancelled TUR thread must be still "alive". But now we want
to check *again*. The previous code refused to do that - a single
hanging TUR thread could block further checks from happening, forever.
The PATH_TIMEOUT return code was actually questionable - in this
libcheck_check() invocation, we didn't even try to check, so nothing
could actually time out (but that's obviously independent of your
patch).

With your patch, if ct->state != PATH_PENDING, we'd try to start
another TUR thread although there's still a hanging one. That's not
necessarily wrong, but I fail to see why it depends on ct->state.
Anyway, if we do this, we take the risk of starting more and more TUR
threads which might all end up hanging; I wonder if we should stop
creating more threads if ct->holders grows further (e.g. > 5).

Regards,
Martin


>  		/* Start new TUR checker */
>  		ct->state = PATH_UNCHECKED;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
@ 2018-10-01 20:09   ` Martin Wilck
  2018-10-01 20:44     ` Martin Wilck
  2018-10-04 16:45     ` Benjamin Marzinski
  0 siblings, 2 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 20:09 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development; +Cc: Bart Van Assche

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> tur_devt() locks ct->lock. However, it is ocassionally called while
> ct->lock is already locked. In reality, there is no reason why we
> need
> to lock all the accesses to ct->devt. The tur checker only needs to
> write to this variable one time, when it first gets the file
> descripter
> that it is checking.  It also never uses ct->devt directly. Instead,
> it
> always graps the major and minor, and turns them into a string. This
> patch changes ct->devt into that string, and sets it in
> libcheck_init()
> when it is first initializing the checker context. After that, ct-
> >devt
> is only ever read.

I like the lock removal a lot, but not so much the conversion into a
string. Why not keep the dev_t? 

Or maybe even easier, the other way around: why don't we make it a
char* and simply set checker->dev_t = &pp->dev_t?

Regards,
Martin

> 
> Cc: Bart Van Assche <bart.vanassche@wdc.com>4
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/tur.c | 55 +++++++++++++--------------------
> ------------
>  1 file changed, 16 insertions(+), 39 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 275541f..d173648 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -37,36 +37,24 @@
>  #define MSG_TUR_FAILED	"tur checker failed to initialize"
>  
>  struct tur_checker_context {
> -	dev_t devt;
> +	char devt[32];
>  	int state;
> -	int running;
> +	int running; /* uatomic access only */
>  	int fd;
>  	unsigned int timeout;
>  	time_t time;
>  	pthread_t thread;
>  	pthread_mutex_t lock;
>  	pthread_cond_t active;
> -	int holders;
> +	int holders; /* uatomic access only */
>  	char message[CHECKER_MSG_LEN];
>  };
>  
> -static const char *tur_devt(char *devt_buf, int size,
> -			    struct tur_checker_context *ct)
> -{
> -	dev_t devt;
> -
> -	pthread_mutex_lock(&ct->lock);
> -	devt = ct->devt;
> -	pthread_mutex_unlock(&ct->lock);
> -
> -	snprintf(devt_buf, size, "%d:%d", major(devt), minor(devt));
> -	return devt_buf;
> -}
> -
>  int libcheck_init (struct checker * c)
>  {
>  	struct tur_checker_context *ct;
>  	pthread_mutexattr_t attr;
> +	struct stat sb;
>  
>  	ct = malloc(sizeof(struct tur_checker_context));
>  	if (!ct)
> @@ -81,6 +69,9 @@ int libcheck_init (struct checker * c)
>  	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
>  	pthread_mutex_init(&ct->lock, &attr);
>  	pthread_mutexattr_destroy(&attr);
> +	if (fstat(c->fd, &sb) == 0)
> +		snprintf(ct->devt, sizeof(ct->devt), "%d:%d",
> major(sb.st_rdev),
> +			 minor(sb.st_rdev));
>  	c->context = ct;
>  
>  	return 0;
> @@ -232,14 +223,12 @@ static void *tur_thread(void *ctx)
>  {
>  	struct tur_checker_context *ct = ctx;
>  	int state, running;
> -	char devt[32];
>  
>  	/* This thread can be canceled, so setup clean up */
>  	tur_thread_cleanup_push(ct);
>  	rcu_register_thread();
>  
> -	condlog(3, "%s: tur checker starting up",
> -		tur_devt(devt, sizeof(devt), ct));
> +	condlog(3, "%s: tur checker starting up", ct->devt);
>  
>  	/* TUR checker start up */
>  	pthread_mutex_lock(&ct->lock);
> @@ -256,8 +245,8 @@ static void *tur_thread(void *ctx)
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
>  
> -	condlog(3, "%s: tur checker finished, state %s",
> -		tur_devt(devt, sizeof(devt), ct),
> checker_state_name(state));
> +	condlog(3, "%s: tur checker finished, state %s", ct->devt,
> +		checker_state_name(state));
>  
>  	running = uatomic_xchg(&ct->running, 0);
>  	if (!running)
> @@ -305,20 +294,12 @@ int libcheck_check(struct checker * c)
>  {
>  	struct tur_checker_context *ct = c->context;
>  	struct timespec tsp;
> -	struct stat sb;
>  	pthread_attr_t attr;
>  	int tur_status, r;
> -	char devt[32];
>  
>  	if (!ct)
>  		return PATH_UNCHECKED;
>  
> -	if (fstat(c->fd, &sb) == 0) {
> -		pthread_mutex_lock(&ct->lock);
> -		ct->devt = sb.st_rdev;
> -		pthread_mutex_unlock(&ct->lock);
> -	}
> -
>  	if (c->sync)
>  		return tur_check(c->fd, c->timeout,
> copy_msg_to_checker, c);
>  
> @@ -327,8 +308,7 @@ int libcheck_check(struct checker * c)
>  	 */
>  	r = pthread_mutex_lock(&ct->lock);
>  	if (r != 0) {
> -		condlog(2, "%s: tur mutex lock failed with %d",
> -			tur_devt(devt, sizeof(devt), ct), r);
> +		condlog(2, "%s: tur mutex lock failed with %d", ct-
> >devt, r);
>  		MSG(c, MSG_TUR_FAILED);
>  		return PATH_WILD;
>  	}
> @@ -338,14 +318,12 @@ int libcheck_check(struct checker * c)
>  			int running = uatomic_xchg(&ct->running, 0);
>  			if (running)
>  				pthread_cancel(ct->thread);
> -			condlog(3, "%s: tur checker timeout",
> -				tur_devt(devt, sizeof(devt), ct));
> +			condlog(3, "%s: tur checker timeout", ct-
> >devt);
>  			ct->thread = 0;
>  			MSG(c, MSG_TUR_TIMEOUT);
>  			tur_status = PATH_TIMEOUT;
>  		} else if (uatomic_read(&ct->running) != 0) {
> -			condlog(3, "%s: tur checker not finished",
> -					tur_devt(devt, sizeof(devt),
> ct));
> +			condlog(3, "%s: tur checker not finished", ct-
> >devt);
>  			tur_status = PATH_PENDING;
>  		} else {
>  			/* TUR checker done */
> @@ -361,7 +339,7 @@ int libcheck_check(struct checker * c)
>  			if (ct->state == PATH_PENDING) {
>  				pthread_mutex_unlock(&ct->lock);
>  				condlog(3, "%s: tur thread not
> responding",
> -					tur_devt(devt, sizeof(devt),
> ct));
> +					ct->devt);
>  				return PATH_TIMEOUT;
>  			}
>  		}
> @@ -381,7 +359,7 @@ int libcheck_check(struct checker * c)
>  			ct->thread = 0;
>  			pthread_mutex_unlock(&ct->lock);
>  			condlog(3, "%s: failed to start tur thread,
> using"
> -				" sync mode", tur_devt(devt,
> sizeof(devt), ct));
> +				" sync mode", ct->devt);
>  			return tur_check(c->fd, c->timeout,
>  					 copy_msg_to_checker, c);
>  		}
> @@ -392,8 +370,7 @@ int libcheck_check(struct checker * c)
>  		pthread_mutex_unlock(&ct->lock);
>  		if (uatomic_read(&ct->running) != 0 &&
>  		    (tur_status == PATH_PENDING || tur_status ==
> PATH_UNCHECKED)) {
> -			condlog(3, "%s: tur checker still running",
> -				tur_devt(devt, sizeof(devt), ct));
> +			condlog(3, "%s: tur checker still running", ct-
> >devt);
>  			tur_status = PATH_PENDING;
>  		} else {
>  			int running = uatomic_xchg(&ct->running, 0);

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-01 20:09   ` Martin Wilck
@ 2018-10-01 20:44     ` Martin Wilck
  2018-10-04 16:47       ` Benjamin Marzinski
  2018-10-04 16:45     ` Benjamin Marzinski
  1 sibling, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 20:44 UTC (permalink / raw)
  Cc: Bart Van Assche, device-mapper development

On Mon, 2018-10-01 at 22:09 +0200, Martin Wilck wrote:
> 
> I like the lock removal a lot, but not so much the conversion into a
> string. Why not keep the dev_t? 
> 
> Or maybe even easier, the other way around: why don't we make it a
> char* and simply set checker->dev_t = &pp->dev_t?

OK, that 2nd one won't work because pp may be destroyed before the
checker terminates. Got it. Still I'd prefer keeping a dev_t in the
checker structure rather than a char[32].

Martin

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 03/19] libmultipath: fix tur memory misuse
  2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
@ 2018-10-01 20:59   ` Martin Wilck
  2018-10-02  7:48     ` Martin Wilck
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 20:59 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development; +Cc: Bart Van Assche

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> when tur_thread() was calling tur_check(), it was passing ct->message 
> as
> the copy argument, but copy_msg_to_tcc() was assuming that it was
> getting a tur_checker_context pointer. This means it was treating
> ct->message as ct. This is why the tur checker never printed checker
> messages. Intead of simply changing the copy argument passed in, I
> just
> removed all the copying code, since it is completely unnecessary. The
> callers of tur_check() can just pass in a buffer that it is safe to
> write to, and copy it later, if necessary.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/tur.c | 46 +++++++++++----------------------
> ------------
>  1 file changed, 11 insertions(+), 35 deletions(-)

ACK. Bart's valgrind/DRD issues should be fixed by this one, too.

But upon closer inspection, the whole checker message handling seems to
be pointless. Checker messages are only used by the TUR checker, and
they simply duplicate the status code that the checker returns.

We might as well just ditch them...

Martin


> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index d173648..abda162 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -103,17 +103,8 @@ void libcheck_free (struct checker * c)
>  	return;
>  }
>  
> -#define TUR_MSG(fmt, args...)					
> \
> -	do {							\
> -		char msg[CHECKER_MSG_LEN];			\
> -								\
> -		snprintf(msg, sizeof(msg), fmt, ##args);	\
> -		copy_message(cb_arg, msg);			\
> -	} while (0)
> -
>  static int
> -tur_check(int fd, unsigned int timeout,
> -	  void (*copy_message)(void *, const char *), void *cb_arg)
> +tur_check(int fd, unsigned int timeout, char *msg)
>  {
>  	struct sg_io_hdr io_hdr;
>  	unsigned char turCmdBlk[TUR_CMD_LEN] = { 0x00, 0, 0, 0, 0, 0 };
> @@ -132,7 +123,7 @@ retry:
>  	io_hdr.timeout = timeout * 1000;
>  	io_hdr.pack_id = 0;
>  	if (ioctl(fd, SG_IO, &io_hdr) < 0) {
> -		TUR_MSG(MSG_TUR_DOWN);
> +		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_DOWN);
>  		return PATH_DOWN;
>  	}
>  	if ((io_hdr.status & 0x7e) == 0x18) {
> @@ -140,7 +131,7 @@ retry:
>  		 * SCSI-3 arrays might return
>  		 * reservation conflict on TUR
>  		 */
> -		TUR_MSG(MSG_TUR_UP);
> +		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_UP);
>  		return PATH_UP;
>  	}
>  	if (io_hdr.info & SG_INFO_OK_MASK) {
> @@ -185,14 +176,14 @@ retry:
>  				 * LOGICAL UNIT NOT ACCESSIBLE,
>  				 * TARGET PORT IN STANDBY STATE
>  				 */
> -				TUR_MSG(MSG_TUR_GHOST);
> +				snprintf(msg, CHECKER_MSG_LEN,
> MSG_TUR_GHOST);
>  				return PATH_GHOST;
>  			}
>  		}
> -		TUR_MSG(MSG_TUR_DOWN);
> +		snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_DOWN);
>  		return PATH_DOWN;
>  	}
> -	TUR_MSG(MSG_TUR_UP);
> +	snprintf(msg, CHECKER_MSG_LEN, MSG_TUR_UP);
>  	return PATH_UP;
>  }
>  
> @@ -210,19 +201,11 @@ static void cleanup_func(void *data)
>  	rcu_unregister_thread();
>  }
>  
> -static void copy_msg_to_tcc(void *ct_p, const char *msg)
> -{
> -	struct tur_checker_context *ct = ct_p;
> -
> -	pthread_mutex_lock(&ct->lock);
> -	strlcpy(ct->message, msg, sizeof(ct->message));
> -	pthread_mutex_unlock(&ct->lock);
> -}
> -
>  static void *tur_thread(void *ctx)
>  {
>  	struct tur_checker_context *ct = ctx;
>  	int state, running;
> +	char msg[CHECKER_MSG_LEN];
>  
>  	/* This thread can be canceled, so setup clean up */
>  	tur_thread_cleanup_push(ct);
> @@ -236,12 +219,13 @@ static void *tur_thread(void *ctx)
>  	ct->message[0] = '\0';
>  	pthread_mutex_unlock(&ct->lock);
>  
> -	state = tur_check(ct->fd, ct->timeout, copy_msg_to_tcc, ct-
> >message);
> +	state = tur_check(ct->fd, ct->timeout, msg);
>  	pthread_testcancel();
>  
>  	/* TUR checker done */
>  	pthread_mutex_lock(&ct->lock);
>  	ct->state = state;
> +	strlcpy(ct->message, msg, sizeof(ct->message));
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
>  
> @@ -283,13 +267,6 @@ static int tur_check_async_timeout(struct
> checker *c)
>  	return (now.tv_sec > ct->time);
>  }
>  
> -static void copy_msg_to_checker(void *c_p, const char *msg)
> -{
> -	struct checker *c = c_p;
> -
> -	strlcpy(c->message, msg, sizeof(c->message));
> -}
> -
>  int libcheck_check(struct checker * c)
>  {
>  	struct tur_checker_context *ct = c->context;
> @@ -301,7 +278,7 @@ int libcheck_check(struct checker * c)
>  		return PATH_UNCHECKED;
>  
>  	if (c->sync)
> -		return tur_check(c->fd, c->timeout,
> copy_msg_to_checker, c);
> +		return tur_check(c->fd, c->timeout, c->message);
>  
>  	/*
>  	 * Async mode
> @@ -360,8 +337,7 @@ int libcheck_check(struct checker * c)
>  			pthread_mutex_unlock(&ct->lock);
>  			condlog(3, "%s: failed to start tur thread,
> using"
>  				" sync mode", ct->devt);
> -			return tur_check(c->fd, c->timeout,
> -					 copy_msg_to_checker, c);
> +			return tur_check(c->fd, c->timeout, c-
> >message);
>  		}
>  		tur_timeout(&tsp);
>  		r = pthread_cond_timedwait(&ct->active, &ct->lock,
> &tsp);

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 04/19] libmultipath: cleanup tur locking
  2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
@ 2018-10-01 21:08   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:08 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> There are only three variables whose access needs to be synchronized
> between the tur thread and the path checker itself: state, message,
> and
> active.  The rest of the variables are either only written when the
> tur
> thread isn't running, or they aren't accessed by the tur thread, or
> they
> are atomics that are themselves used to synchronize things.
> 
> This patch limits the amount of code that is covered by ct->lock to
> only what needs to be locked. It also makes ct->lock no longer a
> recursive lock. To make this simpler, tur_thread now only sets the
> state and message one time, instead of twice, since PATH_UNCHECKED
> was never able to be returned anyway.
> 
> One benefit of this is that the tur checker thread gets more time to
> call tur_check() and return before libcheck_check() gives up and
> return PATH_PENDING.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/checkers/tur.c | 49 ++++++++++++++++++-----------------
> ----------
>  1 file changed, 20 insertions(+), 29 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index abda162..9f6ef51 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -53,7 +53,6 @@ struct tur_checker_context {
>  int libcheck_init (struct checker * c)
>  {
>  	struct tur_checker_context *ct;
> -	pthread_mutexattr_t attr;
>  	struct stat sb;
>  
>  	ct = malloc(sizeof(struct tur_checker_context));
> @@ -65,10 +64,7 @@ int libcheck_init (struct checker * c)
>  	ct->fd = -1;
>  	uatomic_set(&ct->holders, 1);
>  	pthread_cond_init_mono(&ct->active);
> -	pthread_mutexattr_init(&attr);
> -	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
> -	pthread_mutex_init(&ct->lock, &attr);
> -	pthread_mutexattr_destroy(&attr);
> +	pthread_mutex_init(&ct->lock, NULL);
>  	if (fstat(c->fd, &sb) == 0)
>  		snprintf(ct->devt, sizeof(ct->devt), "%d:%d",
> major(sb.st_rdev),
>  			 minor(sb.st_rdev));
> @@ -213,12 +209,6 @@ static void *tur_thread(void *ctx)
>  
>  	condlog(3, "%s: tur checker starting up", ct->devt);
>  
> -	/* TUR checker start up */
> -	pthread_mutex_lock(&ct->lock);
> -	ct->state = PATH_PENDING;
> -	ct->message[0] = '\0';
> -	pthread_mutex_unlock(&ct->lock);
> -
>  	state = tur_check(ct->fd, ct->timeout, msg);
>  	pthread_testcancel();
>  
> @@ -283,13 +273,6 @@ int libcheck_check(struct checker * c)
>  	/*
>  	 * Async mode
>  	 */
> -	r = pthread_mutex_lock(&ct->lock);
> -	if (r != 0) {
> -		condlog(2, "%s: tur mutex lock failed with %d", ct-
> >devt, r);
> -		MSG(c, MSG_TUR_FAILED);
> -		return PATH_WILD;
> -	}
> -
>  	if (ct->thread) {
>  		if (tur_check_async_timeout(c)) {
>  			int running = uatomic_xchg(&ct->running, 0);
> @@ -305,23 +288,29 @@ int libcheck_check(struct checker * c)
>  		} else {
>  			/* TUR checker done */
>  			ct->thread = 0;
> +			pthread_mutex_lock(&ct->lock);
>  			tur_status = ct->state;
>  			strlcpy(c->message, ct->message, sizeof(c-
> >message));
> +			pthread_mutex_unlock(&ct->lock);
>  		}
> -		pthread_mutex_unlock(&ct->lock);
>  	} else {
>  		if (uatomic_read(&ct->holders) > 1) {
>  			/* pthread cancel failed. If it didn't get the
> path
>  			   state already, timeout */
> -			if (ct->state == PATH_PENDING) {
> -				pthread_mutex_unlock(&ct->lock);
> +			pthread_mutex_lock(&ct->lock);
> +			tur_status = ct->state;
> +			pthread_mutex_unlock(&ct->lock);
> +			if (tur_status == PATH_PENDING) {
>  				condlog(3, "%s: tur thread not
> responding",
>  					ct->devt);
>  				return PATH_TIMEOUT;
>  			}
>  		}
>  		/* Start new TUR checker */
> -		ct->state = PATH_UNCHECKED;
> +		pthread_mutex_lock(&ct->lock);
> +		tur_status = ct->state = PATH_PENDING;
> +		ct->message[0] = '\0';
> +		pthread_mutex_unlock(&ct->lock);
>  		ct->fd = c->fd;
>  		ct->timeout = c->timeout;
>  		uatomic_add(&ct->holders, 1);
> @@ -334,20 +323,22 @@ int libcheck_check(struct checker * c)
>  			uatomic_sub(&ct->holders, 1);
>  			uatomic_set(&ct->running, 0);
>  			ct->thread = 0;
> -			pthread_mutex_unlock(&ct->lock);
>  			condlog(3, "%s: failed to start tur thread,
> using"
>  				" sync mode", ct->devt);
>  			return tur_check(c->fd, c->timeout, c-
> >message);
>  		}
>  		tur_timeout(&tsp);
> -		r = pthread_cond_timedwait(&ct->active, &ct->lock,
> &tsp);
> -		tur_status = ct->state;
> -		strlcpy(c->message, ct->message, sizeof(c->message));
> +		pthread_mutex_lock(&ct->lock);
> +		if (ct->state == PATH_PENDING)
> +			r = pthread_cond_timedwait(&ct->active, &ct-
> >lock, 
> +						   &tsp);
> +		if (!r) {
> +			tur_status = ct->state;
> +			strlcpy(c->message, ct->message, sizeof(c-
> >message));
> +		}
>  		pthread_mutex_unlock(&ct->lock);
> -		if (uatomic_read(&ct->running) != 0 &&
> -		    (tur_status == PATH_PENDING || tur_status ==
> PATH_UNCHECKED)) {
> +		if (tur_status == PATH_PENDING) {
>  			condlog(3, "%s: tur checker still running", ct-
> >devt);
> -			tur_status = PATH_PENDING;
>  		} else {
>  			int running = uatomic_xchg(&ct->running, 0);
>  			if (running)

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 05/19] libmultipath: fix tur checker timeout issue
  2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
@ 2018-10-01 21:09   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:09 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> If the tur checker is run, and the tur_thread has timed out,
> libcheck_check() doesn't actually check if the thread is still
> running.
> This means that the thread could have already completed successfully,
> but the tur checker would still return PATH_TIMEOUT, instead of the
> value returned by the thread. This patch makes libcheck_check()
> actually
> check if the thread completed, and if so, it returns the proper
> value.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/checkers/tur.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index 9f6ef51..4e2c7a8 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -276,12 +276,19 @@ int libcheck_check(struct checker * c)
>  	if (ct->thread) {
>  		if (tur_check_async_timeout(c)) {
>  			int running = uatomic_xchg(&ct->running, 0);
> -			if (running)
> +			if (running) {
>  				pthread_cancel(ct->thread);
> -			condlog(3, "%s: tur checker timeout", ct-
> >devt);
> +				condlog(3, "%s: tur checker timeout",
> ct->devt);
> +				MSG(c, MSG_TUR_TIMEOUT);
> +				tur_status = PATH_TIMEOUT;
> +			} else {
> +				pthread_mutex_lock(&ct->lock);
> +				tur_status = ct->state;
> +				strlcpy(c->message, ct->message,
> +					sizeof(c->message));
> +				pthread_mutex_unlock(&ct->lock);
> +			}
>  			ct->thread = 0;
> -			MSG(c, MSG_TUR_TIMEOUT);
> -			tur_status = PATH_TIMEOUT;
>  		} else if (uatomic_read(&ct->running) != 0) {
>  			condlog(3, "%s: tur checker not finished", ct-
> >devt);
>  			tur_status = PATH_PENDING;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 06/19] libmultipath: fix set_int error path
  2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
@ 2018-10-01 21:17   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:17 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> set_int() wasn't checking if the line actually had a value before
> converting it to an integer.  Found by coverity. Also, it should
> be using set_value().
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/dict.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 32524d5..bf4701e 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -33,7 +33,10 @@ set_int(vector strvec, void *ptr)
>  	int *int_ptr = (int *)ptr;
>  	char * buff;
>  
> -	buff = VECTOR_SLOT(strvec, 1);
> +	buff = set_value(strvec);
> +	if (!buff)
> +		return 1;
> +
>  	*int_ptr = atoi(buff);
>  
>  	return 0;

Well, I believe that validate_config_strvec() would have made sure
that VECTOR_SLOT(strvec, 1) exists and is non-NULL before set_int() is
called via keyword->handler(). Also, if we want to make this more
robust, we might want to use strtol() rather than atoi() to check for
an actual int argument.

Anyway:

Reviewed-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio
  2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
@ 2018-10-01 21:25   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:25 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> When get_vpd_sgio() finds out that the vpd info needed to be
> truncated
> to fit in the buffer, it doesn't trucate the size as well,  which
> allows
> it to overwrite the buffer. Also, in once len is set to -ENODATA,
> get_vpd_sgio() should exit, instead of using the negative len in
> memcpy(). Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  libmultipath/discovery.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 0b1855d..3e0db7f 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1116,17 +1116,21 @@ get_vpd_sgio (int fd, int pg, char * str, int
> maxlen)
>  		return -ENODATA;
>  	}
>  	buff_len = get_unaligned_be16(&buff[2]) + 4;
> -	if (buff_len > 4096)
> +	if (buff_len > 4096) {
>  		condlog(3, "vpd pg%02x page truncated", pg);
> -
> +		buff_len = 4096;
> +	}
>  	if (pg == 0x80)
>  		len = parse_vpd_pg80(buff, str, maxlen);
>  	else if (pg == 0x83)
>  		len = parse_vpd_pg83(buff, buff_len, str, maxlen);
>  	else if (pg == 0xc9 && maxlen >= 8) {
> -		len = buff_len < 8 ? -ENODATA :
> -			(buff_len <= maxlen ? buff_len : maxlen);
> -		memcpy (str, buff, len);
> +		if (buff_len < 8)
> +			len = -ENODATA;
> +		else {
> +			len = (buff_len <= maxlen)? buff_len : maxlen;
> +			memcpy (str, buff, len);
> +		}
>  	} else
>  		len = -ENOSYS;
>  

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 08/19] libmultipath: _install_keyword cleanup
  2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
@ 2018-10-01 21:26   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:26 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> _install_keyword should use VECTOR_LAST_SLOT(), which has better
> error
> checking. It should also fail if it gets a NULL pointer, instead of
> dereferencing it. Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  libmultipath/parser.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/libmultipath/parser.c b/libmultipath/parser.c
> index b8b7e0d..92ef7cf 100644
> --- a/libmultipath/parser.c
> +++ b/libmultipath/parser.c
> @@ -79,12 +79,16 @@ _install_keyword(vector keywords, char *string,
>  	struct keyword *keyword;
>  
>  	/* fetch last keyword */
> -	keyword = VECTOR_SLOT(keywords, VECTOR_SIZE(keywords) - 1);
> +	keyword = VECTOR_LAST_SLOT(keywords);
> +	if (!keyword)
> +		return 1;
>  
>  	/* position to last sub level */
> -	for (i = 0; i < sublevel; i++)
> -		keyword =
> -		    VECTOR_SLOT(keyword->sub, VECTOR_SIZE(keyword->sub) 
> - 1);
> +	for (i = 0; i < sublevel; i++) {
> +		keyword = VECTOR_LAST_SLOT(keyword->sub);
> +		if (!keyword)
> +			return 1;
> +	}
>  
>  	/* First sub level allocation */
>  	if (!keyword->sub)

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 09/19] libmultipath: remove unused code
  2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
@ 2018-10-01 21:28   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:28 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> since vector_foreach_slot() already checks if the entry is NULL,
> there's
> no point in checking it in the loop, since it can't be NULL there.
> Found
> by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Let's hope nobody ever removes that NULL check from
vector_foreach_slot()...

Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  libmultipath/print.c | 8 --------
>  1 file changed, 8 deletions(-)
> 
> diff --git a/libmultipath/print.c b/libmultipath/print.c
> index 9da6a77..7b610b9 100644
> --- a/libmultipath/print.c
> +++ b/libmultipath/print.c
> @@ -275,8 +275,6 @@ snprint_multipath_vpr (char * buff, size_t len,
> const struct multipath * mpp)
>  	int i, j;
>  
>  	vector_foreach_slot(mpp->pg, pgp, i) {
> -		if (!pgp)
> -			continue;
>  		vector_foreach_slot(pgp->paths, pp, j) {
>  			if (strlen(pp->vendor_id) && strlen(pp-
> >product_id))
>  				return snprintf(buff, len, "%s,%s",
> @@ -295,8 +293,6 @@ snprint_multipath_vend (char * buff, size_t len,
> const struct multipath * mpp)
>  	int i, j;
>  
>  	vector_foreach_slot(mpp->pg, pgp, i) {
> -		if (!pgp)
> -			continue;
>  		vector_foreach_slot(pgp->paths, pp, j) {
>  			if (strlen(pp->vendor_id))
>  				return snprintf(buff, len, "%s", pp-
> >vendor_id);
> @@ -313,8 +309,6 @@ snprint_multipath_prod (char * buff, size_t len,
> const struct multipath * mpp)
>  	int i, j;
>  
>  	vector_foreach_slot(mpp->pg, pgp, i) {
> -		if (!pgp)
> -			continue;
>  		vector_foreach_slot(pgp->paths, pp, j) {
>  			if (strlen(pp->product_id))
>  				return snprintf(buff, len, "%s", pp-
> >product_id);
> @@ -331,8 +325,6 @@ snprint_multipath_rev (char * buff, size_t len,
> const struct multipath * mpp)
>  	int i, j;
>  
>  	vector_foreach_slot(mpp->pg, pgp, i) {
> -		if (!pgp)
> -			continue;
>  		vector_foreach_slot(pgp->paths, pp, j) {
>  			if (strlen(pp->rev))
>  				return snprintf(buff, len, "%s", pp-
> >rev);

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio
  2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
@ 2018-10-01 21:30   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:30 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> The path_latency prioriziter was assuming that
> prepare_directio_read()
> always succeeds. However, it doesn't, and when it fails, the
> prioritizer
> used buf without it pointing to alloced memory. Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/prioritizers/path_latency.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/libmultipath/prioritizers/path_latency.c
> b/libmultipath/prioritizers/path_latency.c
> index 765265c..eeee01e 100644
> --- a/libmultipath/prioritizers/path_latency.c
> +++ b/libmultipath/prioritizers/path_latency.c
> @@ -237,7 +237,8 @@ int getprio(struct path *pp, char *args, unsigned
> int timeout)
>  	lg_maxavglatency = log(MAX_AVG_LATENCY) / lg_base;
>  	lg_minavglatency = log(MIN_AVG_LATENCY) / lg_base;
>  
> -	prepare_directio_read(pp->fd, &blksize, &buf, &restore_flags);
> +	if (prepare_directio_read(pp->fd, &blksize, &buf,
> &restore_flags) < 0)
> +		return PRIO_UNDEF;
>  
>  	temp = io_num;
>  	while (temp-- > 0) {

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 12/19] libmutipath: don't use malformed uevents
  2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
@ 2018-10-01 21:31   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:31 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> A uevent that doesn't include the ACTION and DEVPATH fields is
> malformed. It should be ignored, instead of used with those fields
> being
> NULL.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/uevent.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/libmultipath/uevent.c b/libmultipath/uevent.c
> index fd8ca35..5f910e6 100644
> --- a/libmultipath/uevent.c
> +++ b/libmultipath/uevent.c
> @@ -729,6 +729,12 @@ struct uevent *uevent_from_udev_device(struct
> udev_device *dev)
>  		if (i == HOTPLUG_NUM_ENVP - 1)
>  			break;
>  	}
> +	if (!uev->devpath || ! uev->action) {
> +		udev_device_unref(dev);
> +		condlog(1, "uevent missing necessary fields");
> +		FREE(uev);
> +		return NULL;
> +	}
>  	uev->udev = dev;
>  	uev->envp[i] = NULL;
>  

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group
  2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
@ 2018-10-01 21:33   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:33 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> If all_pathgroup failed to allocate a vector for pgp->paths, instead
> of
> failing after it freed pgp, it would set pgp to NULL and then
> dereference it. This patch fixes that. Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  libmultipath/structs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libmultipath/structs.c b/libmultipath/structs.c
> index ae847d6..caa178a 100644
> --- a/libmultipath/structs.c
> +++ b/libmultipath/structs.c
> @@ -165,7 +165,7 @@ alloc_pathgroup (void)
>  
>  	if (!pgp->paths) {
>  		FREE(pgp);
> -		pgp = NULL;
> +		return NULL;
>  	}
>  
>  	dm_pathgroup_to_gen(pgp)->ops = &dm_gen_pathgroup_ops;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid
  2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
@ 2018-10-01 21:35   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:35 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> The code is attempting to verify that 0 <= k < 3
> However, sizeof(val) is 12, assuming 4 byte integers. The check needs
> to
> take integer size into account. Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipath/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/multipath/main.c b/multipath/main.c
> index fc5bf16..d5aad95 100644
> --- a/multipath/main.c
> +++ b/multipath/main.c
> @@ -482,7 +482,7 @@ static int print_cmd_valid(int k, const vector
> pathvec,
>  	struct timespec until;
>  	struct path *pp;
>  
> -	if (k < 0 || k >= sizeof(vals))
> +	if (k < 0 || k >= (sizeof(vals) / sizeof(int)))
>  		return 1;
>  
>  	if (k == 2) {

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 14/19] multipathd: function return value tweaks
  2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
@ 2018-10-01 21:37   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:37 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> In cli_add_map() the return value of get_refwwid is never used, and
> refwwid is checked to see if the function returned successfully, so
> the
> return value doesn't need to be saved.
> 
> In resize_map, if setup_map fails, multipathd shouldn't attempt to
> create the device with resulting params string. It should just fail
> instead. Found by coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipathd/cli_handlers.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
> index 5682b5c..bb16472 100644
> --- a/multipathd/cli_handlers.c
> +++ b/multipathd/cli_handlers.c
> @@ -796,8 +796,8 @@ cli_add_map (void * v, char ** reply, int * len,
> void * data)
>  		if (!alias && !count) {
>  			condlog(2, "%s: mapname not found for %d:%d",
>  				param, major, minor);
> -			rc = get_refwwid(CMD_NONE, param, DEV_DEVMAP,
> -					 vecs->pathvec, &refwwid);
> +			get_refwwid(CMD_NONE, param, DEV_DEVMAP,
> +				    vecs->pathvec, &refwwid);
>  			if (refwwid) {
>  				if (coalesce_paths(vecs, NULL, refwwid,
>  						   FORCE_RELOAD_NONE,
> CMD_NONE))
> @@ -881,7 +881,12 @@ int resize_map(struct multipath *mpp, unsigned
> long long size,
>  
>  	mpp->size = size;
>  	update_mpp_paths(mpp, vecs->pathvec);
> -	setup_map(mpp, params, PARAMS_SIZE, vecs);
> +	if (setup_map(mpp, params, PARAMS_SIZE, vecs) != 0) {
> +		condlog(0, "%s: failed to setup map for resize : %s",
> +			mpp->alias, strerror(errno));
> +		mpp->size = orig_size;
> +		return 1;
> +	}
>  	mpp->action = ACT_RESIZE;
>  	mpp->force_udev_reload = 1;
>  	if (domap(mpp, params, 1) <= 0) {

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 15/19] multipathd: minor fixes
  2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
@ 2018-10-01 21:38   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:38 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> In update_multipath(), conf is set again in a couple of lines, and
> nothing uses it before then, so there's no point in setting it twice.
> Also, in ev_remove_path(), strncpy() could end up unterminated, so
> use strlcpy() instead.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipathd/main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index cc493c1..125a805 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -429,7 +429,7 @@ int update_multipath (struct vectors *vecs, char
> *mapname, int reset)
>  				continue;
>  
>  			if (pp->state != PATH_DOWN) {
> -				struct config *conf =
> get_multipath_config();
> +				struct config *conf;
>  				int oldstate = pp->state;
>  				int checkint;
>  
> @@ -1096,7 +1096,7 @@ ev_remove_path (struct path *pp, struct vectors
> * vecs, int need_do_map)
>  			/*
>  			 * flush_map will fail if the device is open
>  			 */
> -			strncpy(alias, mpp->alias, WWID_SIZE);
> +			strlcpy(alias, mpp->alias, WWID_SIZE);
>  			if (mpp->flush_on_last_del == FLUSH_ENABLED) {
>  				condlog(2, "%s Last path deleted,
> disabling queueing", mpp->alias);
>  				mpp->retry_tick = 0;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 16/19] multipathd: remove useless check and fix format
  2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
@ 2018-10-01 21:40   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:40 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> The only thing this patch changes is to remove the check for pp->mpp
> before the check for pp->mpp->prflags, since check_path() already
> verified that pp->mpp != NULL. This fixes a number of coverity
> warnings.
> Also, I normalized the spacing and indenting of the nearby code.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipathd/main.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 125a805..3c2fe7b 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1978,14 +1978,14 @@ check_path (struct vectors * vecs, struct
> path * pp, int ticks)
>  			return 1;
>  		}
>  
> -		if(newstate == PATH_UP || newstate == PATH_GHOST){
> -			if ( pp->mpp && pp->mpp->prflag ){
> +		if (newstate == PATH_UP || newstate == PATH_GHOST) {
> +			if (pp->mpp->prflag) {
>  				/*
>  				 * Check Persistent Reservation.
>  				 */
> -			condlog(2, "%s: checking persistent reservation
> "
> -				"registration", pp->dev);
> -			mpath_pr_event_handle(pp);
> +				condlog(2, "%s: checking persistent "
> +					"reservation registration", pp-
> >dev);
> +				mpath_pr_event_handle(pp);
>  			}
>  		}
>  

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 17/19] multipathd: fix memory leak on error in configure
  2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
@ 2018-10-01 21:42   ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 21:42 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> If configure fails after allocing mpvec, it must free it. Found by
> coverity.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipathd/main.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 3c2fe7b..61ca455 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2277,7 +2277,7 @@ configure (struct vectors * vecs)
>  	ret = path_discovery(vecs->pathvec, DI_ALL);
>  	if (ret < 0) {
>  		condlog(0, "configure failed at path discovery");
> -		return 1;
> +		goto fail;
>  	}
>  
>  	vector_foreach_slot (vecs->pathvec, pp, i){
> @@ -2294,7 +2294,7 @@ configure (struct vectors * vecs)
>  	}
>  	if (map_discovery(vecs)) {
>  		condlog(0, "configure failed at map discovery");
> -		return 1;
> +		goto fail;
>  	}
>  
>  	/*
> @@ -2308,7 +2308,7 @@ configure (struct vectors * vecs)
>  		force_reload = FORCE_RELOAD_YES;
>  	if (ret) {
>  		condlog(0, "configure failed while coalescing paths");
> -		return 1;
> +		goto fail;
>  	}
>  
>  	/*
> @@ -2317,7 +2317,7 @@ configure (struct vectors * vecs)
>  	 */
>  	if (coalesce_maps(vecs, mpvec)) {
>  		condlog(0, "configure failed while coalescing maps");
> -		return 1;
> +		goto fail;
>  	}
>  
>  	dm_lib_release();
> @@ -2353,6 +2353,10 @@ configure (struct vectors * vecs)
>  			i--;
>  	}
>  	return 0;
> +
> +fail:
> +	vector_free(mpvec);
> +	return 1;
>  }
>  
>  int

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
@ 2018-10-01 22:00   ` Martin Wilck
  2018-10-02 22:37     ` Martin Wilck
  2018-10-08  9:35   ` Martin Wilck
  1 sibling, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 22:00 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> When pathinfo fails for some likely transient reason, it clears the
> path
> wwid, but otherwise returns successfully, to keep the path around but
> not usable until it gets fully initialized. However, if the path has
> already been initialized, and pathinfo hits a transient error, it
> shouldn't clear the wwid.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/discovery.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 3e0db7f..33815dc 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1991,9 +1991,9 @@ blank:
>  	/*
>  	 * Recoverable error, for example faulty or offline path
>  	 */
> -	memset(pp->wwid, 0, WWID_SIZE);
>  	pp->chkrstate = pp->state = PATH_DOWN;
> -	pp->initialized = INIT_FAILED;
> +	if (pp->initialized == INIT_FAILED)
> +		memset(pp->wwid, 0, WWID_SIZE);
>  
>  	return PATHINFO_OK;
>  }

I am uncertain about this one. The old code sets pp->initialized to
INIT_FAILED. If the state had been INIT_MISSING_UDEV or
INIT_REQUESTED_UDEV before, this patch might change how the code
behaves later in check_path(), where these conditions are checked.

Likewise, tests for strlen(pp->wwid) are used in various places around
the code. These tests would now yield different results for paths in
"recoverable error" state.

Have you considered these possible side effects?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 19/19] libmultipath: Fixup updating paths
  2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
@ 2018-10-01 22:30   ` Martin Wilck
  2018-10-05 20:32     ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-01 22:30 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> Commit 582c56cc broke some code paths in uev_update_path. First, it
> changed the handling of paths that were not fully initialized.
> uev_update_path was simply setting the wwids for all of these paths.
> Instead, it should ignore the ones that had not requested a new
> uevent.
> These paths are likely down, and are already getting handled by
> check_path, after it verifies that they have become active. Also,
> setting the wwid doesn't update all of the other information that
> may have been missed when the path was initially added.
> 
> Also, it wasn't possible for pp->wwid_changed to transition back to
> zero, unless the path's wwid was empty, in which case there was no
> reason to worry about the wwid change in the first place, since the
> path
> hadn't been fully initialized yet. So, even if a path's wwid changed
> and
> then changed back to the original value, the path still could not be
> used.
> 
> This patch fixes these issues, and also moves the check for paths
> that
> have requested a new uevent up in the functions. These paths will get
> fully reinitialized anyway, so there is no reason to do all the other
> work first.
> 
> Fixes: 582c56cc ("libmultipath: uev_update_path: always warn if WWID
> changed")
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Thanks for fixing this. Looking at the cleaned-up logic after this
patch, I believe that we should hard-code "disable_changed_wwids" to 1.
It's crazy to realize that a path WWID has changed and simply go on as
if nothing had happened. We shouldn't allow this configuration any
more.

Reviewed-by: Martin Wilck <mwilck@suse.com>

> ---
>  multipathd/main.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 61ca455..d6d122a 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1207,6 +1207,15 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  		struct multipath *mpp = pp->mpp;
>  		char wwid[WWID_SIZE];
>  
> +		if (pp->initialized == INIT_REQUESTED_UDEV) {
> +			needs_reinit = 1;
> +			goto out;
> +		}
> +		/* Don't deal with other types of failed initialization
> +		 * now. check_path will handle it */
> +		if (!strlen(pp->wwid))
> +			goto out;
> +
>  		strcpy(wwid, pp->wwid);
>  		get_uid(pp, pp->state, uev->udev);
>  
> @@ -1215,9 +1224,8 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  				uev->kernel, wwid, pp->wwid,
>  				(disable_changed_wwids ? "disallowing"
> :
>  				 "continuing"));
> -			if (disable_changed_wwids &&
> -			    (strlen(wwid) || pp->wwid_changed)) {
> -				strcpy(pp->wwid, wwid);
> +			strcpy(pp->wwid, wwid);
> +			if (disable_changed_wwids) {
>  				if (!pp->wwid_changed) {
>  					pp->wwid_changed = 1;
>  					pp->tick = 1;
> @@ -1225,11 +1233,9 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  						dm_fail_path(pp->mpp-
> >alias, pp->dev_t);
>  				}
>  				goto out;
> -			} else if (!disable_changed_wwids)
> -				strcpy(pp->wwid, wwid);
> -			else
> -				pp->wwid_changed = 0;
> +			}
>  		} else {
> +			pp->wwid_changed = 0;
>  			udev_device_unref(pp->udev);
>  			pp->udev = udev_device_ref(uev->udev);
>  			conf = get_multipath_config();
> @@ -1240,9 +1246,7 @@ uev_update_path (struct uevent *uev, struct
> vectors * vecs)
>  			pthread_cleanup_pop(1);
>  		}
>  
> -		if (pp->initialized == INIT_REQUESTED_UDEV)
> -			needs_reinit = 1;
> -		else if (mpp && ro >= 0) {
> +		if (mpp && ro >= 0) {
>  			condlog(2, "%s: update path write_protect to
> '%d' (uevent)", uev->kernel, ro);
>  
>  			if (mpp->wait_for_udev)

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 03/19] libmultipath: fix tur memory misuse
  2018-10-01 20:59   ` Martin Wilck
@ 2018-10-02  7:48     ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-02  7:48 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development; +Cc: Bart Van Assche

On Mon, 2018-10-01 at 22:59 +0200, Martin Wilck wrote:
> 
> But upon closer inspection, the whole checker message handling seems
> to
> be pointless. Checker messages are only used by the TUR checker, and
> they simply duplicate the status code that the checker returns.
> 
> We might as well just ditch them...

Forget it. I was fooled by the fact that all checkers except TUR use
MSG() to set the message. Sorry for the noise.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-10-01 22:00   ` Martin Wilck
@ 2018-10-02 22:37     ` Martin Wilck
  2018-10-05 19:38       ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-02 22:37 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

Hi Ben,

On Tue, 2018-10-02 at 00:00 +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > When pathinfo fails for some likely transient reason, it clears the
> > path
> > wwid, but otherwise returns successfully, to keep the path around
> > but
> > not usable until it gets fully initialized. However, if the path
> > has
> > already been initialized, and pathinfo hits a transient error, it
> > shouldn't clear the wwid.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/discovery.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> > index 3e0db7f..33815dc 100644
> > --- a/libmultipath/discovery.c
> > +++ b/libmultipath/discovery.c
> > @@ -1991,9 +1991,9 @@ blank:
> >  	/*
> >  	 * Recoverable error, for example faulty or offline path
> >  	 */
> > -	memset(pp->wwid, 0, WWID_SIZE);
> >  	pp->chkrstate = pp->state = PATH_DOWN;
> > -	pp->initialized = INIT_FAILED;
> > +	if (pp->initialized == INIT_FAILED)
> > +		memset(pp->wwid, 0, WWID_SIZE);
> >  
> >  	return PATHINFO_OK;
> >  }
> 
> I am uncertain about this one. The old code sets pp->initialized to
> INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> INIT_REQUESTED_UDEV before, this patch might change how the code
> behaves later in check_path(), where these conditions are checked.
> 
> Likewise, tests for strlen(pp->wwid) are used in various places
> around
> the code. These tests would now yield different results for paths in
> "recoverable error" state.
> 
> Have you considered these possible side effects?

I've pondered over this a lot. The dust is clearing up a bit.

1. With your patch in place, INIT_FAILED is never set except in
alloc_path() (we might rename it to INIT_NEW or the like, but see
below).

2. I don't understand how you handle repeated failure to retrieve the
WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
retrieve the WWID from sysfs after retriggers are exhausted. But I
don't see how pathinfo(DI_WWID) would ever be called in this situation:

In the last invocation, pathinfo() had failed to retrieve the WWID and
set pp->initialized = INIT_MISSING_UDEV. There it will remain because
check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
more from the "add missing path" code, because of the (pp->initialized
!= INIT_MISSING_UDEV) condition.

Am I overlooking something?

3. If "blank" state means that important device information couldn't be
retrieved because of presumably transient failure conditions, we should
retry to retrieve this information by calling pathinfo again later. But
unless the WWID is (reset to) the empty string, check_path() won't call
pathinfo(DI_ALL) any more.

4. The "blank" logic in pathinfo() combines several very different
cases.
  a) PATH_REMOVED status from path_offline(). This means that
elementary sysfs attributes were missing. This is almost the same as
failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
around.
  b) Failure in checker_check(). If the path is offline in the first
place, the checker isn't called, and WWID determination is attempted.
But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
state.
  c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
functions never fail, so this can't happen. I've patches here to fix
that.  
  d) Failure to open pp->fd. 

d) is the only case in which the "blank" logic makes really sense to
me. It can happen only at the first pathinfo() invocation, meaning 
pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
would change nothing for this case.

a) and b) can happen for paths that have been initialized already. I
think in case a) the WWID should be reset, probably initialized should
be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
b) we should IMO proceed normally rather than goto "blank". Resetting
the WWID in case b) is nonsense, agreed.

Altogether, if my analysis is correct, your patch (not blanking the
WWID) should be applied to case b) only.

Please comment - I still feel a bit confused and may have overlooked
something essential.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-10-01 19:51   ` Martin Wilck
@ 2018-10-04 16:31     ` Benjamin Marzinski
  2018-10-05 10:11       ` Martin Wilck
  0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-04 16:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Mon, Oct 01, 2018 at 09:51:22PM +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > The code previously was timing out mode if ct->thread was 0 but
> > ct->running wasn't. This combination never happens.  The idea was to
> > timeout if for some reason the path checker tried to kill the thread,
> > but it didn't die.  The correct thing to check for this is ct-
> > >holders.
> > ct->holders will always be at least one when libcheck_check() is
> > called,
> > since libcheck_free() won't get called until the device is no longer
> > being checked. So, if ct->holders is 2, that means that the tur
> > thread
> > is has not shut down yet.
> > 
> > Also, instead of returning PATH_TIMEOUT whenever the thread hasn't
> > died,
> > it should only time out if the thread didn't successfully get a
> > value,
> > which means the previous state was already PATH_TIMEOUT.
> 
> What about PATH_UNCHECKED?
> 

I don't see how that could happen. In this state we've definitely set
ct->state to PATH_PENDING when we started the thread. The thread will
only change  ct->state to the return of tur_check(), which doesn't
ever return PATH_UNCHECKED. Am I missing something here?

> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/checkers/tur.c | 15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index bf8486d..275541f 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -355,12 +355,15 @@ int libcheck_check(struct checker * c)
> >  		}
> >  		pthread_mutex_unlock(&ct->lock);
> >  	} else {
> > -		if (uatomic_read(&ct->running) != 0) {
> > -			/* pthread cancel failed. continue in sync mode
> > */
> > -			pthread_mutex_unlock(&ct->lock);
> > -			condlog(3, "%s: tur thread not responding",
> > -				tur_devt(devt, sizeof(devt), ct));
> > -			return PATH_TIMEOUT;
> > +		if (uatomic_read(&ct->holders) > 1) {
> > +			/* pthread cancel failed. If it didn't get the
> > path
> > +			   state already, timeout */
> > +			if (ct->state == PATH_PENDING) {
> > +				pthread_mutex_unlock(&ct->lock);
> > +				condlog(3, "%s: tur thread not
> > responding",
> > +					tur_devt(devt, sizeof(devt),
> > ct));
> > +				return PATH_TIMEOUT;
> > +			}
> >  		}
> 
> I'm not quite getting it yet. We're entering this code path if 
> ct->thread is 0. As you point out, if there are >1 holders, a
> previously cancelled TUR thread must be still "alive". But now we want
> to check *again*. The previous code refused to do that - a single
> hanging TUR thread could block further checks from happening, forever.
> The PATH_TIMEOUT return code was actually questionable - in this
> libcheck_check() invocation, we didn't even try to check, so nothing
> could actually time out (but that's obviously independent of your
> patch).
> 
> With your patch, if ct->state != PATH_PENDING, we'd try to start
> another TUR thread although there's still a hanging one. That's not
> necessarily wrong, but I fail to see why it depends on ct->state.
> Anyway, if we do this, we take the risk of starting more and more TUR
> threads which might all end up hanging; I wonder if we should stop
> creating more threads if ct->holders grows further (e.g. > 5).

It's been a while, and I'm not exactly sure what I was thinking here.
But IIRC the idea was that if the state isn't set yet, then the old
thread could mess with the results of a future thread. Also, it was to
avoid a corner case where the tur checker was called, started the
thread, got the result before the checker exitted, cancelled the thread
and returned the result and then was called very shortly afterwards,
before the thread could clean itself up.  In that case, the right thing
to do (I thought) would be to start a new thread, because the other one
would be ending soon.  In truth, starting a new thread at all is
probably a bad idea, since the old thread will still mess with the
checker context on exit. 

A better idea might be to simply fail back to a syncronous call to
tur_check() when you notice a cancelled thread that hasn't exitted.
That can cause all the other checkers to get delayed, but at least in
that case you are still checking paths. The other option is to do as
before and just not check this path, which won't effect the other
checkers. It seems very unlikely that the thread could remain
uncancelled forever, especially if the path was usable.

thoughts?

-Ben

> Regards,
> Martin
> 
> 
> >  		/* Start new TUR checker */
> >  		ct->state = PATH_UNCHECKED;
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-01 20:09   ` Martin Wilck
  2018-10-01 20:44     ` Martin Wilck
@ 2018-10-04 16:45     ` Benjamin Marzinski
  2018-10-05 10:25       ` Martin Wilck
  1 sibling, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-04 16:45 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, device-mapper development

On Mon, Oct 01, 2018 at 10:09:41PM +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > tur_devt() locks ct->lock. However, it is ocassionally called while
> > ct->lock is already locked. In reality, there is no reason why we
> > need
> > to lock all the accesses to ct->devt. The tur checker only needs to
> > write to this variable one time, when it first gets the file
> > descripter
> > that it is checking.  It also never uses ct->devt directly. Instead,
> > it
> > always graps the major and minor, and turns them into a string. This
> > patch changes ct->devt into that string, and sets it in
> > libcheck_init()
> > when it is first initializing the checker context. After that, ct-
> > >devt
> > is only ever read.
> 
> I like the lock removal a lot, but not so much the conversion into a
> string. Why not keep the dev_t? 

Because we will simply convert it into a string every time we use it,
instead of doing the work one time. It's 24 more bytes in the
tur_checker_context, but the code is easier to read, and we're not doing
the same work again and again.

> 
> Or maybe even easier, the other way around: why don't we make it a
> char* and simply set checker->dev_t = &pp->dev_t?

The whole reason we have tur_checker_context->holders is that it's
possible for a path to be removed (or orphaned) while the thread is
still running. The tur_checker_context needs to keep all its own
storage, so that it never as to worry about pointing to freed memory.

-Ben

> Regards,
> Martin
> 
> > 
> > Cc: Bart Van Assche <bart.vanassche@wdc.com>4
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/checkers/tur.c | 55 +++++++++++++--------------------
> > ------------
> >  1 file changed, 16 insertions(+), 39 deletions(-)
> > 
> > diff --git a/libmultipath/checkers/tur.c
> > b/libmultipath/checkers/tur.c
> > index 275541f..d173648 100644
> > --- a/libmultipath/checkers/tur.c
> > +++ b/libmultipath/checkers/tur.c
> > @@ -37,36 +37,24 @@
> >  #define MSG_TUR_FAILED	"tur checker failed to initialize"
> >  
> >  struct tur_checker_context {
> > -	dev_t devt;
> > +	char devt[32];
> >  	int state;
> > -	int running;
> > +	int running; /* uatomic access only */
> >  	int fd;
> >  	unsigned int timeout;
> >  	time_t time;
> >  	pthread_t thread;
> >  	pthread_mutex_t lock;
> >  	pthread_cond_t active;
> > -	int holders;
> > +	int holders; /* uatomic access only */
> >  	char message[CHECKER_MSG_LEN];
> >  };
> >  
> > -static const char *tur_devt(char *devt_buf, int size,
> > -			    struct tur_checker_context *ct)
> > -{
> > -	dev_t devt;
> > -
> > -	pthread_mutex_lock(&ct->lock);
> > -	devt = ct->devt;
> > -	pthread_mutex_unlock(&ct->lock);
> > -
> > -	snprintf(devt_buf, size, "%d:%d", major(devt), minor(devt));
> > -	return devt_buf;
> > -}
> > -
> >  int libcheck_init (struct checker * c)
> >  {
> >  	struct tur_checker_context *ct;
> >  	pthread_mutexattr_t attr;
> > +	struct stat sb;
> >  
> >  	ct = malloc(sizeof(struct tur_checker_context));
> >  	if (!ct)
> > @@ -81,6 +69,9 @@ int libcheck_init (struct checker * c)
> >  	pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);
> >  	pthread_mutex_init(&ct->lock, &attr);
> >  	pthread_mutexattr_destroy(&attr);
> > +	if (fstat(c->fd, &sb) == 0)
> > +		snprintf(ct->devt, sizeof(ct->devt), "%d:%d",
> > major(sb.st_rdev),
> > +			 minor(sb.st_rdev));
> >  	c->context = ct;
> >  
> >  	return 0;
> > @@ -232,14 +223,12 @@ static void *tur_thread(void *ctx)
> >  {
> >  	struct tur_checker_context *ct = ctx;
> >  	int state, running;
> > -	char devt[32];
> >  
> >  	/* This thread can be canceled, so setup clean up */
> >  	tur_thread_cleanup_push(ct);
> >  	rcu_register_thread();
> >  
> > -	condlog(3, "%s: tur checker starting up",
> > -		tur_devt(devt, sizeof(devt), ct));
> > +	condlog(3, "%s: tur checker starting up", ct->devt);
> >  
> >  	/* TUR checker start up */
> >  	pthread_mutex_lock(&ct->lock);
> > @@ -256,8 +245,8 @@ static void *tur_thread(void *ctx)
> >  	pthread_cond_signal(&ct->active);
> >  	pthread_mutex_unlock(&ct->lock);
> >  
> > -	condlog(3, "%s: tur checker finished, state %s",
> > -		tur_devt(devt, sizeof(devt), ct),
> > checker_state_name(state));
> > +	condlog(3, "%s: tur checker finished, state %s", ct->devt,
> > +		checker_state_name(state));
> >  
> >  	running = uatomic_xchg(&ct->running, 0);
> >  	if (!running)
> > @@ -305,20 +294,12 @@ int libcheck_check(struct checker * c)
> >  {
> >  	struct tur_checker_context *ct = c->context;
> >  	struct timespec tsp;
> > -	struct stat sb;
> >  	pthread_attr_t attr;
> >  	int tur_status, r;
> > -	char devt[32];
> >  
> >  	if (!ct)
> >  		return PATH_UNCHECKED;
> >  
> > -	if (fstat(c->fd, &sb) == 0) {
> > -		pthread_mutex_lock(&ct->lock);
> > -		ct->devt = sb.st_rdev;
> > -		pthread_mutex_unlock(&ct->lock);
> > -	}
> > -
> >  	if (c->sync)
> >  		return tur_check(c->fd, c->timeout,
> > copy_msg_to_checker, c);
> >  
> > @@ -327,8 +308,7 @@ int libcheck_check(struct checker * c)
> >  	 */
> >  	r = pthread_mutex_lock(&ct->lock);
> >  	if (r != 0) {
> > -		condlog(2, "%s: tur mutex lock failed with %d",
> > -			tur_devt(devt, sizeof(devt), ct), r);
> > +		condlog(2, "%s: tur mutex lock failed with %d", ct-
> > >devt, r);
> >  		MSG(c, MSG_TUR_FAILED);
> >  		return PATH_WILD;
> >  	}
> > @@ -338,14 +318,12 @@ int libcheck_check(struct checker * c)
> >  			int running = uatomic_xchg(&ct->running, 0);
> >  			if (running)
> >  				pthread_cancel(ct->thread);
> > -			condlog(3, "%s: tur checker timeout",
> > -				tur_devt(devt, sizeof(devt), ct));
> > +			condlog(3, "%s: tur checker timeout", ct-
> > >devt);
> >  			ct->thread = 0;
> >  			MSG(c, MSG_TUR_TIMEOUT);
> >  			tur_status = PATH_TIMEOUT;
> >  		} else if (uatomic_read(&ct->running) != 0) {
> > -			condlog(3, "%s: tur checker not finished",
> > -					tur_devt(devt, sizeof(devt),
> > ct));
> > +			condlog(3, "%s: tur checker not finished", ct-
> > >devt);
> >  			tur_status = PATH_PENDING;
> >  		} else {
> >  			/* TUR checker done */
> > @@ -361,7 +339,7 @@ int libcheck_check(struct checker * c)
> >  			if (ct->state == PATH_PENDING) {
> >  				pthread_mutex_unlock(&ct->lock);
> >  				condlog(3, "%s: tur thread not
> > responding",
> > -					tur_devt(devt, sizeof(devt),
> > ct));
> > +					ct->devt);
> >  				return PATH_TIMEOUT;
> >  			}
> >  		}
> > @@ -381,7 +359,7 @@ int libcheck_check(struct checker * c)
> >  			ct->thread = 0;
> >  			pthread_mutex_unlock(&ct->lock);
> >  			condlog(3, "%s: failed to start tur thread,
> > using"
> > -				" sync mode", tur_devt(devt,
> > sizeof(devt), ct));
> > +				" sync mode", ct->devt);
> >  			return tur_check(c->fd, c->timeout,
> >  					 copy_msg_to_checker, c);
> >  		}
> > @@ -392,8 +370,7 @@ int libcheck_check(struct checker * c)
> >  		pthread_mutex_unlock(&ct->lock);
> >  		if (uatomic_read(&ct->running) != 0 &&
> >  		    (tur_status == PATH_PENDING || tur_status ==
> > PATH_UNCHECKED)) {
> > -			condlog(3, "%s: tur checker still running",
> > -				tur_devt(devt, sizeof(devt), ct));
> > +			condlog(3, "%s: tur checker still running", ct-
> > >devt);
> >  			tur_status = PATH_PENDING;
> >  		} else {
> >  			int running = uatomic_xchg(&ct->running, 0);
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-01 20:44     ` Martin Wilck
@ 2018-10-04 16:47       ` Benjamin Marzinski
  0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-04 16:47 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, dm-devel

On Mon, Oct 01, 2018 at 10:44:06PM +0200, Martin Wilck wrote:
> On Mon, 2018-10-01 at 22:09 +0200, Martin Wilck wrote:
> > 
> > I like the lock removal a lot, but not so much the conversion into a
> > string. Why not keep the dev_t? 
> > 
> > Or maybe even easier, the other way around: why don't we make it a
> > char* and simply set checker->dev_t = &pp->dev_t?
> 
> OK, that 2nd one won't work because pp may be destroyed before the
> checker terminates. Got it. Still I'd prefer keeping a dev_t in the
> checker structure rather than a char[32].

Why? We never do anything with it as a dev_t except change it into a
string. If you feel strongly about this, I don't really care, and I'll
change it. I just don't understand your objection.

-Ben

> 
> Martin
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-10-04 16:31     ` Benjamin Marzinski
@ 2018-10-05 10:11       ` Martin Wilck
  2018-10-05 17:02         ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-05 10:11 UTC (permalink / raw)
  To: Benjamin Marzinski, mwilck+gmail; +Cc: device-mapper development

On Thu, 2018-10-04 at 11:31 -0500,  Benjamin Marzinski  wrote:
> On Mon, Oct 01, 2018 at 09:51:22PM +0200, Martin Wilck wrote:
> > On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > > The code previously was timing out mode if ct->thread was 0 but
> > > ct->running wasn't. This combination never happens.  The idea was
> > > to
> > > timeout if for some reason the path checker tried to kill the
> > > thread,
> > > but it didn't die.  The correct thing to check for this is ct-
> > > > holders.
> > > 
> > > ct->holders will always be at least one when libcheck_check() is
> > > called,
> > > since libcheck_free() won't get called until the device is no
> > > longer
> > > being checked. So, if ct->holders is 2, that means that the tur
> > > thread
> > > is has not shut down yet.
> > > 
> > > Also, instead of returning PATH_TIMEOUT whenever the thread
> > > hasn't
> > > died,
> > > it should only time out if the thread didn't successfully get a
> > > value,
> > > which means the previous state was already PATH_TIMEOUT.
> > 
> > What about PATH_UNCHECKED?
> > 
> 
> I don't see how that could happen. In this state we've definitely set
> ct->state to PATH_PENDING when we started the thread. The thread will
> only change  ct->state to the return of tur_check(), which doesn't
> ever return PATH_UNCHECKED. Am I missing something here?

OK. The only thing you missed was a comment :-)

This stuff is so subtle that we should describe exactly how
the locking works, which actor is supposed/entitled to set what field
to what value in what situation, and so on.

> 
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > > ---
> > >  libmultipath/checkers/tur.c | 15 +++++++++------
> > >  1 file changed, 9 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/libmultipath/checkers/tur.c
> > > b/libmultipath/checkers/tur.c
> > > index bf8486d..275541f 100644
> > > --- a/libmultipath/checkers/tur.c
> > > +++ b/libmultipath/checkers/tur.c
> > > @@ -355,12 +355,15 @@ int libcheck_check(struct checker * c)
> > >  		}
> > >  		pthread_mutex_unlock(&ct->lock);
> > >  	} else {
> > > -		if (uatomic_read(&ct->running) != 0) {
> > > -			/* pthread cancel failed. continue in sync mode
> > > */
> > > -			pthread_mutex_unlock(&ct->lock);
> > > -			condlog(3, "%s: tur thread not responding",
> > > -				tur_devt(devt, sizeof(devt), ct));
> > > -			return PATH_TIMEOUT;
> > > +		if (uatomic_read(&ct->holders) > 1) {
> > > +			/* pthread cancel failed. If it didn't get the
> > > path
> > > +			   state already, timeout */
> > > +			if (ct->state == PATH_PENDING) {
> > > +				pthread_mutex_unlock(&ct->lock);
> > > +				condlog(3, "%s: tur thread not
> > > responding",
> > > +					tur_devt(devt, sizeof(devt),
> > > ct));
> > > +				return PATH_TIMEOUT;
> > > +			}
> > >  		}
> > 
> > I'm not quite getting it yet. We're entering this code path if 
> > ct->thread is 0. As you point out, if there are >1 holders, a
> > previously cancelled TUR thread must be still "alive". But now we
> > want
> > to check *again*. The previous code refused to do that - a single
> > hanging TUR thread could block further checks from happening,
> > forever.
> > The PATH_TIMEOUT return code was actually questionable - in this
> > libcheck_check() invocation, we didn't even try to check, so
> > nothing
> > could actually time out (but that's obviously independent of your
> > patch).
> > 
> > With your patch, if ct->state != PATH_PENDING, we'd try to start
> > another TUR thread although there's still a hanging one. That's not
> > necessarily wrong, but I fail to see why it depends on ct->state.
> > Anyway, if we do this, we take the risk of starting more and more
> > TUR
> > threads which might all end up hanging; I wonder if we should stop
> > creating more threads if ct->holders grows further (e.g. > 5).
> 
> It's been a while, and I'm not exactly sure what I was thinking here.
> But IIRC the idea was that if the state isn't set yet, then the old
> thread could mess with the results of a future thread. Also, it was
> to
> avoid a corner case where the tur checker was called, started the
> thread, got the result before the checker exitted, cancelled the
> thread
> and returned the result and then was called very shortly afterwards,
> before the thread could clean itself up.  In that case, the right
> thing
> to do (I thought) would be to start a new thread, because the other
> one
> would be ending soon.  In truth, starting a new thread at all is
> probably a bad idea, since the old thread will still mess with the
> checker context on exit. 
> 
> A better idea might be to simply fail back to a syncronous call to
> tur_check() when you notice a cancelled thread that hasn't exitted.
> That can cause all the other checkers to get delayed, but at least in
> that case you are still checking paths. The other option is to do as
> before and just not check this path, which won't effect the other
> checkers. It seems very unlikely that the thread could remain
> uncancelled forever, especially if the path was usable.
>
> thoughts?

Generally speaking, we're deeply in the realm of highly unlikely
situations I would say. But we should get it right eventually.

Maybe we can add logic to the tur thread to keep its hands off the
context if it's a "zombie", like below (just a thought, untested)?

Martin


diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index bf8486d3..e8493ca8 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -219,11 +219,18 @@ static void cleanup_func(void *data)
 	rcu_unregister_thread();
 }
 
+#define tur_thread_quit_unless_owner(__ct)	   \
+	if (__ct->thread != pthread_self()) {	   \
+		pthread_mutex_unlock(&__ct->lock); \
+		pthread_exit(NULL);		   \
+	}
+
 static void copy_msg_to_tcc(void *ct_p, const char *msg)
 {
 	struct tur_checker_context *ct = ct_p;
 
 	pthread_mutex_lock(&ct->lock);
+	tur_thread_quit_unless_owner(ct);
 	strlcpy(ct->message, msg, sizeof(ct->message));
 	pthread_mutex_unlock(&ct->lock);
 }
@@ -243,6 +250,7 @@ static void *tur_thread(void *ctx)
 
 	/* TUR checker start up */
 	pthread_mutex_lock(&ct->lock);
+	tur_thread_quit_unless_owner(ct);
 	ct->state = PATH_PENDING;
 	ct->message[0] = '\0';
 	pthread_mutex_unlock(&ct->lock);
@@ -252,6 +260,7 @@ static void *tur_thread(void *ctx)
 
 	/* TUR checker done */
 	pthread_mutex_lock(&ct->lock);
+	tur_thread_quit_unless_owner(ct);
 	ct->state = state;
 	pthread_cond_signal(&ct->active);
 	pthread_mutex_unlock(&ct->lock);




-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSELinux GmbH, GF: Felix Imendörffer, Jane Smithard,  Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-04 16:45     ` Benjamin Marzinski
@ 2018-10-05 10:25       ` Martin Wilck
  2018-10-05 17:10         ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-05 10:25 UTC (permalink / raw)
  To: Benjamin Marzinski, mwilck+gmail
  Cc: Bart Van Assche, device-mapper development

On Thu, 2018-10-04 at 11:45 -0500,  Benjamin Marzinski  wrote:
> On Mon, Oct 01, 2018 at 10:09:41PM +0200, Martin Wilck wrote:
> > On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > > tur_devt() locks ct->lock. However, it is ocassionally called
> > > while
> > > ct->lock is already locked. In reality, there is no reason why we
> > > need
> > > to lock all the accesses to ct->devt. The tur checker only needs
> > > to
> > > write to this variable one time, when it first gets the file
> > > descripter
> > > that it is checking.  It also never uses ct->devt directly.
> > > Instead,
> > > it
> > > always graps the major and minor, and turns them into a string.
> > > This
> > > patch changes ct->devt into that string, and sets it in
> > > libcheck_init()
> > > when it is first initializing the checker context. After that,
> > > ct-
> > > > devt
> > > 
> > > is only ever read.
> > 
> > I like the lock removal a lot, but not so much the conversion into
> > a
> > string. Why not keep the dev_t? 
> 
> Because we will simply convert it into a string every time we use it,
> instead of doing the work one time. It's 24 more bytes in the
> tur_checker_context, but the code is easier to read, and we're not
> doing
> the same work again and again.

Well, IMO snprintf(buf, N, "%d:%d", major(x), minor(x)) is not _that_
much work, and it looks a lot cleaner, to me at least.

But OK, I'm not insisting on this. So, if you re-post, I'm going to ack
the patch.

My thought for long term is is this actually this: ct->dev_t is only
needed for creating messages to be stored in the checker->message
string, which I don't like either.

Why don't we replace the "message" field with an "int msgid", and
add a 

     char* (*get_message)(int msgid)

to the checker API?

msgid could actually be another "uatomic" field. We could even go one
step further, create a field for the checker status in struct checker,
and use like one byte for the general path status and the rest of the
field for the msgid, which usually represents just some checker-
specific extra information wrt the status.

> > 
> > Or maybe even easier, the other way around: why don't we make it a
> > char* and simply set checker->dev_t = &pp->dev_t?
> 
> The whole reason we have tur_checker_context->holders is that it's
> possible for a path to be removed (or orphaned) while the thread is
> still running. The tur_checker_context needs to keep all its own
> storage, so that it never as to worry about pointing to freed memory.

Yeah, I got that meanwhile, as posted before. Sorry.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-10-05 10:11       ` Martin Wilck
@ 2018-10-05 17:02         ` Benjamin Marzinski
  2018-10-05 19:23           ` Martin Wilck
  0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-05 17:02 UTC (permalink / raw)
  To: Martin Wilck; +Cc: mwilck+gmail, device-mapper development

On Fri, Oct 05, 2018 at 12:11:18PM +0200, Martin Wilck wrote:
> On Thu, 2018-10-04 at 11:31 -0500,  Benjamin Marzinski  wrote:
> > It's been a while, and I'm not exactly sure what I was thinking here.
> > But IIRC the idea was that if the state isn't set yet, then the old
> > thread could mess with the results of a future thread. Also, it was
> > to
> > avoid a corner case where the tur checker was called, started the
> > thread, got the result before the checker exitted, cancelled the
> > thread
> > and returned the result and then was called very shortly afterwards,
> > before the thread could clean itself up.  In that case, the right
> > thing
> > to do (I thought) would be to start a new thread, because the other
> > one
> > would be ending soon.  In truth, starting a new thread at all is
> > probably a bad idea, since the old thread will still mess with the
> > checker context on exit. 
> > 
> > A better idea might be to simply fail back to a syncronous call to
> > tur_check() when you notice a cancelled thread that hasn't exitted.
> > That can cause all the other checkers to get delayed, but at least in
> > that case you are still checking paths. The other option is to do as
> > before and just not check this path, which won't effect the other
> > checkers. It seems very unlikely that the thread could remain
> > uncancelled forever, especially if the path was usable.
> >
> > thoughts?
> 
> Generally speaking, we're deeply in the realm of highly unlikely
> situations I would say. But we should get it right eventually.
> 
> Maybe we can add logic to the tur thread to keep its hands off the
> context if it's a "zombie", like below (just a thought, untested)?

This still wouldn't stop a thread from racing with new thread creation
to change ct->holders or ct->running.

-Ben

> Martin
> 
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index bf8486d3..e8493ca8 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -219,11 +219,18 @@ static void cleanup_func(void *data)
>  	rcu_unregister_thread();
>  }
>  
> +#define tur_thread_quit_unless_owner(__ct)	   \
> +	if (__ct->thread != pthread_self()) {	   \
> +		pthread_mutex_unlock(&__ct->lock); \
> +		pthread_exit(NULL);		   \
> +	}
> +
>  static void copy_msg_to_tcc(void *ct_p, const char *msg)
>  {
>  	struct tur_checker_context *ct = ct_p;
>  
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	strlcpy(ct->message, msg, sizeof(ct->message));
>  	pthread_mutex_unlock(&ct->lock);
>  }
> @@ -243,6 +250,7 @@ static void *tur_thread(void *ctx)
>  
>  	/* TUR checker start up */
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	ct->state = PATH_PENDING;
>  	ct->message[0] = '\0';
>  	pthread_mutex_unlock(&ct->lock);
> @@ -252,6 +260,7 @@ static void *tur_thread(void *ctx)
>  
>  	/* TUR checker done */
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	ct->state = state;
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
> 
> 
> 
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSELinux GmbH, GF: Felix Imendörffer, Jane Smithard,  Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-05 10:25       ` Martin Wilck
@ 2018-10-05 17:10         ` Benjamin Marzinski
  2018-10-05 19:07           ` Martin Wilck
  0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-05 17:10 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Bart Van Assche, mwilck+gmail, device-mapper development

On Fri, Oct 05, 2018 at 12:25:15PM +0200, Martin Wilck wrote:
> On Thu, 2018-10-04 at 11:45 -0500,  Benjamin Marzinski  wrote:
> > On Mon, Oct 01, 2018 at 10:09:41PM +0200, Martin Wilck wrote:
> > > On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > > > tur_devt() locks ct->lock. However, it is ocassionally called
> > > > while
> > > > ct->lock is already locked. In reality, there is no reason why we
> > > > need
> > > > to lock all the accesses to ct->devt. The tur checker only needs
> > > > to
> > > > write to this variable one time, when it first gets the file
> > > > descripter
> > > > that it is checking.  It also never uses ct->devt directly.
> > > > Instead,
> > > > it
> > > > always graps the major and minor, and turns them into a string.
> > > > This
> > > > patch changes ct->devt into that string, and sets it in
> > > > libcheck_init()
> > > > when it is first initializing the checker context. After that,
> > > > ct-
> > > > > devt
> > > > 
> > > > is only ever read.
> > > 
> > > I like the lock removal a lot, but not so much the conversion into
> > > a
> > > string. Why not keep the dev_t? 
> > 
> > Because we will simply convert it into a string every time we use it,
> > instead of doing the work one time. It's 24 more bytes in the
> > tur_checker_context, but the code is easier to read, and we're not
> > doing
> > the same work again and again.
> 
> Well, IMO snprintf(buf, N, "%d:%d", major(x), minor(x)) is not _that_
> much work, and it looks a lot cleaner, to me at least.
> 
> But OK, I'm not insisting on this. So, if you re-post, I'm going to ack
> the patch.
> 
> My thought for long term is is this actually this: ct->dev_t is only
> needed for creating messages to be stored in the checker->message
> string, which I don't like either.
> 
> Why don't we replace the "message" field with an "int msgid", and
> add a 
> 
>      char* (*get_message)(int msgid)
> 
> to the checker API?

I would not have a problem with a patch that did this.

-Ben

> 
> msgid could actually be another "uatomic" field. We could even go one
> step further, create a field for the checker status in struct checker,
> and use like one byte for the general path status and the rest of the
> field for the msgid, which usually represents just some checker-
> specific extra information wrt the status.
> 
> > > 
> > > Or maybe even easier, the other way around: why don't we make it a
> > > char* and simply set checker->dev_t = &pp->dev_t?
> > 
> > The whole reason we have tur_checker_context->holders is that it's
> > possible for a path to be removed (or orphaned) while the thread is
> > still running. The tur_checker_context needs to keep all its own
> > storage, so that it never as to worry about pointing to freed memory.
> 
> Yeah, I got that meanwhile, as posted before. Sorry.
> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 02/19] libmultipath: fix tur checker double locking
  2018-10-05 17:10         ` Benjamin Marzinski
@ 2018-10-05 19:07           ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-05 19:07 UTC (permalink / raw)
  To: Benjamin Marzinski, mwilck+gmail
  Cc: Bart Van Assche, device-mapper development

On Fri, 2018-10-05 at 12:10 -0500,  Benjamin Marzinski  wrote:
> On Fri, Oct 05, 2018 at 12:25:15PM +0200, Martin Wilck wrote:
> > On Thu, 2018-10-04 at 11:45 -0500,  Benjamin Marzinski  wrote:
> > > On Mon, Oct 01, 2018 at 10:09:41PM +0200, Martin Wilck wrote:
> > > > On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > > > > tur_devt() locks ct->lock. However, it is ocassionally called
> > > > > while
> > > > > ct->lock is already locked. In reality, there is no reason
> > > > > why we
> > > > > need
> > > > > to lock all the accesses to ct->devt. The tur checker only
> > > > > needs
> > > > > to
> > > > > write to this variable one time, when it first gets the file
> > > > > descripter
> > > > > that it is checking.  It also never uses ct->devt directly.
> > > > > Instead,
> > > > > it
> > > > > always graps the major and minor, and turns them into a
> > > > > string.
> > > > > This
> > > > > patch changes ct->devt into that string, and sets it in
> > > > > libcheck_init()
> > > > > when it is first initializing the checker context. After
> > > > > that,
> > > > > ct-
> > > > > > devt
> > > > > 
> > > > > is only ever read.
> > > > 
> > > > I like the lock removal a lot, but not so much the conversion
> > > > into
> > > > a
> > > > string. Why not keep the dev_t? 
> > > 
> > > Because we will simply convert it into a string every time we use
> > > it,
> > > instead of doing the work one time. It's 24 more bytes in the
> > > tur_checker_context, but the code is easier to read, and we're
> > > not
> > > doing
> > > the same work again and again.
> > 
> > Well, IMO snprintf(buf, N, "%d:%d", major(x), minor(x)) is not
> > _that_
> > much work, and it looks a lot cleaner, to me at least.
> > 
> > But OK, I'm not insisting on this. So, if you re-post, I'm going to
> > ack
> > the patch.
> > 
> > My thought for long term is is this actually this: ct->dev_t is
> > only
> > needed for creating messages to be stored in the checker->message
> > string, which I don't like either.
> > 
> > Why don't we replace the "message" field with an "int msgid", and
> > add a 
> > 
> >      char* (*get_message)(int msgid)
> > 
> > to the checker API?
> 
> I would not have a problem with a patch that did this.

Let's get your series finalized, and possibly merged, first.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
  2018-10-05 17:02         ` Benjamin Marzinski
@ 2018-10-05 19:23           ` Martin Wilck
  0 siblings, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-05 19:23 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

On Fri, 2018-10-05 at 12:02 -0500, Benjamin Marzinski wrote:
> On Fri, Oct 05, 2018 at 12:11:18PM +0200, Martin Wilck wrote:
> > On Thu, 2018-10-04 at 11:31 -0500,  Benjamin Marzinski  wrote:
> > > 
> > > thoughts?
> > 
> > Generally speaking, we're deeply in the realm of highly unlikely
> > situations I would say. But we should get it right eventually.
> > 
> > Maybe we can add logic to the tur thread to keep its hands off the
> > context if it's a "zombie", like below (just a thought, untested)?
> 
> This still wouldn't stop a thread from racing with new thread
> creation
> to change ct->holders or ct->running.

I don't see a problem with ct->holders. Being a refcount, it seems to
be made quite for this purpose. The zombie thread _must_ decrement it
when it eventually exits. Wrt ct->running, we may have to move it under
the lock again if we want to be absolutely sure. 

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-10-02 22:37     ` Martin Wilck
@ 2018-10-05 19:38       ` Benjamin Marzinski
  2018-10-08  9:41         ` Martin Wilck
  0 siblings, 1 reply; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-05 19:38 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:
> Hi Ben,
> 
> > I am uncertain about this one. The old code sets pp->initialized to
> > INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> > INIT_REQUESTED_UDEV before, this patch might change how the code
> > behaves later in check_path(), where these conditions are checked.
> > 
> > Likewise, tests for strlen(pp->wwid) are used in various places
> > around
> > the code. These tests would now yield different results for paths in
> > "recoverable error" state.
> > 
> > Have you considered these possible side effects?
> 
> I've pondered over this a lot. The dust is clearing up a bit.
> 
> 1. With your patch in place, INIT_FAILED is never set except in
> alloc_path() (we might rename it to INIT_NEW or the like, but see
> below).

Correct. The idea is that INIT_FAILED means that the path has never been
fully initialized and that it is missing more than simply the wwid
(which would set INIT_MISSING_UDEV instead).

> 2. I don't understand how you handle repeated failure to retrieve the
> WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
> retrieve the WWID from sysfs after retriggers are exhausted. But I
> don't see how pathinfo(DI_WWID) would ever be called in this situation:
>
> In the last invocation, pathinfo() had failed to retrieve the WWID and
> set pp->initialized = INIT_MISSING_UDEV. There it will remain because
> check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
> are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
> more from the "add missing path" code, because of the (pp->initialized
> != INIT_MISSING_UDEV) condition.
> 
> Am I overlooking something?

Your analysis looks right. This shouldn't have anything to do with the
correctness of the not blanking initialized paths, but we should fix it
anyway.  Right now, as you mentioned, if multipathd fails to get the
wwid after retrigger_tries new uevents, it only has one shot at the
failback methods, and then mutipathd cannot use that path in the future.
However, it will still check if the path is up or down, which is pretty
pointless. It should either

1. set pp->intialized to INIT_OK, to keep it from being checked at all
in the future, or

2. try to grab the missing pathinfo, so that it can use the path.

The benefit of doing the first is that mutipathd doesn't keep messing
with paths that really aren't set up to be multipathed. On the other
hand it might ignore some paths that really should be multipathed.
perhaps the best idea is to keep trying to get the missing info, but
just increase pp->tick (possibly progressively) so we try less often.

> 3. If "blank" state means that important device information couldn't be
> retrieved because of presumably transient failure conditions, we should
> retry to retrieve this information by calling pathinfo again later. But
> unless the WWID is (reset to) the empty string, check_path() won't call
> pathinfo(DI_ALL) any more.

But if we set the wwid at some point in time, that means we got all the
necessary information.  My argument is that if something happens to
cause a later pathinfo() call to fail, we shouldn't pretend like we
didn't already get this info.
 
> 4. The "blank" logic in pathinfo() combines several very different
> cases.
>   a) PATH_REMOVED status from path_offline(). This means that
> elementary sysfs attributes were missing. This is almost the same as
> failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
> status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
> around.

True, but I've always assumed that this means that we will be getting a
uevent to remove this path momentarily, so I doesn't really matter what
we do with it. We could blank the path here, but if we already go the
necessary info, it seems like just setting the path to PATH_DOWN, and
waiting for the remove uevent should be fine.

>   b) Failure in checker_check(). If the path is offline in the first
> place, the checker isn't called, and WWID determination is attempted.
> But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
> state.

This is clearly the wrong thing to do.

>   c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
> functions never fail, so this can't happen. I've patches here to fix
> that.

I just ignored this state because it was impossible. But even with your
patches, my argument remains the same. If we already got this info,
failing to get it again shouldn't cause us to delete the wwid. We should
probably remember the existing values, to make sure that we don't lose
them if the calls fail for some reason.

>   d) Failure to open pp->fd. 

This is clearly the right thing to do.

> d) is the only case in which the "blank" logic makes really sense to
> me. It can happen only at the first pathinfo() invocation, meaning 
> pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
> would change nothing for this case.
> 
> a) and b) can happen for paths that have been initialized already. I
> think in case a) the WWID should be reset, probably initialized should
> be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
> b) we should IMO proceed normally rather than goto "blank". Resetting
> the WWID in case b) is nonsense, agreed.
> 
> Altogether, if my analysis is correct, your patch (not blanking the
> WWID) should be applied to case b) only.

I don't really care about blanking the wwid in case a). I just think
it's unnecessary.  Why do you think we should blank it in case c) if we
have previously gotten those values?

-Ben

> Please comment - I still feel a bit confused and may have overlooked
> something essential.
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 19/19] libmultipath: Fixup updating paths
  2018-10-01 22:30   ` Martin Wilck
@ 2018-10-05 20:32     ` Benjamin Marzinski
  0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-05 20:32 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Tue, Oct 02, 2018 at 12:30:51AM +0200, Martin Wilck wrote:
> On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> > Commit 582c56cc broke some code paths in uev_update_path. First, it
> > changed the handling of paths that were not fully initialized.
> > uev_update_path was simply setting the wwids for all of these paths.
> > Instead, it should ignore the ones that had not requested a new
> > uevent.
> > These paths are likely down, and are already getting handled by
> > check_path, after it verifies that they have become active. Also,
> > setting the wwid doesn't update all of the other information that
> > may have been missed when the path was initially added.
> > 
> > Also, it wasn't possible for pp->wwid_changed to transition back to
> > zero, unless the path's wwid was empty, in which case there was no
> > reason to worry about the wwid change in the first place, since the
> > path
> > hadn't been fully initialized yet. So, even if a path's wwid changed
> > and
> > then changed back to the original value, the path still could not be
> > used.
> > 
> > This patch fixes these issues, and also moves the check for paths
> > that
> > have requested a new uevent up in the functions. These paths will get
> > fully reinitialized anyway, so there is no reason to do all the other
> > work first.
> > 
> > Fixes: 582c56cc ("libmultipath: uev_update_path: always warn if WWID
> > changed")
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> 
> Thanks for fixing this. Looking at the cleaned-up logic after this
> patch, I believe that we should hard-code "disable_changed_wwids" to 1.
> It's crazy to realize that a path WWID has changed and simply go on as
> if nothing had happened. We shouldn't allow this configuration any
> more.

I'm fine with defaulting disable_changed_wwids to 1.

-Ben

> 
> Reviewed-by: Martin Wilck <mwilck@suse.com>
> 
> > ---
> >  multipathd/main.c | 24 ++++++++++++++----------
> >  1 file changed, 14 insertions(+), 10 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 61ca455..d6d122a 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1207,6 +1207,15 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  		struct multipath *mpp = pp->mpp;
> >  		char wwid[WWID_SIZE];
> >  
> > +		if (pp->initialized == INIT_REQUESTED_UDEV) {
> > +			needs_reinit = 1;
> > +			goto out;
> > +		}
> > +		/* Don't deal with other types of failed initialization
> > +		 * now. check_path will handle it */
> > +		if (!strlen(pp->wwid))
> > +			goto out;
> > +
> >  		strcpy(wwid, pp->wwid);
> >  		get_uid(pp, pp->state, uev->udev);
> >  
> > @@ -1215,9 +1224,8 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  				uev->kernel, wwid, pp->wwid,
> >  				(disable_changed_wwids ? "disallowing"
> > :
> >  				 "continuing"));
> > -			if (disable_changed_wwids &&
> > -			    (strlen(wwid) || pp->wwid_changed)) {
> > -				strcpy(pp->wwid, wwid);
> > +			strcpy(pp->wwid, wwid);
> > +			if (disable_changed_wwids) {
> >  				if (!pp->wwid_changed) {
> >  					pp->wwid_changed = 1;
> >  					pp->tick = 1;
> > @@ -1225,11 +1233,9 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  						dm_fail_path(pp->mpp-
> > >alias, pp->dev_t);
> >  				}
> >  				goto out;
> > -			} else if (!disable_changed_wwids)
> > -				strcpy(pp->wwid, wwid);
> > -			else
> > -				pp->wwid_changed = 0;
> > +			}
> >  		} else {
> > +			pp->wwid_changed = 0;
> >  			udev_device_unref(pp->udev);
> >  			pp->udev = udev_device_ref(uev->udev);
> >  			conf = get_multipath_config();
> > @@ -1240,9 +1246,7 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> >  			pthread_cleanup_pop(1);
> >  		}
> >  
> > -		if (pp->initialized == INIT_REQUESTED_UDEV)
> > -			needs_reinit = 1;
> > -		else if (mpp && ro >= 0) {
> > +		if (mpp && ro >= 0) {
> >  			condlog(2, "%s: update path write_protect to
> > '%d' (uevent)", uev->kernel, ro);
> >  
> >  			if (mpp->wait_for_udev)
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 00/19] Misc Multipath patches
  2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
                   ` (18 preceding siblings ...)
  2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
@ 2018-10-07  8:36 ` Christophe Varoqui
  2018-10-09 16:13   ` Benjamin Marzinski
  19 siblings, 1 reply; 58+ messages in thread
From: Christophe Varoqui @ 2018-10-07  8:36 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development, Martin Wilck


[-- Attachment #1.1: Type: text/plain, Size: 2763 bytes --]

Hi Ben,

this v3 patchset is ready for merge or do you plan a v4 ?

Thanks,
Christophe

On Sat, Sep 22, 2018 at 1:05 AM Benjamin Marzinski <bmarzins@redhat.com>
wrote:

> This batch of patches is a resend of the non-merged pathes from my last
> set, along with 2 new ones. It would be really great if we could get
> a version bug after the outstanding commits go it.
>
> Patches 0001-0005 are a number of fixes to the tur checker.These are
>         the ones that should get the most attention.
> Patches 0006-0017 are minor issues found by coverity.
>         Not all of them are bugs that could actually occur in practice,
>         but they are simple and hopefully non-controversial changes.
> Patches 0018-0019 are new and related to changing path wwids
>
> Changes in v3
>         added patches 0018-0019
>
> Changes in v2
>         0002-libmultipath-fix-tur-checker-double-locking.patch now sets
>         ct->devt when initially creating the tur_checker_context, while
>         that structure is still only referenced by a local variable.
>         After that, ct->devt is only ever read. This should remove any
>         issues with it needing locking.
>
> Benjamin Marzinski (19):
>   libmultipath: fix tur checker timeout
>   libmultipath: fix tur checker double locking
>   libmultipath: fix tur memory misuse
>   libmultipath: cleanup tur locking
>   libmultipath: fix tur checker timeout issue
>   libmultipath: fix set_int error path
>   libmultipath: fix length issues in get_vpd_sgio
>   libmultipath: _install_keyword cleanup
>   libmultipath: remove unused code
>   libmultipath: fix memory issue in path_latency prio
>   libmultipath: fix null dereference int alloc_path_group
>   libmutipath: don't use malformed uevents
>   multipath: fix max array size in print_cmd_valid
>   multipathd: function return value tweaks
>   multipathd: minor fixes
>   multipathd: remove useless check and fix format
>   multipathd: fix memory leak on error in configure
>   libmultipath: Don't blank intialized paths
>   libmultipath: Fixup updating paths
>
>  libmultipath/checkers/tur.c              | 168
> +++++++++++--------------------
>  libmultipath/dict.c                      |   5 +-
>  libmultipath/discovery.c                 |  18 ++--
>  libmultipath/parser.c                    |  12 ++-
>  libmultipath/print.c                     |   8 --
>  libmultipath/prioritizers/path_latency.c |   3 +-
>  libmultipath/structs.c                   |   2 +-
>  libmultipath/uevent.c                    |   6 ++
>  multipath/main.c                         |   2 +-
>  multipathd/cli_handlers.c                |  11 +-
>  multipathd/main.c                        |  50 +++++----
>  11 files changed, 131 insertions(+), 154 deletions(-)
>
> --
> 2.7.4
>
>

[-- Attachment #1.2: Type: text/html, Size: 3438 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
  2018-10-01 22:00   ` Martin Wilck
@ 2018-10-08  9:35   ` Martin Wilck
  1 sibling, 0 replies; 58+ messages in thread
From: Martin Wilck @ 2018-10-08  9:35 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-09-21 at 18:05 -0500,  Benjamin Marzinski  wrote:
> When pathinfo fails for some likely transient reason, it clears the
> path
> wwid, but otherwise returns successfully, to keep the path around but
> not usable until it gets fully initialized. However, if the path has
> already been initialized, and pathinfo hits a transient error, it
> shouldn't clear the wwid.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

After follow-up discussion:

Reviewed-by: Martin Wilck <mwilck@suse.com>


> ---
>  libmultipath/discovery.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
> index 3e0db7f..33815dc 100644
> --- a/libmultipath/discovery.c
> +++ b/libmultipath/discovery.c
> @@ -1991,9 +1991,9 @@ blank:
>  	/*
>  	 * Recoverable error, for example faulty or offline path
>  	 */
> -	memset(pp->wwid, 0, WWID_SIZE);
>  	pp->chkrstate = pp->state = PATH_DOWN;
> -	pp->initialized = INIT_FAILED;
> +	if (pp->initialized == INIT_FAILED)
> +		memset(pp->wwid, 0, WWID_SIZE);
>  
>  	return PATHINFO_OK;
>  }

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-10-05 19:38       ` Benjamin Marzinski
@ 2018-10-08  9:41         ` Martin Wilck
  2018-10-09 22:20           ` Benjamin Marzinski
  0 siblings, 1 reply; 58+ messages in thread
From: Martin Wilck @ 2018-10-08  9:41 UTC (permalink / raw)
  To: dm-devel, Benjamin Marzinski

On Fri, 2018-10-05 at 14:38 -0500,  Benjamin Marzinski  wrote:
> On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:
> > Hi Ben,
> > 
> > > I am uncertain about this one. The old code sets pp->initialized
> > > to
> > > INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> > > INIT_REQUESTED_UDEV before, this patch might change how the code
> > > behaves later in check_path(), where these conditions are
> > > checked.
> > > 
> > > Likewise, tests for strlen(pp->wwid) are used in various places
> > > around
> > > the code. These tests would now yield different results for paths
> > > in
> > > "recoverable error" state.
> > > 
> > > Have you considered these possible side effects?
> > 
> > I've pondered over this a lot. The dust is clearing up a bit.
> > 
> > 1. With your patch in place, INIT_FAILED is never set except in
> > alloc_path() (we might rename it to INIT_NEW or the like, but see
> > below).
> 
> Correct. The idea is that INIT_FAILED means that the path has never
> been
> fully initialized and that it is missing more than simply the wwid
> (which would set INIT_MISSING_UDEV instead).
> 
> > 2. I don't understand how you handle repeated failure to retrieve
> > the
> > WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
> > retrieve the WWID from sysfs after retriggers are exhausted. But I
> > don't see how pathinfo(DI_WWID) would ever be called in this
> > situation:
> > 
> > In the last invocation, pathinfo() had failed to retrieve the WWID
> > and
> > set pp->initialized = INIT_MISSING_UDEV. There it will remain
> > because
> > check_path() won't set it to INIT_REQUESTED_UDEV any more after
> > retries
> > are exhausted. And now, check_path() won't call pathinfo(DI_ALL)
> > any
> > more from the "add missing path" code, because of the (pp-
> > >initialized
> > != INIT_MISSING_UDEV) condition.
> > 
> > Am I overlooking something?
> 
> Your analysis looks right. This shouldn't have anything to do with
> the
> correctness of the not blanking initialized paths, but we should fix
> it
> anyway.  Right now, as you mentioned, if multipathd fails to get the
> wwid after retrigger_tries new uevents, it only has one shot at the
> failback methods, and then mutipathd cannot use that path in the
> future.
> However, it will still check if the path is up or down, which is
> pretty
> pointless. It should either
> 
> 1. set pp->intialized to INIT_OK, to keep it from being checked at
> all
> in the future, or

That would be lying to ourselves. I'd rather call it INIT_BROKEN or so,
or maybe even delete the path from pathvec altogether.

[ While we're at this: we call pathinfo with DI_ALL from 
configure->path_discovery(), but with DI_ALL|DI_BLACKLIST from
uev_add_path(). As a result, paths discovered in uevents won't be added
to pathvec in the first place if they're blacklisted, while blacklisted
paths discovered at startup will be in the pathvec, but orphaned. Is
that intentional, and if yes, why ? ]

> 2. try to grab the missing pathinfo, so that it can use the path.
> 
> The benefit of doing the first is that mutipathd doesn't keep messing
> with paths that really aren't set up to be multipathed. On the other
> hand it might ignore some paths that really should be multipathed.
> perhaps the best idea is to keep trying to get the missing info, but
> just increase pp->tick (possibly progressively) so we try less often.

The latter sounds overly complex to me, in particular as in the daemon,
we're only looking at paths retrieved via udev anyway, as you pointed
out in your other post. So perhaps INIT_MISSING_UDEV is just about the
right state for this. We may just consider changing uev_update_path()
such that the path is passed to uev_add_path() not only for
INIT_REQUESTED_UDEV, but also for INIT_MISSING_UDEV state.

> 
> > 3. If "blank" state means that important device information
> > couldn't be
> > retrieved because of presumably transient failure conditions, we
> > should
> > retry to retrieve this information by calling pathinfo again later.
> > But
> > unless the WWID is (reset to) the empty string, check_path() won't
> > call
> > pathinfo(DI_ALL) any more.
> 
> But if we set the wwid at some point in time, that means we got all
> the
> necessary information.  My argument is that if something happens to
> cause a later pathinfo() call to fail, we shouldn't pretend like we
> didn't already get this info.

The question is whether the problem encountered means that the earlier
determined WWID is still reliable. If the path is entirely gone from
sysfs (case a) below), I wouldn't bet on that. In this case we rely on
the assumption that we'll receive an uevent later - likely but not
guaranteed.

OTOH, by blanking the WWID, we forfeit the opportunity to detect a WWID
change later - with that in mind, blanking is actually wrong in this
situation, and your patch should be an improvement.

>  
> > 4. The "blank" logic in pathinfo() combines several very different
> > cases.
> >   a) PATH_REMOVED status from path_offline(). This means that
> > elementary sysfs attributes were missing. This is almost the same
> > as
> > failure in sysfs_pathinfo(), which results in PATHINFO_FAILED
> > return
> > status; but for PATH_REMOVED we return PATHINFO_OK and keep the
> > path
> > around.
> 
> True, but I've always assumed that this means that we will be getting
> a
> uevent to remove this path momentarily, so I doesn't really matter
> what
> we do with it. We could blank the path here, but if we already go the
> necessary info, it seems like just setting the path to PATH_DOWN, and
> waiting for the remove uevent should be fine.

Perhaps. I just wanted to point out the different treatment of two very
similar scenarios.

> >   b) Failure in checker_check(). If the path is offline in the
> > first
> > place, the checker isn't called, and WWID determination is
> > attempted.
> > But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto
> > "blank"
> > state.
> 
> This is clearly the wrong thing to do.
> 
> >   c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo().
> > Both
> > functions never fail, so this can't happen. I've patches here to
> > fix
> > that.
> 
> I just ignored this state because it was impossible. But even with
> your
> patches, my argument remains the same. If we already got this info,
> failing to get it again shouldn't cause us to delete the wwid. We
> should
> probably remember the existing values, to make sure that we don't
> lose
> them if the calls fail for some reason.

My patches wouldn't change a thing here. They just turn these two
functions to void type.

> 
> >   d) Failure to open pp->fd. 
> 
> This is clearly the right thing to do.
> 
> > d) is the only case in which the "blank" logic makes really sense
> > to
> > me. It can happen only at the first pathinfo() invocation, meaning 
> > pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your
> > patch
> > would change nothing for this case.
> > 
> > a) and b) can happen for paths that have been initialized already.
> > I
> > think in case a) the WWID should be reset, probably initialized
> > should
> > be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In
> > case
> > b) we should IMO proceed normally rather than goto "blank".
> > Resetting
> > the WWID in case b) is nonsense, agreed.
> > 
> > Altogether, if my analysis is correct, your patch (not blanking the
> > WWID) should be applied to case b) only.
> 
> I don't really care about blanking the wwid in case a). I just think
> it's unnecessary.  Why do you think we should blank it in case c) if
> we
> have previously gotten those values?

Summarizing, 

 a) you convinced me about that one,
 b) your patch is good, but we may need additional changes in a follow-
up patch,
 c) doesn't happen,
 d) I agreed in the first place.

And my concern about INIT_MISSING_UDEV was orthogonal to your patch
anyway. So I'll give your patch a reviewed-by. Sorry for the noise. I
guess you'll concede the handling of path states in multipath-tools is
non-trivial. 

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 00/19] Misc Multipath patches
  2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
@ 2018-10-09 16:13   ` Benjamin Marzinski
  0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-09 16:13 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: device-mapper development, Martin Wilck

On Sun, Oct 07, 2018 at 10:36:41AM +0200, Christophe Varoqui wrote:

I'll send a v4, hopefully today.

-Ben

>    Hi Ben,
>    this v3 patchset is ready for merge or do you plan a v4 ?
>    Thanks,
>    Christophe
>    On Sat, Sep 22, 2018 at 1:05 AM Benjamin Marzinski
>    <[1]bmarzins@redhat.com> wrote:
> 
>      This batch of patches is a resend of the non-merged pathes from my last
>      set, along with 2 new ones. It would be really great if we could get
>      a version bug after the outstanding commits go it.
> 
>      Patches 0001-0005 are a number of fixes to the tur checker.These are
>              the ones that should get the most attention.
>      Patches 0006-0017 are minor issues found by coverity.
>              Not all of them are bugs that could actually occur in practice,
>              but they are simple and hopefully non-controversial changes.
>      Patches 0018-0019 are new and related to changing path wwids
> 
>      Changes in v3
>              added patches 0018-0019
> 
>      Changes in v2
>              0002-libmultipath-fix-tur-checker-double-locking.patch now sets
>              ct->devt when initially creating the tur_checker_context, while
>              that structure is still only referenced by a local variable.
>              After that, ct->devt is only ever read. This should remove any
>              issues with it needing locking.
> 
>      Benjamin Marzinski (19):
>        libmultipath: fix tur checker timeout
>        libmultipath: fix tur checker double locking
>        libmultipath: fix tur memory misuse
>        libmultipath: cleanup tur locking
>        libmultipath: fix tur checker timeout issue
>        libmultipath: fix set_int error path
>        libmultipath: fix length issues in get_vpd_sgio
>        libmultipath: _install_keyword cleanup
>        libmultipath: remove unused code
>        libmultipath: fix memory issue in path_latency prio
>        libmultipath: fix null dereference int alloc_path_group
>        libmutipath: don't use malformed uevents
>        multipath: fix max array size in print_cmd_valid
>        multipathd: function return value tweaks
>        multipathd: minor fixes
>        multipathd: remove useless check and fix format
>        multipathd: fix memory leak on error in configure
>        libmultipath: Don't blank intialized paths
>        libmultipath: Fixup updating paths
> 
>       libmultipath/checkers/tur.c              | 168
>      +++++++++++--------------------
>       libmultipath/dict.c                      |   5 +-
>       libmultipath/discovery.c                 |  18 ++--
>       libmultipath/parser.c                    |  12 ++-
>       libmultipath/print.c                     |   8 --
>       libmultipath/prioritizers/path_latency.c |   3 +-
>       libmultipath/structs.c                   |   2 +-
>       libmultipath/uevent.c                    |   6 ++
>       multipath/main.c                         |   2 +-
>       multipathd/cli_handlers.c                |  11 +-
>       multipathd/main.c                        |  50 +++++----
>       11 files changed, 131 insertions(+), 154 deletions(-)
> 
>      --
>      2.7.4
> 
> References
> 
>    Visible links
>    1. mailto:bmarzins@redhat.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
  2018-10-08  9:41         ` Martin Wilck
@ 2018-10-09 22:20           ` Benjamin Marzinski
  0 siblings, 0 replies; 58+ messages in thread
From: Benjamin Marzinski @ 2018-10-09 22:20 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Mon, Oct 08, 2018 at 11:41:35AM +0200, Martin Wilck wrote:
> On Fri, 2018-10-05 at 14:38 -0500,  Benjamin Marzinski  wrote:
> > On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:

 
> > Your analysis looks right. This shouldn't have anything to do with
> > the
> > correctness of the not blanking initialized paths, but we should fix
> > it
> > anyway.  Right now, as you mentioned, if multipathd fails to get the
> > wwid after retrigger_tries new uevents, it only has one shot at the
> > failback methods, and then mutipathd cannot use that path in the
> > future.
> > However, it will still check if the path is up or down, which is
> > pretty
> > pointless. It should either
> > 
> > 1. set pp->intialized to INIT_OK, to keep it from being checked at
> > all
> > in the future, or
> 
> That would be lying to ourselves. I'd rather call it INIT_BROKEN or so,
> or maybe even delete the path from pathvec altogether.
> 
> [ While we're at this: we call pathinfo with DI_ALL from 
> configure->path_discovery(), but with DI_ALL|DI_BLACKLIST from
> uev_add_path(). As a result, paths discovered in uevents won't be added
> to pathvec in the first place if they're blacklisted, while blacklisted
> paths discovered at startup will be in the pathvec, but orphaned. Is
> that intentional, and if yes, why ? ]

I really don't know why configure() works this way. The call to
filter_path() just after path_discovery() will remove the blacklisted
paths, so they won't actually end up in the pathvec. But it does seem
pretty pointless to not just check if they are blacklisted while
initializing them.

> > 2. try to grab the missing pathinfo, so that it can use the path.
> > 
> > The benefit of doing the first is that mutipathd doesn't keep messing
> > with paths that really aren't set up to be multipathed. On the other
> > hand it might ignore some paths that really should be multipathed.
> > perhaps the best idea is to keep trying to get the missing info, but
> > just increase pp->tick (possibly progressively) so we try less often.
> 
> The latter sounds overly complex to me, in particular as in the daemon,
> we're only looking at paths retrieved via udev anyway, as you pointed
> out in your other post. So perhaps INIT_MISSING_UDEV is just about the
> right state for this. We may just consider changing uev_update_path()
> such that the path is passed to uev_add_path() not only for
> INIT_REQUESTED_UDEV, but also for INIT_MISSING_UDEV state.

The thought behind not just using uev_update_path() is that it doesn't
get called very predicably. There's a chance that the path may become
usable, but we won't get any event for it.  There's also a chance that a
path that will never have a wwid (and thus shouldn't be multipathed)
will be part of a uevent storm, and multipath will do a bunch of
pointless work.  If we do the checks in the checkerloop(), we can
control exactly how often we check these paths.
 
> 
> Summarizing, 
> 
>  a) you convinced me about that one,
>  b) your patch is good, but we may need additional changes in a follow-
> up patch,
>  c) doesn't happen,
>  d) I agreed in the first place.
> 
> And my concern about INIT_MISSING_UDEV was orthogonal to your patch
> anyway. So I'll give your patch a reviewed-by. Sorry for the noise. I
> guess you'll concede the handling of path states in multipath-tools is
> non-trivial. 

Sure. We do need to figure out what to do when we really can't get a
wwid for the device, but that can happen in a different patchset.

-Ben
 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2018-10-09 22:20 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
2018-10-01 19:51   ` Martin Wilck
2018-10-04 16:31     ` Benjamin Marzinski
2018-10-05 10:11       ` Martin Wilck
2018-10-05 17:02         ` Benjamin Marzinski
2018-10-05 19:23           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
2018-10-01 20:09   ` Martin Wilck
2018-10-01 20:44     ` Martin Wilck
2018-10-04 16:47       ` Benjamin Marzinski
2018-10-04 16:45     ` Benjamin Marzinski
2018-10-05 10:25       ` Martin Wilck
2018-10-05 17:10         ` Benjamin Marzinski
2018-10-05 19:07           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
2018-10-01 20:59   ` Martin Wilck
2018-10-02  7:48     ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
2018-10-01 21:08   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
2018-10-01 21:09   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
2018-10-01 21:17   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
2018-10-01 21:25   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
2018-10-01 21:26   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
2018-10-01 21:28   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
2018-10-01 21:30   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
2018-10-01 21:33   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
2018-10-01 21:31   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
2018-10-01 21:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
2018-10-01 21:37   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
2018-10-01 21:38   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
2018-10-01 21:40   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
2018-10-01 21:42   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
2018-10-01 22:00   ` Martin Wilck
2018-10-02 22:37     ` Martin Wilck
2018-10-05 19:38       ` Benjamin Marzinski
2018-10-08  9:41         ` Martin Wilck
2018-10-09 22:20           ` Benjamin Marzinski
2018-10-08  9:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
2018-10-01 22:30   ` Martin Wilck
2018-10-05 20:32     ` Benjamin Marzinski
2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
2018-10-09 16:13   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.