All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16] multipath-tools: miscellaneous multipath patches
@ 2013-05-02 21:46 Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 01/16] Make kpartx advise modprobe instead of insmod Benjamin Marzinski
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Here are the patches I've got since my last resync.

Please apply.

Philipp Schmidt (1):
  Make kpartx correctly handle non-512 byte GPT partitions

Benjamin Marzinski (15):
  Make kpartx advise modprobe instead of insmod
  Fix print_multipath_topology for large outputs
  Don't print checker messages for ghost paths
  Fix a couple of signal issues
  Fix hardware entry matching code
  Fix some socket issues
  Avoid race between ueventloop and uevqloop
  Add existing multipath devices to wwids file on configure
  add wwids file cleanup options
  Fix max path checker timing
  Make set_multipath_wwid actually do something
  Make multipathd deal better with uninitialized paths
  Stop annoying prio_lookup warning messages
  make multipathd disable queue_without_daemon by default
  Use mapname to choose kpartx delimiter

 kpartx/gpt.c                      |   9 ++-
 kpartx/kpartx.c                   |   2 +-
 kpartx/lopart.c                   |   4 +-
 libmpathpersist/mpath_updatepr.c  |   3 +-
 libmultipath/config.c             |   2 +
 libmultipath/dict.c               |  10 ++-
 libmultipath/discovery.c          |   8 ++-
 libmultipath/log_pthread.c        |   3 +
 libmultipath/print.c              |  30 +++++++--
 libmultipath/prio.c               |   5 +-
 libmultipath/structs.h            |   2 +-
 libmultipath/structs_vec.c        |   2 +-
 libmultipath/uevent.c             |  12 ++--
 libmultipath/uxsock.c             |   4 +-
 libmultipath/wwids.c              | 130 ++++++++++++++++++++++++++++++++++++++
 libmultipath/wwids.h              |   2 +
 multipath/main.c                  |  51 +++++++++++++--
 multipath/multipath.8             |   8 ++-
 multipathd/cli.c                  |   3 +
 multipathd/cli.h                  |   2 +
 multipathd/cli_handlers.c         |  18 ++++++
 multipathd/cli_handlers.h         |   2 +
 multipathd/main.c                 |  41 ++++++++----
 multipathd/multipathd.init.redhat |  14 ++--
 24 files changed, 313 insertions(+), 54 deletions(-)

-- 
1.8.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/16] Make kpartx advise modprobe instead of insmod
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 02/16] Make kpartx correctly handle non-512 byte GPT Benjamin Marzinski
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Users usually want to use modprobe instead of insmod, since it
handles finding the correct version and dependencies.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 kpartx/lopart.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kpartx/lopart.c b/kpartx/lopart.c
index 9413ab1..79a7593 100644
--- a/kpartx/lopart.c
+++ b/kpartx/lopart.c
@@ -205,13 +205,13 @@ find_unused_loop_device (void)
 		fprintf(stderr,
 		    "mount: Could not find any loop device, and, according to %s,\n"
 		    "       this kernel does not know about the loop device.\n"
-		    "       (If so, then recompile or `insmod loop.o'.)",
+		    "       (If so, then recompile or `modprobe loop'.)",
 		      PROC_DEVICES);
 
 	    else
 		fprintf(stderr,
 		    "mount: Could not find any loop device. Maybe this kernel does not know\n"
-		    "       about the loop device (then recompile or `insmod loop.o'), or\n"
+		    "       about the loop device (then recompile or `modprobe loop'), or\n"
 		    "       maybe /dev/loop# has the wrong major number?");
 
 	} else
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/16] Make kpartx correctly handle non-512 byte GPT
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 01/16] Make kpartx advise modprobe instead of insmod Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 03/16] Fix print_multipath_topology for large outputs Benjamin Marzinski
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

The gpt code in kpartx correctly handled non-512 byte gpt
partitions right up until it was time to actually write out the
slice data. At that point it forgot to convert the logical block
address into a the proper slice offset. This patch fixes that.

Signed-off-by: Philipp Schmidt <philipp@ppc.in-berlin.de>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 kpartx/gpt.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kpartx/gpt.c b/kpartx/gpt.c
index 3082cae..0a22927 100644
--- a/kpartx/gpt.c
+++ b/kpartx/gpt.c
@@ -637,6 +637,7 @@ read_gpt_pt (int fd, struct slice all, struct slice *sp, int ns)
 	uint32_t i;
 	int n = 0;
         int last_used_index=-1;
+	int sector_size_mul = get_sector_size(fd)/512;
 
 	if (!find_valid_gpt (fd, &gpt, &ptes) || !gpt || !ptes) {
 		if (gpt)
@@ -652,9 +653,11 @@ read_gpt_pt (int fd, struct slice all, struct slice *sp, int ns)
 			sp[n].size = 0;
 			n++;
 		} else {
-			sp[n].start = __le64_to_cpu(ptes[i].starting_lba);
-			sp[n].size  = __le64_to_cpu(ptes[i].ending_lba) -
-				__le64_to_cpu(ptes[i].starting_lba) + 1;
+			sp[n].start = sector_size_mul *
+				      __le64_to_cpu(ptes[i].starting_lba);
+			sp[n].size  = sector_size_mul *
+				      (__le64_to_cpu(ptes[i].ending_lba) -
+				       __le64_to_cpu(ptes[i].starting_lba) + 1);
                         last_used_index=n;
 			n++;
 		}
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/16] Fix print_multipath_topology for large outputs
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 01/16] Make kpartx advise modprobe instead of insmod Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 02/16] Make kpartx correctly handle non-512 byte GPT Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 04/16] Don't print checker messages for ghost paths Benjamin Marzinski
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

print_multipath_topology had a hard size limit. With a large number
of LUNs and a large number of paths, it was possible to go over
this limit, and have some of the output cut off.
print_multipath_topology now checks for this, and resizes the
buffer if necessary.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/print.c | 30 ++++++++++++++++++++++++++----
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/libmultipath/print.c b/libmultipath/print.c
index 0e2c680..274f5e7 100644
--- a/libmultipath/print.c
+++ b/libmultipath/print.c
@@ -8,6 +8,8 @@
 #include <sys/stat.h>
 #include <dirent.h>
 #include <unistd.h>
+#include <string.h>
+#include <errno.h>
 
 #include "checkers.h"
 #include "vector.h"
@@ -24,6 +26,7 @@
 #include "switchgroup.h"
 #include "devmapper.h"
 #include "uevent.h"
+#include "debug.h"
 
 #define MAX(x,y) (x > y) ? x : y
 #define TAIL     (line + len - 1 - c)
@@ -754,11 +757,30 @@ snprint_pathgroup (char * line, int len, char * format,
 extern void
 print_multipath_topology (struct multipath * mpp, int verbosity)
 {
-	char buff[MAX_LINE_LEN * MAX_LINES] = {};
+	int resize;
+	char *buff = NULL;
+	char *old = NULL;
+	int len, maxlen = MAX_LINE_LEN * MAX_LINES;
+
+	buff = MALLOC(maxlen);
+	do {
+		if (!buff) {
+			if (old)
+				FREE(old);
+			condlog(0, "couldn't allocate memory for list: %s\n",
+				strerror(errno));
+			return;
+		}
+
+		len = snprint_multipath_topology(buff, maxlen, mpp, verbosity);
+		resize = (len == maxlen - 1);
 
-	memset(&buff[0], 0, MAX_LINE_LEN * MAX_LINES);
-	snprint_multipath_topology(&buff[0], MAX_LINE_LEN * MAX_LINES,
-				   mpp, verbosity);
+		if (resize) {
+			maxlen *= 2;
+			old = buff;
+			buff = REALLOC(buff, maxlen);
+		}
+	} while (resize);
 	printf("%s", buff);
 }
 
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/16] Don't print checker messages for ghost paths
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (2 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 03/16] Fix print_multipath_topology for large outputs Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 05/16] Fix a couple of signal issues Benjamin Marzinski
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Since PATH_GHOST is not an unexpected state, we don't need to
keep printing out checker messages for these paths.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 0b5fd1d..826ea75 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -898,7 +898,8 @@ get_state (struct path * pp, int daemon)
 		c->timeout = DEF_TIMEOUT;
 	state = checker_check(c);
 	condlog(3, "%s: state = %s", pp->dev, checker_state_name(state));
-	if (state != PATH_UP && strlen(checker_message(c)))
+	if (state != PATH_UP && state != PATH_GHOST &&
+	    strlen(checker_message(c)))
 		condlog(3, "%s: checker msg is \"%s\"",
 			pp->dev, checker_message(c));
 	return state;
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/16] Fix a couple of signal issues
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (3 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 04/16] Don't print checker messages for ghost paths Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-03  6:36   ` Bart Van Assche
  2013-05-02 21:46 ` [PATCH 06/16] Fix hardware entry matching code Benjamin Marzinski
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

The patch cleans up some signal issues.
First, when the vecs locking around reconfigure got shuffled
around earlier, it was removed from sighup. This patch restores
that.

Second, a new sigusr1 handler was created. However the existing
one was never removed.  Since signal handlers are per-process, and
not per-thread, the original handler will get overwritten by the
new one, so this patch deletes the original handler.

Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
However, these signals weren't being blocked before threads locked
those locks.  This patch blocks those signals while those locks
are being taken to avoid locking deadlocks.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/log_pthread.c |  3 +++
 multipathd/main.c          | 12 +++++-------
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/libmultipath/log_pthread.c b/libmultipath/log_pthread.c
index b8f9119..b3daf8f 100644
--- a/libmultipath/log_pthread.c
+++ b/libmultipath/log_pthread.c
@@ -55,14 +55,17 @@ void log_safe (int prio, const char * fmt, va_list ap)
 
 void log_thread_flush (void)
 {
+	sigset_t old;
 	int empty;
 
 	do {
+		block_signal(SIGUSR1, &old);
 		pthread_mutex_lock(&logq_lock);
 		empty = log_dequeue(la->buff);
 		pthread_mutex_unlock(&logq_lock);
 		if (!empty)
 			log_syslog(la->buff);
+		pthread_sigmask(SIG_SETMASK, &old, NULL);
 	} while (empty == 0);
 }
 
diff --git a/multipathd/main.c b/multipathd/main.c
index 95264fc..f6e68e8 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1467,7 +1467,9 @@ sighup (int sig)
 	if (running_state != DAEMON_RUNNING)
 		return;
 
+	lock(gvecs->lock);
 	reconfigure(gvecs);
+	unlock(gvecs->lock);
 
 #ifdef _DEBUG_
 	dbg_free_final(NULL);
@@ -1481,16 +1483,9 @@ sigend (int sig)
 }
 
 static void
-sigusr1 (int sig)
-{
-	condlog(3, "SIGUSR1 received");
-}
-
-static void
 signal_init(void)
 {
 	signal_set(SIGHUP, sighup);
-	signal_set(SIGUSR1, sigusr1);
 	signal_set(SIGINT, sigend);
 	signal_set(SIGTERM, sigend);
 	signal(SIGPIPE, SIG_IGN);
@@ -1646,6 +1641,7 @@ child (void * param)
 	 */
 	running_state = DAEMON_CONFIGURE;
 
+	block_signal(SIGHUP, &set);
 	lock(vecs->lock);
 	if (configure(vecs, 1)) {
 		unlock(vecs->lock);
@@ -1653,6 +1649,7 @@ child (void * param)
 		exit(1);
 	}
 	unlock(vecs->lock);
+	pthread_sigmask(SIG_SETMASK, &set, NULL);
 
 	/*
 	 * start threads
@@ -1685,6 +1682,7 @@ child (void * param)
 	 */
 	running_state = DAEMON_SHUTDOWN;
 	pthread_sigmask(SIG_UNBLOCK, &set, NULL);
+	block_signal(SIGUSR1, NULL);
 	block_signal(SIGHUP, NULL);
 	lock(vecs->lock);
 	if (conf->queue_without_daemon == QUE_NO_DAEMON_OFF)
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/16] Fix hardware entry matching code
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (4 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 05/16] Fix a couple of signal issues Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 07/16] Fix some socket issues Benjamin Marzinski
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

When a user defined hardware table entry's identifiers exactly
match a built-in one's, the built-in one is removed, and the list
is rescaned.  However, the built-in entry is not freed, and on the
rescan, the first user defined entry is treated as a built-in
entry. This patch frees the built-in entry, and decrements the
number of built-in entries, so that the rescan works as expected.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/config.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 25d3e3d..da676df 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -436,6 +436,8 @@ restart:
 			merge_hwe(hwe2, hwe1);
 			if (hwe_strmatch(hwe2, hwe1) == 0) {
 				vector_del_slot(hw, i);
+				free_hwe(hwe1);
+				n -= 1;
 				/*
 				 * Play safe here; we have modified
 				 * the original vector so the outer
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/16] Fix some socket issues
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (5 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 06/16] Fix hardware entry matching code Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 08/16] Avoid race between ueventloop and uevqloop Benjamin Marzinski
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

multipath wasn't acutally using /org/kernel/linux/storage/multipathd
for its local socket because when it created and bound to that
socket, it didn't include the size of the structure in the length
it passed with the call.  The result was a trucnated name. Also,
mpathpersist wasn't updated to use the new socket name. This patch
fixes both.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmpathpersist/mpath_updatepr.c | 3 ++-
 libmultipath/uxsock.c            | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/libmpathpersist/mpath_updatepr.c b/libmpathpersist/mpath_updatepr.c
index 2982947..8597d40 100644
--- a/libmpathpersist/mpath_updatepr.c
+++ b/libmpathpersist/mpath_updatepr.c
@@ -14,6 +14,7 @@
 #include <debug.h>
 #include "memory.h"
 #include "../libmultipath/uxsock.h"
+#include "../libmultipath/defaults.h"
 
 unsigned long mem_allocated;    /* Total memory used in Bytes */
 
@@ -25,7 +26,7 @@ int update_prflag(char * arg1, char * arg2, int noisy)
 	size_t len;
 	int ret = 0;
 
-	fd = ux_socket_connect("/var/run/multipathd.sock");
+	fd = ux_socket_connect(DEFAULT_SOCKET);
 	if (fd == -1) {
 		condlog (0, "ux socket connect error");
 		return 1 ;
diff --git a/libmultipath/uxsock.c b/libmultipath/uxsock.c
index f3e8dec..ce89428 100644
--- a/libmultipath/uxsock.c
+++ b/libmultipath/uxsock.c
@@ -31,7 +31,7 @@ int ux_socket_connect(const char *name)
 	memset(&addr, 0, sizeof(addr));
 	addr.sun_family = AF_LOCAL;
 	addr.sun_path[0] = '\0';
-	len = strlen(name) + 1;
+	len = strlen(name) + 1 + sizeof(sa_family_t);
 	strncpy(&addr.sun_path[1], name, len);
 
 	fd = socket(AF_LOCAL, SOCK_STREAM, 0);
@@ -62,7 +62,7 @@ int ux_socket_listen(const char *name)
 	memset(&addr, 0, sizeof(addr));
 	addr.sun_family = AF_LOCAL;
 	addr.sun_path[0] = '\0';
-	len = strlen(name) + 1;
+	len = strlen(name) + 1 + sizeof(sa_family_t);
 	strncpy(&addr.sun_path[1], name, len);
 
 	if (bind(fd, (struct sockaddr *)&addr, len) == -1) {
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/16] Avoid race between ueventloop and uevqloop
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (6 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 07/16] Fix some socket issues Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 09/16] Add existing multipath devices to wwids file on Benjamin Marzinski
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

ueventloop sets up uevq_lockp and uev_condp, which uevqloop uses.
If uevqloop accesses these structures before ueventloop has
initialized them, it will not wake up to process uevents. This patch
statically initializes these structures so they will always be
initialized. Also, since calling LIST_HEAD(uevq) initializes it,
there is no reason to call INIT_LIST_HEAD on it later.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/uevent.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/libmultipath/uevent.c b/libmultipath/uevent.c
index bc74d38..0643e14 100644
--- a/libmultipath/uevent.c
+++ b/libmultipath/uevent.c
@@ -53,8 +53,10 @@ typedef int (uev_trigger)(struct uevent *, void * trigger_data);
 
 pthread_t uevq_thr;
 LIST_HEAD(uevq);
-pthread_mutex_t uevq_lock, *uevq_lockp = &uevq_lock;
-pthread_cond_t  uev_cond,  *uev_condp  = &uev_cond;
+pthread_mutex_t uevq_lock = PTHREAD_MUTEX_INITIALIZER;
+pthread_mutex_t *uevq_lockp = &uevq_lock;
+pthread_cond_t uev_cond = PTHREAD_COND_INITIALIZER;
+pthread_cond_t *uev_condp = &uev_cond;
 uev_trigger *my_uev_trigger;
 void * my_trigger_data;
 int servicing_uev;
@@ -409,10 +411,6 @@ int uevent_listen(void)
 	 * thereby not getting to empty the socket's receive buffer queue
 	 * often enough.
 	 */
-	INIT_LIST_HEAD(&uevq);
-
-	pthread_mutex_init(uevq_lockp, NULL);
-	pthread_cond_init(uev_condp, NULL);
 	pthread_cleanup_push(uevq_stop, NULL);
 
 	monitor = udev_monitor_new_from_netlink(conf->udev, "udev");
@@ -525,8 +523,6 @@ out:
 	if (need_failback)
 		err = failback_listen();
 	pthread_cleanup_pop(1);
-	pthread_mutex_destroy(uevq_lockp);
-	pthread_cond_destroy(uev_condp);
 	return err;
 }
 
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/16] Add existing multipath devices to wwids file on
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (7 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 08/16] Avoid race between ueventloop and uevqloop Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 10/16] add wwids file cleanup options Benjamin Marzinski
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

When multipathd started up, it didn't add any existing devices to the
wwids file.  Because of this, devices that were always set up in the
initramfs were not counted as valid multipath devices, and checking
if one of their paths was a multipath path device gave the incorrect
answer.  This patch makes multipath add those devices when it does
its initial configuration on startup.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/multipathd/main.c b/multipathd/main.c
index f6e68e8..440d254 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1357,6 +1357,7 @@ configure (struct vectors * vecs, int start_waiters)
 
 	sync_maps_state(mpvec);
 	vector_foreach_slot(mpvec, mpp, i){
+		remember_wwid(mpp->wwid);
 		update_map_pr(mpp);
 	}
 
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/16] add wwids file cleanup options
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (8 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 09/16] Add existing multipath devices to wwids file on Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 11/16] Fix max path checker timing Benjamin Marzinski
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

This patch adds the "-w <device>" and "-W" options to multipath. They
allow users to either remove a specified device from the wwids file, or
reset the wwids file to only include the wwids for their current
multipath devices.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c |   3 +-
 libmultipath/wwids.c     | 130 +++++++++++++++++++++++++++++++++++++++++++++++
 libmultipath/wwids.h     |   2 +
 multipath/main.c         |  51 +++++++++++++++++--
 multipath/multipath.8    |   8 ++-
 5 files changed, 188 insertions(+), 6 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 826ea75..4d452a1 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -53,7 +53,8 @@ store_pathinfo (vector pathvec, vector hwtable, struct udev_device *udevice,
 		goto out;
 	}
 	pp->udev = udev_device_ref(udevice);
-	err = pathinfo(pp, hwtable, flag | DI_BLACKLIST);
+	err = pathinfo(pp, hwtable,
+		       (conf->dry_run == 3)? flag : (flag | DI_BLACKLIST));
 	if (err)
 		goto out;
 
diff --git a/libmultipath/wwids.c b/libmultipath/wwids.c
index abd23c5..91b07a7 100644
--- a/libmultipath/wwids.c
+++ b/libmultipath/wwids.c
@@ -82,6 +82,136 @@ write_out_wwid(int fd, char *wwid) {
 }
 
 int
+replace_wwids(vector mp)
+{
+	int i, fd, can_write;
+	struct multipath * mpp;
+	size_t len;
+	int ret = -1;
+
+	fd = open_file(conf->wwids_file, &can_write, WWIDS_FILE_HEADER);
+	if (fd < 0)
+		goto out;
+	if (!can_write) {
+		condlog(0, "cannot replace wwids. wwids file is read-only");
+		goto out_file;
+	}
+	if (ftruncate(fd, 0) < 0) {
+		condlog(0, "cannot truncate wwids file : %s", strerror(errno));
+		goto out_file;
+	}
+	len = strlen(WWIDS_FILE_HEADER);
+	if (write_all(fd, WWIDS_FILE_HEADER, len) != len) {
+		condlog(0, "Can't write wwid file header : %s",
+			strerror(errno));
+		/* cleanup partially written header */
+		if (ftruncate(fd, 0) < 0)
+			condlog(0, "Cannot truncate header : %s",
+				strerror(errno));
+		goto out_file;
+	}
+	if (!mp || !mp->allocated) {
+		ret = 0;
+		goto out_file;
+	}
+	vector_foreach_slot(mp, mpp, i) {
+		if (write_out_wwid(fd, mpp->wwid) < 0)
+			goto out_file;
+	}
+	ret = 0;
+out_file:
+	close(fd);
+out:
+	return ret;
+}
+
+int
+do_remove_wwid(int fd, char *str) {
+	char buf[4097];
+	char *ptr;
+	off_t start = 0;
+	int bytes;
+
+	while (1) {
+		if (lseek(fd, start, SEEK_SET) < 0) {
+			condlog(0, "wwid file read lseek failed : %s",
+				strerror(errno));
+			return -1;
+		}
+		bytes = read(fd, buf, 4096);
+		if (bytes < 0) {
+			if (errno == EINTR || errno == EAGAIN)
+				continue;
+			condlog(0, "failed to read from wwids file : %s",
+				strerror(errno));
+			return -1;
+		}
+		if (!bytes) /* didn't find wwid to remove */
+			return 1;
+		buf[bytes] = '\0';
+		ptr = strstr(buf, str);
+		if (ptr != NULL) {
+			condlog(3, "found '%s'", str);
+			if (lseek(fd, start + (ptr - buf), SEEK_SET) < 0) {
+				condlog(0, "write lseek failed : %s",
+						strerror(errno));
+				return -1;
+			}
+			while (1) {
+				if (write(fd, "#", 1) < 0) {
+					if (errno == EINTR || errno == EAGAIN)
+						continue;
+					condlog(0, "failed to write to wwids file : %s", strerror(errno));
+					return -1;
+				}
+				return 0;
+			}
+		}
+		ptr = strrchr(buf, '\n');
+		if (ptr == NULL) { /* shouldn't happen, assume it is EOF */
+			condlog(4, "couldn't find newline, assuming end of file");
+			return 1;
+		}
+		start = start + (ptr - buf) + 1;
+	}
+}
+
+
+int
+remove_wwid(char *wwid) {
+	int fd, len, can_write;
+	char *str;
+	int ret = -1;
+
+	len = strlen(wwid) + 4; /* two slashes the newline and a zero byte */
+	str = malloc(len);
+	if (str == NULL) {
+		condlog(0, "can't allocate memory to remove wwid : %s",
+			strerror(errno));
+		return -1;
+	}
+	if (snprintf(str, len, "/%s/\n", wwid) >= len) {
+		condlog(0, "string overflow trying to remove wwid");
+		goto out;
+	}
+	condlog(3, "removing line '%s' from wwids file", str);
+	fd = open_file(conf->wwids_file, &can_write, WWIDS_FILE_HEADER);
+	if (fd < 0)
+		goto out;
+	if (!can_write) {
+		condlog(0, "cannot remove wwid. wwids file is read-only");
+		goto out_file;
+	}
+	ret = do_remove_wwid(fd, str);
+
+out_file:
+	close(fd);
+out:
+	free(str);
+	return ret;
+}
+
+int
 check_wwids_file(char *wwid, int write_wwid)
 {
 	int fd, can_write, found, ret;
diff --git a/libmultipath/wwids.h b/libmultipath/wwids.h
index 1678f9d..f3b21fa 100644
--- a/libmultipath/wwids.h
+++ b/libmultipath/wwids.h
@@ -14,5 +14,7 @@
 
 int remember_wwid(char *wwid);
 int check_wwids_file(char *wwid, int write_wwid);
+int remove_wwid(char *wwid);
+int replace_wwids(vector mp);
 
 #endif /* _WWIDS_H */
diff --git a/multipath/main.c b/multipath/main.c
index 3038869..f1b3ec9 100644
--- a/multipath/main.c
+++ b/multipath/main.c
@@ -83,7 +83,7 @@ usage (char * progname)
 {
 	fprintf (stderr, VERSION_STRING);
 	fprintf (stderr, "Usage:\n");
-	fprintf (stderr, "  %s [-c] [-d] [-r] [-v lvl] [-p pol] [-b fil] [-q] [dev]\n", progname);
+	fprintf (stderr, "  %s [-c|-w|-W] [-d] [-r] [-v lvl] [-p pol] [-b fil] [-q] [dev]\n", progname);
 	fprintf (stderr, "  %s -l|-ll|-f [-v lvl] [-b fil] [dev]\n", progname);
 	fprintf (stderr, "  %s -F [-v lvl]\n", progname);
 	fprintf (stderr, "  %s -t\n", progname);
@@ -104,6 +104,8 @@ usage (char * progname)
 		"  -B      treat the bindings file as read only\n" \
 		"  -p      policy failover|multibus|group_by_serial|group_by_prio\n" \
 		"  -b fil  bindings file location\n" \
+		"  -w      remove a device from the wwids file\n" \
+		"  -W      reset the wwids file include only the current devices\n" \
 		"  -p pol  force all maps to specified path grouping policy :\n" \
 		"          . failover            one path per priority group\n" \
 		"          . multibus            all paths in one priority group\n" \
@@ -212,7 +214,6 @@ get_dm_mpvec (vector curmp, vector pathvec, char * refwwid)
 
 		if (!conf->dry_run)
 			reinstate_paths(mpp);
-		remember_wwid(mpp->wwid);
 	}
 	return 0;
 }
@@ -262,7 +263,7 @@ configure (void)
 	/*
 	 * if we have a blacklisted device parameter, exit early
 	 */
-	if (dev && conf->dev_type == DEV_DEVNODE &&
+	if (dev && conf->dev_type == DEV_DEVNODE && conf->dry_run != 3 &&
 	    (filter_devnode(conf->blist_devnode,
 			    conf->elist_devnode, dev) > 0)) {
 		if (conf->dry_run == 2)
@@ -284,6 +285,17 @@ configure (void)
 				condlog(3, "scope is nul");
 			goto out;
 		}
+		if (conf->dry_run == 3) {
+			r = remove_wwid(refwwid);
+			if (r == 0)
+				printf("wwid '%s' removed\n", refwwid);
+			else if (r == 1) {
+				printf("wwid '%s' not in wwids file\n",
+					refwwid);
+				r = 0;
+			}
+			goto out;
+		}
 		condlog(3, "scope limited to %s", refwwid);
 		if (conf->dry_run == 2) {
 			if (check_wwids_file(refwwid, 0) == 0){
@@ -439,7 +451,7 @@ main (int argc, char *argv[])
 	if (load_config(DEFAULT_CONFIGFILE))
 		exit(1);
 
-	while ((arg = getopt(argc, argv, ":dchl::FfM:v:p:b:Brtq")) != EOF ) {
+	while ((arg = getopt(argc, argv, ":dchl::FfM:v:p:b:BrtqwW")) != EOF ) {
 		switch(arg) {
 		case 1: printf("optarg : %s\n",optarg);
 			break;
@@ -504,6 +516,12 @@ main (int argc, char *argv[])
 		case 'h':
 			usage(argv[0]);
 			exit(0);
+		case 'w':
+			conf->dry_run = 3;
+			break;
+		case 'W':
+			conf->dry_run = 4;
+			break;
 		case ':':
 			fprintf(stderr, "Missing option argument\n");
 			usage(argv[0]);
@@ -555,6 +573,31 @@ main (int argc, char *argv[])
 		condlog(0, "the -c option requires a path to check");
 		goto out;
 	}
+	if (conf->dry_run == 3 && !conf->dev) {
+		condlog(0, "the -w option requires a device");
+		goto out;
+	}
+	if (conf->dry_run == 4) {
+		struct multipath * mpp;
+		int i;
+		vector curmp;
+
+		curmp = vector_alloc();
+		if (!curmp) {
+			condlog(0, "can't allocate memory for mp list");
+			goto out;
+		}
+		if (dm_get_maps(curmp) == 0)
+			r = replace_wwids(curmp);
+		if (r == 0)
+			printf("successfully reset wwids\n");
+		vector_foreach_slot_backwards(curmp, mpp, i) {
+			vector_del_slot(curmp, i);
+			free_multipath(mpp, KEEP_PATHS);
+		}
+		vector_free(curmp);
+		goto out;
+	}
 	if (conf->remove == FLUSH_ONE) {
 		if (conf->dev_type == DEV_DEVMAP) {
 			r = dm_suspend_and_flush_map(conf->dev);
diff --git a/multipath/multipath.8 b/multipath/multipath.8
index afaa6c4..a2262ac 100644
--- a/multipath/multipath.8
+++ b/multipath/multipath.8
@@ -8,7 +8,7 @@ multipath \- Device mapper target autoconfig
 .RB [\| \-b\ \c
 .IR bindings_file \|]
 .RB [\| \-d \|]
-.RB [\| \-h | \-l | \-ll | \-f | \-t | \-F | \-B | \-c | \-q | \|-r \|]
+.RB [\| \-h | \-l | \-ll | \-f | \-t | \-F | \-B | \-c | \-q | \|-r | \-w | \-W \|]
 .RB [\| \-p\ \c
 .BR failover | multibus | group_by_serial | group_by_prio | group_by_node_name \|]
 .RB [\| device \|]
@@ -68,6 +68,12 @@ check if a block device should be a path in a multipath device
 .B \-q
 allow device tables with queue_if_no_path when multipathd is not running
 .TP
+.B \-w
+remove the wwid for the specified device from the wwids file
+.TP
+.B \-W
+reset the wwids file to only include the current multipath devices
+.TP
 .BI \-p " policy"
 force new maps to use the specified policy:
 .RS 1.2i
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/16] Fix max path checker timing
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (9 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 10/16] add wwids file cleanup options Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-03  6:59   ` Hannes Reinecke
  2013-05-02 21:46 ` [PATCH 12/16] Make set_multipath_wwid actually do something Benjamin Marzinski
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Due to some code being placed inside the wrong block, the number of
seconds to wait between path checks (pp->tick), was only getting set to
the path's individual check interval if that wasn't equal to the max
check interval.  Otherwise it was using the default for a failed path.
This patch makes sure that pp->ticks always always gets set correctly
for active paths.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 440d254..df1c5b9 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1219,11 +1219,10 @@ check_path (struct vectors * vecs, struct path * pp)
 					pp->checkint = 2 * pp->checkint;
 				else
 					pp->checkint = conf->max_checkint;
-
-				pp->tick = pp->checkint;
-				condlog(4, "%s: delay next check %is",
-					pp->dev_t, pp->tick);
 			}
+			pp->tick = pp->checkint;
+			condlog(4, "%s: delay next check %is",
+				pp->dev_t, pp->tick);
 		}
 	}
 	else if (newstate == PATH_DOWN) {
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/16] Make set_multipath_wwid actually do something
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (10 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 11/16] Fix max path checker timing Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 13/16] Make multipathd deal better with uninitialized paths Benjamin Marzinski
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

mpp->wwid is a character array in the multipath struction, not a pointer,
so it is never NULL. multipath needs to check if the string is empty
instead.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index 7073915..72622b0 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -106,7 +106,7 @@ orphan_paths (vector pathvec, struct multipath * mpp)
 static void
 set_multipath_wwid (struct multipath * mpp)
 {
-	if (mpp->wwid)
+	if (strlen(mpp->wwid))
 		return;
 
 	dm_get_uuid(mpp->alias, mpp->wwid);
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/16] Make multipathd deal better with uninitialized paths
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (11 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 12/16] Make set_multipath_wwid actually do something Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 14/16] Stop annoying prio_lookup warning messages Benjamin Marzinski
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

If multipathd cannot get all the necessary information from a path in
pathinfo, it clears the path's wwid, and adds it to the pathvec without
being initialized.  However, it never tries to reinitialize it later.
This can cause problems at bootup if multipathd is started at around
the same time as some path devices are discovered. multipathd may try
to initalize them in configure() before they are all the way set up.
After the paths are completely set up, multipathd will get a uevent for
them, but it won't try to reinitialize them. This patch adds
reinitialization code to uev_add_path().  Also, since getting the path
uid now just involves reading an attribute set by udev, there's no
reason no to try it for paths that are currently down.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/discovery.c |  2 +-
 multipathd/main.c        | 19 ++++++++++++++++++-
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 4d452a1..26983ca 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1034,7 +1034,7 @@ pathinfo (struct path *pp, vector hwtable, int mask)
 		}
 	}
 
-	if (path_state == PATH_UP && (mask & DI_WWID) && !strlen(pp->wwid))
+	if ((mask & DI_WWID) && !strlen(pp->wwid))
 		get_uid(pp);
 	if (mask & DI_BLACKLIST && mask & DI_WWID) {
 		if (filter_wwid(conf->blist_wwid, conf->elist_wwid,
diff --git a/multipathd/main.c b/multipathd/main.c
index df1c5b9..a8e4725 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -375,7 +375,7 @@ static int
 uev_add_path (struct uevent *uev, struct vectors * vecs)
 {
 	struct path *pp;
-	int ret;
+	int ret, i;
 
 	condlog(2, "%s: add path (uevent)", uev->kernel);
 	if (strstr(uev->kernel, "..") != NULL) {
@@ -392,6 +392,23 @@ uev_add_path (struct uevent *uev, struct vectors * vecs)
 			uev->kernel);
 		if (pp->mpp)
 			return 0;
+		if (!strlen(pp->wwid)) {
+			udev_device_unref(pp->udev);
+			pp->udev = udev_device_ref(uev->udev);
+			ret = pathinfo(pp, conf->hwtable,
+				       DI_ALL | DI_BLACKLIST);
+			if (ret == 2) {
+				i = find_slot(vecs->pathvec, (void *)pp);
+				if (i != -1)
+					vector_del_slot(vecs->pathvec, i);
+				free_path(pp);
+				return 0;
+			} else if (ret == 1) {
+				condlog(0, "%s: failed to reinitialize path",
+					uev->kernel);
+				return 1;
+			}
+		}
 	} else {
 		/*
 		 * get path vital state
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/16] Stop annoying prio_lookup warning messages
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (12 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 13/16] Make multipathd deal better with uninitialized paths Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 15/16] make multipathd disable queue_without_daemon by default Benjamin Marzinski
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Multipath shouldn't try to look up its prioritizer if it doesn't have
one. Doing so just causes annoying warning messages.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/prio.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/libmultipath/prio.c b/libmultipath/prio.c
index 8e6b93e..186cc4d 100644
--- a/libmultipath/prio.c
+++ b/libmultipath/prio.c
@@ -162,7 +162,10 @@ void prio_put (struct prio * dst)
 	if (!dst)
 		return;
 
-	src = prio_lookup(dst->name);
+	if (!strlen(dst->name))
+		src = NULL;
+	else
+		src = prio_lookup(dst->name);
 	memset(dst, 0x0, sizeof(struct prio));
 	free_prio(src);
 }
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/16] make multipathd disable queue_without_daemon by default
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (13 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 14/16] Stop annoying prio_lookup warning messages Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-02 21:46 ` [PATCH 16/16] Use mapname to choose kpartx delimiter Benjamin Marzinski
  2013-05-06 19:45 ` [PATCH 00/16] multipath-tools: miscellaneous multipath patches Christophe Varoqui
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Once multipathd stops, there is nothing to restore access to failed
paths. Any multipath devices with no active paths that are set to
queue_if_no_path will never stop queueing, even if they were supposed
to stop after a number of retries.  This can cause shutdown to hang.
So, this patch turns off queueing by default when multipathd is stopped.
It also adds two multipathd interactive commands "forcequeueing daemon"
and "restorequeueing daemon", which override and reset this behavior,
respectively.

Unfortunately this isn't a perfect solution.  Ideally, when restarting
multipathd, you would first call "forcequeueing daemon", no make sure
that any devices that are queueing without paths continue to do so while
you are restarting the daemon. However there is no way to do this in
systemd as there was in Upstart. There is a languishing RFE that I filed
for an ExecRestartPre option in systemd. But for most users, the only
time when they need to restart multipathd is when upgrading the package,
so forcing queueing can be dealt with there.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/dict.c               | 10 ++++------
 libmultipath/structs.h            |  2 +-
 multipathd/cli.c                  |  3 +++
 multipathd/cli.h                  |  2 ++
 multipathd/cli_handlers.c         | 18 ++++++++++++++++++
 multipathd/cli_handlers.h         |  2 ++
 multipathd/main.c                 |  2 ++
 multipathd/multipathd.init.redhat | 14 ++++++++++----
 8 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 5154cdd..14e7c57 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -438,14 +438,11 @@ def_queue_without_daemon(vector strvec)
 	if (!buff)
 		return 1;
 
-	if (!strncmp(buff, "off", 3) || !strncmp(buff, "no", 2) ||
-	    !strncmp(buff, "0", 1))
-		conf->queue_without_daemon = QUE_NO_DAEMON_OFF;
-	else if (!strncmp(buff, "on", 2) || !strncmp(buff, "yes", 3) ||
+	if (!strncmp(buff, "on", 2) || !strncmp(buff, "yes", 3) ||
 		 !strncmp(buff, "1", 1))
 		conf->queue_without_daemon = QUE_NO_DAEMON_ON;
 	else
-		conf->queue_without_daemon = QUE_NO_DAEMON_UNDEF;
+		conf->queue_without_daemon = QUE_NO_DAEMON_OFF;
 
 	free(buff);
 	return 0;
@@ -2649,8 +2646,9 @@ snprint_def_queue_without_daemon (char * buff, int len, void * data)
 	case QUE_NO_DAEMON_OFF:
 		return snprintf(buff, len, "\"no\"");
 	case QUE_NO_DAEMON_ON:
-	case QUE_NO_DAEMON_UNDEF:
 		return snprintf(buff, len, "\"yes\"");
+	case QUE_NO_DAEMON_FORCE:
+		return snprintf(buff, len, "\"forced\"");
 	}
 	return 0;
 }
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index b9ace36..59387d6 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -67,9 +67,9 @@ enum pgstates {
 };
 
 enum queue_without_daemon_states {
-	QUE_NO_DAEMON_UNDEF,
 	QUE_NO_DAEMON_OFF,
 	QUE_NO_DAEMON_ON,
+	QUE_NO_DAEMON_FORCE,
 };
 
 enum pgtimeouts {
diff --git a/multipathd/cli.c b/multipathd/cli.c
index d95cba0..2a5edfa 100644
--- a/multipathd/cli.c
+++ b/multipathd/cli.c
@@ -162,6 +162,7 @@ load_keys (void)
 	r += add_key(keys, "resize", RESIZE, 0);
 	r += add_key(keys, "reset", RESET, 0);
 	r += add_key(keys, "reload", RELOAD, 0);
+	r += add_key(keys, "forcequeueing", FORCEQ, 0);
 	r += add_key(keys, "disablequeueing", DISABLEQ, 0);
 	r += add_key(keys, "restorequeueing", RESTOREQ, 0);
 	r += add_key(keys, "paths", PATHS, 0);
@@ -459,6 +460,8 @@ cli_init (void) {
 	add_handler(GETPRSTATUS+MAP, NULL);
 	add_handler(SETPRSTATUS+MAP, NULL);
 	add_handler(UNSETPRSTATUS+MAP, NULL);
+	add_handler(FORCEQ+DAEMON, NULL);
+	add_handler(RESTOREQ+DAEMON, NULL);
 
 	return 0;
 }
diff --git a/multipathd/cli.h b/multipathd/cli.h
index 6b288d4..09fdc68 100644
--- a/multipathd/cli.h
+++ b/multipathd/cli.h
@@ -10,6 +10,7 @@ enum {
 	__RESIZE,
 	__RESET,
 	__RELOAD,
+	__FORCEQ,
 	__DISABLEQ,
 	__RESTOREQ,
 	__PATHS,
@@ -45,6 +46,7 @@ enum {
 #define RESIZE		(1 << __RESIZE)
 #define RESET		(1 << __RESET)
 #define RELOAD		(1 << __RELOAD)
+#define FORCEQ		(1 << __FORCEQ)
 #define DISABLEQ	(1 << __DISABLEQ)
 #define RESTOREQ	(1 << __RESTOREQ)
 #define PATHS		(1 << __PATHS)
diff --git a/multipathd/cli_handlers.c b/multipathd/cli_handlers.c
index a1e4bde..ac648ad 100644
--- a/multipathd/cli_handlers.c
+++ b/multipathd/cli_handlers.c
@@ -628,6 +628,24 @@ cli_resize(void *v, char **reply, int *len, void *data)
 }
 
 int
+cli_force_no_daemon_q(void * v, char ** reply, int * len, void * data)
+{
+	condlog(2, "force queue_without_daemon (operator)");
+	if (conf->queue_without_daemon == QUE_NO_DAEMON_OFF)
+		conf->queue_without_daemon = QUE_NO_DAEMON_FORCE;
+	return 0;
+}
+
+int
+cli_restore_no_daemon_q(void * v, char ** reply, int * len, void * data)
+{
+	condlog(2, "restore queue_without_daemon (operator)");
+	if (conf->queue_without_daemon == QUE_NO_DAEMON_FORCE)
+		conf->queue_without_daemon = QUE_NO_DAEMON_OFF;
+	return 0;
+}
+
+int
 cli_restore_queueing(void *v, char **reply, int *len, void *data)
 {
 	struct vectors * vecs = (struct vectors *)data;
diff --git a/multipathd/cli_handlers.h b/multipathd/cli_handlers.h
index c62a273..de51961 100644
--- a/multipathd/cli_handlers.h
+++ b/multipathd/cli_handlers.h
@@ -28,6 +28,8 @@ int cli_suspend(void * v, char ** reply, int * len, void * data);
 int cli_resume(void * v, char ** reply, int * len, void * data);
 int cli_reinstate(void * v, char ** reply, int * len, void * data);
 int cli_fail(void * v, char ** reply, int * len, void * data);
+int cli_force_no_daemon_q(void * v, char ** reply, int * len, void * data);
+int cli_restore_no_daemon_q(void * v, char ** reply, int * len, void * data);
 int cli_quit(void * v, char ** reply, int * len, void * data);
 int cli_shutdown(void * v, char ** reply, int * len, void * data);
 int cli_reassign (void * v, char ** reply, int * len, void * data);
diff --git a/multipathd/main.c b/multipathd/main.c
index a8e4725..18c8293 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -898,6 +898,8 @@ uxlsnrloop (void * ap)
 	set_handler_callback(GETPRSTATUS+MAP, cli_getprstatus);
 	set_handler_callback(SETPRSTATUS+MAP, cli_setprstatus);
 	set_handler_callback(UNSETPRSTATUS+MAP, cli_unsetprstatus);
+	set_handler_callback(FORCEQ+DAEMON, cli_force_no_daemon_q);
+	set_handler_callback(RESTOREQ+DAEMON, cli_restore_no_daemon_q);
 
 	umask(077);
 	uxsock_listen(&uxsock_trigger, ap);
diff --git a/multipathd/multipathd.init.redhat b/multipathd/multipathd.init.redhat
index 15b5753..d46ef80 100644
--- a/multipathd/multipathd.init.redhat
+++ b/multipathd/multipathd.init.redhat
@@ -81,23 +81,28 @@ force_stop() {
 	echo
 }
 
-stop() {
+check_root() {
         root_dev=$(awk '{ if ($1 !~ /^[ \t]*#/ && $2 == "/") { print $1; }}' /etc/mtab)
 	dm_num=`dmsetup info -c --noheadings -o minor $root_dev 2> /dev/null`
 	if [ $? -eq 0 ]; then
 		root_dm_device="dm-$dm_num"
 		[ -d $syspath/$root_dm_device ] && teardown_slaves $syspath/$root_dm_device
 	fi
+}
 
-	force_stop
+force_queue_without_daemon() {
+	$DAEMON forcequeueing daemon
 }
 
 restart() {
-	stop
+	force_queue_without_daemon
+	check_root
+	force_stop
 	start
 }
 
 force_restart() {
+	force_queue_without_daemon
 	force_stop
 	start
 }
@@ -115,7 +120,8 @@ start)
 	start
 	;;
 stop)
-	stop
+	check_root
+	force_stop
 	;;
 force-stop)
 	force_stop
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/16] Use mapname to choose kpartx delimiter
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (14 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 15/16] make multipathd disable queue_without_daemon by default Benjamin Marzinski
@ 2013-05-02 21:46 ` Benjamin Marzinski
  2013-05-06 19:45 ` [PATCH 00/16] multipath-tools: miscellaneous multipath patches Christophe Varoqui
  16 siblings, 0 replies; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-02 21:46 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

When kpartx creates partition devices, it uses the mapname as the base
for the partition device names. However when choosing the delimiter,
it uses the device name passed in.  So if kpartx is called on /dev/dm-X
it will always add a 'p', even if the mapname ends in a letter.  This
patch fixes that by setting the delimiter based on the mapname.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 kpartx/kpartx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kpartx/kpartx.c b/kpartx/kpartx.c
index 30b3bd9..98d88c0 100644
--- a/kpartx/kpartx.c
+++ b/kpartx/kpartx.c
@@ -348,7 +348,7 @@ main(int argc, char **argv){
 	if (delim == NULL) {
 		delim = malloc(DELIM_SIZE);
 		memset(delim, 0, DELIM_SIZE);
-		set_delimiter(device, delim);
+		set_delimiter(mapname, delim);
 	}
 
 	fd = open(device, O_RDONLY);
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/16] Fix a couple of signal issues
  2013-05-02 21:46 ` [PATCH 05/16] Fix a couple of signal issues Benjamin Marzinski
@ 2013-05-03  6:36   ` Bart Van Assche
  2013-05-03 20:24     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Bart Van Assche @ 2013-05-03  6:36 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

On 05/02/13 23:46, Benjamin Marzinski wrote:
> The patch cleans up some signal issues.
> First, when the vecs locking around reconfigure got shuffled
> around earlier, it was removed from sighup. This patch restores
> that.
>
> Second, a new sigusr1 handler was created. However the existing
> one was never removed.  Since signal handlers are per-process, and
> not per-thread, the original handler will get overwritten by the
> new one, so this patch deletes the original handler.
>
> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
> However, these signals weren't being blocked before threads locked
> those locks.  This patch blocks those signals while those locks
> are being taken to avoid locking deadlocks.

Are you aware that POSIX does not allow any locking function to be 
invoked from inside a signal handler ? See e.g. 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 
and the text that starts with "The following table defines a set of 
functions that shall be async-signal-safe. Therefore, applications can 
invoke them, without restriction, from signal-catching functions" for a 
list of C library functions that may be invoked from inside a signal 
handler.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/16] Fix max path checker timing
  2013-05-02 21:46 ` [PATCH 11/16] Fix max path checker timing Benjamin Marzinski
@ 2013-05-03  6:59   ` Hannes Reinecke
  2013-05-03 15:44     ` Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2013-05-03  6:59 UTC (permalink / raw)
  To: dm-devel

On 05/02/2013 11:46 PM, Benjamin Marzinski wrote:
> Due to some code being placed inside the wrong block, the number of
> seconds to wait between path checks (pp->tick), was only getting set to
> the path's individual check interval if that wasn't equal to the max
> check interval.  Otherwise it was using the default for a failed path.
> This patch makes sure that pp->ticks always always gets set correctly
> for active paths.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/main.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 440d254..df1c5b9 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1219,11 +1219,10 @@ check_path (struct vectors * vecs, struct path * pp)
>  					pp->checkint = 2 * pp->checkint;
>  				else
>  					pp->checkint = conf->max_checkint;
> -
> -				pp->tick = pp->checkint;
> -				condlog(4, "%s: delay next check %is",
> -					pp->dev_t, pp->tick);
>  			}
> +			pp->tick = pp->checkint;
> +			condlog(4, "%s: delay next check %is",
> +				pp->dev_t, pp->tick);
>  		}
>  	}
>  	else if (newstate == PATH_DOWN) {
> 
But then the message is wrong, isn't it?
We should be printing the 'delay next check' message only
if the check was actually delayed, ie inside the block.

So I think something like this should be more appropriate
(minus line-breaks, of course):

diff --git a/multipathd/main.c b/multipathd/main.c
index 6471d24..9901b02 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1235,10 +1235,10 @@ check_path (struct vectors * vecs, struct
path * pp)
                                else
                                        pp->checkint =
conf->max_checkint;

-                               pp->tick = pp->checkint;
                                condlog(4, "%s: delay next check %is",
                                        pp->dev_t, pp->tick);
                        }
+                       pp->tick = pp->checkint;
                }
        }
        else if (newstate == PATH_DOWN) {


Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 11/16] Fix max path checker timing
  2013-05-03  6:59   ` Hannes Reinecke
@ 2013-05-03 15:44     ` Benjamin Marzinski
  2013-05-03 17:59       ` [PATCH v2 " Benjamin Marzinski
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-03 15:44 UTC (permalink / raw)
  To: device-mapper development

On Fri, May 03, 2013 at 08:59:11AM +0200, Hannes Reinecke wrote:
> On 05/02/2013 11:46 PM, Benjamin Marzinski wrote:
> > Due to some code being placed inside the wrong block, the number of
> > seconds to wait between path checks (pp->tick), was only getting set to
> > the path's individual check interval if that wasn't equal to the max
> > check interval.  Otherwise it was using the default for a failed path.
> > This patch makes sure that pp->ticks always always gets set correctly
> > for active paths.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  multipathd/main.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 440d254..df1c5b9 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1219,11 +1219,10 @@ check_path (struct vectors * vecs, struct path * pp)
> >  					pp->checkint = 2 * pp->checkint;
> >  				else
> >  					pp->checkint = conf->max_checkint;
> > -
> > -				pp->tick = pp->checkint;
> > -				condlog(4, "%s: delay next check %is",
> > -					pp->dev_t, pp->tick);
> >  			}
> > +			pp->tick = pp->checkint;
> > +			condlog(4, "%s: delay next check %is",
> > +				pp->dev_t, pp->tick);
> >  		}
> >  	}
> >  	else if (newstate == PATH_DOWN) {
> > 
> But then the message is wrong, isn't it?
> We should be printing the 'delay next check' message only
> if the check was actually delayed, ie inside the block.
> 
> So I think something like this should be more appropriate
> (minus line-breaks, of course):
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 6471d24..9901b02 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -1235,10 +1235,10 @@ check_path (struct vectors * vecs, struct
> path * pp)
>                                 else
>                                         pp->checkint =
> conf->max_checkint;
> 
> -                               pp->tick = pp->checkint;
>                                 condlog(4, "%s: delay next check %is",
>                                         pp->dev_t, pp->tick);

I think you mean pp->checkint here.  Otherwise the code will print
pp->tick before we set it. But otherwise, I'm fine with this.  I'll
respin the patch.

-Ben

>                         }
> +                       pp->tick = pp->checkint;
>                 }
>         }
>         else if (newstate == PATH_DOWN) {
> 
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		      zSeries & Storage
> hare@suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/16] Fix max path checker timing
  2013-05-03 15:44     ` Benjamin Marzinski
@ 2013-05-03 17:59       ` Benjamin Marzinski
  2013-05-06  5:42         ` Hannes Reinecke
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-03 17:59 UTC (permalink / raw)
  To: device-mapper development; +Cc: Christophe Varoqui

Due to some code being placed inside the wrong block, the number of
seconds to wait between path checks (pp->tick), was only getting set to
the path's individual check interval if that wasn't equal to the max
check interval.  Otherwise it was using the default for a failed path.
This patch makes sure that pp->ticks always always gets set correctly
for active paths.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 440d254..9d52465 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -1220,10 +1220,10 @@ check_path (struct vectors * vecs, struct path * pp)
 				else
 					pp->checkint = conf->max_checkint;
 
-				pp->tick = pp->checkint;
 				condlog(4, "%s: delay next check %is",
-					pp->dev_t, pp->tick);
+					pp->dev_t, pp->checkint);
 			}
+			pp->tick = pp->checkint;
 		}
 	}
 	else if (newstate == PATH_DOWN) {
-- 
1.8.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/16] Fix a couple of signal issues
  2013-05-03  6:36   ` Bart Van Assche
@ 2013-05-03 20:24     ` Benjamin Marzinski
  2013-05-04  7:19       ` Bart Van Assche
  0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2013-05-03 20:24 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: device-mapper development, Christophe Varoqui

On Fri, May 03, 2013 at 08:36:19AM +0200, Bart Van Assche wrote:
> On 05/02/13 23:46, Benjamin Marzinski wrote:
>> The patch cleans up some signal issues.
>> First, when the vecs locking around reconfigure got shuffled
>> around earlier, it was removed from sighup. This patch restores
>> that.
>>
>> Second, a new sigusr1 handler was created. However the existing
>> one was never removed.  Since signal handlers are per-process, and
>> not per-thread, the original handler will get overwritten by the
>> new one, so this patch deletes the original handler.
>>
>> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
>> However, these signals weren't being blocked before threads locked
>> those locks.  This patch blocks those signals while those locks
>> are being taken to avoid locking deadlocks.
>
> Are you aware that POSIX does not allow any locking function to be invoked 
> from inside a signal handler ? See e.g. 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 
> and the text that starts with "The following table defines a set of 
> functions that shall be async-signal-safe. Therefore, applications can 
> invoke them, without restriction, from signal-catching functions" for a 
> list of C library functions that may be invoked from inside a signal 
> handler.

Um. no. I clearly wasn't aware of that. Nuts. So is the risk that a
signal will come into some thread that's not blocking it, and try to
acquire the pthread mutex at the same time as another thread that is
blocking the signal is trying to aquire the same mutex and that this
will corrupt the lock?

The Ubuntu man page for pthread_mutex_lock says

"The mutex functions are not async-signal safe. What this means is  that
 they should not be called from a signal handler. In particular, calling
 pthread_mutex_lock or pthread_mutex_unlock from a  signal handler  may
 deadlock the calling thread."

If that's the only risk, then blocking the signal before we lock the
mutexes should avoid the issue.  Since the point of mutexes is do deal
with multiple threads that are trying to acquire them at the same time,
it seems like they should be able to handle this when one of the threads
happens to be in a signal handler. Also, the locks in sighup were there
for a long time before they were removed, and while I have definitely
seen deadlocks when we don't remember to mask the signal before locking
on that thread, I've never encountered a bug that seems to be related to
a locking corruption like I speculated in the beginning of my reply.

Now, since my patch was to fix some potential deadlocks from using
pthread_mutex_locks in signal handlers, I do realize that the way
we are doing things isn't the safest way around.  Also, it seems pretty
obvious from looking at the URL you posted that we are in undefined
behavior territory, where if we are safe, it's solely based on the
some non-defined parts of the implementation. So I agree, we aren't
following the specs, and I'll work on redesigning things to avoid this.

But the locking I added in this patch fixes corruption that definitely
happens, and is very much able to crash multipathd.  I still think this
patch should go in, since those locks were previously there for a long
time, and I can easily crash multipathd by repeadly sending it sighups
without this patch.

Does that sound reasonable?

-Ben
>
> Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/16] Fix a couple of signal issues
  2013-05-03 20:24     ` Benjamin Marzinski
@ 2013-05-04  7:19       ` Bart Van Assche
  0 siblings, 0 replies; 25+ messages in thread
From: Bart Van Assche @ 2013-05-04  7:19 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development, Christophe Varoqui

On 05/03/13 22:24, Benjamin Marzinski wrote:
> On Fri, May 03, 2013 at 08:36:19AM +0200, Bart Van Assche wrote:
>> On 05/02/13 23:46, Benjamin Marzinski wrote:
>>> The patch cleans up some signal issues.
>>> First, when the vecs locking around reconfigure got shuffled
>>> around earlier, it was removed from sighup. This patch restores
>>> that.
>>>
>>> Second, a new sigusr1 handler was created. However the existing
>>> one was never removed.  Since signal handlers are per-process, and
>>> not per-thread, the original handler will get overwritten by the
>>> new one, so this patch deletes the original handler.
>>>
>>> Third, sighup locks the vecs lock and sigusr1 locks logq_lock.
>>> However, these signals weren't being blocked before threads locked
>>> those locks.  This patch blocks those signals while those locks
>>> are being taken to avoid locking deadlocks.
>>
>> Are you aware that POSIX does not allow any locking function to be invoked
>> from inside a signal handler ? See e.g.
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04
>> and the text that starts with "The following table defines a set of
>> functions that shall be async-signal-safe. Therefore, applications can
>> invoke them, without restriction, from signal-catching functions" for a
>> list of C library functions that may be invoked from inside a signal
>> handler.
>
> Um. no. I clearly wasn't aware of that. Nuts. So is the risk that a
> signal will come into some thread that's not blocking it, and try to
> acquire the pthread mutex at the same time as another thread that is
> blocking the signal is trying to aquire the same mutex and that this
> will corrupt the lock?
>
> The Ubuntu man page for pthread_mutex_lock says
>
> "The mutex functions are not async-signal safe. What this means is  that
>   they should not be called from a signal handler. In particular, calling
>   pthread_mutex_lock or pthread_mutex_unlock from a  signal handler  may
>   deadlock the calling thread."
>
> If that's the only risk, then blocking the signal before we lock the
> mutexes should avoid the issue.  Since the point of mutexes is do deal
> with multiple threads that are trying to acquire them at the same time,
> it seems like they should be able to handle this when one of the threads
> happens to be in a signal handler. Also, the locks in sighup were there
> for a long time before they were removed, and while I have definitely
> seen deadlocks when we don't remember to mask the signal before locking
> on that thread, I've never encountered a bug that seems to be related to
> a locking corruption like I speculated in the beginning of my reply.
>
> Now, since my patch was to fix some potential deadlocks from using
> pthread_mutex_locks in signal handlers, I do realize that the way
> we are doing things isn't the safest way around.  Also, it seems pretty
> obvious from looking at the URL you posted that we are in undefined
> behavior territory, where if we are safe, it's solely based on the
> some non-defined parts of the implementation. So I agree, we aren't
> following the specs, and I'll work on redesigning things to avoid this.
>
> But the locking I added in this patch fixes corruption that definitely
> happens, and is very much able to crash multipathd.  I still think this
> patch should go in, since those locks were previously there for a long
> time, and I can easily crash multipathd by repeadly sending it sighups
> without this patch.
>
> Does that sound reasonable?

Sorry but in my opinion the patch at the start of this thread makes the 
approach for signal handling in multipathd more complex and harder to 
maintain than strictly necessary. Will e.g. the next person who modifies 
this code be aware of all these subtleties ? The approach I follow in 
multithreaded code that I maintain myself for handling signals is to let 
each signal handler write some data into a pipe, wait on that pipe in 
the main thread and let the main thread take the appropriate action. 
This approach is easy to maintain, does not require to block and unblock 
signals at runtime, is portable between Unix systems and POSIX 
compliant. See e.g. 
http://github.com/bvanassche/srptools/commit/b6589892206ca628dbbb7fd8a1d613bf4f442ee6 
for an example. Note: a possible alternative is to use signalfd() 
instead of creating a pipe.

Bart.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 11/16] Fix max path checker timing
  2013-05-03 17:59       ` [PATCH v2 " Benjamin Marzinski
@ 2013-05-06  5:42         ` Hannes Reinecke
  0 siblings, 0 replies; 25+ messages in thread
From: Hannes Reinecke @ 2013-05-06  5:42 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development, Christophe Varoqui

On 05/03/2013 07:59 PM, Benjamin Marzinski wrote:
> Due to some code being placed inside the wrong block, the number of
> seconds to wait between path checks (pp->tick), was only getting set to
> the path's individual check interval if that wasn't equal to the max
> check interval.  Otherwise it was using the default for a failed path.
> This patch makes sure that pp->ticks always always gets set correctly
> for active paths.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/16] multipath-tools: miscellaneous multipath patches
  2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
                   ` (15 preceding siblings ...)
  2013-05-02 21:46 ` [PATCH 16/16] Use mapname to choose kpartx delimiter Benjamin Marzinski
@ 2013-05-06 19:45 ` Christophe Varoqui
  16 siblings, 0 replies; 25+ messages in thread
From: Christophe Varoqui @ 2013-05-06 19:45 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

On jeu., 2013-05-02 at 16:46 -0500, Benjamin Marzinski wrote:
> Here are the patches I've got since my last resync.
> 
> Please apply.
> 
The series is applied and pushed.

- 11v2/16 is merged in place of 11/16
- 05/16 is left out pending analysis of the solution suggested by Bart

Thanks,
Christophe Varoqui
www.opensvc.com

> Philipp Schmidt (1):
>   Make kpartx correctly handle non-512 byte GPT partitions
> 
> Benjamin Marzinski (15):
>   Make kpartx advise modprobe instead of insmod
>   Fix print_multipath_topology for large outputs
>   Don't print checker messages for ghost paths
>   Fix a couple of signal issues
>   Fix hardware entry matching code
>   Fix some socket issues
>   Avoid race between ueventloop and uevqloop
>   Add existing multipath devices to wwids file on configure
>   add wwids file cleanup options
>   Fix max path checker timing
>   Make set_multipath_wwid actually do something
>   Make multipathd deal better with uninitialized paths
>   Stop annoying prio_lookup warning messages
>   make multipathd disable queue_without_daemon by default
>   Use mapname to choose kpartx delimiter
> 
>  kpartx/gpt.c                      |   9 ++-
>  kpartx/kpartx.c                   |   2 +-
>  kpartx/lopart.c                   |   4 +-
>  libmpathpersist/mpath_updatepr.c  |   3 +-
>  libmultipath/config.c             |   2 +
>  libmultipath/dict.c               |  10 ++-
>  libmultipath/discovery.c          |   8 ++-
>  libmultipath/log_pthread.c        |   3 +
>  libmultipath/print.c              |  30 +++++++--
>  libmultipath/prio.c               |   5 +-
>  libmultipath/structs.h            |   2 +-
>  libmultipath/structs_vec.c        |   2 +-
>  libmultipath/uevent.c             |  12 ++--
>  libmultipath/uxsock.c             |   4 +-
>  libmultipath/wwids.c              | 130 ++++++++++++++++++++++++++++++++++++++
>  libmultipath/wwids.h              |   2 +
>  multipath/main.c                  |  51 +++++++++++++--
>  multipath/multipath.8             |   8 ++-
>  multipathd/cli.c                  |   3 +
>  multipathd/cli.h                  |   2 +
>  multipathd/cli_handlers.c         |  18 ++++++
>  multipathd/cli_handlers.h         |   2 +
>  multipathd/main.c                 |  41 ++++++++----
>  multipathd/multipathd.init.redhat |  14 ++--
>  24 files changed, 313 insertions(+), 54 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-05-06 19:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-02 21:46 [PATCH 00/16] multipath-tools: miscellaneous multipath patches Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 01/16] Make kpartx advise modprobe instead of insmod Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 02/16] Make kpartx correctly handle non-512 byte GPT Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 03/16] Fix print_multipath_topology for large outputs Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 04/16] Don't print checker messages for ghost paths Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 05/16] Fix a couple of signal issues Benjamin Marzinski
2013-05-03  6:36   ` Bart Van Assche
2013-05-03 20:24     ` Benjamin Marzinski
2013-05-04  7:19       ` Bart Van Assche
2013-05-02 21:46 ` [PATCH 06/16] Fix hardware entry matching code Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 07/16] Fix some socket issues Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 08/16] Avoid race between ueventloop and uevqloop Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 09/16] Add existing multipath devices to wwids file on Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 10/16] add wwids file cleanup options Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 11/16] Fix max path checker timing Benjamin Marzinski
2013-05-03  6:59   ` Hannes Reinecke
2013-05-03 15:44     ` Benjamin Marzinski
2013-05-03 17:59       ` [PATCH v2 " Benjamin Marzinski
2013-05-06  5:42         ` Hannes Reinecke
2013-05-02 21:46 ` [PATCH 12/16] Make set_multipath_wwid actually do something Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 13/16] Make multipathd deal better with uninitialized paths Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 14/16] Stop annoying prio_lookup warning messages Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 15/16] make multipathd disable queue_without_daemon by default Benjamin Marzinski
2013-05-02 21:46 ` [PATCH 16/16] Use mapname to choose kpartx delimiter Benjamin Marzinski
2013-05-06 19:45 ` [PATCH 00/16] multipath-tools: miscellaneous multipath patches Christophe Varoqui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.