All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] multipathd per-device waiter fixes
@ 2018-03-16 21:31 Benjamin Marzinski
  2018-03-16 21:31 ` [PATCH 1/3] multipathd: fix waiter thread cancelling Benjamin Marzinski
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Benjamin Marzinski @ 2018-03-16 21:31 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

These patches fix some issues in the old waiter code that I noticed
when writing my the polling waiter code. Neither of them is particularly
likely to cause a problem in real world situations, but I don't think
the fixes should be that controversial.

These patches are based on top of my "multipath: new and rebased patches"
set.

Benjamin Marzinski (3):
  multipathd: fix waiter thread cancelling
  multipathd: move __setup_multipath to multipathd
  multipathd: stop waiter in __setup_multipath

 libmultipath/structs_vec.c | 103 ---------------------------------------------
 libmultipath/structs_vec.h |   4 --
 multipathd/dmevents.c      |   1 +
 multipathd/main.c          | 103 +++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.h          |   4 ++
 multipathd/waiter.c        |  22 ++++++++--
 6 files changed, 127 insertions(+), 110 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] multipathd: fix waiter thread cancelling
  2018-03-16 21:31 [PATCH 0/3] multipathd per-device waiter fixes Benjamin Marzinski
@ 2018-03-16 21:31 ` Benjamin Marzinski
  2018-03-19 12:59   ` Martin Wilck
  2018-03-16 21:31 ` [PATCH 2/3] multipathd: move __setup_multipath to multipathd Benjamin Marzinski
  2018-03-16 21:31 ` [PATCH 3/3] multipathd: stop waiter in __setup_multipath Benjamin Marzinski
  2 siblings, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2018-03-16 21:31 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

multipathd was sending a signal to the per-device waiter thread after
cancelling it.  Since these threads are detached, there is a window
where a new thread could have started with the old thread id between the
cancel and the signalling, causing the signal to be delivered to the
wrong thread. Simply reversing the order doesn't fix the issue, since
the waiter threads exit immediately if they receive a signal, again
opening a window for the cancel to be delivered to the wrong thread.

To fix this, multipathd does reverse the order, so that it signals the
thread first (it needs to signal the thread, since the dm_task_run ioctl
isn't a cancellation point) and then cancels it. However it does this
while holding a new mutex.

The waiter thread can only exit without being cancelled for two reasons.
1. When it fails in update_multipath, which removes the device while
   holding the vecs lock.
2. If it receives a SIGUSR2 signal while waiting for a dm event.

Case 1 can never race with another thread removing the device, since
removing a device always happens while holding the vecs lock.  This
means that if the device exists to be removed, then the waiter thread
can't exit this way during the removal.

Case 2 is now solved by grabbing the new mutex after failing
dm_task_run(). With the mutex held, the thread checks if it has been
cancelled. If it wasn't cancelled, the thread continues.

The reason that this uses a new mutex, instead of the vecs lock, is that
using the vecs lock would keep the thread from ending until the vecs
lock was released.  Normally, this isn't a problem. But during
reconfigure, the waiter threads for all devices are stopped, and new
ones started, all while holding the vecs lock.  For systems with a large
number of multipath devices, this will cause multipathd do have double
its already large number of waiter threads during reconfigure, all locked
into memory.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/waiter.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/multipathd/waiter.c b/multipathd/waiter.c
index cb9708b..e894294 100644
--- a/multipathd/waiter.c
+++ b/multipathd/waiter.c
@@ -23,6 +23,7 @@
 #include "waiter.h"
 
 pthread_attr_t waiter_attr;
+struct mutex_lock waiter_lock = { .mutex = PTHREAD_MUTEX_INITIALIZER };
 
 static struct event_thread *alloc_waiter (void)
 {
@@ -59,8 +60,12 @@ void stop_waiter_thread (struct multipath *mpp, struct vectors *vecs)
 		mpp->waiter);
 	thread = mpp->waiter;
 	mpp->waiter = (pthread_t)0;
-	pthread_cancel(thread);
+	pthread_cleanup_push(cleanup_lock, &waiter_lock);
+	lock(&waiter_lock);
+	pthread_testcancel();
 	pthread_kill(thread, SIGUSR2);
+	pthread_cancel(thread);
+	lock_cleanup_pop(&waiter_lock);
 }
 
 /*
@@ -114,8 +119,13 @@ static int waiteventloop (struct event_thread *waiter)
 	dm_task_destroy(waiter->dmt);
 	waiter->dmt = NULL;
 
-	if (!r)	/* wait interrupted by signal */
-		return -1;
+	if (!r)	{ /* wait interrupted by signal. check for cancellation */
+		pthread_cleanup_push(cleanup_lock, &waiter_lock);
+		lock(&waiter_lock);
+		pthread_testcancel();
+		lock_cleanup_pop(&waiter_lock);
+		return 1; /* If we weren't cancelled, just reschedule */
+	}
 
 	waiter->event_nr++;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] multipathd: move __setup_multipath to multipathd
  2018-03-16 21:31 [PATCH 0/3] multipathd per-device waiter fixes Benjamin Marzinski
  2018-03-16 21:31 ` [PATCH 1/3] multipathd: fix waiter thread cancelling Benjamin Marzinski
@ 2018-03-16 21:31 ` Benjamin Marzinski
  2018-03-19 13:07   ` Martin Wilck
  2018-03-16 21:31 ` [PATCH 3/3] multipathd: stop waiter in __setup_multipath Benjamin Marzinski
  2 siblings, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2018-03-16 21:31 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

__setup_multipath is only called from multipathd, so it shouldn't be in
libmultipath.  Move it, update_multpath (which calls it) and
set_no_path_retry (which is a helper function for it) into multipathd.
None of these functions were changed, only copied.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 libmultipath/structs_vec.c | 103 ---------------------------------------------
 libmultipath/structs_vec.h |   4 --
 multipathd/dmevents.c      |   1 +
 multipathd/main.c          | 103 +++++++++++++++++++++++++++++++++++++++++++++
 multipathd/main.h          |   4 ++
 multipathd/waiter.c        |   1 +
 6 files changed, 109 insertions(+), 107 deletions(-)

diff --git a/libmultipath/structs_vec.c b/libmultipath/structs_vec.c
index e9a0274..8c8fb25 100644
--- a/libmultipath/structs_vec.c
+++ b/libmultipath/structs_vec.c
@@ -290,61 +290,6 @@ void enter_recovery_mode(struct multipath *mpp)
 	put_multipath_config(conf);
 }
 
-static void set_no_path_retry(struct multipath *mpp)
-{
-	char is_queueing = 0;
-
-	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
-	if (mpp->features && strstr(mpp->features, "queue_if_no_path"))
-		is_queueing = 1;
-
-	switch (mpp->no_path_retry) {
-	case NO_PATH_RETRY_UNDEF:
-		break;
-	case NO_PATH_RETRY_FAIL:
-		if (is_queueing)
-			dm_queue_if_no_path(mpp->alias, 0);
-		break;
-	case NO_PATH_RETRY_QUEUE:
-		if (!is_queueing)
-			dm_queue_if_no_path(mpp->alias, 1);
-		break;
-	default:
-		if (mpp->nr_active > 0) {
-			mpp->retry_tick = 0;
-			dm_queue_if_no_path(mpp->alias, 1);
-		} else if (is_queueing && mpp->retry_tick == 0)
-			enter_recovery_mode(mpp);
-		break;
-	}
-}
-
-int __setup_multipath(struct vectors *vecs, struct multipath *mpp,
-		      int reset)
-{
-	if (dm_get_info(mpp->alias, &mpp->dmi)) {
-		/* Error accessing table */
-		condlog(3, "%s: cannot access table", mpp->alias);
-		goto out;
-	}
-
-	if (update_multipath_strings(mpp, vecs->pathvec, 1)) {
-		condlog(0, "%s: failed to setup multipath", mpp->alias);
-		goto out;
-	}
-
-	if (reset) {
-		set_no_path_retry(mpp);
-		if (VECTOR_SIZE(mpp->paths) != 0)
-			dm_cancel_deferred_remove(mpp);
-	}
-
-	return 0;
-out:
-	remove_map(mpp, vecs, PURGE_VEC);
-	return 1;
-}
-
 void
 sync_map_state(struct multipath *mpp)
 {
@@ -468,54 +413,6 @@ int verify_paths(struct multipath *mpp, struct vectors *vecs)
 	return count;
 }
 
-int update_multipath (struct vectors *vecs, char *mapname, int reset)
-{
-	struct multipath *mpp;
-	struct pathgroup  *pgp;
-	struct path *pp;
-	int i, j;
-
-	mpp = find_mp_by_alias(vecs->mpvec, mapname);
-
-	if (!mpp) {
-		condlog(3, "%s: multipath map not found", mapname);
-		return 2;
-	}
-
-	if (__setup_multipath(vecs, mpp, reset))
-		return 1; /* mpp freed in setup_multipath */
-
-	/*
-	 * compare checkers states with DM states
-	 */
-	vector_foreach_slot (mpp->pg, pgp, i) {
-		vector_foreach_slot (pgp->paths, pp, j) {
-			if (pp->dmstate != PSTATE_FAILED)
-				continue;
-
-			if (pp->state != PATH_DOWN) {
-				struct config *conf = get_multipath_config();
-				int oldstate = pp->state;
-				condlog(2, "%s: mark as failed", pp->dev);
-				mpp->stat_path_failures++;
-				pp->state = PATH_DOWN;
-				if (oldstate == PATH_UP ||
-				    oldstate == PATH_GHOST)
-					update_queue_mode_del_path(mpp);
-
-				/*
-				 * if opportune,
-				 * schedule the next check earlier
-				 */
-				if (pp->tick > conf->checkint)
-					pp->tick = conf->checkint;
-				put_multipath_config(conf);
-			}
-		}
-	}
-	return 0;
-}
-
 /*
  * mpp->no_path_retry:
  *   -2 (QUEUE) : queue_if_no_path enabled, never turned off
diff --git a/libmultipath/structs_vec.h b/libmultipath/structs_vec.h
index 0adba17..4220ea3 100644
--- a/libmultipath/structs_vec.h
+++ b/libmultipath/structs_vec.h
@@ -19,9 +19,6 @@ void orphan_path (struct path * pp, const char *reason);
 
 int verify_paths(struct multipath * mpp, struct vectors * vecs);
 int update_mpp_paths(struct multipath * mpp, vector pathvec);
-int __setup_multipath (struct vectors * vecs, struct multipath * mpp,
-		       int reset);
-#define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
 int update_multipath_strings (struct multipath *mpp, vector pathvec,
 			      int is_daemon);
 void extract_hwe_from_path(struct multipath * mpp);
@@ -36,7 +33,6 @@ void remove_maps (struct vectors * vecs);
 void sync_map_state (struct multipath *);
 struct multipath * add_map_with_path (struct vectors * vecs,
 				struct path * pp, int add_vec);
-int update_multipath (struct vectors *vecs, char *mapname, int reset);
 void update_queue_mode_del_path(struct multipath *mpp);
 void update_queue_mode_add_path(struct multipath *mpp);
 int update_multipath_table (struct multipath *mpp, vector pathvec,
diff --git a/multipathd/dmevents.c b/multipathd/dmevents.c
index 1ef811e..2281a10 100644
--- a/multipathd/dmevents.c
+++ b/multipathd/dmevents.c
@@ -22,6 +22,7 @@
 #include "structs_vec.h"
 #include "devmapper.h"
 #include "debug.h"
+#include "main.h"
 #include "dmevents.h"
 
 #ifndef DM_DEV_ARM_POLL
diff --git a/multipathd/main.c b/multipathd/main.c
index e35231e..70aff5d 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -347,6 +347,109 @@ set_multipath_wwid (struct multipath * mpp)
 	dm_get_uuid(mpp->alias, mpp->wwid);
 }
 
+static void set_no_path_retry(struct multipath *mpp)
+{
+	char is_queueing = 0;
+
+	mpp->nr_active = pathcount(mpp, PATH_UP) + pathcount(mpp, PATH_GHOST);
+	if (mpp->features && strstr(mpp->features, "queue_if_no_path"))
+		is_queueing = 1;
+
+	switch (mpp->no_path_retry) {
+	case NO_PATH_RETRY_UNDEF:
+		break;
+	case NO_PATH_RETRY_FAIL:
+		if (is_queueing)
+			dm_queue_if_no_path(mpp->alias, 0);
+		break;
+	case NO_PATH_RETRY_QUEUE:
+		if (!is_queueing)
+			dm_queue_if_no_path(mpp->alias, 1);
+		break;
+	default:
+		if (mpp->nr_active > 0) {
+			mpp->retry_tick = 0;
+			dm_queue_if_no_path(mpp->alias, 1);
+		} else if (is_queueing && mpp->retry_tick == 0)
+			enter_recovery_mode(mpp);
+		break;
+	}
+}
+
+int __setup_multipath(struct vectors *vecs, struct multipath *mpp,
+		      int reset)
+{
+	if (dm_get_info(mpp->alias, &mpp->dmi)) {
+		/* Error accessing table */
+		condlog(3, "%s: cannot access table", mpp->alias);
+		goto out;
+	}
+
+	if (update_multipath_strings(mpp, vecs->pathvec, 1)) {
+		condlog(0, "%s: failed to setup multipath", mpp->alias);
+		goto out;
+	}
+
+	if (reset) {
+		set_no_path_retry(mpp);
+		if (VECTOR_SIZE(mpp->paths) != 0)
+			dm_cancel_deferred_remove(mpp);
+	}
+
+	return 0;
+out:
+	remove_map(mpp, vecs, PURGE_VEC);
+	return 1;
+}
+
+int update_multipath (struct vectors *vecs, char *mapname, int reset)
+{
+	struct multipath *mpp;
+	struct pathgroup  *pgp;
+	struct path *pp;
+	int i, j;
+
+	mpp = find_mp_by_alias(vecs->mpvec, mapname);
+
+	if (!mpp) {
+		condlog(3, "%s: multipath map not found", mapname);
+		return 2;
+	}
+
+	if (__setup_multipath(vecs, mpp, reset))
+		return 1; /* mpp freed in setup_multipath */
+
+	/*
+	 * compare checkers states with DM states
+	 */
+	vector_foreach_slot (mpp->pg, pgp, i) {
+		vector_foreach_slot (pgp->paths, pp, j) {
+			if (pp->dmstate != PSTATE_FAILED)
+				continue;
+
+			if (pp->state != PATH_DOWN) {
+				struct config *conf = get_multipath_config();
+				int oldstate = pp->state;
+				condlog(2, "%s: mark as failed", pp->dev);
+				mpp->stat_path_failures++;
+				pp->state = PATH_DOWN;
+				if (oldstate == PATH_UP ||
+				    oldstate == PATH_GHOST)
+					update_queue_mode_del_path(mpp);
+
+				/*
+				 * if opportune,
+				 * schedule the next check earlier
+				 */
+				if (pp->tick > conf->checkint)
+					pp->tick = conf->checkint;
+				put_multipath_config(conf);
+			}
+		}
+	}
+	return 0;
+}
+
 static int
 update_map (struct multipath *mpp, struct vectors *vecs, int new_map)
 {
diff --git a/multipathd/main.h b/multipathd/main.h
index 0e9c5e3..af39558 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -39,5 +39,9 @@ void * mpath_pr_event_handler_fn (void * );
 int update_map_pr(struct multipath *mpp);
 void * mpath_pr_event_handler_fn (void * pathp );
 void handle_signals(bool);
+int __setup_multipath (struct vectors * vecs, struct multipath * mpp,
+		       int reset);
+#define setup_multipath(vecs, mpp) __setup_multipath(vecs, mpp, 1)
+int update_multipath (struct vectors *vecs, char *mapname, int reset);
 
 #endif /* MAIN_H */
diff --git a/multipathd/waiter.c b/multipathd/waiter.c
index e894294..c70ad21 100644
--- a/multipathd/waiter.c
+++ b/multipathd/waiter.c
@@ -21,6 +21,7 @@
 #include "debug.h"
 #include "lock.h"
 #include "waiter.h"
+#include "main.h"
 
 pthread_attr_t waiter_attr;
 struct mutex_lock waiter_lock = { .mutex = PTHREAD_MUTEX_INITIALIZER };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] multipathd: stop waiter in __setup_multipath
  2018-03-16 21:31 [PATCH 0/3] multipathd per-device waiter fixes Benjamin Marzinski
  2018-03-16 21:31 ` [PATCH 1/3] multipathd: fix waiter thread cancelling Benjamin Marzinski
  2018-03-16 21:31 ` [PATCH 2/3] multipathd: move __setup_multipath to multipathd Benjamin Marzinski
@ 2018-03-16 21:31 ` Benjamin Marzinski
  2018-03-19 13:08   ` Martin Wilck
  2 siblings, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2018-03-16 21:31 UTC (permalink / raw)
  To: device-mapper development; +Cc: Martin Wilck

__setup_multipath can remove a multipath device from multipathd, and it
can be called by either by the waiter thread or another thread.
Previously, it dealt with this by never stopping the waiter thread.  It
simply relied on the waiter thread to notice and stop itself.  Now, when
called by another thread, it explicitly stops the waiter thread.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
 multipathd/main.c   | 2 +-
 multipathd/waiter.c | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 70aff5d..3ae0442 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -398,7 +398,7 @@ int __setup_multipath(struct vectors *vecs, struct multipath *mpp,
 
 	return 0;
 out:
-	remove_map(mpp, vecs, PURGE_VEC);
+	remove_map_and_stop_waiter(mpp, vecs, PURGE_VEC);
 	return 1;
 }
 
diff --git a/multipathd/waiter.c b/multipathd/waiter.c
index c70ad21..595c69a 100644
--- a/multipathd/waiter.c
+++ b/multipathd/waiter.c
@@ -57,6 +57,11 @@ void stop_waiter_thread (struct multipath *mpp, struct vectors *vecs)
 			mpp->alias);
 		return;
 	}
+	/* Don't cancel yourself. __setup_multipath is called by
+	   by the waiter thread, and may remove a multipath device */
+	if (pthread_equal(mpp->waiter, pthread_self()))
+		return;
+
 	condlog(2, "%s: stop event checker thread (%lu)", mpp->alias,
 		mpp->waiter);
 	thread = mpp->waiter;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] multipathd: fix waiter thread cancelling
  2018-03-16 21:31 ` [PATCH 1/3] multipathd: fix waiter thread cancelling Benjamin Marzinski
@ 2018-03-19 12:59   ` Martin Wilck
  2018-03-19 16:25     ` Benjamin Marzinski
  0 siblings, 1 reply; 9+ messages in thread
From: Martin Wilck @ 2018-03-19 12:59 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-03-16 at 16:31 -0500, Benjamin Marzinski wrote:
> multipathd was sending a signal to the per-device waiter thread after
> cancelling it.  Since these threads are detached, there is a window
> where a new thread could have started with the old thread id between
> the
> cancel and the signalling, causing the signal to be delivered to the
> wrong thread. Simply reversing the order doesn't fix the issue, since
> the waiter threads exit immediately if they receive a signal, again
> opening a window for the cancel to be delivered to the wrong thread.
> 
> To fix this, multipathd does reverse the order, so that it signals
> the
> thread first (it needs to signal the thread, since the dm_task_run
> ioctl
> isn't a cancellation point) and then cancels it. However it does this
> while holding a new mutex.
> 
> The waiter thread can only exit without being cancelled for two
> reasons.
> 1. When it fails in update_multipath, which removes the device while
>    holding the vecs lock.
> 2. If it receives a SIGUSR2 signal while waiting for a dm event.
> 
> Case 1 can never race with another thread removing the device, since
> removing a device always happens while holding the vecs lock.  This
> means that if the device exists to be removed, then the waiter thread
> can't exit this way during the removal.
> 
> Case 2 is now solved by grabbing the new mutex after failing
> dm_task_run(). With the mutex held, the thread checks if it has been
> cancelled. If it wasn't cancelled, the thread continues.
> 
> The reason that this uses a new mutex, instead of the vecs lock, is
> that
> using the vecs lock would keep the thread from ending until the vecs
> lock was released.  Normally, this isn't a problem. But during
> reconfigure, the waiter threads for all devices are stopped, and new
> ones started, all while holding the vecs lock.  For systems with a
> large
> number of multipath devices, this will cause multipathd do have
> double
> its already large number of waiter threads during reconfigure, all
> locked
> into memory.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  multipathd/waiter.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/multipathd/waiter.c b/multipathd/waiter.c
> index cb9708b..e894294 100644
> --- a/multipathd/waiter.c
> +++ b/multipathd/waiter.c
> @@ -23,6 +23,7 @@
>  #include "waiter.h"
>  
>  pthread_attr_t waiter_attr;
> +struct mutex_lock waiter_lock = { .mutex = PTHREAD_MUTEX_INITIALIZER
> };
>  
>  static struct event_thread *alloc_waiter (void)
>  {
> @@ -59,8 +60,12 @@ void stop_waiter_thread (struct multipath *mpp,
> struct vectors *vecs)
>  		mpp->waiter);
>  	thread = mpp->waiter;
>  	mpp->waiter = (pthread_t)0;
> -	pthread_cancel(thread);
> +	pthread_cleanup_push(cleanup_lock, &waiter_lock);
> +	lock(&waiter_lock);
> +	pthread_testcancel();

What's the purpose of this pthread_testcancel() call?

>  	pthread_kill(thread, SIGUSR2);
> +	pthread_cancel(thread);
> +	lock_cleanup_pop(&waiter_lock);
>  }
>  
>  /*
> @@ -114,8 +119,13 @@ static int waiteventloop (struct event_thread
> *waiter)
>  	dm_task_destroy(waiter->dmt);
>  	waiter->dmt = NULL;
>  
> -	if (!r)	/* wait interrupted by signal */
> -		return -1;
> +	if (!r)	{ /* wait interrupted by signal. check for
> cancellation */
> +		pthread_cleanup_push(cleanup_lock, &waiter_lock);
> +		lock(&waiter_lock);
> +		pthread_testcancel();
> +		lock_cleanup_pop(&waiter_lock);

Nitpick: I'd prefer pthread_cleanup_pop(1) here.

Regards,
Martin


> +		return 1; /* If we weren't cancelled, just
> reschedule */
> +	}
>  
>  	waiter->event_nr++;
>  

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] multipathd: move __setup_multipath to multipathd
  2018-03-16 21:31 ` [PATCH 2/3] multipathd: move __setup_multipath to multipathd Benjamin Marzinski
@ 2018-03-19 13:07   ` Martin Wilck
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Wilck @ 2018-03-19 13:07 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-03-16 at 16:31 -0500, Benjamin Marzinski wrote:
> __setup_multipath is only called from multipathd, so it shouldn't be
> in
> libmultipath.  Move it, update_multpath (which calls it) and
> set_no_path_retry (which is a helper function for it) into
> multipathd.
> None of these functions were changed, only copied.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

One day we may need to split up multipathd/main.c a bit...

Acked-by: Martin Wilck <mwilck@suse.com>

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] multipathd: stop waiter in __setup_multipath
  2018-03-16 21:31 ` [PATCH 3/3] multipathd: stop waiter in __setup_multipath Benjamin Marzinski
@ 2018-03-19 13:08   ` Martin Wilck
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Wilck @ 2018-03-19 13:08 UTC (permalink / raw)
  To: Benjamin Marzinski, device-mapper development

On Fri, 2018-03-16 at 16:31 -0500, Benjamin Marzinski wrote:
> __setup_multipath can remove a multipath device from multipathd, and
> it
> can be called by either by the waiter thread or another thread.
> Previously, it dealt with this by never stopping the waiter
> thread.  It
> simply relied on the waiter thread to notice and stop itself.  Now,
> when
> called by another thread, it explicitly stops the waiter thread.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>

Reviewed-by: Martin Wilck <mwilck@redhat.com>

- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] multipathd: fix waiter thread cancelling
  2018-03-19 12:59   ` Martin Wilck
@ 2018-03-19 16:25     ` Benjamin Marzinski
  2018-03-19 20:02       ` Martin Wilck
  0 siblings, 1 reply; 9+ messages in thread
From: Benjamin Marzinski @ 2018-03-19 16:25 UTC (permalink / raw)
  To: Martin Wilck; +Cc: device-mapper development

On Mon, Mar 19, 2018 at 01:59:24PM +0100, Martin Wilck wrote:
> On Fri, 2018-03-16 at 16:31 -0500, Benjamin Marzinski wrote:
> > diff --git a/multipathd/waiter.c b/multipathd/waiter.c
> > index cb9708b..e894294 100644
> > --- a/multipathd/waiter.c
> > +++ b/multipathd/waiter.c
> > @@ -23,6 +23,7 @@
> >  #include "waiter.h"
> >  
> >  pthread_attr_t waiter_attr;
> > +struct mutex_lock waiter_lock = { .mutex = PTHREAD_MUTEX_INITIALIZER
> > };
> >  
> >  static struct event_thread *alloc_waiter (void)
> >  {
> > @@ -59,8 +60,12 @@ void stop_waiter_thread (struct multipath *mpp,
> > struct vectors *vecs)
> >  		mpp->waiter);
> >  	thread = mpp->waiter;
> >  	mpp->waiter = (pthread_t)0;
> > -	pthread_cancel(thread);
> > +	pthread_cleanup_push(cleanup_lock, &waiter_lock);
> > +	lock(&waiter_lock);
> > +	pthread_testcancel();
> 
> What's the purpose of this pthread_testcancel() call?
> 

It's not necessary. I feel like in general, after you get done waiting
on a lock, it's good form to see if you've been cancelled while you were
waiting. However, it's really unlikely that a thread will be waiting
here long, and this pthread_cancel isn't protecting it form accessing
any freed data. If you think it's likely to confuse people, I have no
problem with removing it.

> >  	pthread_kill(thread, SIGUSR2);
> > +	pthread_cancel(thread);
> > +	lock_cleanup_pop(&waiter_lock);
> >  }
> >  
> >  /*
> > @@ -114,8 +119,13 @@ static int waiteventloop (struct event_thread
> > *waiter)
> >  	dm_task_destroy(waiter->dmt);
> >  	waiter->dmt = NULL;
> >  
> > -	if (!r)	/* wait interrupted by signal */
> > -		return -1;
> > +	if (!r)	{ /* wait interrupted by signal. check for
> > cancellation */
> > +		pthread_cleanup_push(cleanup_lock, &waiter_lock);
> > +		lock(&waiter_lock);
> > +		pthread_testcancel();
> > +		lock_cleanup_pop(&waiter_lock);
> 
> Nitpick: I'd prefer pthread_cleanup_pop(1) here.

I have no problem with changing that.

-Ben

> Regards,
> Martin
> 
> 
> > +		return 1; /* If we weren't cancelled, just
> > reschedule */
> > +	}
> >  
> >  	waiter->event_nr++;
> >  
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/3] multipathd: fix waiter thread cancelling
  2018-03-19 16:25     ` Benjamin Marzinski
@ 2018-03-19 20:02       ` Martin Wilck
  0 siblings, 0 replies; 9+ messages in thread
From: Martin Wilck @ 2018-03-19 20:02 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: device-mapper development

On Mon, 2018-03-19 at 11:25 -0500, Benjamin Marzinski wrote:
> On Mon, Mar 19, 2018 at 01:59:24PM +0100, Martin Wilck wrote:
> > On Fri, 2018-03-16 at 16:31 -0500, Benjamin Marzinski wrote:
> > > diff --git a/multipathd/waiter.c b/multipathd/waiter.c
> > > index cb9708b..e894294 100644
> > > --- a/multipathd/waiter.c
> > > +++ b/multipathd/waiter.c
> > > @@ -23,6 +23,7 @@
> > >  #include "waiter.h"
> > >  
> > >  pthread_attr_t waiter_attr;
> > > +struct mutex_lock waiter_lock = { .mutex =
> > > PTHREAD_MUTEX_INITIALIZER
> > > };
> > >  
> > >  static struct event_thread *alloc_waiter (void)
> > >  {
> > > @@ -59,8 +60,12 @@ void stop_waiter_thread (struct multipath
> > > *mpp,
> > > struct vectors *vecs)
> > >  		mpp->waiter);
> > >  	thread = mpp->waiter;
> > >  	mpp->waiter = (pthread_t)0;
> > > -	pthread_cancel(thread);
> > > +	pthread_cleanup_push(cleanup_lock, &waiter_lock);
> > > +	lock(&waiter_lock);
> > > +	pthread_testcancel();
> > 
> > What's the purpose of this pthread_testcancel() call?
> > 
> 
> It's not necessary. I feel like in general, after you get done
> waiting
> on a lock, it's good form to see if you've been cancelled while you
> were
> waiting. However, it's really unlikely that a thread will be waiting
> here long, and this pthread_cancel isn't protecting it form accessing
> any freed data. If you think it's likely to confuse people, I have no
> problem with removing it.

My thinking was: this thread is just about to cancel another thread. If
it's cancelled itself just before doing that, the other one won't be
cancelled. I find it hard to imagine conditions where that would be the
right thing. So if at all, I'd call pthread_testcancel() after
releasing the lock. But probably it's best just to remove it here.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-19 20:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16 21:31 [PATCH 0/3] multipathd per-device waiter fixes Benjamin Marzinski
2018-03-16 21:31 ` [PATCH 1/3] multipathd: fix waiter thread cancelling Benjamin Marzinski
2018-03-19 12:59   ` Martin Wilck
2018-03-19 16:25     ` Benjamin Marzinski
2018-03-19 20:02       ` Martin Wilck
2018-03-16 21:31 ` [PATCH 2/3] multipathd: move __setup_multipath to multipathd Benjamin Marzinski
2018-03-19 13:07   ` Martin Wilck
2018-03-16 21:31 ` [PATCH 3/3] multipathd: stop waiter in __setup_multipath Benjamin Marzinski
2018-03-19 13:08   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.