All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath
@ 2018-01-30  9:58 Wuchongyun
  2018-01-30 13:43 ` Martin Wilck
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Wuchongyun @ 2018-01-30  9:58 UTC (permalink / raw)
  To: Christophe Varoqui, Hannes Reinecke, Martin Wilck,
	Xose Vazquez Perez, Benjamin Marzinski, dm-devel
  Cc: Guozhonghua, Changwei Ge, Changlimin

Hi Christophe, Hannes, Martin, Xose, Benjamin and other reviewers

Please help to review this patch, thanks.

Issue:
When restart multipath-tools service, if there is a connection request
or a command at this time, all multipathd related thread cancellation
will not work, all cleanup hook functions will not be called.

Because uxsock_listen will get this poll(poll_count = ppoll()) and the
poll_count will not be 0, in this loop handle_signals() will not be
called. After uxsock_listen finish this connection process, it will take
5 seconds in ppoll() to get a new session, after then it have the chance
to call handle_signals(). In this case uxsock_listen will wait more than
5 seconds to call handle_signals() to hand all multipathd related
process' thread cancellation.  Unfortunately there is only 5 seconds for
thread cancellation, after 5 seconds if the cancellation not finish
multipathd will killed by force. In this case thread cancellation will
have no chance to be processed.

So may be we should process exit signal as qickly as we can. After
finishing ppoll call handle_exit_sig() first, if there is a exit signal
ignore other signals and commands. If we insist on finishing all the new
coming commands and other signals, we will face the risk that all cleanup
functions will not be called and  also the new coming command can't
been fully implemented.

Signed-off-by: Chongyun Wu <wu.chongyun@h3c.com>
---
 multipathd/main.c   |   18 +++++++++++++-----
 multipathd/main.h   |    1 +
 multipathd/uxlsnr.c |    8 ++++++++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index dedc7d2..2ee2538 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2237,10 +2237,6 @@ signal_set(int signo, void (*func) (int))
 void
 handle_signals(void)
 {
-	if (exit_sig) {
-		condlog(2, "exit (signal)");
-		exit_daemon();
-	}
 	if (reconfig_sig) {
 		condlog(2, "reconfigure (signal)");
 		set_config_state(DAEMON_CONFIGURE);
@@ -2251,11 +2247,23 @@ handle_signals(void)
 		log_reset("multipathd");
 		pthread_mutex_unlock(&logq_lock);
 	}
-	exit_sig = 0;
 	reconfig_sig = 0;
 	log_reset_sig = 0;
 }
 
+int
+handle_exit_sig(void)
+{
+	if (exit_sig) {
+		condlog(2, "exit (signal)");
+		exit_daemon();
+		exit_sig = 0;
+		return 1;
+	}
+
+	return 0;
+}
+
 static void
 sighup (int sig)
 {
diff --git a/multipathd/main.h b/multipathd/main.h
index ededdab..d769fcf 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -39,5 +39,6 @@ void * mpath_pr_event_handler_fn (void * );
 int update_map_pr(struct multipath *mpp);
 void * mpath_pr_event_handler_fn (void * pathp );
 void handle_signals(void);
+int handle_exit_sig(void);
 
 #endif /* MAIN_H */
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index 8aa0f83..9b2f95a 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -240,6 +240,14 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		/* most of our life is spent in this call */
 		poll_count = ppoll(polls, i, &sleep_time, &mask);
 
+		/*
+		* should process exit signal as qickly as we can
+		* otherwise multipathd will be killed by force
+		*/
+		if (handle_exit_sig()) {
+			continue;
+		}
+
 		if (poll_count == -1) {
 			if (errno == EINTR) {
 				handle_signals();
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath
  2018-01-30  9:58 [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath Wuchongyun
@ 2018-01-30 13:43 ` Martin Wilck
  2018-01-30 13:47 ` [PATCH] multipath-tools: handle exit signal immediately Martin Wilck
  2018-01-30 14:16 ` [PATCH v2] " Martin Wilck
  2 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2018-01-30 13:43 UTC (permalink / raw)
  To: Wuchongyun, Christophe Varoqui, Hannes Reinecke,
	Xose Vazquez Perez, Benjamin Marzinski, dm-devel
  Cc: Guozhonghua, Changwei Ge, Changlimin

On Tue, 2018-01-30 at 09:58 +0000, Wuchongyun wrote:
> Hi Christophe, Hannes, Martin, Xose, Benjamin and other reviewers
> 
> Please help to review this patch, thanks.

Your analysis looks correct to me. I'd like to propose a slightly
different patch which I'm going to post after this email.

Thanks,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] multipath-tools: handle exit signal immediately
  2018-01-30  9:58 [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath Wuchongyun
  2018-01-30 13:43 ` Martin Wilck
@ 2018-01-30 13:47 ` Martin Wilck
  2018-01-30 14:16 ` [PATCH v2] " Martin Wilck
  2 siblings, 0 replies; 8+ messages in thread
From: Martin Wilck @ 2018-01-30 13:47 UTC (permalink / raw)
  To: Christophe Varoqui, Chongyun Wu
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge,
	Changlimin, Martin Wilck

multipathd shouldn't try to service any more client connections
when it receives an exit signal. Based on an analysis by Chongyun Wu.

Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c   | 6 ++++--
 multipathd/main.h   | 2 +-
 multipathd/uxlsnr.c | 5 +++--
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 27cf234623d0..8e96f5dd2d7f 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
 }
 
 void
-handle_signals(void)
+handle_signals(bool nonfatal)
 {
 	if (exit_sig) {
 		condlog(2, "exit (signal)");
+		exit_sig = 0;
 		exit_daemon();
 	}
+	if (!nonfatal)
+		return;
 	if (reconfig_sig) {
 		condlog(2, "reconfigure (signal)");
 		set_config_state(DAEMON_CONFIGURE);
@@ -2200,7 +2203,6 @@ handle_signals(void)
 		log_reset("multipathd");
 		pthread_mutex_unlock(&logq_lock);
 	}
-	exit_sig = 0;
 	reconfig_sig = 0;
 	log_reset_sig = 0;
 }
diff --git a/multipathd/main.h b/multipathd/main.h
index ededdaba32fe..e8140feaf291 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
 void * mpath_pr_event_handler_fn (void * );
 int update_map_pr(struct multipath *mpp);
 void * mpath_pr_event_handler_fn (void * pathp );
-void handle_signals(void);
+void handle_signals(bool);
 
 #endif /* MAIN_H */
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index 98ac25a68c43..330685efa27d 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		/* most of our life is spent in this call */
 		poll_count = ppoll(polls, i, &sleep_time, &mask);
 
+		handle_signals(false);
 		if (poll_count == -1) {
 			if (errno == EINTR) {
-				handle_signals();
+				handle_signals(true);
 				continue;
 			}
 
@@ -233,7 +234,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		}
 
 		if (poll_count == 0) {
-			handle_signals();
+			handle_signals(true);
 			continue;
 		}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2] multipath-tools: handle exit signal immediately
  2018-01-30  9:58 [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath Wuchongyun
  2018-01-30 13:43 ` Martin Wilck
  2018-01-30 13:47 ` [PATCH] multipath-tools: handle exit signal immediately Martin Wilck
@ 2018-01-30 14:16 ` Martin Wilck
  2018-01-31  7:43   ` Chongyun Wu
  2018-02-01 16:08   ` Benjamin Marzinski
  2 siblings, 2 replies; 8+ messages in thread
From: Martin Wilck @ 2018-01-30 14:16 UTC (permalink / raw)
  To: Christophe Varoqui, Chongyun Wu
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge,
	Changlimin, Martin Wilck

multipathd shouldn't try to service any more client connections
when it receives an exit signal. Moreover, ppoll() can return
success even if signals occured. So check for reconfigure or
log_reset signals after handling pending client requests.

Based on an analysis by Chongyun Wu.

Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/main.c   | 6 ++++--
 multipathd/main.h   | 2 +-
 multipathd/uxlsnr.c | 7 +++++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/multipathd/main.c b/multipathd/main.c
index 27cf234623d0..8e96f5dd2d7f 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
 }
 
 void
-handle_signals(void)
+handle_signals(bool nonfatal)
 {
 	if (exit_sig) {
 		condlog(2, "exit (signal)");
+		exit_sig = 0;
 		exit_daemon();
 	}
+	if (!nonfatal)
+		return;
 	if (reconfig_sig) {
 		condlog(2, "reconfigure (signal)");
 		set_config_state(DAEMON_CONFIGURE);
@@ -2200,7 +2203,6 @@ handle_signals(void)
 		log_reset("multipathd");
 		pthread_mutex_unlock(&logq_lock);
 	}
-	exit_sig = 0;
 	reconfig_sig = 0;
 	log_reset_sig = 0;
 }
diff --git a/multipathd/main.h b/multipathd/main.h
index ededdaba32fe..e8140feaf291 100644
--- a/multipathd/main.h
+++ b/multipathd/main.h
@@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
 void * mpath_pr_event_handler_fn (void * );
 int update_map_pr(struct multipath *mpp);
 void * mpath_pr_event_handler_fn (void * pathp );
-void handle_signals(void);
+void handle_signals(bool);
 
 #endif /* MAIN_H */
diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index 98ac25a68c43..dc116cf2515b 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		/* most of our life is spent in this call */
 		poll_count = ppoll(polls, i, &sleep_time, &mask);
 
+		handle_signals(false);
 		if (poll_count == -1) {
 			if (errno == EINTR) {
-				handle_signals();
+				handle_signals(true);
 				continue;
 			}
 
@@ -233,7 +234,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 		}
 
 		if (poll_count == 0) {
-			handle_signals();
+			handle_signals(true);
 			continue;
 		}
 
@@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 				FREE(inbuf);
 			}
 		}
+		/* see if we got a non-fatal signal */
+		handle_signals(true);
 
 		/* see if we got a new client */
 		if (polls[0].revents & POLLIN) {
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] multipath-tools: handle exit signal immediately
  2018-01-30 14:16 ` [PATCH v2] " Martin Wilck
@ 2018-01-31  7:43   ` Chongyun Wu
  2018-01-31  9:40     ` Martin Wilck
  2018-02-01 16:08   ` Benjamin Marzinski
  1 sibling, 1 reply; 8+ messages in thread
From: Chongyun Wu @ 2018-01-31  7:43 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge, Changlimin

Hi Martin,

My patch have tested but your patch seems to be more gracefully. May be 
there are some small issues if I am not wrong.Thanks.

On 2018/1/30 22:18, Martin Wilck wrote:
> multipathd shouldn't try to service any more client connections
> when it receives an exit signal. Moreover, ppoll() can return
> success even if signals occured. So check for reconfigure or
> log_reset signals after handling pending client requests.
> 
> Based on an analysis by Chongyun Wu.
> 
> Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   multipathd/main.c   | 6 ++++--
>   multipathd/main.h   | 2 +-
>   multipathd/uxlsnr.c | 7 +++++--
>   3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 27cf234623d0..8e96f5dd2d7f 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
>   }
>   
>   void
> -handle_signals(void)
> +handle_signals(bool nonfatal)
>   {
>   	if (exit_sig) {
>   		condlog(2, "exit (signal)");
> +		exit_sig = 0;

If we received a exit signal should we just escape all below commands 
process to avoid the race condition between thread cancellation and 
commands processing? so maybe flag exit_sig keep to 1 here?

>   		exit_daemon();
>   	}
> +	if (!nonfatal)
> +		return;
>   	if (reconfig_sig) {
>   		condlog(2, "reconfigure (signal)");
>   		set_config_state(DAEMON_CONFIGURE);
> @@ -2200,7 +2203,6 @@ handle_signals(void)
>   		log_reset("multipathd");
>   		pthread_mutex_unlock(&logq_lock);
>   	}
> -	exit_sig = 0;
>   	reconfig_sig = 0;
>   	log_reset_sig = 0;
>   }
> diff --git a/multipathd/main.h b/multipathd/main.h
> index ededdaba32fe..e8140feaf291 100644
> --- a/multipathd/main.h
> +++ b/multipathd/main.h
> @@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
>   void * mpath_pr_event_handler_fn (void * );
>   int update_map_pr(struct multipath *mpp);
>   void * mpath_pr_event_handler_fn (void * pathp );
> -void handle_signals(void);
> +void handle_signals(bool);
>   
>   #endif /* MAIN_H */
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index 98ac25a68c43..dc116cf2515b 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   		/* most of our life is spent in this call */
>   		poll_count = ppoll(polls, i, &sleep_time, &mask);
>   
> +		handle_signals(false);
Is it better to add and check the return value from handle_signals()? If 
return true means this a exit signal should run continue to escape all 
below processing.
If not doing that, we may need to add pthread_cleanup_push(cleanup_lock, 
&client_lock) before pthread_mutex_lock(&client_lock) in below command 
processing to avid dead lock which happen when commands processing code 
just have the lock and cancellation works to call uxsock_cleanup hook to 
get the lock and the commands processing code have no chance to release 
the lock.

>   		if (poll_count == -1) {
>   			if (errno == EINTR) {
> -				handle_signals();
> +				handle_signals(true);
>   				continue;
>   			}
>   
> @@ -233,7 +234,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   		}
>   
>   		if (poll_count == 0) {
> -			handle_signals();
> +			handle_signals(true);
>   			continue;
>   		}
>   
> @@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   				FREE(inbuf);
>   			}
>   		}
> +		/* see if we got a non-fatal signal */
> +		handle_signals(true);
above code maybe not needed. Because it's safe to make sure no commands 
be processed then to handle nonefatal signal, like code "if(poll_count 
==0){...}" do. If here to call hanle_signals(true) again, may facing 
race condition between signals processing and command processing. That 
maybe not so safe. And the handle_signals have been called above. Is 
there a special reason for this which I missed?

>   
>   		/* see if we got a new client */
>   		if (polls[0].revents & POLLIN) {
> 
Chongyun Wu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] multipath-tools: handle exit signal immediately
  2018-01-31  7:43   ` Chongyun Wu
@ 2018-01-31  9:40     ` Martin Wilck
  2018-01-31 11:59       ` Chongyun Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Martin Wilck @ 2018-01-31  9:40 UTC (permalink / raw)
  To: Chongyun Wu, Christophe Varoqui
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge, Changlimin

On Wed, 2018-01-31 at 07:43 +0000, Chongyun Wu wrote:
> Hi Martin,
> 
> My patch have tested but your patch seems to be more gracefully. May
> be 
> there are some small issues if I am not wrong.Thanks.
> 
> On 2018/1/30 22:18, Martin Wilck wrote:
> > multipathd shouldn't try to service any more client connections
> > when it receives an exit signal. Moreover, ppoll() can return
> > success even if signals occured. So check for reconfigure or
> > log_reset signals after handling pending client requests.
> > 
> > Based on an analysis by Chongyun Wu.
> > 
> > Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >   multipathd/main.c   | 6 ++++--
> >   multipathd/main.h   | 2 +-
> >   multipathd/uxlsnr.c | 7 +++++--
> >   3 files changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 27cf234623d0..8e96f5dd2d7f 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
> >   }
> >   
> >   void
> > -handle_signals(void)
> > +handle_signals(bool nonfatal)
> >   {
> >   	if (exit_sig) {
> >   		condlog(2, "exit (signal)");
> > +		exit_sig = 0;
> 
> If we received a exit signal should we just escape all below
> commands 
> process to avoid the race condition between thread cancellation and 
> commands processing? so maybe flag exit_sig keep to 1 here?

What would that be good for? We call exit_daemon(), signalling the main
process to shut down. If we don't reset exit_sig, we'll call
exit_daemon() again if handle_signals is ever run again. That's not
necessary and might actually confuse matters.

Or am I overlooking something?

> 
> >   		exit_daemon();
> >   	}
> > +	if (!nonfatal)
> > +		return;
> >   	if (reconfig_sig) {
> >   		condlog(2, "reconfigure (signal)");
> >   		set_config_state(DAEMON_CONFIGURE);
> > @@ -2200,7 +2203,6 @@ handle_signals(void)
> >   		log_reset("multipathd");
> >   		pthread_mutex_unlock(&logq_lock);
> >   	}
> > -	exit_sig = 0;
> >   	reconfig_sig = 0;
> >   	log_reset_sig = 0;
> >   }
> > diff --git a/multipathd/main.h b/multipathd/main.h
> > index ededdaba32fe..e8140feaf291 100644
> > --- a/multipathd/main.h
> > +++ b/multipathd/main.h
> > @@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
> >   void * mpath_pr_event_handler_fn (void * );
> >   int update_map_pr(struct multipath *mpp);
> >   void * mpath_pr_event_handler_fn (void * pathp );
> > -void handle_signals(void);
> > +void handle_signals(bool);
> >   
> >   #endif /* MAIN_H */
> > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> > index 98ac25a68c43..dc116cf2515b 100644
> > --- a/multipathd/uxlsnr.c
> > +++ b/multipathd/uxlsnr.c
> > @@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   		/* most of our life is spent in this call */
> >   		poll_count = ppoll(polls, i, &sleep_time, &mask);
> >   
> > +		handle_signals(false);
> 
> Is it better to add and check the return value from handle_signals()?
> If 
> return true means this a exit signal should run continue to escape
> all 
> below processing.
> If not doing that, we may need to add
> pthread_cleanup_push(cleanup_lock, 
> &client_lock) before pthread_mutex_lock(&client_lock) in below
> command 
> processing to avid dead lock which happen when commands processing
> code 
> just have the lock and cancellation works to call uxsock_cleanup hook
> to 
> get the lock and the commands processing code have no chance to
> release 
> the lock.

AFAICS it isn't necessary, because the two code parts where the lock is
taken in the uxsock_listen() don't contain cancellation points. If I'm
overlooking something here and pthread_cleanup_push(&client_lock) has
to be used in the listener thread, that would be an independent fix.
  
> > @@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   				FREE(inbuf);
> >   			}
> >   		}
> > +		/* see if we got a non-fatal signal */
> > +		handle_signals(true);
> 
> above code maybe not needed. Because it's safe to make sure no
> commands 
> be processed then to handle nonefatal signal, like code
> "if(poll_count 
> ==0){...}" do. If here to call hanle_signals(true) again, may facing 
> race condition between signals processing and command processing. 

Do you see an actual race condition or just the potential to confuse
connected clients?

> That 
> maybe not so safe. And the handle_signals have been called above. Is 
> there a special reason for this which I missed?

Of the handle_signals() calls above, the first one handles only fatal
signals, and the others are only called if ppoll() returns -1 or 0.

We have two non-fatal signals, "log_reset" and "reconfigure".
"log_reset" can be safely ignored for this discussion. "reconfigure"
may admittedly confuse connected clients if it's called while a session
is active.

The loop over the clients makes sure that commands from clients that
are seen before the signal are received and processed before the
handle_signals() call.

The question is what to do with client sessions that remain open after
the loop has finished (i.e. the current command has been handled). IMO
that's not the common case, more often than not client connections just
send a single command. But of course, it can't be excluded either.

If we don't call handle_signals() at this point, and clients continue
sending commands, ppoll() won't return -1 or 0, and the "reconfigure"
signal might not be processed in a long time, which seems wrong to me.
After all, signals are meant to be processed asynchronously. Note that
if one of the clients sends a "reconfigure" command, it will be handled
immediately, possibly confusing other clients connected at the same
time. Therefore I think it's correct to handle a possibly received
"reconfigure" signal at this point in the code, too.

It might generallly make sense for the uxlsnr thread to pause servicing
commands while the daemon is not in DAEMON_RUNNING or DAEMON_IDLE
state. But that's independent of the current discussion.

Summarizing, unless you or someone else points out an actual race
condition (in the sense that internal multipathd data structures might
be corrupted), which is introduced by this particular handle_signals()
call, I'd prefer adding it.

Martin


-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] multipath-tools: handle exit signal immediately
  2018-01-31  9:40     ` Martin Wilck
@ 2018-01-31 11:59       ` Chongyun Wu
  0 siblings, 0 replies; 8+ messages in thread
From: Chongyun Wu @ 2018-01-31 11:59 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge, Changlimin

On 2018/1/31 17:40, Martin Wilck wrote:
> On Wed, 2018-01-31 at 07:43 +0000, Chongyun Wu wrote:
>> Hi Martin,
>>
>> My patch have tested but your patch seems to be more gracefully. May
>> be
>> there are some small issues if I am not wrong.Thanks.
>>
>> On 2018/1/30 22:18, Martin Wilck wrote:
>>> multipathd shouldn't try to service any more client connections
>>> when it receives an exit signal. Moreover, ppoll() can return
>>> success even if signals occured. So check for reconfigure or
>>> log_reset signals after handling pending client requests.
>>>
>>> Based on an analysis by Chongyun Wu.
>>>
>>> Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
>>> Signed-off-by: Martin Wilck <mwilck@suse.com>
>>> ---
>>>    multipathd/main.c   | 6 ++++--
>>>    multipathd/main.h   | 2 +-
>>>    multipathd/uxlsnr.c | 7 +++++--
>>>    3 files changed, 10 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/multipathd/main.c b/multipathd/main.c
>>> index 27cf234623d0..8e96f5dd2d7f 100644
>>> --- a/multipathd/main.c
>>> +++ b/multipathd/main.c
>>> @@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
>>>    }
>>>    
>>>    void
>>> -handle_signals(void)
>>> +handle_signals(bool nonfatal)
>>>    {
>>>    	if (exit_sig) {
>>>    		condlog(2, "exit (signal)");
>>> +		exit_sig = 0;
>>
>> If we received a exit signal should we just escape all below
>> commands
>> process to avoid the race condition between thread cancellation and
>> commands processing? so maybe flag exit_sig keep to 1 here?
> 
> What would that be good for? We call exit_daemon(), signalling the main
> process to shut down. If we don't reset exit_sig, we'll call
> exit_daemon() again if handle_signals is ever run again. That's not
> necessary and might actually confuse matters.
> 
> Or am I overlooking something?

You are right. I just want to escape below code process to make logic 
simple but as you see really have problem in this way.

> 
>>
>>>    		exit_daemon();
>>>    	}
>>> +	if (!nonfatal)
>>> +		return;
>>>    	if (reconfig_sig) {
>>>    		condlog(2, "reconfigure (signal)");
>>>    		set_config_state(DAEMON_CONFIGURE);
>>> @@ -2200,7 +2203,6 @@ handle_signals(void)
>>>    		log_reset("multipathd");
>>>    		pthread_mutex_unlock(&logq_lock);
>>>    	}
>>> -	exit_sig = 0;
>>>    	reconfig_sig = 0;
>>>    	log_reset_sig = 0;
>>>    }
>>> diff --git a/multipathd/main.h b/multipathd/main.h
>>> index ededdaba32fe..e8140feaf291 100644
>>> --- a/multipathd/main.h
>>> +++ b/multipathd/main.h
>>> @@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
>>>    void * mpath_pr_event_handler_fn (void * );
>>>    int update_map_pr(struct multipath *mpp);
>>>    void * mpath_pr_event_handler_fn (void * pathp );
>>> -void handle_signals(void);
>>> +void handle_signals(bool);
>>>    
>>>    #endif /* MAIN_H */
>>> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
>>> index 98ac25a68c43..dc116cf2515b 100644
>>> --- a/multipathd/uxlsnr.c
>>> +++ b/multipathd/uxlsnr.c
>>> @@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn
>>> uxsock_trigger, void * trigger_data)
>>>    		/* most of our life is spent in this call */
>>>    		poll_count = ppoll(polls, i, &sleep_time, &mask);
>>>    
>>> +		handle_signals(false);
>>
>> Is it better to add and check the return value from handle_signals()?
>> If
>> return true means this a exit signal should run continue to escape
>> all
>> below processing.
>> If not doing that, we may need to add
>> pthread_cleanup_push(cleanup_lock,
>> &client_lock) before pthread_mutex_lock(&client_lock) in below
>> command
>> processing to avid dead lock which happen when commands processing
>> code
>> just have the lock and cancellation works to call uxsock_cleanup hook
>> to
>> get the lock and the commands processing code have no chance to
>> release
>> the lock.
> 
> AFAICS it isn't necessary, because the two code parts where the lock is
> taken in the uxsock_listen() don't contain cancellation points. If I'm
> overlooking something here and pthread_cleanup_push(&client_lock) has
> to be used in the listener thread, that would be an independent fix.
>    
>>> @@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn
>>> uxsock_trigger, void * trigger_data)
>>>    				FREE(inbuf);
>>>    			}
>>>    		}
>>> +		/* see if we got a non-fatal signal */
>>> +		handle_signals(true);
>>
>> above code maybe not needed. Because it's safe to make sure no
>> commands
>> be processed then to handle nonefatal signal, like code
>> "if(poll_count
>> ==0){...}" do. If here to call hanle_signals(true) again, may facing
>> race condition between signals processing and command processing.
> 
> Do you see an actual race condition or just the potential to confuse
> connected clients?
> 
>> That
>> maybe not so safe. And the handle_signals have been called above. Is
>> there a special reason for this which I missed?
> 
> Of the handle_signals() calls above, the first one handles only fatal
> signals, and the others are only called if ppoll() returns -1 or 0.
> 
> We have two non-fatal signals, "log_reset" and "reconfigure".
> "log_reset" can be safely ignored for this discussion. "reconfigure"
> may admittedly confuse connected clients if it's called while a session
> is active.
> 
> The loop over the clients makes sure that commands from clients that
> are seen before the signal are received and processed before the
> handle_signals() call.
> 
> The question is what to do with client sessions that remain open after
> the loop has finished (i.e. the current command has been handled). IMO
> that's not the common case, more often than not client connections just
> send a single command. But of course, it can't be excluded either.
> 
> If we don't call handle_signals() at this point, and clients continue
> sending commands, ppoll() won't return -1 or 0, and the "reconfigure"
> signal might not be processed in a long time, which seems wrong to me.
> After all, signals are meant to be processed asynchronously. Note that
> if one of the clients sends a "reconfigure" command, it will be handled
> immediately, possibly confusing other clients connected at the same
> time. Therefore I think it's correct to handle a possibly received
> "reconfigure" signal at this point in the code, too.
Your consideration sounds reasonable.

> 
> It might generallly make sense for the uxlsnr thread to pause servicing
> commands while the daemon is not in DAEMON_RUNNING or DAEMON_IDLE
> state. But that's independent of the current discussion.
> 
> Summarizing, unless you or someone else points out an actual race
> condition (in the sense that internal multipathd data structures might
> be corrupted), which is introduced by this particular handle_signals()
> call, I'd prefer adding it.
Yes, I haven't found any actual race condition just afraid of that. I 
think the patch might be safe after your explanation.

> 
> Martin
> 
> 

Thanks
Chongyun Wu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] multipath-tools: handle exit signal immediately
  2018-01-30 14:16 ` [PATCH v2] " Martin Wilck
  2018-01-31  7:43   ` Chongyun Wu
@ 2018-02-01 16:08   ` Benjamin Marzinski
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Marzinski @ 2018-02-01 16:08 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Xose Vazquez Perez, Guozhonghua, dm-devel, Changwei Ge,
	Changlimin, Chongyun Wu

On Tue, Jan 30, 2018 at 03:16:24PM +0100, Martin Wilck wrote:
> multipathd shouldn't try to service any more client connections
> when it receives an exit signal. Moreover, ppoll() can return
> success even if signals occured. So check for reconfigure or
> log_reset signals after handling pending client requests.
> 
> Based on an analysis by Chongyun Wu.
> 
> Reported-by: Chongyun Wu <wu.chongyun@h3c.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/main.c   | 6 ++++--
>  multipathd/main.h   | 2 +-
>  multipathd/uxlsnr.c | 7 +++++--
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 27cf234623d0..8e96f5dd2d7f 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2184,12 +2184,15 @@ signal_set(int signo, void (*func) (int))
>  }
>  
>  void
> -handle_signals(void)
> +handle_signals(bool nonfatal)
>  {
>  	if (exit_sig) {
>  		condlog(2, "exit (signal)");
> +		exit_sig = 0;
>  		exit_daemon();
>  	}
> +	if (!nonfatal)
> +		return;
>  	if (reconfig_sig) {
>  		condlog(2, "reconfigure (signal)");
>  		set_config_state(DAEMON_CONFIGURE);
> @@ -2200,7 +2203,6 @@ handle_signals(void)
>  		log_reset("multipathd");
>  		pthread_mutex_unlock(&logq_lock);
>  	}
> -	exit_sig = 0;
>  	reconfig_sig = 0;
>  	log_reset_sig = 0;
>  }
> diff --git a/multipathd/main.h b/multipathd/main.h
> index ededdaba32fe..e8140feaf291 100644
> --- a/multipathd/main.h
> +++ b/multipathd/main.h
> @@ -38,6 +38,6 @@ int mpath_pr_event_handle(struct path *pp);
>  void * mpath_pr_event_handler_fn (void * );
>  int update_map_pr(struct multipath *mpp);
>  void * mpath_pr_event_handler_fn (void * pathp );
> -void handle_signals(void);
> +void handle_signals(bool);
>  
>  #endif /* MAIN_H */
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index 98ac25a68c43..dc116cf2515b 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -221,9 +221,10 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  		/* most of our life is spent in this call */
>  		poll_count = ppoll(polls, i, &sleep_time, &mask);
>  
> +		handle_signals(false);
>  		if (poll_count == -1) {
>  			if (errno == EINTR) {
> -				handle_signals();
> +				handle_signals(true);
>  				continue;
>  			}
>  
> @@ -233,7 +234,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  		}
>  
>  		if (poll_count == 0) {
> -			handle_signals();
> +			handle_signals(true);
>  			continue;
>  		}
>  
> @@ -292,6 +293,8 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  				FREE(inbuf);
>  			}
>  		}
> +		/* see if we got a non-fatal signal */
> +		handle_signals(true);
>  
>  		/* see if we got a new client */
>  		if (polls[0].revents & POLLIN) {
> -- 
> 2.16.1

Looks good to me.

Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-01 16:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-30  9:58 [PATCH]multipathd: threads cancellation can't work if new poll arrival just after restart multipath Wuchongyun
2018-01-30 13:43 ` Martin Wilck
2018-01-30 13:47 ` [PATCH] multipath-tools: handle exit signal immediately Martin Wilck
2018-01-30 14:16 ` [PATCH v2] " Martin Wilck
2018-01-31  7:43   ` Chongyun Wu
2018-01-31  9:40     ` Martin Wilck
2018-01-31 11:59       ` Chongyun Wu
2018-02-01 16:08   ` Benjamin Marzinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.