All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails
@ 2018-03-20 16:50 Martin Wilck
  2018-03-20 16:50 ` [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal Martin Wilck
  2018-03-20 21:13 ` [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck
  0 siblings, 2 replies; 6+ messages in thread
From: Martin Wilck @ 2018-03-20 16:50 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: dm-devel, Martin Wilck

Some storage controllers don't support ALUA even though they have
hwtable entries telling so. One such example is the IBM IPR SAS controller
found on certain IBM Power systems. IBM has confirmed that reliable
detection of ALUA support for this controller by vendor/model/rev is
impossible.

For such cases, instead of simply failing to setup multipath, fall back
to const prio.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 libmultipath/discovery.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c
index 9f2a9c907914..a51eb0b82f53 100644
--- a/libmultipath/discovery.c
+++ b/libmultipath/discovery.c
@@ -1637,9 +1637,26 @@ get_prio (struct path * pp)
 	pp->priority = prio_getprio(p, pp, conf->checker_timeout);
 	put_multipath_config(conf);
 	if (pp->priority < 0) {
-		condlog(3, "%s: %s prio error", pp->dev, prio_name(p));
-		pp->priority = PRIO_UNDEF;
-		return 1;
+		const char origin[] = "(fallback after error)";
+
+		if (strcmp(prio_name(p), DEFAULT_PRIO)) {
+			conf = get_multipath_config();
+			prio_get(conf->multipath_dir, p, DEFAULT_PRIO,
+				 DEFAULT_PRIO_ARGS);
+			pp->priority = prio_getprio(p, pp,
+						    conf->checker_timeout);
+			put_multipath_config(conf);
+		}
+
+		if (pp->priority < 0) {
+			condlog(1, "%s: %s prio error", pp->dev, prio_name(p));
+			pp->priority = PRIO_UNDEF;
+			return 1;
+		}
+
+		condlog(3, "%s: new prio = %s %s", pp->dev, prio_name(p), origin);
+		condlog(3, "%s: new prio args = \"%s\" %s", pp->dev, prio_args(p),
+			origin);
 	}
 	condlog(3, "%s: %s prio = %u",
 		pp->dev, prio_name(p), pp->priority);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal
  2018-03-20 16:50 [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck
@ 2018-03-20 16:50 ` Martin Wilck
  2018-03-21  2:43   ` Chongyun Wu
  2018-03-22 23:31   ` Benjamin Marzinski
  2018-03-20 21:13 ` [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck
  1 sibling, 2 replies; 6+ messages in thread
From: Martin Wilck @ 2018-03-20 16:50 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: dm-devel, Martin Wilck

The ppoll() calls of the uxlsnr thread are vital for proper functioning of
multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
for other reasons, quit the daemon. If we don't do that, multipathd may
hang in a state where it can't be terminated any more, because the uxlsnr
thread is responsible for handling all signals. This happens e.g. if
systemd's multipathd.socket is running in and multipathd is started from
outside systemd.

24f2844 "multipathd: fix signal blocking logic" has made this problem more
severe. Before that patch, the signals weren't actually blocked in any thread.
That's not to say 24f2844 was wrong. I still think it's correct, we just
need this one on top.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 multipathd/uxlsnr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
index cdafd82943e7..6f666663fc6f 100644
--- a/multipathd/uxlsnr.c
+++ b/multipathd/uxlsnr.c
@@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 
 	if (ux_sock == -1) {
 		condlog(1, "could not create uxsock: %d", errno);
-		return NULL;
+		exit_daemon();
 	}
 
 	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
@@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
 	if (!polls) {
 		condlog(0, "uxsock: failed to allocate poll fds");
-		return NULL;
+		exit_daemon();
 	}
 	sigfillset(&mask);
 	sigdelset(&mask, SIGINT);
@@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
 
 			/* something went badly wrong! */
 			condlog(0, "uxsock: poll failed with %d", errno);
+			exit_daemon();
 			break;
 		}
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails
  2018-03-20 16:50 [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck
  2018-03-20 16:50 ` [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal Martin Wilck
@ 2018-03-20 21:13 ` Martin Wilck
  1 sibling, 0 replies; 6+ messages in thread
From: Martin Wilck @ 2018-03-20 21:13 UTC (permalink / raw)
  To: Christophe Varoqui; +Cc: dm-devel

On Tue, 2018-03-20 at 17:50 +0100, Martin Wilck wrote:
> Some storage controllers don't support ALUA even though they have
> hwtable entries telling so. One such example is the IBM IPR SAS
> controller
> found on certain IBM Power systems. IBM has confirmed that reliable
> detection of ALUA support for this controller by vendor/model/rev is
> impossible.
> 
> For such cases, instead of simply failing to setup multipath, fall
> back
> to const prio.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>

NACK to self. I just realized that this won't work, because we might be
resetting e.g. alua prioritizer to "const" because of a temporary
problem. So this needs more thought.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal
  2018-03-20 16:50 ` [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal Martin Wilck
@ 2018-03-21  2:43   ` Chongyun Wu
  2018-03-21  7:48     ` Martin Wilck
  2018-03-22 23:31   ` Benjamin Marzinski
  1 sibling, 1 reply; 6+ messages in thread
From: Chongyun Wu @ 2018-03-21  2:43 UTC (permalink / raw)
  To: Martin Wilck, Christophe Varoqui; +Cc: dm-devel

On 2018/3/21 0:51, Martin Wilck wrote:
> The ppoll() calls of the uxlsnr thread are vital for proper functioning of
> multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
> for other reasons, quit the daemon. If we don't do that, multipathd may
> hang in a state where it can't be terminated any more, because the uxlsnr
> thread is responsible for handling all signals. This happens e.g. if
> systemd's multipathd.socket is running in and multipathd is started from
> outside systemd.
> 
> 24f2844 "multipathd: fix signal blocking logic" has made this problem more
> severe. Before that patch, the signals weren't actually blocked in any thread.
> That's not to say 24f2844 was wrong. I still think it's correct, we just
> need this one on top.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>   multipathd/uxlsnr.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index cdafd82943e7..6f666663fc6f 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   
>   	if (ux_sock == -1) {
>   		condlog(1, "could not create uxsock: %d", errno);
> -		return NULL;
> +		exit_daemon();
>   	}
>   
>   	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
>   	if (!polls) {
>   		condlog(0, "uxsock: failed to allocate poll fds");
> -		return NULL;
> +		exit_daemon();
>   	}
>   	sigfillset(&mask);
>   	sigdelset(&mask, SIGINT);
> @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>   
>   			/* something went badly wrong! */
>   			condlog(0, "uxsock: poll failed with %d", errno);
> +			exit_daemon();
>   			break;
>   		}
Hi Martin,

Your analysis is reasonable. It is necessary to deal with fatal error 
not only to return, if not doing this multipathd can't exit normally and 
multipathd commands can't work any more. I think your patch is OK, but I 
have some ideas inspired by your patch.
Calling exit_daemon() is to shut down the multipathd, relay on the 
outside to pull multipathd again. Is there a function can be use to deal 
with fatal error? Its function are close the socket(if create 
successfully before) and create a new socket to make uxlsnr thread work 
properly again or continue to create uxsocket? This function actually is 
try to repair those errors.
It just an idea, maybe not quite right.

Regards,
Chongyun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal
  2018-03-21  2:43   ` Chongyun Wu
@ 2018-03-21  7:48     ` Martin Wilck
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Wilck @ 2018-03-21  7:48 UTC (permalink / raw)
  To: Chongyun Wu, Christophe Varoqui; +Cc: dm-devel

On Wed, 2018-03-21 at 02:43 +0000, Chongyun Wu wrote:
> On 2018/3/21 0:51, Martin Wilck wrote:
> > The ppoll() calls of the uxlsnr thread are vital for proper
> > functioning of
> > multipathd. If the uxlsnr thread can't open the socket or fails to
> > call ppoll()
> > for other reasons, quit the daemon. If we don't do that, multipathd
> > may
> > hang in a state where it can't be terminated any more, because the
> > uxlsnr
> > thread is responsible for handling all signals. This happens e.g.
> > if
> > systemd's multipathd.socket is running in and multipathd is started
> > from
> > outside systemd.
> > 
> > 24f2844 "multipathd: fix signal blocking logic" has made this
> > problem more
> > severe. Before that patch, the signals weren't actually blocked in
> > any thread.
> > That's not to say 24f2844 was wrong. I still think it's correct, we
> > just
> > need this one on top.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >   multipathd/uxlsnr.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> > index cdafd82943e7..6f666663fc6f 100644
> > --- a/multipathd/uxlsnr.c
> > +++ b/multipathd/uxlsnr.c
> > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   	if (ux_sock == -1) {
> >   		condlog(1, "could not create uxsock: %d", errno);
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   
> >   	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) *
> > sizeof(struct pollfd));
> >   	if (!polls) {
> >   		condlog(0, "uxsock: failed to allocate poll
> > fds");
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   	sigfillset(&mask);
> >   	sigdelset(&mask, SIGINT);
> > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   			/* something went badly wrong! */
> >   			condlog(0, "uxsock: poll failed with %d",
> > errno);
> > +			exit_daemon();
> >   			break;
> >   		}
> 
> Hi Martin,
> 
> Your analysis is reasonable. It is necessary to deal with fatal
> error 
> not only to return, if not doing this multipathd can't exit normally
> and 
> multipathd commands can't work any more. I think your patch is OK,
> but I 
> have some ideas inspired by your patch.
> Calling exit_daemon() is to shut down the multipathd, relay on the 
> outside to pull multipathd again. Is there a function can be use to
> deal 
> with fatal error?

I don't think so, that's why I call the error "fatal". But feel free to
come up with a more intelligent solution. In the only case with
practical relevance I've seen so far (socket can't be opened because
it's open in systemd), it's sufficient that multipathd is started via
systemd (this should be done in production anyway), or that the admin
runs "systemctl stop multipathd.socket" before starting the daemon
manually.

In general, I don't think it makes sense to try and be overly smart.
Errors that prevent the ux socket from being opened (with the mentioned
exception) are likely to be so severe that any attempts to work around
them will probably fail as well.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal
  2018-03-20 16:50 ` [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal Martin Wilck
  2018-03-21  2:43   ` Chongyun Wu
@ 2018-03-22 23:31   ` Benjamin Marzinski
  1 sibling, 0 replies; 6+ messages in thread
From: Benjamin Marzinski @ 2018-03-22 23:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel

On Tue, Mar 20, 2018 at 05:50:10PM +0100, Martin Wilck wrote:
> The ppoll() calls of the uxlsnr thread are vital for proper functioning of
> multipathd. If the uxlsnr thread can't open the socket or fails to call ppoll()
> for other reasons, quit the daemon. If we don't do that, multipathd may
> hang in a state where it can't be terminated any more, because the uxlsnr
> thread is responsible for handling all signals. This happens e.g. if
> systemd's multipathd.socket is running in and multipathd is started from
> outside systemd.
> 
> 24f2844 "multipathd: fix signal blocking logic" has made this problem more
> severe. Before that patch, the signals weren't actually blocked in any thread.
> That's not to say 24f2844 was wrong. I still think it's correct, we just
> need this one on top.
> 
Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  multipathd/uxlsnr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> index cdafd82943e7..6f666663fc6f 100644
> --- a/multipathd/uxlsnr.c
> +++ b/multipathd/uxlsnr.c
> @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  
>  	if (ux_sock == -1) {
>  		condlog(1, "could not create uxsock: %d", errno);
> -		return NULL;
> +		exit_daemon();
>  	}
>  
>  	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) * sizeof(struct pollfd));
>  	if (!polls) {
>  		condlog(0, "uxsock: failed to allocate poll fds");
> -		return NULL;
> +		exit_daemon();
>  	}
>  	sigfillset(&mask);
>  	sigdelset(&mask, SIGINT);
> @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn uxsock_trigger, void * trigger_data)
>  
>  			/* something went badly wrong! */
>  			condlog(0, "uxsock: poll failed with %d", errno);
> +			exit_daemon();
>  			break;
>  		}
>  
> -- 
> 2.16.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-22 23:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 16:50 [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck
2018-03-20 16:50 ` [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal Martin Wilck
2018-03-21  2:43   ` Chongyun Wu
2018-03-21  7:48     ` Martin Wilck
2018-03-22 23:31   ` Benjamin Marzinski
2018-03-20 21:13 ` [PATCH 1/2] libmultipath: fallback to const prio if getprio() fails Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.