All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: Benjamin Marzinski <bmarzins@redhat.com>,
	device-mapper development <dm-devel@redhat.com>
Subject: Re: [PATCH v3 01/19] libmultipath: fix tur checker timeout
Date: Mon, 01 Oct 2018 21:51:22 +0200	[thread overview]
Message-ID: <deb772629c403acaf8c39a0e74f947a7f8b9d2bf.camel@suse.com> (raw)
In-Reply-To: <1537571127-10143-2-git-send-email-bmarzins@redhat.com>

On Fri, 2018-09-21 at 18:05 -0500, Benjamin Marzinski wrote:
> The code previously was timing out mode if ct->thread was 0 but
> ct->running wasn't. This combination never happens.  The idea was to
> timeout if for some reason the path checker tried to kill the thread,
> but it didn't die.  The correct thing to check for this is ct-
> >holders.
> ct->holders will always be at least one when libcheck_check() is
> called,
> since libcheck_free() won't get called until the device is no longer
> being checked. So, if ct->holders is 2, that means that the tur
> thread
> is has not shut down yet.
> 
> Also, instead of returning PATH_TIMEOUT whenever the thread hasn't
> died,
> it should only time out if the thread didn't successfully get a
> value,
> which means the previous state was already PATH_TIMEOUT.

What about PATH_UNCHECKED?

> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/checkers/tur.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/libmultipath/checkers/tur.c
> b/libmultipath/checkers/tur.c
> index bf8486d..275541f 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -355,12 +355,15 @@ int libcheck_check(struct checker * c)
>  		}
>  		pthread_mutex_unlock(&ct->lock);
>  	} else {
> -		if (uatomic_read(&ct->running) != 0) {
> -			/* pthread cancel failed. continue in sync mode
> */
> -			pthread_mutex_unlock(&ct->lock);
> -			condlog(3, "%s: tur thread not responding",
> -				tur_devt(devt, sizeof(devt), ct));
> -			return PATH_TIMEOUT;
> +		if (uatomic_read(&ct->holders) > 1) {
> +			/* pthread cancel failed. If it didn't get the
> path
> +			   state already, timeout */
> +			if (ct->state == PATH_PENDING) {
> +				pthread_mutex_unlock(&ct->lock);
> +				condlog(3, "%s: tur thread not
> responding",
> +					tur_devt(devt, sizeof(devt),
> ct));
> +				return PATH_TIMEOUT;
> +			}
>  		}

I'm not quite getting it yet. We're entering this code path if 
ct->thread is 0. As you point out, if there are >1 holders, a
previously cancelled TUR thread must be still "alive". But now we want
to check *again*. The previous code refused to do that - a single
hanging TUR thread could block further checks from happening, forever.
The PATH_TIMEOUT return code was actually questionable - in this
libcheck_check() invocation, we didn't even try to check, so nothing
could actually time out (but that's obviously independent of your
patch).

With your patch, if ct->state != PATH_PENDING, we'd try to start
another TUR thread although there's still a hanging one. That's not
necessarily wrong, but I fail to see why it depends on ct->state.
Anyway, if we do this, we take the risk of starting more and more TUR
threads which might all end up hanging; I wonder if we should stop
creating more threads if ct->holders grows further (e.g. > 5).

Regards,
Martin


>  		/* Start new TUR checker */
>  		ct->state = PATH_UNCHECKED;

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

  reply	other threads:[~2018-10-01 19:51 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
2018-10-01 19:51   ` Martin Wilck [this message]
2018-10-04 16:31     ` Benjamin Marzinski
2018-10-05 10:11       ` Martin Wilck
2018-10-05 17:02         ` Benjamin Marzinski
2018-10-05 19:23           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
2018-10-01 20:09   ` Martin Wilck
2018-10-01 20:44     ` Martin Wilck
2018-10-04 16:47       ` Benjamin Marzinski
2018-10-04 16:45     ` Benjamin Marzinski
2018-10-05 10:25       ` Martin Wilck
2018-10-05 17:10         ` Benjamin Marzinski
2018-10-05 19:07           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
2018-10-01 20:59   ` Martin Wilck
2018-10-02  7:48     ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
2018-10-01 21:08   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
2018-10-01 21:09   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
2018-10-01 21:17   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
2018-10-01 21:25   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
2018-10-01 21:26   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
2018-10-01 21:28   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
2018-10-01 21:30   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
2018-10-01 21:33   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
2018-10-01 21:31   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
2018-10-01 21:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
2018-10-01 21:37   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
2018-10-01 21:38   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
2018-10-01 21:40   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
2018-10-01 21:42   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
2018-10-01 22:00   ` Martin Wilck
2018-10-02 22:37     ` Martin Wilck
2018-10-05 19:38       ` Benjamin Marzinski
2018-10-08  9:41         ` Martin Wilck
2018-10-09 22:20           ` Benjamin Marzinski
2018-10-08  9:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
2018-10-01 22:30   ` Martin Wilck
2018-10-05 20:32     ` Benjamin Marzinski
2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
2018-10-09 16:13   ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=deb772629c403acaf8c39a0e74f947a7f8b9d2bf.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.