* [PATCH v2 00/10] various multipath-tools patches @ 2018-10-23 13:43 Martin Wilck 2018-10-23 13:43 ` [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled Martin Wilck ` (4 more replies) 0 siblings, 5 replies; 13+ messages in thread From: Martin Wilck @ 2018-10-23 13:43 UTC (permalink / raw) To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck This is v2 of my series from Oct. 10; starting from 7 as the other patches have already been merged. Changes wrt v2: 7/10: handle failure of libcheck_init() (Ben). I decided not to return PATH_TIMEOUT in this case, rationale in my reply to Ben's review. 9/10: new, add the suggestions from Ben's review of 8/8 of v1. 10/10: new, see Ben's comment on patch 5/8 of v1. Martin Wilck (4): libmultipath: handle TUR threads that can't be cancelled multipathd: handle repeated udev retrigger failure multipathd: improve "add missing path" handling multipath.8: fix description of "device" argument libmultipath/checkers/tur.c | 24 ++++++++++++++-- multipath/multipath.8 | 32 +++++++++++---------- multipathd/main.c | 55 +++++++++++++++++++++++++------------ 3 files changed, 76 insertions(+), 35 deletions(-) -- 2.19.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck @ 2018-10-23 13:43 ` Martin Wilck 2018-10-23 19:28 ` Benjamin Marzinski 2018-10-23 13:43 ` [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure Martin Wilck ` (3 subsequent siblings) 4 siblings, 1 reply; 13+ messages in thread From: Martin Wilck @ 2018-10-23 13:43 UTC (permalink / raw) To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck When the tur checker code determines that a hanging TUR thread couldn't be cancelled, rather than simply returning, reallocate the checker context and start a new thread. This will leak some memory if the hanging thread never wakes up again, but well, in that highly unlikely case we're leaking threads anyway. Signed-off-by: Martin Wilck <mwilck@suse.com> --- libmultipath/checkers/tur.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c index 41210892..a6c88eb2 100644 --- a/libmultipath/checkers/tur.c +++ b/libmultipath/checkers/tur.c @@ -349,11 +349,29 @@ int libcheck_check(struct checker * c) } } else { if (uatomic_read(&ct->holders) > 1) { - /* The thread has been cancelled but hasn't - * quit. exit with timeout. */ + /* + * The thread has been cancelled but hasn't quit. + * We have to prevent it from interfering with the new + * thread. We create a new context and leave the old + * one with the stale thread, hoping it will clean up + * eventually. + */ condlog(3, "%d:%d : tur thread not responding", major(ct->devt), minor(ct->devt)); - return PATH_TIMEOUT; + + /* + * libcheck_init will replace c->context. + * It fails only in OOM situations. In this case, return + * PATH_UNCHECKED to avoid prematurely failing the path. + */ + if (libcheck_init(c) != 0) + return PATH_UNCHECKED; + + if (!uatomic_sub_return(&ct->holders, 1)) + /* It did terminate, eventually */ + cleanup_context(ct); + + ct = c->context; } /* Start new TUR checker */ pthread_mutex_lock(&ct->lock); -- 2.19.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled 2018-10-23 13:43 ` [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled Martin Wilck @ 2018-10-23 19:28 ` Benjamin Marzinski 2018-10-23 20:49 ` Martin Wilck 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Marzinski @ 2018-10-23 19:28 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel On Tue, Oct 23, 2018 at 03:43:45PM +0200, Martin Wilck wrote: > When the tur checker code determines that a hanging TUR thread > couldn't be cancelled, rather than simply returning, reallocate > the checker context and start a new thread. This will leak some > memory if the hanging thread never wakes up again, but well, in > that highly unlikely case we're leaking threads anyway. The thing about PATH_UNCHECKED is that we do mark the path as failed. We just don't tell device-mapper. If we get PATH_UNCHECKED in pathinfo(), we set the state to PATH_DOWN. If we get a PATH_UNCHECKED in check_path(), we immediately call pathinfo(), making it likely that we will get PATH_UNCHECKED there as well. The consequence of this is that if the path later changes to PATH_DOWN, which seems likely, we still won't tell device-mapper, since as far as multipathd is concerned, the path hasn't changed state. Looking at most of the code, the way we treat PATH_UNCHECKED really only makes sense when we use it to mean we haven't ever gotten a result from get_state() before. If you want a return code that does just does nothing with the path, except wait, that's PATH_PENDING. It leaves the paths state exactly the same as before. The only issue there is that we schedule another path checker for a second later, which might not be the right answer to an out-of-memory issue. If you've reviewed the code paths that we follow on PATH_UNCHECKED, and still feel that it is the right answer, I won't block it, because this is a pretty remote corner case. But I don't like how PATH_UNCHECKED works like PATH_DOWN, but without actually keeping the state synced with the kernel, since the rest of the multipathd code is expecting the state to be synced. > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > libmultipath/checkers/tur.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c > index 41210892..a6c88eb2 100644 > --- a/libmultipath/checkers/tur.c > +++ b/libmultipath/checkers/tur.c > @@ -349,11 +349,29 @@ int libcheck_check(struct checker * c) > } > } else { > if (uatomic_read(&ct->holders) > 1) { > - /* The thread has been cancelled but hasn't > - * quit. exit with timeout. */ > + /* > + * The thread has been cancelled but hasn't quit. > + * We have to prevent it from interfering with the new > + * thread. We create a new context and leave the old > + * one with the stale thread, hoping it will clean up > + * eventually. > + */ > condlog(3, "%d:%d : tur thread not responding", > major(ct->devt), minor(ct->devt)); > - return PATH_TIMEOUT; > + > + /* > + * libcheck_init will replace c->context. > + * It fails only in OOM situations. In this case, return > + * PATH_UNCHECKED to avoid prematurely failing the path. > + */ > + if (libcheck_init(c) != 0) > + return PATH_UNCHECKED; > + > + if (!uatomic_sub_return(&ct->holders, 1)) > + /* It did terminate, eventually */ > + cleanup_context(ct); > + > + ct = c->context; > } > /* Start new TUR checker */ > pthread_mutex_lock(&ct->lock); > -- > 2.19.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled 2018-10-23 19:28 ` Benjamin Marzinski @ 2018-10-23 20:49 ` Martin Wilck 2018-10-23 21:21 ` Benjamin Marzinski 0 siblings, 1 reply; 13+ messages in thread From: Martin Wilck @ 2018-10-23 20:49 UTC (permalink / raw) To: Benjamin Marzinski, mwilck+gmail; +Cc: dm-devel On Tue, 2018-10-23 at 14:28 -0500, Benjamin Marzinski wrote: > On Tue, Oct 23, 2018 at 03:43:45PM +0200, Martin Wilck wrote: > > When the tur checker code determines that a hanging TUR thread > > couldn't be cancelled, rather than simply returning, reallocate > > the checker context and start a new thread. This will leak some > > memory if the hanging thread never wakes up again, but well, in > > that highly unlikely case we're leaking threads anyway. > > The thing about PATH_UNCHECKED is that we do mark the path as failed. > We just don't tell device-mapper. If we get PATH_UNCHECKED in > pathinfo(), we set the state to PATH_DOWN. If we get a PATH_UNCHECKED > in > check_path(), we immediately call pathinfo(), But we call pathinfo(pp, conf, 0) there, i.e. all DI flags unset. This comes down to a (partial) blacklist check (which is ignored) and a call to path_offline(). pp->state isn't touched in this code path. It's more or less a NOOP. (BTW, the purpose of this pathinfo() call remains obscure to me. It goes back to the ancient commit 95987395). > making it likely that we > will get PATH_UNCHECKED there as well. The consequence of this is > that > if the path later changes to PATH_DOWN, which seems likely, we still > won't tell device-mapper, since as far as multipathd is concerned, > the > path hasn't changed state. I don't follow you. check_path() quits early when PATH_UNCHECKED is encountered, and doesn't alter pp->state. It will check again in the next round, and if the path switches to DOWN then (or any other state, for that matter) it will treat it right, AFAICS. > Looking at most of the code, the way we > treat PATH_UNCHECKED really only makes sense when we use it to mean > we > haven't ever gotten a result from get_state() before. > > If you want a return code that does just does nothing with the path, > except wait, that's PATH_PENDING. It leaves the paths state exactly > the > same as before. The only issue there is that we schedule another path > checker for a second later, which might not be the right answer to an > out-of-memory issue. > > If you've reviewed the code paths that we follow on PATH_UNCHECKED, > and > still feel that it is the right answer, I won't block it, because > this > is a pretty remote corner case. PATH_UNCHECKED is tested in the following places (ignoring calls from multipath and mpathpersist): 1) pathinfo(): in the "blank" case (we discussed that before, I think it's wrong and I removed that test in 17/21 of my previous 21-part series). 2) sync_map_state(): no attempt to sync the path with dm is made, which is what we want here, IMHO. 3) check_path(): see above. So yes, IMO the code review shows that PATH_UNCHECKED is better then PATH_TIMEOUT for the corner case at hand. Regards Martin > But I don't like how PATH_UNCHECKED > works like PATH_DOWN, but without actually keeping the state synced > with > the kernel, since the rest of the multipathd code is expecting the > state > to be synced. > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > --- > > libmultipath/checkers/tur.c | 24 +++++++++++++++++++++--- > > 1 file changed, 21 insertions(+), 3 deletions(-) > > > > diff --git a/libmultipath/checkers/tur.c > > b/libmultipath/checkers/tur.c > > index 41210892..a6c88eb2 100644 > > --- a/libmultipath/checkers/tur.c > > +++ b/libmultipath/checkers/tur.c > > @@ -349,11 +349,29 @@ int libcheck_check(struct checker * c) > > } > > } else { > > if (uatomic_read(&ct->holders) > 1) { > > - /* The thread has been cancelled but hasn't > > - * quit. exit with timeout. */ > > + /* > > + * The thread has been cancelled but hasn't > > quit. > > + * We have to prevent it from interfering with > > the new > > + * thread. We create a new context and leave > > the old > > + * one with the stale thread, hoping it will > > clean up > > + * eventually. > > + */ > > condlog(3, "%d:%d : tur thread not responding", > > major(ct->devt), minor(ct->devt)); > > - return PATH_TIMEOUT; > > + > > + /* > > + * libcheck_init will replace c->context. > > + * It fails only in OOM situations. In this > > case, return > > + * PATH_UNCHECKED to avoid prematurely failing > > the path. > > + */ > > + if (libcheck_init(c) != 0) > > + return PATH_UNCHECKED; > > + > > + if (!uatomic_sub_return(&ct->holders, 1)) > > + /* It did terminate, eventually */ > > + cleanup_context(ct); > > + > > + ct = c->context; > > } > > /* Start new TUR checker */ > > pthread_mutex_lock(&ct->lock); > > -- > > 2.19.1 > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel > -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled 2018-10-23 20:49 ` Martin Wilck @ 2018-10-23 21:21 ` Benjamin Marzinski 2018-10-23 21:31 ` Martin Wilck 0 siblings, 1 reply; 13+ messages in thread From: Benjamin Marzinski @ 2018-10-23 21:21 UTC (permalink / raw) To: Martin Wilck; +Cc: mwilck+gmail, dm-devel On Tue, Oct 23, 2018 at 10:49:49PM +0200, Martin Wilck wrote: > On Tue, 2018-10-23 at 14:28 -0500, Benjamin Marzinski wrote: > > > On Tue, Oct 23, 2018 at 03:43:45PM +0200, Martin Wilck wrote: > > > When the tur checker code determines that a hanging TUR thread > > > couldn't be cancelled, rather than simply returning, reallocate > > > the checker context and start a new thread. This will leak some > > > memory if the hanging thread never wakes up again, but well, in > > > that highly unlikely case we're leaking threads anyway. > > > > The thing about PATH_UNCHECKED is that we do mark the path as failed. > > We just don't tell device-mapper. If we get PATH_UNCHECKED in > > pathinfo(), we set the state to PATH_DOWN. If we get a PATH_UNCHECKED > > in > > check_path(), we immediately call pathinfo(), > > But we call pathinfo(pp, conf, 0) there, i.e. all DI flags unset. This > comes down to a (partial) blacklist check (which is ignored) and a call > to path_offline(). pp->state isn't touched in this code path. It's more > or less a NOOP. (BTW, the purpose of this pathinfo() call remains > obscure to me. It goes back to the ancient commit 95987395). Oops. I overlooked the flags. Well, that takes care of any issue with check_path(). As for the pathinfo call itself, I assume that, at some tine, it was possible to get to check_path() without a fully initialized path device. But that doesn't explain why it doesn't set any flags. I'm not sure if this code currently serves any purpose. > > making it likely that we > > will get PATH_UNCHECKED there as well. The consequence of this is > > that > > if the path later changes to PATH_DOWN, which seems likely, we still > > won't tell device-mapper, since as far as multipathd is concerned, > > the > > path hasn't changed state. > > I don't follow you. check_path() quits early when PATH_UNCHECKED is > encountered, and doesn't alter pp->state. It will check again in the > next round, and if the path switches to DOWN then (or any other state, > for that matter) it will treat it right, AFAICS. If PATH_UNCHECKED triggers the "blank" code in pathinfo, it will set the path's state to PATH_DOWN. > > Looking at most of the code, the way we > > treat PATH_UNCHECKED really only makes sense when we use it to mean > > we > > haven't ever gotten a result from get_state() before. > > > > If you want a return code that does just does nothing with the path, > > except wait, that's PATH_PENDING. It leaves the paths state exactly > > the > > same as before. The only issue there is that we schedule another path > > checker for a second later, which might not be the right answer to an > > out-of-memory issue. > > > > If you've reviewed the code paths that we follow on PATH_UNCHECKED, > > and > > still feel that it is the right answer, I won't block it, because > > this > > is a pretty remote corner case. > > PATH_UNCHECKED is tested in the following places (ignoring calls from > multipath and mpathpersist): > > 1) pathinfo(): in the "blank" case (we discussed that before, I think > it's wrong and I removed that test in 17/21 of my previous 21-part > series). removing the blank case here fixes the issue with getting a PATH_UNCHECKED in pathinfo(), which is the root of my objection to it, so Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > 2) sync_map_state(): no attempt to sync the path with dm is made, > which is what we want here, IMHO. > 3) check_path(): see above. > > So yes, IMO the code review shows that PATH_UNCHECKED is better then > PATH_TIMEOUT for the corner case at hand. > > Regards > Martin > > > > But I don't like how PATH_UNCHECKED > > works like PATH_DOWN, but without actually keeping the state synced > > with > > the kernel, since the rest of the multipathd code is expecting the > > state > > to be synced. > > > > > Signed-off-by: Martin Wilck <mwilck@suse.com> > > > --- > > > libmultipath/checkers/tur.c | 24 +++++++++++++++++++++--- > > > 1 file changed, 21 insertions(+), 3 deletions(-) > > > > > > diff --git a/libmultipath/checkers/tur.c > > > b/libmultipath/checkers/tur.c > > > index 41210892..a6c88eb2 100644 > > > --- a/libmultipath/checkers/tur.c > > > +++ b/libmultipath/checkers/tur.c > > > @@ -349,11 +349,29 @@ int libcheck_check(struct checker * c) > > > } > > > } else { > > > if (uatomic_read(&ct->holders) > 1) { > > > - /* The thread has been cancelled but hasn't > > > - * quit. exit with timeout. */ > > > + /* > > > + * The thread has been cancelled but hasn't > > > quit. > > > + * We have to prevent it from interfering with > > > the new > > > + * thread. We create a new context and leave > > > the old > > > + * one with the stale thread, hoping it will > > > clean up > > > + * eventually. > > > + */ > > > condlog(3, "%d:%d : tur thread not responding", > > > major(ct->devt), minor(ct->devt)); > > > - return PATH_TIMEOUT; > > > + > > > + /* > > > + * libcheck_init will replace c->context. > > > + * It fails only in OOM situations. In this > > > case, return > > > + * PATH_UNCHECKED to avoid prematurely failing > > > the path. > > > + */ > > > + if (libcheck_init(c) != 0) > > > + return PATH_UNCHECKED; > > > + > > > + if (!uatomic_sub_return(&ct->holders, 1)) > > > + /* It did terminate, eventually */ > > > + cleanup_context(ct); > > > + > > > + ct = c->context; > > > } > > > /* Start new TUR checker */ > > > pthread_mutex_lock(&ct->lock); > > > -- > > > 2.19.1 > > > > -- > > dm-devel mailing list > > dm-devel@redhat.com > > https://www.redhat.com/mailman/listinfo/dm-devel > > > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled 2018-10-23 21:21 ` Benjamin Marzinski @ 2018-10-23 21:31 ` Martin Wilck 0 siblings, 0 replies; 13+ messages in thread From: Martin Wilck @ 2018-10-23 21:31 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: dm-devel On Tue, 2018-10-23 at 16:21 -0500, Benjamin Marzinski wrote: > On Tue, Oct 23, 2018 at 10:49:49PM +0200, Martin Wilck wrote: > > On Tue, 2018-10-23 at 14:28 -0500, Benjamin Marzinski wrote: > > > > > On Tue, Oct 23, 2018 at 03:43:45PM +0200, Martin Wilck wrote: > > > > When the tur checker code determines that a hanging TUR thread > > > > couldn't be cancelled, rather than simply returning, reallocate > > > > the checker context and start a new thread. This will leak some > > > > memory if the hanging thread never wakes up again, but well, in > > > > that highly unlikely case we're leaking threads anyway. > > > > > > The thing about PATH_UNCHECKED is that we do mark the path as > > > failed. > > > We just don't tell device-mapper. If we get PATH_UNCHECKED in > > > pathinfo(), we set the state to PATH_DOWN. If we get a > > > PATH_UNCHECKED > > > in > > > check_path(), we immediately call pathinfo(), > > > > But we call pathinfo(pp, conf, 0) there, i.e. all DI flags unset. > > This > > comes down to a (partial) blacklist check (which is ignored) and a > > call > > to path_offline(). pp->state isn't touched in this code path. It's > > more > > or less a NOOP. (BTW, the purpose of this pathinfo() call remains > > obscure to me. It goes back to the ancient commit 95987395). > > Oops. I overlooked the flags. Well, that takes care of any issue with > check_path(). As for the pathinfo call itself, I assume that, at some > tine, it was possible to get to check_path() without a fully > initialized > path device. But that doesn't explain why it doesn't set any flags. > I'm not sure if this code currently serves any purpose. > > > > making it likely that we > > > will get PATH_UNCHECKED there as well. The consequence of this is > > > that > > > if the path later changes to PATH_DOWN, which seems likely, we > > > still > > > won't tell device-mapper, since as far as multipathd is > > > concerned, > > > the > > > path hasn't changed state. > > > > I don't follow you. check_path() quits early when PATH_UNCHECKED is > > encountered, and doesn't alter pp->state. It will check again in > > the > > next round, and if the path switches to DOWN then (or any other > > state, > > for that matter) it will treat it right, AFAICS. > > If PATH_UNCHECKED triggers the "blank" code in pathinfo, it will set > the > path's state to PATH_DOWN. > > > > Looking at most of the code, the way we > > > treat PATH_UNCHECKED really only makes sense when we use it to > > > mean > > > we > > > haven't ever gotten a result from get_state() before. > > > > > > If you want a return code that does just does nothing with the > > > path, > > > except wait, that's PATH_PENDING. It leaves the paths state > > > exactly > > > the > > > same as before. The only issue there is that we schedule another > > > path > > > checker for a second later, which might not be the right answer > > > to an > > > out-of-memory issue. > > > > > > If you've reviewed the code paths that we follow on > > > PATH_UNCHECKED, > > > and > > > still feel that it is the right answer, I won't block it, because > > > this > > > is a pretty remote corner case. > > > > PATH_UNCHECKED is tested in the following places (ignoring calls > > from > > multipath and mpathpersist): > > > > 1) pathinfo(): in the "blank" case (we discussed that before, I > > think > > it's wrong and I removed that test in 17/21 of my previous 21-part > > series). > > removing the blank case here fixes the issue with getting a > PATH_UNCHECKED in pathinfo(), which is the root of my objection to > it, > so > > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> Thanks. Side note, FTR: before the checker notices that the thread is stale, it sees a TUR timeout and thus returns PATH_TIMEOUT, which multipathd treats like PATH_DOWN. The way the patch is now, it just doesn't return PATH_TIMEOUT *again* in the next checker round if it notices that cancellation failed. Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck 2018-10-23 13:43 ` [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled Martin Wilck @ 2018-10-23 13:43 ` Martin Wilck 2018-10-23 19:56 ` Benjamin Marzinski 2018-10-23 13:43 ` [PATCH v2 09/10] multipathd: improve "add missing path" handling Martin Wilck ` (2 subsequent siblings) 4 siblings, 1 reply; 13+ messages in thread From: Martin Wilck @ 2018-10-23 13:43 UTC (permalink / raw) To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck If a path was still not properly initialized after exhausting the retrigger tries, it used to remain in INIT_MISSING_UDEV state forever. get_uid() might fall back to non-udev-based methods to determine the WWID, but it would never be called for a path in this state any more. This patch changes this behavior by resetting the path back to FAILED state if udev can't provide the WWID even after retriggering. Now, if the path ever happens to be in PATH_UP or PATH_GHOST state again, pathinfo(DI_ALL) will be called from check_path(), and there's at least some chance to obtain a WWID for it. Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipathd/main.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/multipathd/main.c b/multipathd/main.c index 2d45d989..a9e1a4bd 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -1828,15 +1828,29 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) retrigger_tries = conf->retrigger_tries; checkint = conf->checkint; put_multipath_config(conf); - if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV && - pp->retriggers < retrigger_tries) { - condlog(2, "%s: triggering change event to reinitialize", - pp->dev); - pp->initialized = INIT_REQUESTED_UDEV; - pp->retriggers++; - sysfs_attr_set_value(pp->udev, "uevent", "change", - strlen("change")); - return 0; + if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) { + if (pp->retriggers < retrigger_tries) { + condlog(2, "%s: triggering change event to reinitialize", + pp->dev); + pp->initialized = INIT_REQUESTED_UDEV; + pp->retriggers++; + sysfs_attr_set_value(pp->udev, "uevent", "change", + strlen("change")); + return 0; + } else { + condlog(1, "%s: not initialized after %d udev retriggers", + pp->dev, retrigger_tries); + /* + * Make sure that the "add missing path" code path + * below may reinstate the path later, if it ever + * comes up again. + * The WWID needs not be cleared; if it was set, the + * state hadn't been INIT_MISSING_UDEV in the first + * place. + */ + pp->initialized = INIT_FAILED; + return 0; + } } /* -- 2.19.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure 2018-10-23 13:43 ` [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure Martin Wilck @ 2018-10-23 19:56 ` Benjamin Marzinski 0 siblings, 0 replies; 13+ messages in thread From: Benjamin Marzinski @ 2018-10-23 19:56 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel On Tue, Oct 23, 2018 at 03:43:46PM +0200, Martin Wilck wrote: > If a path was still not properly initialized after exhausting the > retrigger tries, it used to remain in INIT_MISSING_UDEV state forever. > get_uid() might fall back to non-udev-based methods to determine > the WWID, but it would never be called for a path in this state any more. > > This patch changes this behavior by resetting the path back to FAILED > state if udev can't provide the WWID even after retriggering. Now, if > the path ever happens to be in PATH_UP or PATH_GHOST state again, > pathinfo(DI_ALL) will be called from check_path(), and there's at least > some chance to obtain a WWID for it. > > Signed-off-by: Martin Wilck <mwilck@suse.com> Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > multipathd/main.c | 32 +++++++++++++++++++++++--------- > 1 file changed, 23 insertions(+), 9 deletions(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index 2d45d989..a9e1a4bd 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -1828,15 +1828,29 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > retrigger_tries = conf->retrigger_tries; > checkint = conf->checkint; > put_multipath_config(conf); > - if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV && > - pp->retriggers < retrigger_tries) { > - condlog(2, "%s: triggering change event to reinitialize", > - pp->dev); > - pp->initialized = INIT_REQUESTED_UDEV; > - pp->retriggers++; > - sysfs_attr_set_value(pp->udev, "uevent", "change", > - strlen("change")); > - return 0; > + if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) { > + if (pp->retriggers < retrigger_tries) { > + condlog(2, "%s: triggering change event to reinitialize", > + pp->dev); > + pp->initialized = INIT_REQUESTED_UDEV; > + pp->retriggers++; > + sysfs_attr_set_value(pp->udev, "uevent", "change", > + strlen("change")); > + return 0; > + } else { > + condlog(1, "%s: not initialized after %d udev retriggers", > + pp->dev, retrigger_tries); > + /* > + * Make sure that the "add missing path" code path > + * below may reinstate the path later, if it ever > + * comes up again. > + * The WWID needs not be cleared; if it was set, the > + * state hadn't been INIT_MISSING_UDEV in the first > + * place. > + */ > + pp->initialized = INIT_FAILED; > + return 0; > + } > } > > /* > -- > 2.19.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 09/10] multipathd: improve "add missing path" handling 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck 2018-10-23 13:43 ` [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled Martin Wilck 2018-10-23 13:43 ` [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure Martin Wilck @ 2018-10-23 13:43 ` Martin Wilck 2018-10-23 20:11 ` Benjamin Marzinski 2018-10-23 13:43 ` [PATCH v2 10/10] multipath.8: fix description of "device" argument Martin Wilck 2018-11-14 7:38 ` [PATCH v2 00/10] various multipath-tools patches Christophe Varoqui 4 siblings, 1 reply; 13+ messages in thread From: Martin Wilck @ 2018-10-23 13:43 UTC (permalink / raw) To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck Only add devices that have been properly initialized by pathinfo(). For others, increase the path check interval to avoid useless checks of devices which are probably not meant to be multipathed anyway. The check for pp->initialized != INIT_MISSING_UDEV is redundant, as check_path() returns early in all other cases. Replace it by a check for INIT_FAILED, in case we ever add more init states. Suggested-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipathd/main.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/multipathd/main.c b/multipathd/main.c index a9e1a4bd..bf5f12a6 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -1811,7 +1811,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) int add_active; int disable_reinstate = 0; int oldchkrstate = pp->chkrstate; - int retrigger_tries, checkint; + int retrigger_tries, checkint, max_checkint; struct config *conf; int ret; @@ -1827,6 +1827,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) conf = get_multipath_config(); retrigger_tries = conf->retrigger_tries; checkint = conf->checkint; + max_checkint = conf->max_checkint; put_multipath_config(conf); if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) { if (pp->retriggers < retrigger_tries) { @@ -1891,18 +1892,26 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) return 1; } if (!pp->mpp) { - if (!strlen(pp->wwid) && pp->initialized != INIT_MISSING_UDEV && + if (!strlen(pp->wwid) && pp->initialized == INIT_FAILED && (newstate == PATH_UP || newstate == PATH_GHOST)) { condlog(2, "%s: add missing path", pp->dev); conf = get_multipath_config(); pthread_cleanup_push(put_multipath_config, conf); ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST); pthread_cleanup_pop(1); - if (ret == PATHINFO_OK) { + /* INIT_OK implies ret == PATHINFO_OK */ + if (pp->initialized == INIT_OK) { ev_add_path(pp, vecs, 1); pp->tick = 1; - } else if (ret == PATHINFO_SKIPPED) - return -1; + } else { + /* + * We failed multiple times to initialize this + * path properly. Don't re-check too often. + */ + pp->checkint = max_checkint; + if (ret == PATHINFO_SKIPPED) + return -1; + } } return 0; } @@ -2049,11 +2058,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) return 0; } } else { - unsigned int max_checkint; LOG_MSG(4, checker_message(&pp->checker)); - conf = get_multipath_config(); - max_checkint = conf->max_checkint; - put_multipath_config(conf); if (pp->checkint != max_checkint) { /* * double the next check delay. -- 2.19.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 09/10] multipathd: improve "add missing path" handling 2018-10-23 13:43 ` [PATCH v2 09/10] multipathd: improve "add missing path" handling Martin Wilck @ 2018-10-23 20:11 ` Benjamin Marzinski 0 siblings, 0 replies; 13+ messages in thread From: Benjamin Marzinski @ 2018-10-23 20:11 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel On Tue, Oct 23, 2018 at 03:43:47PM +0200, Martin Wilck wrote: > Only add devices that have been properly initialized by pathinfo(). > For others, increase the path check interval to avoid useless checks > of devices which are probably not meant to be multipathed anyway. > > The check for pp->initialized != INIT_MISSING_UDEV is redundant, > as check_path() returns early in all other cases. Replace it by a > check for INIT_FAILED, in case we ever add more init states. > Thanks Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > Suggested-by: Benjamin Marzinski <bmarzins@redhat.com> > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipathd/main.c | 23 ++++++++++++++--------- > 1 file changed, 14 insertions(+), 9 deletions(-) > > diff --git a/multipathd/main.c b/multipathd/main.c > index a9e1a4bd..bf5f12a6 100644 > --- a/multipathd/main.c > +++ b/multipathd/main.c > @@ -1811,7 +1811,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > int add_active; > int disable_reinstate = 0; > int oldchkrstate = pp->chkrstate; > - int retrigger_tries, checkint; > + int retrigger_tries, checkint, max_checkint; > struct config *conf; > int ret; > > @@ -1827,6 +1827,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > conf = get_multipath_config(); > retrigger_tries = conf->retrigger_tries; > checkint = conf->checkint; > + max_checkint = conf->max_checkint; > put_multipath_config(conf); > if (!pp->mpp && pp->initialized == INIT_MISSING_UDEV) { > if (pp->retriggers < retrigger_tries) { > @@ -1891,18 +1892,26 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > return 1; > } > if (!pp->mpp) { > - if (!strlen(pp->wwid) && pp->initialized != INIT_MISSING_UDEV && > + if (!strlen(pp->wwid) && pp->initialized == INIT_FAILED && > (newstate == PATH_UP || newstate == PATH_GHOST)) { > condlog(2, "%s: add missing path", pp->dev); > conf = get_multipath_config(); > pthread_cleanup_push(put_multipath_config, conf); > ret = pathinfo(pp, conf, DI_ALL | DI_BLACKLIST); > pthread_cleanup_pop(1); > - if (ret == PATHINFO_OK) { > + /* INIT_OK implies ret == PATHINFO_OK */ > + if (pp->initialized == INIT_OK) { > ev_add_path(pp, vecs, 1); > pp->tick = 1; > - } else if (ret == PATHINFO_SKIPPED) > - return -1; > + } else { > + /* > + * We failed multiple times to initialize this > + * path properly. Don't re-check too often. > + */ > + pp->checkint = max_checkint; > + if (ret == PATHINFO_SKIPPED) > + return -1; > + } > } > return 0; > } > @@ -2049,11 +2058,7 @@ check_path (struct vectors * vecs, struct path * pp, int ticks) > return 0; > } > } else { > - unsigned int max_checkint; > LOG_MSG(4, checker_message(&pp->checker)); > - conf = get_multipath_config(); > - max_checkint = conf->max_checkint; > - put_multipath_config(conf); > if (pp->checkint != max_checkint) { > /* > * double the next check delay. > -- > 2.19.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 10/10] multipath.8: fix description of "device" argument 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck ` (2 preceding siblings ...) 2018-10-23 13:43 ` [PATCH v2 09/10] multipathd: improve "add missing path" handling Martin Wilck @ 2018-10-23 13:43 ` Martin Wilck 2018-10-23 20:33 ` Benjamin Marzinski 2018-11-14 7:38 ` [PATCH v2 00/10] various multipath-tools patches Christophe Varoqui 4 siblings, 1 reply; 13+ messages in thread From: Martin Wilck @ 2018-10-23 13:43 UTC (permalink / raw) To: Christophe Varoqui, Benjamin Marzinski; +Cc: dm-devel, Martin Wilck Describe the fact that we can refer to path devices by WWID, and to multipath maps by devnode. Signed-off-by: Martin Wilck <mwilck@suse.com> --- multipath/multipath.8 | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/multipath/multipath.8 b/multipath/multipath.8 index c9bd23aa..9cdd05a3 100644 --- a/multipath/multipath.8 +++ b/multipath/multipath.8 @@ -95,21 +95,25 @@ is used to detect and coalesce multiple paths to devices, for fail-over or perfo .SH ARGUMENTS .\" ---------------------------------------------------------------------------- . +The \fBdevice\fR argument restricts \fBmultipath\fR's operation to devices matching the given +expression. The argument may refer either to a multipath map or to +its components ("paths"). The expression may be in one of the following formats: +. +.TP 1.4i +.B device node +file name of a device node, e.g. \fI/dev/dm-10\fR or \fI/dev/sda\fR. If the node refers +to an existing device mapper device representing a multipath map, this selects +the map or its paths, depending on the operation mode. Otherwise, it selects a path device. +. .TP -.BI device -Act only on the multipath map specified by -.IR device , -which is either: -.RS 1.2i -.IP \[bu] -A multipath map name. -.IP \[bu] -A path (low-level device) associated with the desired multipath map; the path may be in one of the following formats: -.RS 1.2i -.IP \[bu] -.B /dev/sdX -.IP \[bu] -.B major:minor +.B device ID +kernel device number specified by major:minor numbers, e.g. \fI65:16\fR. This +format can only be used for path devices. +. +.TP +.B WWID +a World Wide Identifier matching a multipath map or its paths. To list WWIDs of devices +present in the system, use e.g. the command "\fImultipath -d -v3 2>/dev/null\fR". . .\" ---------------------------------------------------------------------------- .SH OPERATION MODES -- 2.19.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v2 10/10] multipath.8: fix description of "device" argument 2018-10-23 13:43 ` [PATCH v2 10/10] multipath.8: fix description of "device" argument Martin Wilck @ 2018-10-23 20:33 ` Benjamin Marzinski 0 siblings, 0 replies; 13+ messages in thread From: Benjamin Marzinski @ 2018-10-23 20:33 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel On Tue, Oct 23, 2018 at 03:43:48PM +0200, Martin Wilck wrote: > Describe the fact that we can refer to path devices by WWID, and > to multipath maps by devnode. > Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com> > Signed-off-by: Martin Wilck <mwilck@suse.com> > --- > multipath/multipath.8 | 32 ++++++++++++++++++-------------- > 1 file changed, 18 insertions(+), 14 deletions(-) > > diff --git a/multipath/multipath.8 b/multipath/multipath.8 > index c9bd23aa..9cdd05a3 100644 > --- a/multipath/multipath.8 > +++ b/multipath/multipath.8 > @@ -95,21 +95,25 @@ is used to detect and coalesce multiple paths to devices, for fail-over or perfo > .SH ARGUMENTS > .\" ---------------------------------------------------------------------------- > . > +The \fBdevice\fR argument restricts \fBmultipath\fR's operation to devices matching the given > +expression. The argument may refer either to a multipath map or to > +its components ("paths"). The expression may be in one of the following formats: > +. > +.TP 1.4i > +.B device node > +file name of a device node, e.g. \fI/dev/dm-10\fR or \fI/dev/sda\fR. If the node refers > +to an existing device mapper device representing a multipath map, this selects > +the map or its paths, depending on the operation mode. Otherwise, it selects a path device. > +. > .TP > -.BI device > -Act only on the multipath map specified by > -.IR device , > -which is either: > -.RS 1.2i > -.IP \[bu] > -A multipath map name. > -.IP \[bu] > -A path (low-level device) associated with the desired multipath map; the path may be in one of the following formats: > -.RS 1.2i > -.IP \[bu] > -.B /dev/sdX > -.IP \[bu] > -.B major:minor > +.B device ID > +kernel device number specified by major:minor numbers, e.g. \fI65:16\fR. This > +format can only be used for path devices. > +. > +.TP > +.B WWID > +a World Wide Identifier matching a multipath map or its paths. To list WWIDs of devices > +present in the system, use e.g. the command "\fImultipath -d -v3 2>/dev/null\fR". > . > .\" ---------------------------------------------------------------------------- > .SH OPERATION MODES > -- > 2.19.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 00/10] various multipath-tools patches 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck ` (3 preceding siblings ...) 2018-10-23 13:43 ` [PATCH v2 10/10] multipath.8: fix description of "device" argument Martin Wilck @ 2018-11-14 7:38 ` Christophe Varoqui 4 siblings, 0 replies; 13+ messages in thread From: Christophe Varoqui @ 2018-11-14 7:38 UTC (permalink / raw) To: Martin Wilck; +Cc: device-mapper development [-- Attachment #1.1: Type: text/plain, Size: 999 bytes --] Merged. Thanks. On Tue, Oct 23, 2018 at 3:44 PM Martin Wilck <mwilck@suse.com> wrote: > This is v2 of my series from Oct. 10; starting from 7 as the > other patches have already been merged. > > Changes wrt v2: > 7/10: handle failure of libcheck_init() (Ben). I decided not to > return PATH_TIMEOUT in this case, rationale in my reply to > Ben's review. > 9/10: new, add the suggestions from Ben's review of 8/8 of v1. > 10/10: new, see Ben's comment on patch 5/8 of v1. > > Martin Wilck (4): > libmultipath: handle TUR threads that can't be cancelled > multipathd: handle repeated udev retrigger failure > multipathd: improve "add missing path" handling > multipath.8: fix description of "device" argument > > libmultipath/checkers/tur.c | 24 ++++++++++++++-- > multipath/multipath.8 | 32 +++++++++++---------- > multipathd/main.c | 55 +++++++++++++++++++++++++------------ > 3 files changed, 76 insertions(+), 35 deletions(-) > > -- > 2.19.1 > > [-- Attachment #1.2: Type: text/html, Size: 1387 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-11-14 7:38 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-23 13:43 [PATCH v2 00/10] various multipath-tools patches Martin Wilck 2018-10-23 13:43 ` [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled Martin Wilck 2018-10-23 19:28 ` Benjamin Marzinski 2018-10-23 20:49 ` Martin Wilck 2018-10-23 21:21 ` Benjamin Marzinski 2018-10-23 21:31 ` Martin Wilck 2018-10-23 13:43 ` [PATCH v2 08/10] multipathd: handle repeated udev retrigger failure Martin Wilck 2018-10-23 19:56 ` Benjamin Marzinski 2018-10-23 13:43 ` [PATCH v2 09/10] multipathd: improve "add missing path" handling Martin Wilck 2018-10-23 20:11 ` Benjamin Marzinski 2018-10-23 13:43 ` [PATCH v2 10/10] multipath.8: fix description of "device" argument Martin Wilck 2018-10-23 20:33 ` Benjamin Marzinski 2018-11-14 7:38 ` [PATCH v2 00/10] various multipath-tools patches Christophe Varoqui
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.