All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Benjamin Marzinski" <bmarzins@redhat.com>
To: Martin Wilck <mwilck@suse.com>
Cc: device-mapper development <dm-devel@redhat.com>
Subject: Re: [PATCH v3 18/19] libmultipath: Don't blank intialized paths
Date: Fri, 5 Oct 2018 14:38:43 -0500	[thread overview]
Message-ID: <20181005193843.GF3172@octiron.msp.redhat.com> (raw)
In-Reply-To: <f87221ecedb072a7f3551ea0f62e7df586d075f1.camel@suse.com>

On Wed, Oct 03, 2018 at 12:37:17AM +0200, Martin Wilck wrote:
> Hi Ben,
> 
> > I am uncertain about this one. The old code sets pp->initialized to
> > INIT_FAILED. If the state had been INIT_MISSING_UDEV or
> > INIT_REQUESTED_UDEV before, this patch might change how the code
> > behaves later in check_path(), where these conditions are checked.
> > 
> > Likewise, tests for strlen(pp->wwid) are used in various places
> > around
> > the code. These tests would now yield different results for paths in
> > "recoverable error" state.
> > 
> > Have you considered these possible side effects?
> 
> I've pondered over this a lot. The dust is clearing up a bit.
> 
> 1. With your patch in place, INIT_FAILED is never set except in
> alloc_path() (we might rename it to INIT_NEW or the like, but see
> below).

Correct. The idea is that INIT_FAILED means that the path has never been
fully initialized and that it is missing more than simply the wwid
(which would set INIT_MISSING_UDEV instead).

> 2. I don't understand how you handle repeated failure to retrieve the
> WWID. I see that get_uid() (actually, scsi_uid_fallback()) would
> retrieve the WWID from sysfs after retriggers are exhausted. But I
> don't see how pathinfo(DI_WWID) would ever be called in this situation:
>
> In the last invocation, pathinfo() had failed to retrieve the WWID and
> set pp->initialized = INIT_MISSING_UDEV. There it will remain because
> check_path() won't set it to INIT_REQUESTED_UDEV any more after retries
> are exhausted. And now, check_path() won't call pathinfo(DI_ALL) any
> more from the "add missing path" code, because of the (pp->initialized
> != INIT_MISSING_UDEV) condition.
> 
> Am I overlooking something?

Your analysis looks right. This shouldn't have anything to do with the
correctness of the not blanking initialized paths, but we should fix it
anyway.  Right now, as you mentioned, if multipathd fails to get the
wwid after retrigger_tries new uevents, it only has one shot at the
failback methods, and then mutipathd cannot use that path in the future.
However, it will still check if the path is up or down, which is pretty
pointless. It should either

1. set pp->intialized to INIT_OK, to keep it from being checked at all
in the future, or

2. try to grab the missing pathinfo, so that it can use the path.

The benefit of doing the first is that mutipathd doesn't keep messing
with paths that really aren't set up to be multipathed. On the other
hand it might ignore some paths that really should be multipathed.
perhaps the best idea is to keep trying to get the missing info, but
just increase pp->tick (possibly progressively) so we try less often.

> 3. If "blank" state means that important device information couldn't be
> retrieved because of presumably transient failure conditions, we should
> retry to retrieve this information by calling pathinfo again later. But
> unless the WWID is (reset to) the empty string, check_path() won't call
> pathinfo(DI_ALL) any more.

But if we set the wwid at some point in time, that means we got all the
necessary information.  My argument is that if something happens to
cause a later pathinfo() call to fail, we shouldn't pretend like we
didn't already get this info.
 
> 4. The "blank" logic in pathinfo() combines several very different
> cases.
>   a) PATH_REMOVED status from path_offline(). This means that
> elementary sysfs attributes were missing. This is almost the same as
> failure in sysfs_pathinfo(), which results in PATHINFO_FAILED return
> status; but for PATH_REMOVED we return PATHINFO_OK and keep the path
> around.

True, but I've always assumed that this means that we will be getting a
uevent to remove this path momentarily, so I doesn't really matter what
we do with it. We could blank the path here, but if we already go the
necessary info, it seems like just setting the path to PATH_DOWN, and
waiting for the remove uevent should be fine.

>   b) Failure in checker_check(). If the path is offline in the first
> place, the checker isn't called, and WWID determination is attempted.
> But if the checker returns PATH_UNCHECKED or PATH_WILD, we goto "blank"
> state.

This is clearly the wrong thing to do.

>   c) Failure in scsi_ioctl_pathinfo() or cciss_ioctl_pathinfo(). Both
> functions never fail, so this can't happen. I've patches here to fix
> that.

I just ignored this state because it was impossible. But even with your
patches, my argument remains the same. If we already got this info,
failing to get it again shouldn't cause us to delete the wwid. We should
probably remember the existing values, to make sure that we don't lose
them if the calls fail for some reason.

>   d) Failure to open pp->fd. 

This is clearly the right thing to do.

> d) is the only case in which the "blank" logic makes really sense to
> me. It can happen only at the first pathinfo() invocation, meaning 
> pp->wwid is still empty, and pp->initialized is INIT_FAILED. Your patch
> would change nothing for this case.
> 
> a) and b) can happen for paths that have been initialized already. I
> think in case a) the WWID should be reset, probably initialized should
> be set to INIT_FAILED, and PATHINFO_FAILED should be returned. In case
> b) we should IMO proceed normally rather than goto "blank". Resetting
> the WWID in case b) is nonsense, agreed.
> 
> Altogether, if my analysis is correct, your patch (not blanking the
> WWID) should be applied to case b) only.

I don't really care about blanking the wwid in case a). I just think
it's unnecessary.  Why do you think we should blank it in case c) if we
have previously gotten those values?

-Ben

> Please comment - I still feel a bit confused and may have overlooked
> something essential.
> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 

  reply	other threads:[~2018-10-05 19:38 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-21 23:05 [PATCH v3 00/19] Misc Multipath patches Benjamin Marzinski
2018-09-21 23:05 ` [PATCH v3 01/19] libmultipath: fix tur checker timeout Benjamin Marzinski
2018-10-01 19:51   ` Martin Wilck
2018-10-04 16:31     ` Benjamin Marzinski
2018-10-05 10:11       ` Martin Wilck
2018-10-05 17:02         ` Benjamin Marzinski
2018-10-05 19:23           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 02/19] libmultipath: fix tur checker double locking Benjamin Marzinski
2018-10-01 20:09   ` Martin Wilck
2018-10-01 20:44     ` Martin Wilck
2018-10-04 16:47       ` Benjamin Marzinski
2018-10-04 16:45     ` Benjamin Marzinski
2018-10-05 10:25       ` Martin Wilck
2018-10-05 17:10         ` Benjamin Marzinski
2018-10-05 19:07           ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 03/19] libmultipath: fix tur memory misuse Benjamin Marzinski
2018-10-01 20:59   ` Martin Wilck
2018-10-02  7:48     ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 04/19] libmultipath: cleanup tur locking Benjamin Marzinski
2018-10-01 21:08   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 05/19] libmultipath: fix tur checker timeout issue Benjamin Marzinski
2018-10-01 21:09   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 06/19] libmultipath: fix set_int error path Benjamin Marzinski
2018-10-01 21:17   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 07/19] libmultipath: fix length issues in get_vpd_sgio Benjamin Marzinski
2018-10-01 21:25   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 08/19] libmultipath: _install_keyword cleanup Benjamin Marzinski
2018-10-01 21:26   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 09/19] libmultipath: remove unused code Benjamin Marzinski
2018-10-01 21:28   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 10/19] libmultipath: fix memory issue in path_latency prio Benjamin Marzinski
2018-10-01 21:30   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 11/19] libmultipath: fix null dereference int alloc_path_group Benjamin Marzinski
2018-10-01 21:33   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 12/19] libmutipath: don't use malformed uevents Benjamin Marzinski
2018-10-01 21:31   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 13/19] multipath: fix max array size in print_cmd_valid Benjamin Marzinski
2018-10-01 21:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 14/19] multipathd: function return value tweaks Benjamin Marzinski
2018-10-01 21:37   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 15/19] multipathd: minor fixes Benjamin Marzinski
2018-10-01 21:38   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 16/19] multipathd: remove useless check and fix format Benjamin Marzinski
2018-10-01 21:40   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 17/19] multipathd: fix memory leak on error in configure Benjamin Marzinski
2018-10-01 21:42   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 18/19] libmultipath: Don't blank intialized paths Benjamin Marzinski
2018-10-01 22:00   ` Martin Wilck
2018-10-02 22:37     ` Martin Wilck
2018-10-05 19:38       ` Benjamin Marzinski [this message]
2018-10-08  9:41         ` Martin Wilck
2018-10-09 22:20           ` Benjamin Marzinski
2018-10-08  9:35   ` Martin Wilck
2018-09-21 23:05 ` [PATCH v3 19/19] libmultipath: Fixup updating paths Benjamin Marzinski
2018-10-01 22:30   ` Martin Wilck
2018-10-05 20:32     ` Benjamin Marzinski
2018-10-07  8:36 ` [PATCH v3 00/19] Misc Multipath patches Christophe Varoqui
2018-10-09 16:13   ` Benjamin Marzinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181005193843.GF3172@octiron.msp.redhat.com \
    --to=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.