All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Wilck <mwilck@suse.com>
To: Benjamin Marzinski <bmarzins@redhat.com>
Cc: dm-devel@redhat.com
Subject: Re: [PATCH 15/31] multipath: implement "check usable paths" (-C/-U)
Date: Thu, 14 Sep 2017 13:47:31 +0200	[thread overview]
Message-ID: <1505389651.4493.1.camel@suse.com> (raw)
In-Reply-To: <20170913205334.GL3145@octiron.msp.redhat.com>

On Wed, 2017-09-13 at 15:53 -0500, Benjamin Marzinski wrote:
> On Sun, Sep 03, 2017 at 12:38:44AM +0200, Martin Wilck wrote:
> > When we process udev rules, it's crucial to know whether I/O on a
> > given
> > device will succeed. Unfortunately DM_NR_VALID_PATHS is not
> > reliable,
> > because the kernel path events aren't necessarily received in
> > order, and
> > even if they are, the number of usable paths may have changed
> > during
> > udev processing, in particular when there's a lot of load on udev
> > because many paths are failing or reinstating at the same time.
> > The latter problem can't be completely avoided, but the closer the
> > test before the actual "blkid" call, the better.
> > 
> > This patch adds the -C/-U options to multipath to check if a given
> > map has usable paths. Obviously this command must avoid doing any
> > I/O
> > on the multipath map itself, thus no checkers are called; only
> > status
> > from sysfs and dm is collected.
> 
> I'm a little worried about the overhead of adding yet more multipath
> commands to udev.  The multipath command takes a while to exec, and
> already udev hits issues where in event storms, udev can time out
> because it's trying to do too much with too short a timeout.

I was aware of that and tried to make this as lean as possible. On my
system here it takes about 8ms or 500 sytsem calls, which is roughly
the same number as "multipath -c" or "kpartx_id", at least in the case
where there are paths available. AFAICS, most of the time is spent in
libudev collecting device properties. I haven't studied that in depth
though. "blkid" calls are much more expensive AFAICT.

> Do out-of-order uevents really happen? 

For dm-mpath "path events", yes, I'm positive about that. 
See an example at http://paste.opensuse.org/28641254. 
It was taken with an openSUSE Tumbleweed 4.11.8 kernel. It was tkane
from udev monitor data. 
See http://paste.opensuse.org/63686952 for the full log.

You can see that the time stamps and seqnums increase, but
DM_NR_VALID_PATHS does not decrease monitonically as you'd expect (my
script removes all paths of map in order, re-adds them again).
So far I haven't had the time to analyze this on the kernel side. But
even if it could be fixed in the kernel, multipathd and the udev rules
should be able to deal with it.

So, reinforcing my argument from the log message, I truly believe that
DM_NR_VALID_PATHS is not something that we should rely upon too much.

> Delayed ones certainly do, but if
> we really can see out-of-order events, then all that event coalescing
> code that got in should get another pass over it, because I'm pretty
> sure it relied on events not being reordered.

That would need further examination. I thought that the coalescing
logic worked mostly on uevents for path devices, not the
PATH_FAILED/PATH_REINSTATED events for the map devices at which the
udev rules are looking.

> If all we're are worried about is delayed events, then it might be
> o.k.
> to just always disable scanning on PATH_FAILED events, because we
> don't
> know if there are any more of them. When we reload a device, we
> already
> pass the DM_SUBSYSTEM_UDEV_FLAG2 to deal with not having
> DM_NR_VALID_PATHS on reloads. However, I do realize that a path could
> fail immediately after the reload, and your patch does a better job
> keeping that window smaller.
> 
> Also, when you have reinstates and failures at the same time, you
> won't
> run into problems unless the path you just reinstated immediately
> fails
> (otherwise there will be at least one available path, the one you
> just
> reinstated).  This certainly can happen. 

Maybe we could skip calling "multipath -U" for PATH_REINSTATED events.
You're right, the scenario you just describe is really not that likely.

> > Unfortunately, in my
> experience, it usually happens because sysfs says that the path is
> o.k.
> but when the kernel tries to do IO to it, it's flaky. The -C/-U
> callout
> isn't going to catch those cases, because it doesn't do IO.

True, but the whole purpose of this patch is to avoid doing IO in the
first place. We can't do anything about this; both the kernel's and
multipathd's internal representation can only be approximations of the
real device state.

> Now, I agree that you are making the window where things can go wrong
> smaller, but there is a cost that is being incurred on processing a
> large number of uevents to make that window smaller, and I don't know
> exactly how that trade-off works. I've been thinking about making a
> library interface that multipath would use to do the commands which
> are
> also called from udev. That would let udev directly call these
> commands
> if they wanted, which would save on the exec time, and cut out any
> unnecessary cruft that doesn't need to be done for udev to get its
> information.  That might be a solution, in case we do start seeing
> more
> timed-out uevents because of this.

Sure. I've had a similar thought. My tests with "multipath -U" makes me
think that most of the time is spent in collecting properties from
sysfs in libudev. If the code was run in the context of the udev worker
which might have these properties already cached, performance could be
much better. I'm not sure what exactly is cached in the udev workers
though.

Anyway, back to your NAK on this patch, please consider again. 
IMO we're a lot safer with this additional check, in particular in view
of possible out-of-order events.

I introduced this as a replacement for the original "DM_DEPS" check we
had at SUSE. We'd found that to be helpful in avoiding problems during
udev processing in the past. It's always hard to tell if such past
fixes are still required, but at least for SLES we'd risk to cause
customer regressions if we simply dropped it, so we prefer to play safe
here. We can keep this as a SUSE-only patch, if you or others insist
that "multipath -U" is a bad thing.

DM_DEPS just checks if there are any paths (valid or not), and comes
down to a "dmsetup deps" invocation, which takes about 4ms. "multipath
-U" is slower because it needs to look at the paths, but those
additional cycles may pay off if we can avoid a blkid call on a device
with no paths. My first approach to the question "is this map really
ready for IO?" was indeed just a tiny "dmsetup deps" wrapper. But then
I realized the ordering problems for uevents shown above, and I
concluded that a more robust test would be desirable.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

  reply	other threads:[~2017-09-14 11:47 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-02 22:38 [PATCH 00/31] multipath/kpartx udev rules cleanup, and fixes Martin Wilck
2017-09-02 22:38 ` [PATCH 01/31] libmultipath: fix partition_delimiter config option Martin Wilck
2017-09-02 22:38 ` [PATCH 02/31] kpartx: helper functions for name and uuid generation Martin Wilck
2017-09-02 22:38 ` [PATCH 03/31] kpartx: search partitions by UUID, and rename Martin Wilck
2017-09-02 22:38 ` [PATCH 04/31] test-kpartx: add tests for renaming functionality Martin Wilck
2017-09-02 22:38 ` [PATCH 05/31] kpartx: fix a corner case when renaming partitions Martin Wilck
2017-09-02 22:38 ` [PATCH 06/31] kpartx: fix part deletion without partition table Martin Wilck
2017-09-02 22:38 ` [PATCH 07/31] test-kpartx: test deletion with empty part table Martin Wilck
2017-09-02 22:38 ` [PATCH 08/31] kpartx: only recognize dasd part table on DASD Martin Wilck
2017-09-02 22:38 ` [PATCH 09/31] libmultipath: support MPATH_UDEV_NO_PATHS_FLAG on map creation Martin Wilck
2017-09-13 20:33   ` Benjamin Marzinski
2017-09-14 11:53     ` Martin Wilck
2017-09-02 22:38 ` [PATCH 10/31] libmultipath: add get_udev_device Martin Wilck
2017-09-02 22:38 ` [PATCH 11/31] libmultipath: get_refwwid: use get_udev_device Martin Wilck
2017-09-02 22:38 ` [PATCH 12/31] libmultipath: use const char* in some dm helpers Martin Wilck
2017-09-02 22:38 ` [PATCH 13/31] libmultipath: add DI_NOIO flag for pathinfo Martin Wilck
2017-09-02 22:38 ` [PATCH 14/31] libmultipath: add dm_get_multipath Martin Wilck
2017-09-02 22:38 ` [PATCH 15/31] multipath: implement "check usable paths" (-C/-U) Martin Wilck
2017-09-13 20:53   ` Benjamin Marzinski
2017-09-14 11:47     ` Martin Wilck [this message]
2017-09-15 21:06       ` Benjamin Marzinski
2017-09-02 22:38 ` [PATCH 16/31] 11-dm-mpath.rules: multipath -U for READY check Martin Wilck
2017-09-02 22:38 ` [PATCH 17/31] 11-dm-mpath.rules: import more ID_FS_xxx vars from db Martin Wilck
2017-09-02 22:38 ` [PATCH 18/31] 11-dm-mpath.rules: no need to test before IMPORT Martin Wilck
2017-09-02 22:38 ` [PATCH 19/31] 11-dm-mpath.rules: handle new maps with READY==0 Martin Wilck
2017-09-02 22:38 ` [PATCH 20/31] 11-dm-mpath.rules: don't set READY->ACTIVATION Martin Wilck
2017-09-13 21:19   ` Benjamin Marzinski
2017-09-13 21:33     ` Martin Wilck
2017-09-14 12:48     ` Martin Wilck
2017-09-15 20:33       ` Benjamin Marzinski
2017-09-02 22:38 ` [PATCH 21/31] 11-dm-mpath.rules: Remember DM_ACTIVATION Martin Wilck
2017-09-13 21:19   ` Benjamin Marzinski
2017-09-14 13:06     ` Martin Wilck
2017-09-15 20:40       ` Benjamin Marzinski
2017-09-18 19:54         ` Martin Wilck
2017-09-02 22:38 ` [PATCH 22/31] multipath.rules: set ID_FS_TYPE to "mpath_member" Martin Wilck
2017-09-02 22:38 ` [PATCH 23/31] kpartx.rules: don't rely on DM_DEPS and DM_TABLE_STATE Martin Wilck
2017-09-02 22:38 ` [PATCH 24/31] kpartx.rules: respect DM_UDEV_LOW_PRIORITY_FLAG Martin Wilck
2017-09-02 22:38 ` [PATCH 25/31] kpartx.rules: improved logic for by-uuid and by-label links Martin Wilck
2017-09-02 22:38 ` [PATCH 26/31] kpartx.rules: create by-partuuid and by-partlabel symlinks Martin Wilck
2017-09-02 22:38 ` [PATCH 27/31] kpartx.rules: generate type-name links only for multipath devices Martin Wilck
2017-09-02 22:38 ` [PATCH 28/31] kpartx.rules: fix logic for adding partitions Martin Wilck
2017-09-02 22:38 ` [PATCH 29/31] multipath/kpartx rules: avoid superfluous scanning Martin Wilck
2017-09-02 22:38 ` [PATCH 30/31] kpartx/del-part-nodes.rules: new udev file Martin Wilck
2017-09-13 21:23   ` Benjamin Marzinski
2017-09-02 22:39 ` [PATCH 31/31] kpartx.rules: move symlink code to other files Martin Wilck
2017-09-13 21:26   ` Benjamin Marzinski
2017-09-13 21:28 ` [PATCH 00/31] multipath/kpartx udev rules cleanup, and fixes Benjamin Marzinski
2017-09-14 11:56   ` Martin Wilck
2017-09-14 20:00   ` [PATCH v2 30/31] kpartx/del-part-nodes.rules: new udev file Martin Wilck
2017-09-14 20:00     ` [PATCH v2 31/31] kpartx.rules: move symlink code to other files Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1505389651.4493.1.camel@suse.com \
    --to=mwilck@suse.com \
    --cc=bmarzins@redhat.com \
    --cc=dm-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.