Re: [PATCH] fstests: UDEV_SETTLE_PROG before dmsetup create

From: Josef Bacik <josef@toxicpanda.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: fstests@vger.kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] fstests: UDEV_SETTLE_PROG before dmsetup create
Date: Thu, 2 Dec 2021 10:45:35 -0500	[thread overview]
Message-ID: <YajqH1Njtqz0ZXF+@localhost.localdomain> (raw)
In-Reply-To: <YajcBNMm3dR4YI/N@debian9.Home>

On Thu, Dec 02, 2021 at 02:45:24PM +0000, Filipe Manana wrote:
> On Wed, Dec 01, 2021 at 01:18:52PM -0500, Josef Bacik wrote:
> > We've been seeing transient errors with any test that uses a dm device
> > for the entirety of the time that we've been running nightly xfstests
> 
> I have been having it on my tests vms since ever as well.
> It's really annoying, but fortunatelly it doesn't happen too often.
> 

Yeah before this change we'd fail ~6 tests on every configruation on every
overnight run.  With this change we fail 0.  It's rare, but with our continual
testing it happens sooooooo much.

> > runs.  This turns out to be because sometimes we get EBUSY while trying
> > to create our new dm device.  Generally this is because the test comes
> > right after another test that messes with the dm device, and thus we
> > still have udev messing around with the device when DM tries to O_EXCL
> > the block device.
> > 
> > Add a UDEV_SETTLE_PROG before creating the device to make sure we can
> > create our new dm device without getting this transient error.
> 
> I suspect this might only make it seem the problem goes away but does not
> really fix it.
> 
> I say that for 2 reasons:
> 
> 1) All tests that use dm end up calling _dmsetup_remove(), like through
>    _log_writes_remove() or _cleanup_flakey() for example. Normally those
>    are called in the _cleanup() function, which ensures it's done even if
>    the test fails for some reason.
> 
>    So I don't understand why we need that UDEV_SETTLE_PROG at _dmsetup_create().
> 
>    And I've seen the ebusy failure happen even when the previous tests did
>    not use any dm device;
> 
> 2) Some tests fail after creating the dm device and using it. For example
>    btrfs/206 often fails when it tries to fsck the filesystem:
> 
>    btrfs/206 3s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad)
>         --- tests/btrfs/206.out     2020-10-16 23:13:46.554152652 +0100
>         +++ /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad      2021-12-01 21:09:46.317632589 +0000
>         @@ -3,3 +3,5 @@
>         XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>         wrote 8192/8192 bytes at offset 0
>         XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
>         +_check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
>         +(see /home/fdmanana/git/hub/xfstests/results//btrfs/206.full for details)
>         ...
> 
>        (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/206.out /home/fdmanana/git/hub/xfstests/results//btrfs/206.out.bad'  to see the entire diff)
> 
>     In the .full file I got:
> 
>     (...)
>     replaying 1239@11201: sector 2173408, size 16384, flags 0x10(METADATA)
>     replaying 1240@11234: sector 0, size 0, flags 0x1(FLUSH)
>     replaying 1241@11235: sector 128, size 4096, flags 0x12(FUA|METADATA)
>     _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
>     *** fsck.btrfs output ***
>     ERROR: cannot open device '/dev/sdc': Device or resource busy
>     ERROR: cannot open file system
>     Opening filesystem to check...
>     *** end fsck.btrfs output
>     *** mount output ***
> 
>    The ebusy failure is not when the test starts, but when somewhere in the middle
>    of the replay loop when it calls fsck, or when it ends and the fstests framework
>    calls fsck.
> 
>    I've seen that with btrfs/172 too, which also uses dm logwrites in a similar way.
> 
> So to me this suggests 2 things:
> 
> 1) Calling UDEV_SETTLE_PROG at _dmsetup_create() doesn't solve that problem with
>    btrfs/206 (and other tests) - the problem is fsck failing to open the scratch
>    device after it called _log_writes_remove() -> _dmsetup_remove(), and not a
>    failure to create the dm device;
> 
> 2) The problem is likely something missing at _dmsetup_remove(). Perhaps add
>    another UDEV_SETTLE_PROG there:
> 
>    diff --git a/common/rc b/common/rc
>    index 8e351f17..22b34677 100644
>    --- a/common/rc
>    +++ b/common/rc
>    @@ -4563,6 +4563,7 @@ _dmsetup_remove()
>             $UDEV_SETTLE_PROG >/dev/null 2>&1
>             $DMSETUP_PROG remove "$@" >>$seqres.full 2>&1
>             $DMSETUP_PROG mknodes >/dev/null 2>&1
>     +       $UDEV_SETTLE_PROG >/dev/null 2>&1
>      }
>  
>     _dmsetup_create()
> 
>   I can't say if that change to _dmsetup_remove() is correct, or what it's
>   needed, as I really haven't spent time trying to figure out why the issue
>   happens.
> 

I actually tried a few iterations before I settled on this one, but I was only
trying to reproduce the EBUSY when creating the flakey device, I hadn't seen it
with fsck.  So I originally started with your change, but it didn't fix the
problem.  Then I did both, UDEV_SETTLE at the end of remove and at the beginning
of create and the problem went away, and then I removed the one from remove and
the problem still was gone.

But since I've made this change I also have been investigating another problem
where we'll get EBUSY at mkfs time when we use SCRATCH_DEV_POOL.  I have a test
patch in our staging branch to make sure it actuall fixes it, but I have to add
this to the start of _scratch_pool_mkfs as well.

It turns out that udev is doing this thing where it writes to
/sys/block/whatever/uevent to make sure that a KOBJ_CHANGE event gets sent out
for the device.

This is racing with the test doing a mount.  So the mount gets O_EXCL, which
means the uevent doesn't get emitted until umount.  This would explain what
you're seeing, we umount, we get the KOBJ_CHANGE event once the O_EXCL is
dropped, udev runs, and then fsck gets an EBUSY.

This is a very long email to say that udev is causing spurious failures because
of behavior I don't entirely understand.  We're going to need to sprinkle in
UDEV_SETTLE_PROG in different places to kill all of these different scenarios.

What do we think is best here?  Put UDEV_SETTLE_PROG at the start of any
function that needs to do O_EXCL?  So this would mean

_dmsetup_create
_dmsetup_remove
*_mkfs
*_mount
*_check

That would be safest, and I can certainly do that.  My initial approach was just
to do it where it was problematic, but the debugging I did yesterday around
btrfs/223 failures and the fact that udev is queue'ing up events that get
delivered at some point in the future makes it kind of hard to handle on a
case-by-case basis.  Thanks,

Josef