Re: fuzzing bcachefs with dm-flakey

From: Dave Chinner <david@fromorbit.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	linux-bcachefs@vger.kernel.org, dm-devel@redhat.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: fuzzing bcachefs with dm-flakey
Date: Tue, 30 May 2023 09:12:44 +1000	[thread overview]
Message-ID: <ZHUxbLh1P9yiq2c9@dread.disaster.area> (raw)
In-Reply-To: <ZHUVy7jut1Ex1IGJ@casper.infradead.org>

On Mon, May 29, 2023 at 10:14:51PM +0100, Matthew Wilcox wrote:
> On Mon, May 29, 2023 at 04:59:40PM -0400, Mikulas Patocka wrote:
> > Hi
> > 
> > I improved the dm-flakey device mapper target, so that it can do random 
> > corruption of read and write bios - I uploaded it here: 
> > https://people.redhat.com/~mpatocka/testcases/bcachefs/dm-flakey.c
> > 
> > I set up dm-flakey, so that it corrupts 10% of read bios and 10% of write 
> > bios with this command:
> > dmsetup create flakey --table "0 `blockdev --getsize /dev/ram0` flakey /dev/ram0 0 0 1 4 random_write_corrupt 100000000 random_read_corrupt 100000000"
> 
> I'm not suggesting that any of the bugs you've found are invalid, but 10%
> seems really high.  Is it reasonable to expect any filesystem to cope
> with that level of broken hardware?  Can any of our existing ones cope
> with that level of flakiness?  I mean, I've got some pretty shoddy USB
> cables, but ...

It's realistic in that when you have lots of individual storage
devices, load balanced over all of them, and then one fails
completely we'll see an IO error rate like this. These are the sorts
of setups I'd expect to be using erasure coding with bcachefs, so
the IO failure rate should be able to head towards 20-30% before
actual loss and/or corruption should start occurring.

In this situation, if the failures were isolated to an individual
device, then I'd want the filesystem to kick that device out of the
backing pool. Hence all the failures go away and then rebuild of the
redundancy the erasure coding provides can take place. i.e. an IO
failure rate this high should be a very short lived incident for a
filesystem that directly manages individual devices.

But within a single, small device, it's not a particularly realistic
scenario. If it's really corrupting this much active metadata, then
the filesystem should be shutting down at the first
uncorrectable/unrecoverable metadata error and every other IO error
is then superfluous.

Of course, bcachefs might be doing just that - cleanly shutting down
an active filesystem is a very hard problem. XFS still has intricate
and subtle issues with shutdown of active filesystems that can cause
hangs and/or crashes, so I wouldn't expect bcachefs to be able to
handle these scenarios completely cleanly at this stage of it's
development....

Perhaps it is worthwhile running the same tests on btrfs so we can
something to compare the bcachefs behaviour to. I suspect that btrfs
will fair little better on the single device, no checksums
corruption test....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com