Re: Race Condition Leads to Corruption

From: Kai Krakow <kai@kaishome.de>
To: Coly Li <colyli@suse.de>
Cc: Marc Smith <msmith626@gmail.com>,
	"linux-bcache@vger.kernel.org" <linux-bcache@vger.kernel.org>
Subject: Re: Race Condition Leads to Corruption
Date: Fri, 23 Apr 2021 00:19:00 +0200	[thread overview]
Message-ID: <CAC2ZOYvKZBFRPi+-BB8vyTWhMoTGsQZ+7vuFfDmBzpSjzwvVYg@mail.gmail.com> (raw)
In-Reply-To: <e61bcc44-5ac1-e58c-d5c9-fb7257ba044d@suse.de>

Hello Coly!

Am Do., 22. Apr. 2021 um 18:05 Uhr schrieb Coly Li <colyli@suse.de>:

> In direct I/Os, to read the just-written data, the reader must wait and
> make sure the previous write complete, then the reading data should be
> the previous written content. If not, that's bcache bug.

Isn't this report exactly about that? DIO data has been written, then
differently written again with a concurrent process, and when you read
it back, any of both may come back (let's call it state A). But the
problem here is that this is not persistent, and that should actually
not happen: bcache now has stale content in its cache, and after write
back finished, the contents of the previous read (from state A)
changed to a new state B. And this is not what you should expect from
direct IO: The contents have literally changed under your feet with a
much too high latency: If some read already confirmed that data has
some state A after concurrent writes, it should not change to a state
B after bcache finished write-back.

> You may try the above steps on non-bcache block devices with/without
> file systems, it is probably to reproduce similar "race" with parallel
> direct read and writes.

I'm guessing the bcache results would suggest there's a much higher
latency of inconsistency between write and read races, in the range of
minutes or even hours. So there'd be no chance to properly verify your
DIO writes by the following read and be sure that this state persists
- just because there might be outstanding bcache dirty data.

I wonder if this is why I'm seeing btrfs corructions with bcache when
I enabled auto-defrag in btrfs. OTOH, I didn't check the code on how
auto-defrag is actually implemented and if it uses some direct-io path
under the hoods.

Regards,
Kai