Re: btrfs vs write caching firmware bugs (was: Re: BTRFS recovery not possible)

From: Chris Murphy <lists@colorremedies.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs vs write caching firmware bugs (was: Re: BTRFS recovery not possible)
Date: Mon, 24 Jun 2019 11:31:35 -0600	[thread overview]
Message-ID: <CAJCQCtRrT5pUxOxfKWTC=zt9E=ZxRaiLeBxngqc6YVQEYp8n_g@mail.gmail.com> (raw)
In-Reply-To: <f1cfe396-aac7-b670-b8de-f5d3b795acfe@gmx.com>

On Sun, Jun 23, 2019 at 7:52 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2019/6/24 上午4:45, Zygo Blaxell wrote:
> > I first observed these correlations back in 2016.  We had a lot of WD
> > Green and Black drives in service at the time--too many to replace or
> > upgrade them all early--so I looked for a workaround to force the
> > drives to behave properly.  Since it looked like a write ordering issue,
> > I disabled the write cache on drives with these firmware versions, and
> > found that the transid-verify filesystem failures stopped immediately
> > (they had been bi-weekly events with write cache enabled).
>
> So the worst scenario really happens in real world, badly implemented
> flush/fua from firmware.
> Btrfs has no way to fix such low level problem.

Right. The questions I have: should Btrfs (or any file system) be able
to detect such devices and still protect the data? i.e. for the file
system to somehow be more suspicious, without impacting performance,
and go read-only sooner so that at least read-only mount can work? Or
is this so much work for such a tiny edge case that it's not worth it?

Arguably the hardware is some kind of zombie saboteur. It's not
totally dead, it gives the impression that it's working most of the
time, and then silently fails to do what we think it should in an
extraordinary departure from specs and expectations.

Are there other failure cases that could look like this and therefore
worth handling? As storage stacks get more complicated with ever more
complex firmware, and firmware updates in the field, it might be
useful to have at least one file system that can detect such problems
sooner than others and go read-only to prevent further problems?

> BTW, do you have any corruption using the bad drivers (with write cache)
> with traditional journal based fs like XFS/EXT4?
>
> Btrfs is relying more the hardware to implement barrier/flush properly,
> or CoW can be easily ruined.
> If the firmware is only tested (if tested) against such fs, it may be
> the problem of the vendor.

I think we can definitely say this is a vendor problem. But the
question still is whether the file system as a role in at least
disqualifying hardware when it knows it's acting up before the file
system is thoroughly damaged?

I also wonder how ext4 and XFS will behave. In some ways they might
tolerate the problem without noticing it for longer, where instead of
kernel space recognizing it, it's actually user space / application
layer that gets confused first, if it's bogus data that's being
returned. Filesystem metadata is a relatively small target for such
corruption when the file system mostly does overwrites.

I also wonder how ZFS handles this. Both in the single device case,
and in the RAIDZ case.

-- 
Chris Murphy