Linux-BTRFS Archive on lore.kernel.org
 help / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Christoph Anton Mitterer <calestyo@scientia.net>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7
Date: Sat, 16 Mar 2019 22:54:50 -0400
Message-ID: <20190317025450.GB16664@hungrycats.org> (raw)
In-Reply-To: <303507e634c36af048ac36c7233c3b2af8311b46.camel@scientia.net>

[-- Attachment #1: Type: text/plain, Size: 8858 bytes --]

On Sat, Mar 16, 2019 at 11:11:10PM +0100, Christoph Anton Mitterer wrote:
> On Fri, 2019-03-15 at 01:28 -0400, Zygo Blaxell wrote:
> > But maybe there should be something like a btrfs-announce list,
> > > i.e. a
> > > low volume mailing list, in which (interested) users are informed
> > > about
> > > more grave issues.
> > > …
> > I don't know if it would be a low-volume list...every kernel release
> > includes fixes for _some_ exotic corner case.
> 
> Well this one *may* be exotic for many users, but we have at least the
> use case of qemu which seems to be not that exotic at all.
> 
> And the ones you outline below seem even more common?
> 
> Also the other means for end-users to know whether something is stable
> or not like https://btrfs.wiki.kernel.org/index.php/Status don't seem
> to really work out.

It's hard to separate the signal from the noise.  I first detected
the 2018 bug in 2016, but didn't know it was a distinct bug until
after eliminating all the other corruption causes that occurred during
that time.  I am still tracking issue(s) in btrfs that bring servers
down multiple times a week, so I'm not in a hurry to declare any part
of btrfs stable yet.

When could we ever confidently say btrfs is stable?  Some filesystems
are 30 years old and still fixing bugs.  See you in 2037?

Now, that specific wiki page should probably be updated, since at least
one outstanding bug is now known.

> There is a known silent data corruption bug which seems so far only
> fixed in 5.1rc* ... and the page still says stable since 4.14.
> Even know with the fix, one should probably need to wait a year or so
> until one could mark it stable again if nothing had been found by then.

I sometimes use "it has been $N days since the last bug fix in $Y" as a
crude metric of how trustworthy code is.  adfs is 2913 days and counting!
ext2 is only 106 days.  btrfs and xfs seem to be competing for the lowest
value of N, never rising above a few dozen except around holidays and
conferences, with ext4 not far behind.

> So... if btrfs allows for direct IO... and if this isn't stable in some
> situations,... what can one do about it? I mean there doesn't seem to
> be an option to disallow it... 

Sure, but O_DIRECT is a performance/risk tradeoff.  If you ask someone who
uses csums or snapshots, they'll tell you btrfs should always put correct
data and checksums on disk, even if the application does something weird
and undefined like O_DIRECT.  If you ask someone who wants the O_DIRECT
performance, they'll tell you O_DIRECT should not waste time computing,
verifying, reading, or writing csums, nor should users expect correct
behavior from applications that don't follow the filesystem-specific
rules correctly (for some implied definition of how correct applications
should behave, because O_DIRECT is not a concrete specification), and that
includes permitting undetected data corruption to be persisted on disk.

> and any program can do O_DIRECT (without even knowing btrfs is below).

Most filesystems permit silent data corruption all of the time, so btrfs
is weird for disallowing silent data corruption some of the time.

> Guess I have to go deeper down the rabbit hole now for the other
> compressions bugs...
> 
> 
> > I found the 2017 compression bug in a lot of digital photographs.
> 
> Is there any way (apart from having correct checksums) to find out
> whether a file was affected by the 2017-bug?
> Like, I don't know,.. looking for large chunks of zeros?

You need to have an inline extent in the first 4096 bytes of the file
and data starting at 4096 bytes.  Normally that never happens, but it
is possible to construct files that way with the right sequences of
write(), seek(), and fsync().  They occur naturally in about one out
of every 100,000 'rsync -S' files which triggers a similar sequence of
operations internally in the kernel.

The symptom is that the corrupted file has uninitialized kernel memory in
the last bytes of the first 4096 byte block, when the correct file has
0 bytes there.  It turns out that uninitialized kernel memory is often
full of zeros anyway, so even "corrupted" files come out unchanged most
of the time.

If you don't know what is supposed to be in those bytes (either from
the file format, an uncorrupted copy of the file, or unexpected behavior
when the file is used) then there's no way to know they're wrong.

> And is there any more detailed information available on the 2017-bug,
> in the sense under which occasions it occurred?

The kernel commit message for the fix is quite detailed.

> Like also only on reads (which would mean again that I'd be mostly
> safe, because my checksums should mostly catch this)?

Only reads, and only files with a specific structure, and only at a
single specific location in the file.

> Or just on dedupe or hole punching? Or did it only affect sparse files
> (and there only the holes (blocks of zeros) as in your camera JPG
> example)?

You can't get the 2017 bug with dedupe--inline extents are not dedupable.
You do need a sparse file.

I didn't find the 2017 bug because of bees--I found it because of
rsync -S.

> > It turns out that several popular cameras (including some of the ones
> > I
> > own) put a big chunk of zeros near the beginnings of JPG files, and
> > when
> > rsync copies those it will insert a hole instead of copying the
> > zeros.
> 
> Many other types of files may have such bigger chunks of zeros to...
> basically everything that leaves place for meta-data.

Only contiguous chunks of 0 that end at byte 4096 can be affected.
0 anywhere else in the file is the domain of the 2018 bug.  Also 2017
replaces 0 with invalid data, while 2018 replaces valid data with 0.

> AFAIU, both cp and rsync (--sparse) don't create spares files actively
> per default,... cp (per default) only creates sparse files when it
> detects the source file to be already sparse.
> Same seems to be the case for tar, which only stores a file sparse
> (inside the archive) when --sparse is used.
> 
> So would one be safe from the 2017 bug if one haven't had sparse files
> and not activated the sparse in any of these tools?

Probably.  Even "unsafe" is less than a 1 in 100,000 event, so you're
often safe even when using triggering tools (especially if the system
is lightly loaded).  Lots of tools make sparse files.

> >   Most photo tools ignore this data completely,
> > so when garbage appears there, nobody notices.
> 
> So the 2017-bug meant that areas that should be zero were filled with
> garbage but everything als was preserved correclty

Yep.

> > I don't think I found an application that cared about the 2017 bug at
> > all.
> 
> Well for me it would be still helpful to know how to find out whether I
> might have been affected or not... I do have some really old backups so
> recovery would be possible in many cases.

You could compare those backups to current copies before discarding them.
Or build a SHA table and keep a copy of it on online media for verification.

> > The 2018 bug is a different story--when it hits, it's obvious, and
> > ordinary application things break
> 
> Which one to you mean now? The one recently fixed on
> reads+holepunching/dedupe/clone? Cause I thought that one was not not
> that obvious as it was silent...

Many applications will squawk if you delete 32K of data randomly from
the middle of their data files.  There are crashes, garbage output,
error messages, corrupted VM filesystem images (i.e. the guest's fsck
complains).  A lot of issues magically disappear after applying the
"2018" fix.

> Anything still known about the even older compression related
> corruption bugs that Filipe mentioned, in the sense when they occurred
> and how to find out whether one was affected?

Kernels from 2015 and earlier had assorted problems with compressed data.
It's difficult to distinguish between them, or isolate specific syndomes
to specific bug fixes.  Not all of them were silent--there was a bug
in 2014 that returned EIO instead of data when reading files affected
by the 2017 bug (that change in behavior was a good clue about where
to look for the 2017 fix).  One of the bugs eventually manifests itself
as a broken filesystem or a kernel panic when you write to an affected
area of a file.

It's more practical to just assume anything stored on btrfs with
compression on a kernel prior to 2015 is suspect until proven otherwise.
In 2014 and earlier, you have to start suspecting uncompressed data too.
Kernels between 2012 and 2014 crashed so often it was difficult to run
data integrity verification tests with a significant corpus size.

> 
> Thanks,
> Chris.
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply index

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23  3:11 Reproducer for "compressed data + hole data corruption bug, 2018 editiion" Zygo Blaxell
2018-08-23  5:10 ` Qu Wenruo
2018-08-23 16:44   ` Zygo Blaxell
2018-08-23 23:50     ` Qu Wenruo
2019-02-12  3:09 ` Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7 Zygo Blaxell
2019-02-12 15:33   ` Christoph Anton Mitterer
2019-02-12 15:35   ` Filipe Manana
2019-02-12 17:01     ` Zygo Blaxell
2019-02-12 17:56       ` Filipe Manana
2019-02-12 18:13         ` Zygo Blaxell
2019-02-13  7:24           ` Qu Wenruo
2019-02-13 17:36           ` Filipe Manana
2019-02-13 18:14             ` Filipe Manana
2019-02-14  1:22               ` Filipe Manana
2019-02-14  5:00                 ` Zygo Blaxell
2019-02-14 12:21                 ` Christoph Anton Mitterer
2019-02-15  5:40                   ` Zygo Blaxell
2019-03-04 15:34                     ` Christoph Anton Mitterer
2019-03-07 20:07                       ` Zygo Blaxell
2019-03-08 10:37                         ` Filipe Manana
2019-03-14 18:58                           ` Christoph Anton Mitterer
2019-03-14 20:22                           ` Christoph Anton Mitterer
2019-03-14 22:39                             ` Filipe Manana
2019-03-08 12:20                         ` Austin S. Hemmelgarn
2019-03-14 18:58                           ` Christoph Anton Mitterer
2019-03-14 18:58                         ` Christoph Anton Mitterer
2019-03-15  5:28                           ` Zygo Blaxell
2019-03-16 22:11                             ` Christoph Anton Mitterer
2019-03-17  2:54                               ` Zygo Blaxell [this message]
2019-02-15 12:02                   ` Filipe Manana
2019-03-04 15:46                     ` Christoph Anton Mitterer
2019-02-12 18:58       ` Andrei Borzenkov
2019-02-12 21:48         ` Chris Murphy
2019-02-12 22:11           ` Zygo Blaxell
2019-02-12 22:53             ` Chris Murphy
2019-02-13  2:46               ` Zygo Blaxell
2019-02-13  7:47   ` Roman Mamedov
2019-02-13  8:04     ` Qu Wenruo

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190317025450.GB16664@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=calestyo@scientia.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox