linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Anton Mitterer <calestyo@scientia.net>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7
Date: Sat, 16 Mar 2019 23:11:10 +0100	[thread overview]
Message-ID: <303507e634c36af048ac36c7233c3b2af8311b46.camel@scientia.net> (raw)
In-Reply-To: <20190315052827.GH23918@hungrycats.org>

On Fri, 2019-03-15 at 01:28 -0400, Zygo Blaxell wrote:
> But maybe there should be something like a btrfs-announce list,
> > i.e. a
> > low volume mailing list, in which (interested) users are informed
> > about
> > more grave issues.
> > …
> I don't know if it would be a low-volume list...every kernel release
> includes fixes for _some_ exotic corner case.

Well this one *may* be exotic for many users, but we have at least the
use case of qemu which seems to be not that exotic at all.

And the ones you outline below seem even more common?


Also the other means for end-users to know whether something is stable
or not like https://btrfs.wiki.kernel.org/index.php/Status don't seem
to really work out.

There is a known silent data corruption bug which seems so far only
fixed in 5.1rc* ... and the page still says stable since 4.14.
Even know with the fix, one should probably need to wait a year or so
until one could mark it stable again if nothing had been found by then.



> > What about the direct IO issues that may be still present and which
> > you've mentioned above... is this used somewhere per default /
> > under
> > normal circumstances?
> 
> Direct IO is an odd case because it's not all that well understood
> what the correct behavior is.  You can't prevent the kernel from
> making
> copies of data and also expect full data integrity and also lock-free
> performance, all at the same time.  Pick any two, and pay for it with
> losses in the third.
> 
> The bug fixes here are more along the lines of "OK so you're using
> direct
> IO which means you've basically admitted you don't care about *your*
> data,
> let's try not to corrupt *other* data on the filesystem at the same
> time."

So... if btrfs allows for direct IO... and if this isn't stable in some
situations,... what can one do about it? I mean there doesn't seem to
be an option to disallow it... and any program can do O_DIRECT (without
even knowing btrfs is below).




Guess I have to go deeper down the rabbit hole now for the other
compressions bugs...


> I found the 2017 compression bug in a lot of digital photographs.

Is there any way (apart from having correct checksums) to find out
whether a file was affected by the 2017-bug?
Like, I don't know,.. looking for large chunks of zeros?


And is there any more detailed information available on the 2017-bug,
in the sense under which occasions it occurred?

Like also only on reads (which would mean again that I'd be mostly
safe, because my checksums should mostly catch this)?

Or just on dedupe or hole punching? Or did it only affect sparse files
(and there only the holes (blocks of zeros) as in your camera JPG
example)?


> It turns out that several popular cameras (including some of the ones
> I
> own) put a big chunk of zeros near the beginnings of JPG files, and
> when
> rsync copies those it will insert a hole instead of copying the
> zeros.

Many other types of files may have such bigger chunks of zeros to...
basically everything that leaves place for meta-data.


> The 2017 bug affected "ordinary" holes so standard tools like cp and
> rsync could trigger it.

AFAIU, both cp and rsync (--sparse) don't create spares files actively
per default,... cp (per default) only creates sparse files when it
detects the source file to be already sparse.
Same seems to be the case for tar, which only stores a file sparse
(inside the archive) when --sparse is used.

So would one be safe from the 2017 bug if one haven't had sparse files
and not activated the sparse in any of these tools?


>   Most photo tools ignore this data completely,
> so when garbage appears there, nobody notices.

So the 2017-bug meant that areas that should be zero were filled with
garbage but everything als was preserved correclty



> I don't think I found an application that cared about the 2017 bug at
> all.

Well for me it would be still helpful to know how to find out whether I
might have been affected or not... I do have some really old backups so
recovery would be possible in many cases.



> The 2018 bug is a different story--when it hits, it's obvious, and
> ordinary application things break

Which one to you mean now? The one recently fixed on
reads+holepunching/dedupe/clone? Cause I thought that one was not not
that obvious as it was silent...


Anything still known about the even older compression related
corruption bugs that Filipe mentioned, in the sense when they occurred
and how to find out whether one was affected?


Thanks,
Chris.


  reply	other threads:[~2019-03-16 22:11 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-23  3:11 Reproducer for "compressed data + hole data corruption bug, 2018 editiion" Zygo Blaxell
2018-08-23  5:10 ` Qu Wenruo
2018-08-23 16:44   ` Zygo Blaxell
2018-08-23 23:50     ` Qu Wenruo
2019-02-12  3:09 ` Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7 Zygo Blaxell
2019-02-12 15:33   ` Christoph Anton Mitterer
2019-02-12 15:35   ` Filipe Manana
2019-02-12 17:01     ` Zygo Blaxell
2019-02-12 17:56       ` Filipe Manana
2019-02-12 18:13         ` Zygo Blaxell
2019-02-13  7:24           ` Qu Wenruo
2019-02-13 17:36           ` Filipe Manana
2019-02-13 18:14             ` Filipe Manana
2019-02-14  1:22               ` Filipe Manana
2019-02-14  5:00                 ` Zygo Blaxell
2019-02-14 12:21                 ` Christoph Anton Mitterer
2019-02-15  5:40                   ` Zygo Blaxell
2019-03-04 15:34                     ` Christoph Anton Mitterer
2019-03-07 20:07                       ` Zygo Blaxell
2019-03-08 10:37                         ` Filipe Manana
2019-03-14 18:58                           ` Christoph Anton Mitterer
2019-03-14 20:22                           ` Christoph Anton Mitterer
2019-03-14 22:39                             ` Filipe Manana
2019-03-08 12:20                         ` Austin S. Hemmelgarn
2019-03-14 18:58                           ` Christoph Anton Mitterer
2019-03-14 18:58                         ` Christoph Anton Mitterer
2019-03-15  5:28                           ` Zygo Blaxell
2019-03-16 22:11                             ` Christoph Anton Mitterer [this message]
2019-03-17  2:54                               ` Zygo Blaxell
2019-02-15 12:02                   ` Filipe Manana
2019-03-04 15:46                     ` Christoph Anton Mitterer
2019-02-12 18:58       ` Andrei Borzenkov
2019-02-12 21:48         ` Chris Murphy
2019-02-12 22:11           ` Zygo Blaxell
2019-02-12 22:53             ` Chris Murphy
2019-02-13  2:46               ` Zygo Blaxell
2019-02-13  7:47   ` Roman Mamedov
2019-02-13  8:04     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=303507e634c36af048ac36c7233c3b2af8311b46.camel@scientia.net \
    --to=calestyo@scientia.net \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).