On Thu, Mar 14, 2019 at 07:58:45PM +0100, Christoph Anton Mitterer wrote:
> Phew... too much [silent] corruption bugs in btrfs... :-(
> 
> Actually I didn't even notice the others (which unfortunately doesn't
> mean I'm definitely not affected), so I probably cannot much do/check
> about them now... but only about the "recent" one that was fixed now.
> 
> But maybe there should be something like a btrfs-announce list, i.e. a
> low volume mailing list, in which (interested) users are informed about
> more grave issues.
> Such things can happen and there's no one to blame about that... but if
> they happen it would be good for users to get notified so that they can
> check their systems and possibly recover data from (still existing)
> other sources.

I don't know if it would be a low-volume list...every kernel release
includes fixes for _some_ exotic corner case.

> What about the direct IO issues that may be still present and which
> you've mentioned above... is this used somewhere per default / under
> normal circumstances?

Direct IO is an odd case because it's not all that well understood
what the correct behavior is.  You can't prevent the kernel from making
copies of data and also expect full data integrity and also lock-free
performance, all at the same time.  Pick any two, and pay for it with
losses in the third.

The bug fixes here are more along the lines of "OK so you're using direct
IO which means you've basically admitted you don't care about *your* data,
let's try not to corrupt *other* data on the filesystem at the same time."

> I think by now I'm pretty confident that I, personally, am safe.

It took me two years to find this bug, and I had to write a tool to
encounter it often enough to notice.  A lot of people are safe.

> > If you had asked questions like "is this bug the reason why I've been
> > seeing random SHA hash verification failures for several years?" then
> > you should worry about this bug; otherwise, it probably didn't affect
> > you.
> 
> I think you're right... but my data with many thousands of pictures,
> etc. from all life is really precious to me, so I better wanted to
> understand the issue in "depth"... and I think these questions and your
> answers may still benefit others who may also want to find out whether
> they could have been silently affected :-)

I found the 2017 compression bug in a lot of digital photographs.
It turns out that several popular cameras (including some of the ones I
own) put a big chunk of zeros near the beginnings of JPG files, and when
rsync copies those it will insert a hole instead of copying the zeros.
The 2017 bug affected "ordinary" holes so standard tools like cp and
rsync could trigger it.  Most photo tools ignore this data completely,
so when garbage appears there, nobody notices.

A similar thing happens to .o files:  ld aligns things to 4K block
boundaries, triggering the 2017 compressed read bug.  Nobody reads that
data either--it's just alignment padding.

I don't think I found an application that cared about the 2017 bug at all.
Only backup verifications.

The 2018 bug is a different story--when it hits, it's obvious, and
ordinary application things break--but it won't happen to typical photo
image files, even with aggressive dedupe.

> 
> Cheers and thanks,
> Chris.
> 
>