All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs rare silent data corruption with kernel data leak
@ 2016-09-21  4:55 Zygo Blaxell
  2016-09-21 11:14 ` Paul Jones
  2016-10-08  6:10 ` btrfs rare silent data corruption with kernel data leak (updated with some bisection results) Zygo Blaxell
  0 siblings, 2 replies; 11+ messages in thread
From: Zygo Blaxell @ 2016-09-21  4:55 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 8512 bytes --]

Summary: 

There seem to be two btrfs bugs here: one loses data on writes,
and the other leaks data from the kernel to replace it on reads.  It all
happens after checksums are verified, so the corruption is entirely
silent--no EIO errors, kernel messages, or device event statistics.

Compressed extents are corrupted with kernel data leak.  Uncompressed
extents may not be corrupted, or may be corrupted by deterministically
replacing data bytes with zero, or may not be corrupted.  No preconditions
for corruption are known.  Less than one file per hundred thousand
seems to be affected.  Only specific parts of any file can be affected.
Kernels v4.0..v4.5.7 tested, all have the issue.

Background, observations, and analysis:

I've been detecting silent data corruption on btrfs for over a year.
Over time I've been improving data collection and controlling for
confounding factors (other known btrfs bugs, RAM and CPU failures, raid5,
etc).  I have recently isolated the most common remaining corruption mode,
and it seems to be a btrfs bug.

I don't have an easy recipe to create a corrupted file and I don't know
precisely how they come to exist.  In the wild, about one in 10^5..10^7
files is provably corrupted.  The corruption can only occur at one point
in each file so the rate of corruption incidents follows the number
of files.  It seems to occur most often to software builders and rsync
backup receivers.  It seems to happen mostly on busier machines with
mixed workloads and not at all on idle test VMs trying to reproduce this
issue with a script.

One way to get corruption is to set up a series of filesystems and rsync
/usr to them sequentially (i.e. rsync -a /usr /fs-A; rsync -a /fs-A /fs-B;
rsync -a /fs-B /fs-C; ...) and verify each copy by comparison afterwards.
The same host needs to be doing other filesystem workloads or it won't
seem to reproduce this issue.  It took me two weeks to intentionally
create one corrupt file this way.  Good luck.

In cases where this corruption mode is found, the files always have an
extent map following this pattern:

	# filefrag -v usr/share/icons/hicolor/icon-theme.cache
	Filesystem type is: 9123683e
	File size of usr/share/icons/hicolor/icon-theme.cache is 36456 (9 blocks of 4096 bytes)
	 ext:     logical_offset:        physical_offset: length:   expected: flags:
	   0:        0..    4095:          0..      4095:   4096:             encoded,not_aligned,inline
	   1:        1..       8:  182785288.. 182785295:      8:          1: last,encoded,shared,eof
	usr/share/icons/hicolor/icon-theme.cache: 2 extents found

Note the first inline extent followed by one or more non-inline
extents.  I don't know enough about the writing side of btrfs to know
if this is a bug in and of itself.  It _looks_ wrong to me.

Once such an extent is created, the corruption is persistent but not
deterministic.  When I read the extent through btrfs, the file is
different most of the time:

	# cp usr/share/icons/hicolor/icon-theme.cache /tmp/foo
	# ls -l usr/share/icons/hicolor/icon-theme.cache /tmp/foo
	-rw-r--r-- 1 root root 36456 Sep 20 11:41 /tmp/foo
	-rw-r--r-- 1 root root 36456 Sep  6 11:52 usr/share/icons/hicolor/icon-theme.cache
	# while sysctl vm.drop_caches=1; do cmp -l usr/share/icons/hicolor/icon-theme.cache /tmp/foo; done
	vm.drop_caches = 1
	vm.drop_caches = 1
	 4093 213   0
	 4094 177   0
	vm.drop_caches = 1
	 4093 216   0
	 4094  33   0
	 4095 173   0
	 4096  15   0
	vm.drop_caches = 1
	 4093 352   0
	 4094   3   0
	 4095  37   0
	 4096   2   0
	vm.drop_caches = 1
	 4093 243   0
	 4094 372   0
	 4095 154   0
	 4096 221   0
	vm.drop_caches = 1
	 4093 333   0
	 4094 170   0
	 4095 356   0
	 4096 213   0
	vm.drop_caches = 1
	 4093 170   0
	 4094 155   0
	 4095  62   0
	 4096 233   0
	vm.drop_caches = 1
	 4093 263   0
	 4094   6   0
	 4095 363   0
	 4096  44   0
	vm.drop_caches = 1
	 4093 237   0
	 4094 330   0
	 4095 217   0
	 4096 206   0
	^C

In other runs there can be 5 or more consecutive reads with no differences
detected.

I fetched the raw inline extent item for this file through the SEARCH_V2
ioctl and decoded it:

	# head /tmp/bar
	27 5e 06 00 00 00 00 00 [generation 417319]
	fc 0f 00 00 00 00 00 00 [ram_bytes = 0xffc, compression = 1]
	01 00 00 00 00 78 5e 9c [zlib data starts at "78 5e..."]
	97 3d 74 14 55 14 c7 6f
	60 77 b3 9f d9 20 20 08
	28 11 22 a0 66 90 8f a0
	a8 01 a2 80 80 a2 20 e6
	28 20 42 26 bb 93 cd 30
	b3 33 9b d9 99 24 62 d4
	20 f8 51 58 58 50 58 58

Notice ram_bytes is 0xffc, or 4092, but the inline extent's position in
the file covers the offset range 0..4095.

When an inline extent is read in btrfs, any difference between the read
buffer page size and the size of the data should be memset to zero.
For uncompressed extents, the memset target size is PAGE_CACHE_SIZE in
btrfs_get_extent.  For compressed extents, the decompression function
is passed the ram_bytes field from the extent as the size of the buffer.

Unfortunately, in this case, ram_bytes is only 4092 bytes.  The inline
extent is not the last extent in the file, so read() can retrieve data
beyond the end of the extent.  Ideally this data comes from the next
extent, but the next extent's offset (4096) is 4 bytes later.  The last
4 bytes of the first page of the file end up with uninitialized data.
vm.drop_caches triggers an aggressive nondeterminstic rearrangement of
buffers in physical kernel memory, which would result in different data
on each read.

If I extract the zlib compressed data from the inline extent item, I
can verify that the compressed data decompresses OK and is really 4092
bytes long:

	# perl -MCompress::Zlib -e '$/=undef; open(BAR, "/tmp/bar"); $x = <BAR>; for my $y (split(" ", $x)) { $z .= chr(hex($y)); } print uncompress(substr($z, 21))' | hd | diff -u - <(hd /tmp/foo) | head
	--- -   2016-09-20 23:40:41.168981367 -0400
	+++ /dev/fd/63  2016-09-20 23:40:41.167445549 -0400
	@@ -253,5 +253,2028 @@
	 00000fc0  00 00 00 00 00 09 00 04  00 00 00 00 00 01 00 04  |................|
	 00000fd0  00 00 00 00 00 00 10 20  00 00 0f e0 00 00 0f ec  |....... ........|
	 00000fe0  6b 73 71 75 61 72 65 73  00 00 00 00 00 00 00 06  |ksquares........|
	-00000ff0  00 38 00 04 00 00 00 00  00 30 00 04              |.8.......0..|
	-00000ffc
	+00000ff0  00 38 00 04 00 00 00 00  00 30 00 04 00 00 00 00  |.8.......0......|
	+00001000  00 24 00 04 00 00 00 00  00 13 00 04 00 00 00 00  |.$..............|

I have not found instances of this bug involving uncompressed extents.
Uncompressed extents may have deterministic data corruption (all missing
bytes replaced with zero) without the kernel data leak, or they may not
be corrupted at all.

In the wild I've encountered corrupted files with errors as long as
3000 bytes in the first page.  At the time the data wasn't clean enough
to make a statement about whether all of the bytes in the uncorrupted
version of the files were zero.  The vast majority of the time one side
or the other of the comparison was all-zero, but my testing environment
was not set up to reliably identify which version of the affected files
was the correct one or separate this corruption mode from other modes.

What next:

The bug where ram_bytes is trusted instead of calculating an acceptable
output buffer size should be fixed to prevent the kernel data leak
(not to mention possible fuzzing vulnerabilities).

The bug that is causing broken inline extents to be created needs to
be fixed.

What do we do with all the existing broken inline extents on filesystems?
We could detect this case and return EIO.  Since some of the data is
missing, we can't guess what the missing data was, and we can't attest
to userspace that we have read it all correctly.

If we can *prove* that the writing side of this bug *only* occurs in
cases when the missing data is zero (e.g because we find it is triggered
only by a sequence like "create/truncate write(4092) lseek(+4) write"
so the missing data is a hole) then we can safely fill in the missing
data with zeros.  The low rate of occurrence of the bug means that even
a high false positive EIO rate is still a low absolute rate.

Maybe it's enough to assume the missing data is zero, and issue a release
note telling people to verify and correct their own data after applying
the bug fix to prevent any more corrupted writes.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: btrfs rare silent data corruption with kernel data leak
  2016-09-21  4:55 btrfs rare silent data corruption with kernel data leak Zygo Blaxell
@ 2016-09-21 11:14 ` Paul Jones
  2016-09-21 13:02   ` Zygo Blaxell
  2016-09-22 20:42   ` Chris Mason
  2016-10-08  6:10 ` btrfs rare silent data corruption with kernel data leak (updated with some bisection results) Zygo Blaxell
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Jones @ 2016-09-21 11:14 UTC (permalink / raw)
  To: Zygo Blaxell, linux-btrfs

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of Zygo Blaxell
> Sent: Wednesday, 21 September 2016 2:56 PM
> To: linux-btrfs@vger.kernel.org
> Subject: btrfs rare silent data corruption with kernel data leak
> 
> Summary:
> 
> There seem to be two btrfs bugs here: one loses data on writes, and the
> other leaks data from the kernel to replace it on reads.  It all happens after
> checksums are verified, so the corruption is entirely silent--no EIO errors,
> kernel messages, or device event statistics.
> 
> Compressed extents are corrupted with kernel data leak.  Uncompressed
> extents may not be corrupted, or may be corrupted by deterministically
> replacing data bytes with zero, or may not be corrupted.  No preconditions
> for corruption are known.  Less than one file per hundred thousand seems to
> be affected.  Only specific parts of any file can be affected.
> Kernels v4.0..v4.5.7 tested, all have the issue.

Funny you should bring this up - I think I just suffered from this, or something similar.

I have a mysql database of around 20 GiB which is under relatively heavy workload for weeks at a time. 
I just remembered today that I still had my root partition using compression from when disk space was an issue about 4 months ago. I removed the compress mount option, upgraded the kernel (from 4.7.2 to 4.7.4) and rebooted. Mysql came up properly on reboot.
I stopped mysql, ran "btrfs filesystem defragment -v -r -c none /var/lib/mysql" to remove the compression and it finished reporting 1 error, but without any actual error messages.
However Mysql now wouldn't come back up. Remembering what I read earlier today I thought "oh no...." and checked dmesg:

[  539.166231] BTRFS warning (device sda1): csum failed ino 42906332 off 81920 csum 2566472073 expected csum 1967602629
[  539.166856] BTRFS warning (device sda1): csum failed ino 42906332 off 81920 csum 2566472073 expected csum 1967602629
[  539.166865] BTRFS warning (device sda1): csum failed ino 42906332 off 94208 csum 2566472073 expected csum 1625955513
[  539.166908] BTRFS warning (device sda1): csum failed ino 42906332 off 94208 csum 2566472073 expected csum 1625955513
[  539.167553] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum 2566472073 expected csum 3995365962
[  539.168234] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum 2566472073 expected csum 3995365962
[  539.168239] BTRFS warning (device sda1): csum failed ino 42906332 off 4096 csum 2566472073 expected csum 3937913037
[  539.168282] BTRFS warning (device sda1): csum failed ino 42906332 off 4096 csum 2566472073 expected csum 3937913037
[  539.168286] BTRFS warning (device sda1): csum failed ino 42906332 off 8192 csum 2566472073 expected csum 1100728286
[  539.168328] BTRFS warning (device sda1): csum failed ino 42906332 off 8192 csum 2566472073 expected csum 1100728286
[  612.832463] __readpage_endio_check: 2 callbacks suppressed
[  612.832466] BTRFS warning (device sda1): csum failed ino 42906332 off 81920 csum 2566472073 expected csum 1967602629
[  612.833160] BTRFS warning (device sda1): csum failed ino 42906332 off 81920 csum 2566472073 expected csum 1967602629
[  612.833167] BTRFS warning (device sda1): csum failed ino 42906332 off 94208 csum 2566472073 expected csum 1625955513
[  612.833202] BTRFS warning (device sda1): csum failed ino 42906332 off 94208 csum 2566472073 expected csum 1625955513
[  612.833863] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum 2566472073 expected csum 3995365962
[  612.834549] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum 2566472073 expected csum 3995365962
[  612.834555] BTRFS warning (device sda1): csum failed ino 42906332 off 4096 csum 2566472073 expected csum 3937913037
[  612.834602] BTRFS warning (device sda1): csum failed ino 42906332 off 4096 csum 2566472073 expected csum 3937913037
[  612.834608] BTRFS warning (device sda1): csum failed ino 42906332 off 8192 csum 2566472073 expected csum 1100728286
[  612.834652] BTRFS warning (device sda1): csum failed ino 42906332 off 8192 csum 2566472073 expected csum 1100728286

Using debug tree I found inode 42906332 was the file ibdata1

I tried to copy the mysql directory elsewhere, but that caused io failures in a few files so I just removed the whole lot and restored from last nights backup.
These are the errors I got before I cancelled the copy:

[ 1284.349881] __readpage_endio_check: 2 callbacks suppressed
[ 1284.349885] BTRFS warning (device sda1): csum failed ino 42906332 off 0 csum 2566472073 expected csum 3995365962
[ 1284.349901] BTRFS warning (device sda1): csum failed ino 42906332 off 65536 csum 2566472073 expected csum 3704130384
[ 1284.349906] BTRFS warning (device sda1): csum failed ino 42906332 off 126976 csum 2566472073 expected csum 254392532
[ 1284.349911] BTRFS warning (device sda1): csum failed ino 42906332 off 8192 csum 2566472073 expected csum 1100728286
[ 1284.349913] BTRFS warning (device sda1): csum failed ino 42906332 off 77824 csum 2566472073 expected csum 716549262
[ 1284.349923] BTRFS warning (device sda1): csum failed ino 42906332 off 131072 csum 2566472073 expected csum 788300917
[ 1284.349925] BTRFS warning (device sda1): csum failed ino 42906332 off 12288 csum 2566472073 expected csum 3265258934
[ 1284.349926] BTRFS warning (device sda1): csum failed ino 42906332 off 81920 csum 2566472073 expected csum 1967602629
[ 1284.349930] BTRFS warning (device sda1): csum failed ino 42906332 off 192512 csum 2566472073 expected csum 2025572636
[ 1284.349934] BTRFS warning (device sda1): csum failed ino 42906332 off 258048 csum 2566472073 expected csum 3392889013
[ 1298.892667] BTRFS info (device sda1): csum failed ino 44628191 extent 228384051200 csum 2566472073 wanted 847116788 mirror 0
[ 1298.892727] BTRFS info (device sda1): csum failed ino 44628191 extent 228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892732] BTRFS info (device sda1): csum failed ino 44628191 extent 228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892751] BTRFS info (device sda1): csum failed ino 44628191 extent 228384051200 csum 2566472073 wanted 847116788 mirror 2
[ 1298.892786] BTRFS info (device sda1): csum failed ino 44628191 extent 228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892792] BTRFS info (device sda1): csum failed ino 44628191 extent 228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892805] BTRFS info (device sda1): csum failed ino 44628191 extent 228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1298.892849] BTRFS info (device sda1): csum failed ino 44628191 extent 228384051200 csum 2566472073 wanted 847116788 mirror 0
[ 1298.892896] BTRFS info (device sda1): csum failed ino 44628191 extent 228383002624 csum 2566472073 wanted 847116788 mirror 1
[ 1311.456422] __readpage_endio_check: 4430 callbacks suppressed
[ 1311.456425] BTRFS warning (device sda1): csum failed ino 44628192 off 3221225472 csum 2566472073 expected csum 3669189289
[ 1311.456442] BTRFS warning (device sda1): csum failed ino 44628192 off 3221229568 csum 2566472073 expected csum 317582346
[ 1311.456451] BTRFS warning (device sda1): csum failed ino 44628192 off 3221233664 csum 2566472073 expected csum 1636016048
[ 1311.456459] BTRFS warning (device sda1): csum failed ino 44628192 off 3221237760 csum 2566472073 expected csum 95857614
[ 1311.456467] BTRFS warning (device sda1): csum failed ino 44628192 off 3221241856 csum 2566472073 expected csum 2014942236
[ 1311.456482] BTRFS warning (device sda1): csum failed ino 44628192 off 3221254144 csum 2566472073 expected csum 1884694409
[ 1311.456540] BTRFS warning (device sda1): csum failed ino 44628192 off 3222274048 csum 2566472073 expected csum 2741402016
[ 1311.456542] BTRFS warning (device sda1): csum failed ino 44628192 off 3222339584 csum 2566472073 expected csum 3503993973
[ 1311.456545] BTRFS warning (device sda1): csum failed ino 44628192 off 3222405120 csum 2566472073 expected csum 3548745998
[ 1311.456551] BTRFS warning (device sda1): csum failed ino 44628192 off 3222470656 csum 2566472073 expected csum 2988893031

I'm seeing a lot of checksum 2566472073 - Is that the checksum of blank space I wonder?

Here are the details of the filesystem concerned:

vm-server mysql # btrfs fi show /
Label: 'Root'  uuid: 58d27dbd-7c1e-4ef7-8d43-e93df1537b08
        Total devices 2 FS bytes used 103.21GiB
        devid   13 size 471.93GiB used 245.03GiB path /dev/sda1
        devid   14 size 471.93GiB used 245.03GiB path /dev/sdb1

vm-server mysql # btrfs fi df /
Data, RAID1: total=242.00GiB, used=102.42GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=3.00GiB, used=811.02MiB
GlobalReserve, single: total=272.00MiB, used=0.00B

/dev/sda1 on / type btrfs (rw,noatime,ssd,discard,noacl,space_cache=v2,subvolid=5,subvol=/)
(compress was enabled previously)


Regards,
Paul





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak
  2016-09-21 11:14 ` Paul Jones
@ 2016-09-21 13:02   ` Zygo Blaxell
  2016-09-22 17:49     ` Kai Krakow
  2016-09-22 20:42   ` Chris Mason
  1 sibling, 1 reply; 11+ messages in thread
From: Zygo Blaxell @ 2016-09-21 13:02 UTC (permalink / raw)
  To: Paul Jones; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2796 bytes --]

On Wed, Sep 21, 2016 at 11:14:35AM +0000, Paul Jones wrote:
> > -----Original Message-----
> > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> > owner@vger.kernel.org] On Behalf Of Zygo Blaxell
> > Sent: Wednesday, 21 September 2016 2:56 PM
> > To: linux-btrfs@vger.kernel.org
> > Subject: btrfs rare silent data corruption with kernel data leak
> > 
> > Summary:
> > 
> > There seem to be two btrfs bugs here: one loses data on writes, and the
> > other leaks data from the kernel to replace it on reads.  It all happens after
> > checksums are verified, so the corruption is entirely silent--no EIO errors,
> > kernel messages, or device event statistics.
> > 
> > Compressed extents are corrupted with kernel data leak.  Uncompressed
> > extents may not be corrupted, or may be corrupted by deterministically
> > replacing data bytes with zero, or may not be corrupted.  No preconditions
> > for corruption are known.  Less than one file per hundred thousand seems to
> > be affected.  Only specific parts of any file can be affected.
> > Kernels v4.0..v4.5.7 tested, all have the issue.
> 
> Funny you should bring this up - I think I just suffered from this, or something similar.
[...snip...]
> [ 1311.456545] BTRFS warning (device sda1): csum failed ino 44628192 off 3222405120 csum 2566472073 expected csum 3548745998
> [ 1311.456551] BTRFS warning (device sda1): csum failed ino 44628192 off 3222470656 csum 2566472073 expected csum 2988893031
> 
> I'm seeing a lot of checksum 2566472073 - Is that the checksum of blank space I wonder?

The issue I found occurs after successful checksum validation, and
can affect only the first 4096 bytes of a file.  It produces no kernel
messages and no checksum failures.  If you run 'filefrag -v' on the file,
the corrupted extent *must* have the 'inline' flag; otherwise, it's a
different problem.

You have csum failures, and the affected offsets are way beyond 4096.

Holes are not checksummed in general because they are gaps in the
file offset address space.  Metadata records that describe holes are
checksummed, but when those checksums fail the kernel messages look
different.

I don't recognize the symptoms you are having.  After eliminating
hardware problems and making sure your kernel is up to date, I'd try
changing mount options...

> Here are the details of the filesystem concerned:
[...]
> /dev/sda1 on / type btrfs (rw,noatime,ssd,discard,noacl,space_cache=v2,subvolid=5,subvol=/)
> (compress was enabled previously)

...discard and space_cache=v2 in particular.  Free space tree is a new
feature, and discard has had a few bug fixes over the years.  Both are
non-default options.

Speaking of mount options, 'max_inline=0' may prevent the issue I found.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak
  2016-09-21 13:02   ` Zygo Blaxell
@ 2016-09-22 17:49     ` Kai Krakow
  2016-09-22 19:35       ` Christoph Anton Mitterer
  0 siblings, 1 reply; 11+ messages in thread
From: Kai Krakow @ 2016-09-22 17:49 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 4124 bytes --]

Am Wed, 21 Sep 2016 09:02:09 -0400
schrieb Zygo Blaxell <ce3g8jdj@umail.furryterror.org>:

> On Wed, Sep 21, 2016 at 11:14:35AM +0000, Paul Jones wrote:
> > > -----Original Message-----
> > > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> > > owner@vger.kernel.org] On Behalf Of Zygo Blaxell
> > > Sent: Wednesday, 21 September 2016 2:56 PM
> > > To: linux-btrfs@vger.kernel.org
> > > Subject: btrfs rare silent data corruption with kernel data leak
> > > 
> > > Summary:
> > > 
> > > There seem to be two btrfs bugs here: one loses data on writes,
> > > and the other leaks data from the kernel to replace it on reads.
> > > It all happens after checksums are verified, so the corruption is
> > > entirely silent--no EIO errors, kernel messages, or device event
> > > statistics.
> > > 
> > > Compressed extents are corrupted with kernel data leak.
> > > Uncompressed extents may not be corrupted, or may be corrupted by
> > > deterministically replacing data bytes with zero, or may not be
> > > corrupted.  No preconditions for corruption are known.  Less than
> > > one file per hundred thousand seems to be affected.  Only
> > > specific parts of any file can be affected. Kernels v4.0..v4.5.7
> > > tested, all have the issue.  
> > 
> > Funny you should bring this up - I think I just suffered from this,
> > or something similar.  
> [...snip...]
> > [ 1311.456545] BTRFS warning (device sda1): csum failed ino
> > 44628192 off 3222405120 csum 2566472073 expected csum 3548745998
> > [ 1311.456551] BTRFS warning (device sda1): csum failed ino
> > 44628192 off 3222470656 csum 2566472073 expected csum 2988893031
> > 
> > I'm seeing a lot of checksum 2566472073 - Is that the checksum of
> > blank space I wonder?  
> 
> The issue I found occurs after successful checksum validation, and
> can affect only the first 4096 bytes of a file.  It produces no kernel
> messages and no checksum failures.  If you run 'filefrag -v' on the
> file, the corrupted extent *must* have the 'inline' flag; otherwise,
> it's a different problem.
> 
> You have csum failures, and the affected offsets are way beyond 4096.
> 
> Holes are not checksummed in general because they are gaps in the
> file offset address space.  Metadata records that describe holes are
> checksummed, but when those checksums fail the kernel messages look
> different.
> 
> I don't recognize the symptoms you are having.  After eliminating
> hardware problems and making sure your kernel is up to date, I'd try
> changing mount options...
> 
> > Here are the details of the filesystem concerned:  
> [...]
> > /dev/sda1 on / type btrfs
> > (rw,noatime,ssd,discard,noacl,space_cache=v2,subvolid=5,subvol=/)
> > (compress was enabled previously)  
> 
> ...discard and space_cache=v2 in particular.  Free space tree is a new
> feature, and discard has had a few bug fixes over the years.  Both are
> non-default options.
> 
> Speaking of mount options, 'max_inline=0' may prevent the issue I
> found.

This sounds a lot like problems I've seen with my virtualbox images.
They just accumulate csum errors over time, especially after
defragmenting. No matter if discard was enabled or not.

The only common option is compression, I'm using compress=lzo.

I made the files nocow which of course disables compression and
probably results in not triggering autodefrag. Of course this also
disables checksums so you won't see data corruption as a direct error
in the FS. But the vm images are running fine so far without any
problems. Data integrity and data safety looks healthy.

You probably want to nonow mysql anyways. I think mysql data files have
their own checksumming, copy on write, and transactions. There's really
no point in doing it twice.

So it seems that btrfs either can damage files on defragmentation or
that compression has a bug with files that are heavily modified inline
(thus breaking the file into a lot of extents) - which probably has bad
effects on compressed extents, too.

-- 
Regards,
Kai

Replies to list-only preferred.

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak
  2016-09-22 17:49     ` Kai Krakow
@ 2016-09-22 19:35       ` Christoph Anton Mitterer
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Anton Mitterer @ 2016-09-22 19:35 UTC (permalink / raw)
  To: Kai Krakow, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

On Thu, 2016-09-22 at 19:49 +0200, Kai Krakow wrote:
> I think mysql data files
> have
> their own checksumming
Last time I've checked it, none of the major DBs or VM-image formats
had this... postgresql being the only one supporting something close to
fs level csumming (but again not per default).

mysql has AFAICS only https://dev.mysql.com/doc/refman/5.7/en/checksum-
table.html
which requires program support.


Cheers,
Chris.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5930 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak
  2016-09-21 11:14 ` Paul Jones
  2016-09-21 13:02   ` Zygo Blaxell
@ 2016-09-22 20:42   ` Chris Mason
  2016-09-24  2:28     ` Zygo Blaxell
  2016-09-28 16:18     ` [PATCH][RFC] btrfs rare silent data corruption with kernel data leak (updated, preliminary patch) Zygo Blaxell
  1 sibling, 2 replies; 11+ messages in thread
From: Chris Mason @ 2016-09-22 20:42 UTC (permalink / raw)
  To: Paul Jones, Zygo Blaxell, linux-btrfs

On 09/21/2016 07:14 AM, Paul Jones wrote:
>> -----Original Message-----
>> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
>> owner@vger.kernel.org] On Behalf Of Zygo Blaxell
>> Sent: Wednesday, 21 September 2016 2:56 PM
>> To: linux-btrfs@vger.kernel.org
>> Subject: btrfs rare silent data corruption with kernel data leak
>>
>> Summary:
>>
>> There seem to be two btrfs bugs here: one loses data on writes, and the
>> other leaks data from the kernel to replace it on reads.  It all happens after
>> checksums are verified, so the corruption is entirely silent--no EIO errors,
>> kernel messages, or device event statistics.
>>
>> Compressed extents are corrupted with kernel data leak.  Uncompressed
>> extents may not be corrupted, or may be corrupted by deterministically
>> replacing data bytes with zero, or may not be corrupted.  No preconditions
>> for corruption are known.  Less than one file per hundred thousand seems to
>> be affected.  Only specific parts of any file can be affected.
>> Kernels v4.0..v4.5.7 tested, all have the issue.

Zygo, could you please bounce me your original email?  Somehow exchange 
ate it.

If you're seeing this databases that use fsync, it could be related to 
the fsync fix I put into the last RC.  On my boxes it caused crashes, 
but memory corruptions aren't impossible.

Any chance you can do a controlled experiment to rule out compression?

-chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak
  2016-09-22 20:42   ` Chris Mason
@ 2016-09-24  2:28     ` Zygo Blaxell
  2016-09-28 16:18     ` [PATCH][RFC] btrfs rare silent data corruption with kernel data leak (updated, preliminary patch) Zygo Blaxell
  1 sibling, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2016-09-24  2:28 UTC (permalink / raw)
  To: Chris Mason; +Cc: Paul Jones, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3210 bytes --]

On Thu, Sep 22, 2016 at 04:42:06PM -0400, Chris Mason wrote:
> On 09/21/2016 07:14 AM, Paul Jones wrote:
> >>-----Original Message-----
> >>From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> >>owner@vger.kernel.org] On Behalf Of Zygo Blaxell
> >>Sent: Wednesday, 21 September 2016 2:56 PM
> >>To: linux-btrfs@vger.kernel.org
> >>Subject: btrfs rare silent data corruption with kernel data leak
> >>
> >>Summary:
> >>
> >>There seem to be two btrfs bugs here: one loses data on writes, and the
> >>other leaks data from the kernel to replace it on reads.  It all happens after
> >>checksums are verified, so the corruption is entirely silent--no EIO errors,
> >>kernel messages, or device event statistics.
> >>
> >>Compressed extents are corrupted with kernel data leak.  Uncompressed
> >>extents may not be corrupted, or may be corrupted by deterministically
> >>replacing data bytes with zero, or may not be corrupted.  No preconditions
> >>for corruption are known.  Less than one file per hundred thousand seems to
> >>be affected.  Only specific parts of any file can be affected.
> >>Kernels v4.0..v4.5.7 tested, all have the issue.
> 
> Zygo, could you please bounce me your original email?  Somehow exchange ate
> it.

Resent (also archived at http://www.spinics.net/lists/linux-btrfs/msg59102.html).

> If you're seeing this databases that use fsync, it could be related to the
> fsync fix I put into the last RC.  On my boxes it caused crashes, but memory
> corruptions aren't impossible.

Did you mean cbd60aa7cd "Btrfs: remove root_log_ctx from ctx list before
btrfs_sync_log returns"?  If so, I can cherry-pick that onto a previously
tested kernel and see what happens.

> Any chance you can do a controlled experiment to rule out compression?

It took two weeks of mining to find one extent with compression.
I've never seen one that wasn't compressed, which (I'm assuming) means
they're even harder to find if they exist at all.

> -chris

TBH I'm more concerned about the logic here, which came up the last time
inline extents at the beginning of multi-extent files caused problems:

btrfs_get_extent(...) {

	existing = search_extent_mapping(em_tree, start, len);
	/*
	 * existing will always be non-NULL, since there must be
	 * extent causing the -EEXIST.
	 */
	if (existing->start == em->start &&
	    extent_map_end(existing) == extent_map_end(em) &&
	    em->block_start == existing->block_start) {
		/*
		 * these two extents are the same, it happens
		 * with inlines especially
		 */
		free_extent_map(em);
		em = existing;
		err = 0;

	} else if (start >= extent_map_end(existing) ||
	    start <= existing->start) {
		/*
		 * The existing extent map is the one nearest to
		 * the [start, start + len) range which overlaps
		 */
		err = merge_extent_mapping(em_tree, existing,
					   em, start);
		free_extent_map(existing);
		if (err) {   
			free_extent_map(em);
			em = NULL;
		}
	} else {
		free_extent_map(em);
		em = existing;
		err = 0;
	}

What is extent_map_end for an inline extent?  Could we end up in the
wrong branch of the if statement?  Is this code relevant at all?


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][RFC] btrfs rare silent data corruption with kernel data leak (updated, preliminary patch)
  2016-09-22 20:42   ` Chris Mason
  2016-09-24  2:28     ` Zygo Blaxell
@ 2016-09-28 16:18     ` Zygo Blaxell
  1 sibling, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2016-09-28 16:18 UTC (permalink / raw)
  To: Chris Mason; +Cc: Paul Jones, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 8311 bytes --]

On Thu, Sep 22, 2016 at 04:42:06PM -0400, Chris Mason wrote:
> On 09/21/2016 07:14 AM, Paul Jones wrote:
> >>-----Original Message-----
> >>From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> >>owner@vger.kernel.org] On Behalf Of Zygo Blaxell
> >>Sent: Wednesday, 21 September 2016 2:56 PM
> >>To: linux-btrfs@vger.kernel.org
> >>Subject: btrfs rare silent data corruption with kernel data leak
> >>
> >>Summary:
> >>
> >>There seem to be two btrfs bugs here: one loses data on writes, and the
> >>other leaks data from the kernel to replace it on reads.  It all happens after
> >>checksums are verified, so the corruption is entirely silent--no EIO errors,
> >>kernel messages, or device event statistics.
> >>
> >>Compressed extents are corrupted with kernel data leak.  Uncompressed
> >>extents may not be corrupted, or may be corrupted by deterministically
> >>replacing data bytes with zero, or may not be corrupted.  No preconditions
> >>for corruption are known.  Less than one file per hundred thousand seems to
> >>be affected.  Only specific parts of any file can be affected.
> >>Kernels v4.0..v4.5.7 tested, all have the issue.
> 
> Zygo, could you please bounce me your original email?  Somehow exchange ate
> it.
> 
> If you're seeing this databases that use fsync, it could be related to the
> fsync fix I put into the last RC.  On my boxes it caused crashes, but memory
> corruptions aren't impossible.

The corruption pattern doesn't look like generic memory corruption.  Data in
the inline extents is never wrong.  Only the data after the end of the inline
extent, and the correct data in those file offsets is always zero.

> Any chance you can do a controlled experiment to rule out compression?

I get uncompressed inline extents, but so far I haven't found any of those
that read corrupted data.

I've tested 4.7.5 and it has the same corruption problem (among some others
that make it hard to use for testing).

The trigger seems to be the '-S' option to rsync, which causes a lot of
short writes with seeks between.  When there is a seek from within the
first 4096 bytes to outside of the first 4096 bytes, an inline extent
_can_ occur--but does not most of the time.

Normally, the inline extent disappears in this sequence of operations:

	# head -c 4000 /usr/share/doc/ssh/copyright > f
	# filefrag -v f
	Filesystem type is: 9123683e
	File size of f is 4000 (1 block of 4096 bytes)
	 ext:     logical_offset:        physical_offset: length:   expected: flags:
	   0:        0..    4095:          0..      4095:   4096:             last,not_aligned,inline,eof
	f: 1 extent found
	# head -c 4000 /usr/share/doc/ssh/copyright | dd conv=notrunc seek=1 bs=4k of=f
	0+1 records in
	0+1 records out
	4000 bytes (4.0 kB) copied, 0.00770182 s, 519 kB/s
	# filefrag -v f
	Filesystem type is: 9123683e
	File size of f is 8096 (2 blocks of 4096 bytes)
	 ext:     logical_offset:        physical_offset: length:   expected: flags:
	   0:        0..    4095:          0..      4095:   4096:             not_aligned,inline
	   1:        1..       1:          0..         0:      1:          1: last,unknown_loc,delalloc,eof
	f: 2 extents found
	# sync
	# filefrag -v f
	Filesystem type is: 9123683e
	File size of f is 8096 (2 blocks of 4096 bytes)
	 ext:     logical_offset:        physical_offset: length:   expected: flags:
	   0:        0..       1:    1368948..   1368949:      2:             last,encoded,eof
	f: 1 extent found
	# head -c 4000 /usr/share/doc/ssh/copyright > f

but very rarely (p = 0.00001), the inline extent doesn't go away,
and we get an inline extent followed by more extents (see filefrag
example below).

The inline extents appear with and without compression; however, I
have not been able to find cases where corruption occurs without
compression so far.

Probing a little deeper shows that the inline extent is always shorter
than 4096 bytes, and corruption always happens in the gap between the
end of the inline extent data and the 4096th byte in the following page.

It looks like the data is OK on disk.  It is just some part of the read
path for compressed extents that injects uninitialized data on read.
Since kernel memory is often filled with zeros, the data is read correctly
much of the time by sheer chance.  Existing data could be read correctly
with a kernel patch.

This reproducer will create corrupted extents in a kvm instance (4GB
memory, 16GB of btrfs filesystem, kernel 4.5.7) in under an hour:

	# mkdir /tmp/eee
	# cd /tmp/eee
	# y=/usr; for x in $(seq 0 9); do rsync -avxHSPW "$y/." "$x"; y="$x"; done &
	# mkdir /tmp/fff
	# cd /tmp/fff
	# y=/usr; for x in $(seq 0 9); do rsync -avxHSPW "$y/." "$x"; y="$x"; done &

This is how to find the inline extents where the corruption can occur:

	# find /tmp/eee /tmp/fff -type f -size +4097c -exec sh -c 'for x; do if filefrag -v "$x" | sed -n "4p" | grep -q "inline"; then ls -l "$x"; filefrag -v "$x"; fi; done' -- {} +
	-rw-r--r-- 1 root root 86040 Nov 11  2014 /tmp/eee/3/share/locale/eo/LC_MESSAGES/glib20.mo
	Filesystem type is: 9123683e
	File size of /tmp/eee/3/share/locale/eo/LC_MESSAGES/glib20.mo is 86040 (22 blocks of 4096 bytes)
	 ext:     logical_offset:        physical_offset: length:   expected: flags:
	   0:        0..    4095:          0..      4095:   4096:             encoded,not_aligned,inline
	   1:        1..      21:    2819748..   2819768:     21:          1: last,encoded,eof
	/tmp/eee/3/share/locale/eo/LC_MESSAGES/glib20.mo: 2 extents found

These are the mount options I used:

	# head -1 /proc/mounts
	/dev/vda / btrfs rw,noatime,max_inline=4095,compress-force=zlib,flushoncommit,space_cache,subvolid=5,subvol=/ 0 0

Adding 'compress' and 'compress-force' causes corruption on reads.
'max_inline=4095' made more files with inline extents so I could test
faster.  'flushoncommit' might have an effect on reproduction rate,
but I tested with and without, and didn't notice a substantial difference.

I was thinking the problem might be in uncompress_inline, and could be
fixed like this:

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a39eaa8..512b713 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6762,7 +6762,16 @@ static noinline int uncompress_inline(struct btrfs_path *path,
 
        read_extent_buffer(leaf, tmp, ptr, inline_size);
 
-       max_size = min_t(unsigned long, PAGE_CACHE_SIZE, max_size);
+       /*
+        * We can't have max_size > PAGE_CACHE_SIZE because we only
+        * allocated one page.  We can't have max_size < PAGE_CACHE_SIZE
+        * because we might extend the file past the end of the page,
+        * so we need to memset the end of the buffer to zero.  Since
+        * max_size can't be anything other than PAGE_CACHE_SIZE,
+        * just set it to that value.
+        */
+       WARN_ON(max_size > PAGE_CACHE_SIZE);
+       max_size = PAGE_CACHE_SIZE;
        ret = btrfs_decompress(compress_type, tmp, page,
                               extent_offset, inline_size, max_size);
        kfree(tmp);

Unfortunately I just tested that code, and while it seems to make the
data _less_ nondeterministic, it doesn't fix the problem:

	# history -a; (while :; do sysctl vm.drop_caches=1; cmp -l {/tmp/eee/3,/usr}/share/locale/eo/LC_MESSAGES/glib20.mo; done)
	vm.drop_caches = 1
	vm.drop_caches = 1
	 4094   1   0
	vm.drop_caches = 1
	 4096   1   0
	vm.drop_caches = 1
	 4094   1   0
	vm.drop_caches = 1
	 4094   1   0
	vm.drop_caches = 1
	 4094 105   0
	 4095 124   0
	 4096 137   0
	vm.drop_caches = 1
	 4094   1   0
	vm.drop_caches = 1
	 4096 154   0
	vm.drop_caches = 1
	 4094  40   0
	 4095  40   0
	 4096  40   0
	vm.drop_caches = 1
	vm.drop_caches = 1
	 4094   1   0
	vm.drop_caches = 1
	 4096 154   0
	vm.drop_caches = 1
	vm.drop_caches = 1
	 4094  46   0
	 4095  17   0
	 4096 100   0
	vm.drop_caches = 1
	 4096 325   0
	vm.drop_caches = 1
	vm.drop_caches = 1
	vm.drop_caches = 1

> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak (updated with some bisection results)
  2016-09-21  4:55 btrfs rare silent data corruption with kernel data leak Zygo Blaxell
  2016-09-21 11:14 ` Paul Jones
@ 2016-10-08  6:10 ` Zygo Blaxell
  2016-10-08  7:02   ` Zygo Blaxell
  1 sibling, 1 reply; 11+ messages in thread
From: Zygo Blaxell @ 2016-10-08  6:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason

[-- Attachment #1: Type: text/plain, Size: 3512 bytes --]

(continuing from https://www.spinics.net/lists/linux-btrfs/msg59251.html)

I'm still poking at this bug, and found out some more about it.
Recall that this bug seems to have two parts which together cause data
corruption:

The "write half" of the bug writes a questionable extent structure in
the filesystem once every few hundred thousand files (a compressed inline
extent followed by other non-inline extents when a write occurs at
the beginning of a file, followed by a seek past the end of the first
page of the file, followed by another write).

The "read half" of the bug reads this structure inconsistently (data
between the inline extent and the following extent is random garbage,
different each time the file is read).

After having no success attacking the read half of the bug with a patch,
I tried to bisect to see where the bug was introduced.

The "write half" of the bug seems to appear first somewhere between v3.8
and v3.9.  I have not been able to reproduce it with v3.8.13, v3.7.10, or
v3.6.11.  I can reproduce it in v3.9.11, v3.12.64, and v3.18.13..v4.7.5.

The "read half" of the bug is more interesting.  All kernels I've tested
that have the write half of the bug have the read half as well, but
versions 3.6..3.9 have many more instances of a separate non-repeatable
read corruption (one that does not require the "write half" bug to occur).
These additional bugs were not anticipated by my bisection test case,
so my bisect went in the wrong direction and I didn't cover the right
kernels to understand where these bugs were introduced (yet).

The good news is that whatever went wrong around 3.6..3.9 seems to have
been fixed by v3.12--that kernel has the same behavior as v4.7.5 for
data corruption on reads.


This is my current repro script.  Run it in a shell loop until corruption
occurs, e.g. "while repro; do date; done".  Adjust the "result" function
to taste (e.g. write it to a file, use your own email address, etc).

#!/bin/sh
set -x

result () {
	echo "$@" "$(cat /proc/version)" | mail -s "$(echo "$@" | head -1) $(uname -r)" results@localhost
}

umount /try
mkdir -p /try
for blk in /dev/vdc /dev/sdc; do
	< "$blk" || continue
	mkfs.btrfs -dsingle -mdup -O ^extref,^skinny-metadata,^no-holes -f "$blk" || exit 1
	mount -ocompress-force,flushoncommit,max_inline=4096,noatime "$blk" /try || exit 1
	cd /try || exit 1
	break
done

# Must be on btrfs
btrfs sub list . || exit 1

y=/usr; for x in $(seq 0 9); do rsync -axHSWI "$y/." "$x"; y="$x"; done &
y=/usr; for x in $(seq 10 19); do rsync -axHSWI "$y/." "$x"; y="$x"; done &
y=/usr; for x in $(seq 20 29); do rsync -axHSWI "$y/." "$x"; y="$x"; done &
y=/usr; for x in $(seq 30 39); do rsync -axHSWI "$y/." "$x"; y="$x"; done &

wait

touch list

find -type f -size +4097c -exec sh -c 'for x; do if filefrag -v "$x" | sed -n "4p" | grep -q "inline"; then echo "$x" >> list; fi; done' -- {} +

if [ -s list ]; then
	while read -r x; do
		ls -l "$x"
		filefrag -v "$x"
		sum="$(sha1sum "$x")"
		for y in $(seq 0 99); do
			sysctl vm.drop_caches=1
			sum2="$(sha1sum "$x")"
			if [ "$sum" != "$sum2" ]; then
				result "$x sum1 $sum sum2 $sum2"
				exit 1
			fi
		done
	done < list
	result "No inconsistent reads, $(wc -l < list) inlines"
else
	result "No inline extents"
fi

for x in *9/.; do
	if ! diff -r /usr/. "$x"; then
		result "Differences found in $x"
		# We are looking for corrupted inline extents.
		# Other corruption is interesting but it's not our bug.
		exit 0
	fi
done

result "No corruption found"
exit 0

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak (updated with some bisection results)
  2016-10-08  6:10 ` btrfs rare silent data corruption with kernel data leak (updated with some bisection results) Zygo Blaxell
@ 2016-10-08  7:02   ` Zygo Blaxell
  2016-10-09  4:11     ` Zygo Blaxell
  0 siblings, 1 reply; 11+ messages in thread
From: Zygo Blaxell @ 2016-10-08  7:02 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]

On Sat, Oct 08, 2016 at 02:10:08AM -0400, Zygo Blaxell wrote:
[...]
> The "write half" of the bug seems to appear first somewhere between v3.8
> and v3.9.  I have not been able to reproduce it with v3.8.13, v3.7.10, or
> v3.6.11.  I can reproduce it in v3.9.11, v3.12.64, and v3.18.13..v4.7.5.

After six more iterations of the 'repro' script, I can reproduce "write
half" on 3.8.13.

Bisection is hard.  Bisecting ancient bugs even more so.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: btrfs rare silent data corruption with kernel data leak (updated with some bisection results)
  2016-10-08  7:02   ` Zygo Blaxell
@ 2016-10-09  4:11     ` Zygo Blaxell
  0 siblings, 0 replies; 11+ messages in thread
From: Zygo Blaxell @ 2016-10-09  4:11 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Chris Mason

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On Sat, Oct 08, 2016 at 03:02:00AM -0400, Zygo Blaxell wrote:
> On Sat, Oct 08, 2016 at 02:10:08AM -0400, Zygo Blaxell wrote:
> [...]
> > The "write half" of the bug seems to appear first somewhere between v3.8
> > and v3.9.  I have not been able to reproduce it with v3.8.13, v3.7.10, or
> > v3.6.11.  I can reproduce it in v3.9.11, v3.12.64, and v3.18.13..v4.7.5.
> 
> After six more iterations of the 'repro' script, I can reproduce "write
> half" on 3.8.13.
> 
> Bisection is hard.  Bisecting ancient bugs even more so.

I increased the number of iterations of the 'repro' script to 100,
although in practice no more than 20 are required.  With this test
case, I can find the bug in kernels as early as v3.5.7.  v3.0..v3.4 crash
before they complete one run of the 'repro' script.  Earlier kernels
don't work with the userspace on my testing machine, and going back more
than four years is not worth the effort IMHO.

At the other end of the timeline, I also reproduced this bug on 4.8.1.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-09  4:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21  4:55 btrfs rare silent data corruption with kernel data leak Zygo Blaxell
2016-09-21 11:14 ` Paul Jones
2016-09-21 13:02   ` Zygo Blaxell
2016-09-22 17:49     ` Kai Krakow
2016-09-22 19:35       ` Christoph Anton Mitterer
2016-09-22 20:42   ` Chris Mason
2016-09-24  2:28     ` Zygo Blaxell
2016-09-28 16:18     ` [PATCH][RFC] btrfs rare silent data corruption with kernel data leak (updated, preliminary patch) Zygo Blaxell
2016-10-08  6:10 ` btrfs rare silent data corruption with kernel data leak (updated with some bisection results) Zygo Blaxell
2016-10-08  7:02   ` Zygo Blaxell
2016-10-09  4:11     ` Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.