All of lore.kernel.org
 help / color / mirror / Atom feed
* Unocorrectable errors with RAID1
@ 2017-01-16 11:10 Christoph Groth
  2017-01-16 13:24 ` Austin S. Hemmelgarn
  2017-01-16 22:45 ` Goldwyn Rodrigues
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-16 11:10 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7876 bytes --]

Hi,

I’ve been using a btrfs RAID1 of two hard disks since early 2012 
on my home server.  The machine has been working well overall, but 
recently some problems with the file system surfaced.  Since I do 
have backups, I do not worry about the data, but I post here to 
better understand what happened.  Also I cannot exclude that my 
case is useful in some way to btrfs development.

First some information about the system:

root@mim:~# uname -a
Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 
GNU/Linux
root@mim:~# btrfs --version
btrfs-progs v4.7.3
root@mim:~# btrfs fi show
Label: none  uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
	Total devices 2 FS bytes used 345.97GiB
	devid    1 size 465.29GiB used 420.06GiB path /dev/sda2
	devid    2 size 465.29GiB used 420.04GiB path /dev/sdb2

root@mim:~# btrfs fi df /
Data, RAID1: total=417.00GiB, used=344.62GiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=40.00MiB, used=68.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=3.00GiB, used=1.35GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=464.00MiB, used=0.00B
root@mim:~# dmesg | grep -i btrfs
[    4.165859] Btrfs loaded
[    4.481712] BTRFS: device fsid 
2da00153-f9ea-4d6c-a6cc-10c913d22686 devid 1 transid 2075354 
/dev/sda2
[    4.482025] BTRFS: device fsid 
2da00153-f9ea-4d6c-a6cc-10c913d22686 devid 2 transid 2075354 
/dev/sdb2
[    4.521090] BTRFS info (device sdb2): disk space caching is 
enabled
[    4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr 
0, rd 0, flush 0, corrupt 3, gen 0
[    4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr 
0, rd 0, flush 0, corrupt 3, gen 0
[   18.315694] BTRFS info (device sdb2): disk space caching is 
enabled

The disks themselves have been turning for almost 5 years by now, 
but their SMART health is still fully satisfactory.

I noticed that something was wrong because printing stopped to 
work.  So I did a scrub that detected 0 "correctable errors" and 6 
"uncorrectable" errors.  The relevant bits from kern.log are:

Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device 
sdb2): checksum error at logical 180829634560 on dev /dev/sdb2, 
sector 353143968, root 5, inode 10014144, offset 221184, length 
4096, links 1 (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device 
sdb2): checksum error at logical 180829634560 on dev /dev/sda2, 
sector 353182880, root 5, inode 10014144, offset 221184, length 
4096, links 1 (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device 
sdb2): checksum error at logical 260254629888 on dev /dev/sda2, 
sector 508309824, root 5, inode 9990924, offset 6676480, length 
4096, links 1 (path: 
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device 
sdb2): checksum error at logical 260254638080 on dev /dev/sda2, 
sector 508309840, root 5, inode 9990924, offset 6684672, length 
4096, links 1 (path: 
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device 
sdb2): checksum error at logical 260254629888 on dev /dev/sdb2, 
sector 508270912, root 5, inode 9990924, offset 6676480, length 
4096, links 1 (path: 
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device 
sdb2): checksum error at logical 260254638080 on dev /dev/sdb2, 
sector 508270928, root 5, inode 9990924, offset 6684672, length 
4096, links 1 (path: 
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)

As you can see each disk has the same three errors, and there are 
no other errors.  Random bad blocks cannot explain this situation. 
I asked on #btrfs and someone suggested that these errors are 
likely due to RAM problems.  This may indeed be the case, since 
the machine has no ECC.  I managed to fix these errors by 
replacing the broken files with good copies.  Scrubbing shows no 
errors now:

root@mim:~# btrfs scrub status /
scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
	scrub started at Sat Jan 14 12:52:03 2017 and finished 
	after 01:49:10
	total bytes scrubbed: 699.17GiB with 0 errors

However, there are further problems.  When trying to archive the 
full filesystem I noticed that some files/directories cannot be 
read.  (The problem is localized to some ".git" directory that I 
don’t need.)  Any attempt to read the broken files (or to delete 
them) does not work:

$ du -sh .git
du: cannot access 
'.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such 
file or directory
du: cannot access 
'.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale 
file handle
du: cannot access 
'.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale 
file handle
du: cannot access '.git/objects/info': Stale file handle
du: cannot access '.git/objects/pack': Stale file handle

During the above command the following lines were added to 
kern.log:

Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15
Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device 
sda2): corrupt leaf, slot offset bad: block=192561152,root=1, 
slot=15

So I tried to repair the file system by running "btrfs check 
--repair", but this doesn’t work:

(initramfs) btrfs --version
btrfs-progs v4.7.3
(initramfs) btrfs check --repair /dev/sda2
UUID: ...
checking extents
incorrect offsets 2527 2543
items overlap, can't fix
cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
btrfs[0x41a8b4]
btrfs[0x41a8db]
btrfs[0x42428b]
btrfs[0x424f83]
btrfs[0x4259cd]
btrfs(cmd_check+0x1111)[0x427d6d]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
btrfs(_start+0x2a)[0x40a37a]

I now have the following questions:

* So scrubbing is not enough to check the health of a btrfs file 
  system?  It’s also necessary to read all the files?

* Any ideas what coud have caused the "stale file handle" errors? 
  Is there any way to fix them?  Of course RAM errors can in 
  principle have _any_ consequences, but I would have hoped that 
  even without ECC RAM it’s practically inpossible to end up with 
  an unrepairable file system.  Perhaps I simply had very bad 
  luck.

* I believe that btrfs RAID1 is considered reasonably safe for 
  production use by now.  I want to replace that home server with 
  a new machine (still without ECC).  Is it a good idea to use 
  btrfs for the main file system?  I would certainly hope so! :-)

Thanks for your time,
Christoph

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
@ 2017-01-16 13:24 ` Austin S. Hemmelgarn
  2017-01-16 15:42   ` Christoph Groth
  2017-01-16 22:45 ` Goldwyn Rodrigues
  1 sibling, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-16 13:24 UTC (permalink / raw)
  To: Christoph Groth, linux-btrfs

On 2017-01-16 06:10, Christoph Groth wrote:
> Hi,
>
> I’ve been using a btrfs RAID1 of two hard disks since early 2012 on my
> home server.  The machine has been working well overall, but recently
> some problems with the file system surfaced.  Since I do have backups, I
> do not worry about the data, but I post here to better understand what
> happened.  Also I cannot exclude that my case is useful in some way to
> btrfs development.
>
> First some information about the system:
>
> root@mim:~# uname -a
> Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 GNU/Linux
> root@mim:~# btrfs --version
> btrfs-progs v4.7.3
You get bonus points for being up-to-date both with the kernel and the 
userspace tools.
> root@mim:~# btrfs fi show
> Label: none  uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
>     Total devices 2 FS bytes used 345.97GiB
>     devid    1 size 465.29GiB used 420.06GiB path /dev/sda2
>     devid    2 size 465.29GiB used 420.04GiB path /dev/sdb2
>
> root@mim:~# btrfs fi df /
> Data, RAID1: total=417.00GiB, used=344.62GiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=40.00MiB, used=68.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=3.00GiB, used=1.35GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=464.00MiB, used=0.00B
Just a general comment on this, you might want to consider running a 
full balance on this filesystem, you've got a huge amount of slack space 
in the data chunks (over 70GiB), and significant space in the Metadata 
chunks that isn't accounted for by the GlobalReserve, as well as a 
handful of empty single profile chunks which are artifacts from some old 
versions of mkfs.  This isn't of course essential, but keeping ahead of 
such things does help sometimes when you have issues.
> root@mim:~# dmesg | grep -i btrfs
> [    4.165859] Btrfs loaded
> [    4.481712] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 1 transid 2075354 /dev/sda2
> [    4.482025] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 2 transid 2075354 /dev/sdb2
> [    4.521090] BTRFS info (device sdb2): disk space caching is enabled
> [    4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [    4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [   18.315694] BTRFS info (device sdb2): disk space caching is enabled
>
> The disks themselves have been turning for almost 5 years by now, but
> their SMART health is still fully satisfactory.
>
> I noticed that something was wrong because printing stopped to work.  So
> I did a scrub that detected 0 "correctable errors" and 6 "uncorrectable"
> errors.  The relevant bits from kern.log are:
>
> Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sdb2, sector
> 353143968, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sda2, sector
> 353182880, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sda2, sector
> 508309824, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sda2, sector
> 508309840, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sdb2, sector
> 508270912, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sdb2, sector
> 508270928, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
>
> As you can see each disk has the same three errors, and there are no
> other errors.  Random bad blocks cannot explain this situation. I asked
> on #btrfs and someone suggested that these errors are likely due to RAM
> problems.  This may indeed be the case, since the machine has no ECC.  I
> managed to fix these errors by replacing the broken files with good
> copies.  Scrubbing shows no errors now:
>
> root@mim:~# btrfs scrub status /
> scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
>     scrub started at Sat Jan 14 12:52:03 2017 and finished     after
> 01:49:10
>     total bytes scrubbed: 699.17GiB with 0 errors
>
> However, there are further problems.  When trying to archive the full
> filesystem I noticed that some files/directories cannot be read.  (The
> problem is localized to some ".git" directory that I don’t need.)  Any
> attempt to read the broken files (or to delete them) does not work:
>
> $ du -sh .git
> du: cannot access
> '.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such file
> or directory
> du: cannot access
> '.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale file handle
> du: cannot access
> '.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale file handle
> du: cannot access '.git/objects/info': Stale file handle
> du: cannot access '.git/objects/pack': Stale file handle
>
> During the above command the following lines were added to kern.log:
>
> Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
>
> So I tried to repair the file system by running "btrfs check --repair",
> but this doesn’t work:
>
> (initramfs) btrfs --version
> btrfs-progs v4.7.3
> (initramfs) btrfs check --repair /dev/sda2
> UUID: ...
> checking extents
> incorrect offsets 2527 2543
> items overlap, can't fix
> cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
> btrfs[0x41a8b4]
> btrfs[0x41a8db]
> btrfs[0x42428b]
> btrfs[0x424f83]
> btrfs[0x4259cd]
> btrfs(cmd_check+0x1111)[0x427d6d]
> btrfs(main+0x12f)[0x40a341]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
> btrfs(_start+0x2a)[0x40a37a]
>
> I now have the following questions:
>
> * So scrubbing is not enough to check the health of a btrfs file
>  system?  It’s also necessary to read all the files?
Scrubbing checks data integrity, but not the state of the data.  IOW, 
you're checking that the data and metadata match with the checksums, but 
not necessarily that the filesystem itself is valid.
>
> * Any ideas what coud have caused the "stale file handle" errors?  Is
> there any way to fix them?  Of course RAM errors can in  principle have
> _any_ consequences, but I would have hoped that  even without ECC RAM
> it’s practically inpossible to end up with  an unrepairable file
> system.  Perhaps I simply had very bad  luck.
-ESTALE is _supposed_ to be a networked filesystem only thing.  BTRFS 
returns it somewhere, and I've been meaning to track down where (because 
there is almost certainly a more correct error code to return there), I 
just haven't had time to do so.

As far as RAM, it absolutely is possible for bad RAM or even just 
transient memory errors to cause filesystem corruption.  The disk itself 
stores exactly what it was told to (in theory), so if it was told to 
store bad data, it stores bad data.  I've lost at least 3 filesystems 
over the past 5 years just due to bad memory, although I've been 
particularly unlucky in that respect.  There are a few things you can do 
to mitigate the risk of not using ECC RAM though:
* Reboot regularly, at least weekly, and possibly more frequently.
* Keep the system cool, warmer components are more likely to have 
transient errors.
* Prefer fewer numbers of memory modules when possible.  Fewer modules 
means less total area that could be hit by cosmic rays or other 
high-energy radiation (the main cause of most transient errors).
>
> * I believe that btrfs RAID1 is considered reasonably safe for
>  production use by now.  I want to replace that home server with  a new
> machine (still without ECC).  Is it a good idea to use  btrfs for the
> main file system?  I would certainly hope so! :-)
FWIW, this wasn't exactly an issue with BTRFS, any other filesystem 
would have failed similarly, although others likely would have done more 
damage (instead of failing to load libcups due to -EIO, you would have 
seen seemingly random segfaults from apps using it when they tried to 
use the corrupted data).  In fact, if it weren't for the fact that 
you're using BTRFS, it likely would have taken longer for you to figure 
out what had happened.  If you were using ext4 (or XFS, or almost any 
other filesystem except for ZFS), you likely would have had no 
indication that anything was wrong other than printing not working until 
you re-installed whatever package included libcups.

As far as raid1 mode in particular, I consider it stable, and quite a 
few other people do, but even the most stable software has issues from 
time to time, but I have not lost a single filesystem using raid1 mode 
to a filesystem bug since at least kernel 3.16.  I have lost a few to 
hardware issues, but if I hadn't been using BTRFS I wouldn't have 
figured out nearly as quickly that I had said hardware issues.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 13:24 ` Austin S. Hemmelgarn
@ 2017-01-16 15:42   ` Christoph Groth
  2017-01-16 16:29     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-16 15:42 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

Austin S. Hemmelgarn wrote:
> On 2017-01-16 06:10, Christoph Groth wrote:

>> root@mim:~# btrfs fi df /
>> Data, RAID1: total=417.00GiB, used=344.62GiB
>> Data, single: total=8.00MiB, used=0.00B
>> System, RAID1: total=40.00MiB, used=68.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, RAID1: total=3.00GiB, used=1.35GiB
>> Metadata, single: total=8.00MiB, used=0.00B
>> GlobalReserve, single: total=464.00MiB, used=0.00B

> Just a general comment on this, you might want to consider 
> running a full balance on this filesystem, you've got a huge 
> amount of slack space in the data chunks (over 70GiB), and 
> significant space in the Metadata chunks that isn't accounted 
> for by the GlobalReserve, as well as a handful of empty single 
> profile chunks which are artifacts from some old versions of 
> mkfs.  This isn't of course essential, but keeping ahead of such 
> things does help sometimes when you have issues.

Thanks!  So slack is the difference between "total" and "used"?  I 
saw that the manpage of "btrfs balance" explains this a bit in its 
"examples" section.  Are you aware of any more in-depth 
documentation?  Or one has to look at the source at this level?

I ran

btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /
btrfs balance start -dusage=25 -musage=25 /

This resulted in

root@mim:~# btrfs fi df /
Data, RAID1: total=365.00GiB, used=344.61GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=2.00GiB, used=1.35GiB
GlobalReserve, single: total=460.00MiB, used=0.00B

I hope that one day there will be a daemon that silently performs 
all the necessary btrfs maintenance in the background when system 
load is low!

>> * So scrubbing is not enough to check the health of a btrfs 
>> file system?  It’s also necessary to read all the files?

> Scrubbing checks data integrity, but not the state of the data. 
> IOW, you're checking that the data and metadata match with the 
> checksums, but not necessarily that the filesystem itself is 
> valid.

I see, but what should one then do to detect problems such as mine 
as soon as possible?  Periodically calculate hashes for all files? 
I’ve never seen a recommendation to do that for btrfs.

> There are a few things you can do to mitigate the risk of not 
> using ECC RAM though:
> * Reboot regularly, at least weekly, and possibly more 
> frequently.
> * Keep the system cool, warmer components are more likely to 
> have transient errors.
> * Prefer fewer numbers of memory modules when possible.  Fewer 
> modules means less total area that could be hit by cosmic rays 
> or other high-energy radiation (the main cause of most transient 
> errors).

Thanks for the advice, I think I buy the regular reboots.

As a consequence of my problem I think I’ll stop using RAID1 on 
the file server, since this only protects against dead disks, 
which evidently is only part of the problem.  Instead, I’ll make 
sure that the laptop that syncs with the server has a SSD that is 
big enough to hold all the data that is on the server as well (1 
TB SSDs are affordable now).  This way, instead of disk-level 
redundancy, I’ll have machine-level redundancy.  When something 
like the current problem hits one of the two machines, I should 
still have a usable second machine with all the data on it.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 15:42   ` Christoph Groth
@ 2017-01-16 16:29     ` Austin S. Hemmelgarn
  2017-01-17  4:50       ` Janos Toth F.
  2017-01-17  9:18       ` Christoph Groth
  0 siblings, 2 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-16 16:29 UTC (permalink / raw)
  To: Christoph Groth; +Cc: linux-btrfs

On 2017-01-16 10:42, Christoph Groth wrote:
> Austin S. Hemmelgarn wrote:
>> On 2017-01-16 06:10, Christoph Groth wrote:
>
>>> root@mim:~# btrfs fi df /
>>> Data, RAID1: total=417.00GiB, used=344.62GiB
>>> Data, single: total=8.00MiB, used=0.00B
>>> System, RAID1: total=40.00MiB, used=68.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, RAID1: total=3.00GiB, used=1.35GiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=464.00MiB, used=0.00B
>
>> Just a general comment on this, you might want to consider running a
>> full balance on this filesystem, you've got a huge amount of slack
>> space in the data chunks (over 70GiB), and significant space in the
>> Metadata chunks that isn't accounted for by the GlobalReserve, as well
>> as a handful of empty single profile chunks which are artifacts from
>> some old versions of mkfs.  This isn't of course essential, but
>> keeping ahead of such things does help sometimes when you have issues.
>
> Thanks!  So slack is the difference between "total" and "used"?  I saw
> that the manpage of "btrfs balance" explains this a bit in its
> "examples" section.  Are you aware of any more in-depth documentation?
> Or one has to look at the source at this level?
There's not really much in the way of great documentation that I know 
of.  I can however cover the basics here:

BTRFS uses a 2 level allocation system.  At the higher level, you have 
chunks.  These are just big blocks of space on the disk that get used 
for only one type of lower level allocation (Data, Metadata, or System). 
  Data chunks are normally 1GB, Metadata 256MB, and System depends on 
the size of the FS when it was created.  Within these chunks, BTRFS then 
allocates individual blocks just like any other filesystem.  When there 
is no free space in any existing chunks for a new block that needs 
allocated, a new chunk is allocated.  Newly allocated chunks may be 
larger (if the filesystem is really big) or smaller (if the FS doesn't 
have much free space left at the chunk level) than the default.  In the 
event that BTRFS can't allocate a new chunk because there's no room, a 
couple of different things could happen.  If the chunk to be allocated 
was a data chunk, you get -ENOSPC (usually, sometimes you might get 
other odd results) in the userspace application that triggered the 
allocation.  However, if BTRFS needs room for metadata, then it will try 
to use the GlobalReserve instead.  This is a special area within the 
metadata chunks that's reserved for internal operations and trying to 
get out of free space exhaustion situations.  If that fails, then the 
filesystem is functionally dead, reads will still work, and you might be 
able to write very small amounts of data at a time, but it's not 
possible from a practical perspective to recover a filesystem in such a 
situation.

The 'total' value in fi df output is the total space allocated to chunks 
of that type, while the 'used' value is how much is actually being used. 
  It's worth noting that since GlobalReserve is a part of the Metadata 
chunks, the total there is part of the total for Metadata, but not the 
used value (so in an ideal situation with no slack space at the block 
level, you would still see a difference between metadata total and used 
equal to the global reserve total).

What balancing does is send everything back through the allocator, which 
in turn back-fills chunks that are only partially full, and removes ones 
that are now empty.  In normal usage, it's not absolutely needed.  From 
a practical perspective though, it's generally a good idea to keep the 
slack space (the difference between total and used) within chunks to a 
minimum to try and avoid getting the filesystem stuck with no free space 
at the chunk level.
>
> I ran
>
> btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /
> btrfs balance start -dusage=25 -musage=25 /
>
> This resulted in
>
> root@mim:~# btrfs fi df /
> Data, RAID1: total=365.00GiB, used=344.61GiB
> System, RAID1: total=32.00MiB, used=64.00KiB
> Metadata, RAID1: total=2.00GiB, used=1.35GiB
> GlobalReserve, single: total=460.00MiB, used=0.00B
This is a much saner looking FS, you've only got about 20GB of slack in 
Data chunks, and less than 1GB in metadata, which is reasonable given 
the size of the FS and how much data you have on it.  Ideal values for 
both are actually hard to determine, as having no slack in the chunks 
actually hurts performance a bit, and the ideal values depend on how 
much your workloads hit each type of chunk.
>
> I hope that one day there will be a daemon that silently performs all
> the necessary btrfs maintenance in the background when system load is low!
FWIW, while there isn't a daemon yet that does this, it's a perfect 
thing for a cronjob.  The general maintenance regimen that I use for 
most of my filesystems is:
* Run 'btrfs balance start -dusage=20 -musage=20' daily.  This will 
complete really fast on most filesystems, and keeps the slack-space 
relatively under-control (and has the nice bonus that it helps 
defragment free space.
* Run a full scrub on all filesystems weekly.  This catches silent 
corruption of the data, and will fix it if possible.
* Run a full defrag on all filesystems monthly.  This should be run 
before the balance (reasons are complicated and require more explanation 
than you probably care for).  I would run this at least weekly though on 
HDD's, as they tend to be more negatively impacted by fragmentation.
There are a couple of other things I also do (fstrim and punching holes 
in large files to make them sparse), but they're not really BTRFS 
specific.  Overall, with a decent SSD (I usually use Crucial MX series 
SSD's in my personal systems), these have near zero impact most of the 
time, and with decent HDD's, you should have limited issues as long as 
you run on only one FS at a time.
>
>>> * So scrubbing is not enough to check the health of a btrfs file
>>> system?  It’s also necessary to read all the files?
>
>> Scrubbing checks data integrity, but not the state of the data. IOW,
>> you're checking that the data and metadata match with the checksums,
>> but not necessarily that the filesystem itself is valid.
>
> I see, but what should one then do to detect problems such as mine as
> soon as possible?  Periodically calculate hashes for all files? I’ve
> never seen a recommendation to do that for btrfs.
Scrub will verify that the data is the same as when the kernel 
calculated the block checksum.  That's really the best that can be done. 
  In your case, it couldn't correct the errors because both copies of 
the corrupted blocks were bad (this points at an issue with either RAM 
or the storage controller BTW, not the disks themselves).  Had one of 
the copies been valid, it would have intelligently detected which one 
was bad and fixed things.  It's worth noting that the combination of 
checksumming and scrub actually provides more stringent data integrity 
guarantees than any other widely used filesystem except ZFS.

As far as general monitoring, in addition to scrubbing (and obviously 
watching SMART status) you want to check the output of 'btrfs device 
stats' for non-zero error counters (these are cumulative counters that 
are only reset when the user says to do so, so right now they'll show 
aggregate data for the life of the FS), and if you're paranoid, watch 
that the mount options on the FS don't change (some monitoring software 
such as Monit makes this insanely easy to do), as the FS will go 
read-only if a severe error is detected (stuff like a failed read at the 
device level, not just checksum errors).
>
>> There are a few things you can do to mitigate the risk of not using
>> ECC RAM though:
>> * Reboot regularly, at least weekly, and possibly more frequently.
>> * Keep the system cool, warmer components are more likely to have
>> transient errors.
>> * Prefer fewer numbers of memory modules when possible.  Fewer modules
>> means less total area that could be hit by cosmic rays or other
>> high-energy radiation (the main cause of most transient errors).
>
> Thanks for the advice, I think I buy the regular reboots.
>
> As a consequence of my problem I think I’ll stop using RAID1 on the file
> server, since this only protects against dead disks, which evidently is
> only part of the problem.  Instead, I’ll make sure that the laptop that
> syncs with the server has a SSD that is big enough to hold all the data
> that is on the server as well (1 TB SSDs are affordable now).  This way,
> instead of disk-level redundancy, I’ll have machine-level redundancy.
> When something like the current problem hits one of the two machines, I
> should still have a usable second machine with all the data on it.
I actually have a similar situation, I've got a laptop that I back-up to 
a personal server system.  In my case though, I've take a much 
higher-level approach, the backup storage is in fact GlusterFS (a 
clustered filesystem) running on top of BTRFS on 3 different systems 
(the server, plus a pair of Intel NUC's that are just dedicated SAN 
systems).  If I didn't have the hardware to do this or cared about 
performance more (I'm lucky if I get 20MB/s write speed, but most of the 
issue is that I went cheap on the NUC's), I would probably still be 
using BTRFS in raid1 mode on the server despite keeping a copy on the 
laptop, simply because that provides an extra layer of protection on the 
server side.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
  2017-01-16 13:24 ` Austin S. Hemmelgarn
@ 2017-01-16 22:45 ` Goldwyn Rodrigues
  2017-01-17  8:44   ` Christoph Groth
  1 sibling, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-16 22:45 UTC (permalink / raw)
  To: Christoph Groth, linux-btrfs



On 01/16/2017 05:10 AM, Christoph Groth wrote:
> Hi,
> 
> I’ve been using a btrfs RAID1 of two hard disks since early 2012 on my
> home server.  The machine has been working well overall, but recently
> some problems with the file system surfaced.  Since I do have backups, I
> do not worry about the data, but I post here to better understand what
> happened.  Also I cannot exclude that my case is useful in some way to
> btrfs development.
> 
> First some information about the system:
> 
> root@mim:~# uname -a
> Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 GNU/Linux
> root@mim:~# btrfs --version
> btrfs-progs v4.7.3
> root@mim:~# btrfs fi show
> Label: none  uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
>     Total devices 2 FS bytes used 345.97GiB
>     devid    1 size 465.29GiB used 420.06GiB path /dev/sda2
>     devid    2 size 465.29GiB used 420.04GiB path /dev/sdb2
> 
> root@mim:~# btrfs fi df /
> Data, RAID1: total=417.00GiB, used=344.62GiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=40.00MiB, used=68.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=3.00GiB, used=1.35GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=464.00MiB, used=0.00B
> root@mim:~# dmesg | grep -i btrfs
> [    4.165859] Btrfs loaded
> [    4.481712] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 1 transid 2075354 /dev/sda2
> [    4.482025] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 2 transid 2075354 /dev/sdb2
> [    4.521090] BTRFS info (device sdb2): disk space caching is enabled
> [    4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [    4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [   18.315694] BTRFS info (device sdb2): disk space caching is enabled
> 
> The disks themselves have been turning for almost 5 years by now, but
> their SMART health is still fully satisfactory.
> 
> I noticed that something was wrong because printing stopped to work.  So
> I did a scrub that detected 0 "correctable errors" and 6 "uncorrectable"
> errors.  The relevant bits from kern.log are:
> 
> Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sdb2, sector
> 353143968, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sda2, sector
> 353182880, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sda2, sector
> 508309824, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
> 
> Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sda2, sector
> 508309840, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
> 
> Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sdb2, sector
> 508270912, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
> 
> Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sdb2, sector
> 508270928, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
> 
> 
> As you can see each disk has the same three errors, and there are no
> other errors.  Random bad blocks cannot explain this situation. I asked
> on #btrfs and someone suggested that these errors are likely due to RAM
> problems.  This may indeed be the case, since the machine has no ECC.  I
> managed to fix these errors by replacing the broken files with good
> copies.  Scrubbing shows no errors now:
> 
> root@mim:~# btrfs scrub status /
> scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
>     scrub started at Sat Jan 14 12:52:03 2017 and finished     after
> 01:49:10
>     total bytes scrubbed: 699.17GiB with 0 errors
> 
> However, there are further problems.  When trying to archive the full
> filesystem I noticed that some files/directories cannot be read.  (The
> problem is localized to some ".git" directory that I don’t need.)  Any
> attempt to read the broken files (or to delete them) does not work:
> 
> $ du -sh .git
> du: cannot access
> '.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such file
> or directory
> du: cannot access
> '.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale file handle
> du: cannot access
> '.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale file handle
> du: cannot access '.git/objects/info': Stale file handle
> du: cannot access '.git/objects/pack': Stale file handle
> 
> During the above command the following lines were added to kern.log:
> 
> Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> 
> So I tried to repair the file system by running "btrfs check --repair",
> but this doesn’t work:
> 
> (initramfs) btrfs --version
> btrfs-progs v4.7.3
> (initramfs) btrfs check --repair /dev/sda2
> UUID: ...
> checking extents
> incorrect offsets 2527 2543
> items overlap, can't fix
> cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
> btrfs[0x41a8b4]
> btrfs[0x41a8db]
> btrfs[0x42428b]
> btrfs[0x424f83]
> btrfs[0x4259cd]
> btrfs(cmd_check+0x1111)[0x427d6d]
> btrfs(main+0x12f)[0x40a341]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
> btrfs(_start+0x2a)[0x40a37a]
> 

Would you be able to upload a btrfs-image for me to examine. This is a
core ctree error where most probably item size is incorrectly registered.

Thanks,

-- 
-- 
Goldwyn

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 16:29     ` Austin S. Hemmelgarn
@ 2017-01-17  4:50       ` Janos Toth F.
  2017-01-17 12:25         ` Austin S. Hemmelgarn
  2017-01-17  9:18       ` Christoph Groth
  1 sibling, 1 reply; 20+ messages in thread
From: Janos Toth F. @ 2017-01-17  4:50 UTC (permalink / raw)
  To: Btrfs BTRFS

> BTRFS uses a 2 level allocation system.  At the higher level, you have
> chunks.  These are just big blocks of space on the disk that get used for
> only one type of lower level allocation (Data, Metadata, or System).  Data
> chunks are normally 1GB, Metadata 256MB, and System depends on the size of
> the FS when it was created.  Within these chunks, BTRFS then allocates
> individual blocks just like any other filesystem.

This always seems to confuse me when I try to get an abstract idea
about de-/fragmentation of Btrfs.
Can meta-/data be fragmented on both levels? And if so, can defrag
and/or balance "cure" both levels of fragmentation (if any)?
But how? May be several defrag and balance runs, repeated until
returns diminish (or at least you consider them meaningless and/or
unnecessary)?


> What balancing does is send everything back through the allocator, which in
> turn back-fills chunks that are only partially full, and removes ones that
> are now empty.

Does't this have a potential chance of introducing (additional)
extent-level fragmentation?

> FWIW, while there isn't a daemon yet that does this, it's a perfect thing
> for a cronjob.  The general maintenance regimen that I use for most of my
> filesystems is:
> * Run 'btrfs balance start -dusage=20 -musage=20' daily.  This will complete
> really fast on most filesystems, and keeps the slack-space relatively
> under-control (and has the nice bonus that it helps defragment free space.
> * Run a full scrub on all filesystems weekly.  This catches silent
> corruption of the data, and will fix it if possible.
> * Run a full defrag on all filesystems monthly.  This should be run before
> the balance (reasons are complicated and require more explanation than you
> probably care for).  I would run this at least weekly though on HDD's, as
> they tend to be more negatively impacted by fragmentation.

I wonder if one should always run a full balance instead of a full
scrub, since balance should also read (and thus theoretically verify)
the meta-/data (does it though? I would expect it to check the
chekcsums, but who knows...? may be it's "optimized" to skip that
step?) and also perform the "consolidation" of the chunk level.

I wish there was some more "integrated" solution for this: a
balance-like operation which consolidates the chunks and also
de-fragments the file extents at the same time while passively
uncovers (and fixes if necessary and possible) any checksum mismatches
/ data errors, so that balance and defrag can't work against
each-other and the overall work is minimized (compared to several full
runs or many different commands).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 22:45 ` Goldwyn Rodrigues
@ 2017-01-17  8:44   ` Christoph Groth
  2017-01-17 11:32     ` Goldwyn Rodrigues
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17  8:44 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

Goldwyn Rodrigues wrote:

> Would you be able to upload a btrfs-image for me to 
> examine. This is a core ctree error where most probably item 
> size is incorrectly registered.

Sure, I can do that.  I'd like to use the -s option, will this be 
fine?  Is there some preferred place for the upload?  If not, I 
can use personal webspace.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-16 16:29     ` Austin S. Hemmelgarn
  2017-01-17  4:50       ` Janos Toth F.
@ 2017-01-17  9:18       ` Christoph Groth
  2017-01-17 12:32         ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17  9:18 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]

Austin S. Hemmelgarn wrote:

> There's not really much in the way of great documentation that I 
> know of.  I can however cover the basics here:
>
> (...)

Thanks for this explanation.  I'm sure it will be also useful to 
others.

> If the chunk to be allocated was a data chunk, you get -ENOSPC 
> (usually, sometimes you might get other odd results) in the 
> userspace application that triggered the allocation.

It seems that the available space reported by the system df 
command corresponds roughly to the size of the block device minus 
all the "used" space as reported by "btrfs fi df".

If I understand what you wrote correctly this means that when 
writing a huge file it may happen that the system df will report 
enough free space, but btrfs will raise ENOSPC.  However, it 
should be possible to keep writing small files even at this point 
(assuming that there's enough space for the metadata).  Or will 
btrfs split the huge file into small pieces to fit it into the 
fragmented free space in the chunks?

Such a situation should be avoided of course.  I'm asking out of 
curiosity.

>>>> * So scrubbing is not enough to check the health of a btrfs 
>>>> file system?  It’s also necessary to read all the files?
>>
>>> Scrubbing checks data integrity, but not the state of the 
>>> data. IOW, you're checking that the data and metadata match 
>>> with the checksums, but not necessarily that the filesystem 
>>> itself is valid.
>>
>> I see, but what should one then do to detect problems such as 
>> mine as soon as possible?  Periodically calculate hashes for 
>> all files? I’ve never seen a recommendation to do that for 
>> btrfs.

> Scrub will verify that the data is the same as when the kernel 
> calculated the block checksum.  That's really the best that can 
> be done. In your case, it couldn't correct the errors because 
> both copies of the corrupted blocks were bad (this points at an 
> issue with either RAM or the storage controller BTW, not the 
> disks themselves).  Had one of the copies been valid, it would 
> have intelligently detected which one was bad and fixed things.

I think I understand the problem with the three corrupted blocks 
that I was able to fix by replacing the files.

But there is also the strange "Stale file handle" error with some 
other files that was not found by scrubbing, and also does not 
seem to appear in the output of "btrfs dev stats", which is BTW

[/dev/sda2].write_io_errs   0
[/dev/sda2].read_io_errs    0
[/dev/sda2].flush_io_errs   0
[/dev/sda2].corruption_errs 3
[/dev/sda2].generation_errs 0
[/dev/sdb2].write_io_errs   0
[/dev/sdb2].read_io_errs    0
[/dev/sdb2].flush_io_errs   0
[/dev/sdb2].corruption_errs 3
[/dev/sdb2].generation_errs 0

(The 2 times 3 corruption errors seem to be the uncorrectable 
errors that I could fix by replacing the files.)

To get the "stale file handle" error I need to try to read the 
affected file.  That's why I was wondering whether reading all the 
files periodically is indeed a useful maintenance procedure with 
btrfs.

"btrfs check" does find the problem, but it can be only run on an 
unmounted file system.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17  8:44   ` Christoph Groth
@ 2017-01-17 11:32     ` Goldwyn Rodrigues
  2017-01-17 20:25       ` Christoph Groth
  0 siblings, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-17 11:32 UTC (permalink / raw)
  To: Christoph Groth; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 541 bytes --]



On 01/17/2017 02:44 AM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
> 
>> Would you be able to upload a btrfs-image for me to examine. This is a
>> core ctree error where most probably item size is incorrectly registered.
> 
> Sure, I can do that.  I'd like to use the -s option, will this be fine? 

Yes, I think that should be fine.

> Is there some preferred place for the upload?  If not, I can use
> personal webspace.

No, there is no preferred place. As far as I can download it, it is fine.

-- 
Goldwyn


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17  4:50       ` Janos Toth F.
@ 2017-01-17 12:25         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-17 12:25 UTC (permalink / raw)
  To: Janos Toth F., Btrfs BTRFS

On 2017-01-16 23:50, Janos Toth F. wrote:
>> BTRFS uses a 2 level allocation system.  At the higher level, you have
>> chunks.  These are just big blocks of space on the disk that get used for
>> only one type of lower level allocation (Data, Metadata, or System).  Data
>> chunks are normally 1GB, Metadata 256MB, and System depends on the size of
>> the FS when it was created.  Within these chunks, BTRFS then allocates
>> individual blocks just like any other filesystem.
>
> This always seems to confuse me when I try to get an abstract idea
> about de-/fragmentation of Btrfs.
> Can meta-/data be fragmented on both levels? And if so, can defrag
> and/or balance "cure" both levels of fragmentation (if any)?
> But how? May be several defrag and balance runs, repeated until
> returns diminish (or at least you consider them meaningless and/or
> unnecessary)?
Defrag operates only at the block level.  It won't allocate chunks 
unless it has to, and it won't remove chunks unless they become empty 
from it moving things around (although that's not likely to happen most 
of the time).  Balance functionally operates at both levels, but it 
doesn't really do any defragmentation.  Balance _may_ merge extents 
sometimes, but I'm not sure of this.  It will compact allocations and 
therefore functionally defragment free space within chunks (though not 
necessarily at the chunk-level itself).

Defrag run with the same options _should_ have no net effect after the 
first run, the two exceptions being if the filesystem is close to full 
or if the data set is being modified live while the defrag is happening. 
  Balance run with the same options will eventually hit a point where it 
doesn't do anything (or only touches one chunk of each type but doesn't 
actually give any benefit).  If you're just using the usage filters or 
doing a full balance, this point is the second run.  If you're using 
other filters, it's functionally not possible to determine when that 
point will be without low-level knowledge of the chunk layout.

For an idle filesystem, if you run defrag then a full balance, that will 
get you a near optimal layout.  Running them in the reverse order will 
get you a different layout that may be less optimal than running defrag 
first because defrag may move data in such a way that new chunks get 
allocated.  Repeated runs of defrag and balance will in more than 95% of 
cases provide no extra benefit.
>
>
>> What balancing does is send everything back through the allocator, which in
>> turn back-fills chunks that are only partially full, and removes ones that
>> are now empty.
>
> Does't this have a potential chance of introducing (additional)
> extent-level fragmentation?
In theory, yes.  IIRC, extents can't cross a chunk boundary.  Beyond 
that packing constraint, balance shouldn't fragment things further.
>
>> FWIW, while there isn't a daemon yet that does this, it's a perfect thing
>> for a cronjob.  The general maintenance regimen that I use for most of my
>> filesystems is:
>> * Run 'btrfs balance start -dusage=20 -musage=20' daily.  This will complete
>> really fast on most filesystems, and keeps the slack-space relatively
>> under-control (and has the nice bonus that it helps defragment free space.
>> * Run a full scrub on all filesystems weekly.  This catches silent
>> corruption of the data, and will fix it if possible.
>> * Run a full defrag on all filesystems monthly.  This should be run before
>> the balance (reasons are complicated and require more explanation than you
>> probably care for).  I would run this at least weekly though on HDD's, as
>> they tend to be more negatively impacted by fragmentation.
>
> I wonder if one should always run a full balance instead of a full
> scrub, since balance should also read (and thus theoretically verify)
> the meta-/data (does it though? I would expect it to check the
> chekcsums, but who knows...? may be it's "optimized" to skip that
> step?) and also perform the "consolidation" of the chunk level.
Scrub uses fewer resources than balance.  Balance has to read _and_ 
re-write all data in the FS regardless of the state of the data.  Scrub 
only needs to read the data if it's good, and if it's bad it only (for 
raid1) has to re-write the replica that's bad, not both of them.  In 
fact, the only practical reason to run balance on a regular basis at all 
is to compact allocations and defragment free space.  This is why I only 
have it balance chunks that are less than 1/5 full.
>
> I wish there was some more "integrated" solution for this: a
> balance-like operation which consolidates the chunks and also
> de-fragments the file extents at the same time while passively
> uncovers (and fixes if necessary and possible) any checksum mismatches
> / data errors, so that balance and defrag can't work against
> each-other and the overall work is minimized (compared to several full
> runs or many different commands).
More than 90% of the time, the performance difference between the 
absolute optimal layout and the one generated by just running defrag 
then balancing is so small that it's insignificant.  The closer to the 
optimal layout you get, the lower the returns for optimizing further 
(and this applies to any filesystem in fact).  In essence, it's a bit 
like the traveling salesman problem, any arbitrary solution probably 
isn't optimal, but it's generally close enough to not matter.

As far as scrub fitting into all of this, I'd personally rather have a 
daemon that slowly (less than 1% bandwidth usage) scrubs the FS over 
time in the background and logs and fixes errors it encounters (similar 
to how filesystem scrubbing works in many clustered filesystems) instead 
of always having to manually invoke it and jump through hoops to keep 
the bandwidth usage reasonable.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17  9:18       ` Christoph Groth
@ 2017-01-17 12:32         ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-17 12:32 UTC (permalink / raw)
  To: Christoph Groth; +Cc: linux-btrfs

On 2017-01-17 04:18, Christoph Groth wrote:
> Austin S. Hemmelgarn wrote:
>
>> There's not really much in the way of great documentation that I know
>> of.  I can however cover the basics here:
>>
>> (...)
>
> Thanks for this explanation.  I'm sure it will be also useful to others.
Glad I could help.
>
>> If the chunk to be allocated was a data chunk, you get -ENOSPC
>> (usually, sometimes you might get other odd results) in the userspace
>> application that triggered the allocation.
>
> It seems that the available space reported by the system df command
> corresponds roughly to the size of the block device minus all the "used"
> space as reported by "btrfs fi df".
That's correct.
>
> If I understand what you wrote correctly this means that when writing a
> huge file it may happen that the system df will report enough free
> space, but btrfs will raise ENOSPC.  However, it should be possible to
> keep writing small files even at this point (assuming that there's
> enough space for the metadata).  Or will btrfs split the huge file into
> small pieces to fit it into the fragmented free space in the chunks?
OK, so the first bit to understanding this is that an extent in a file 
can't be larger than a chunk.  This means that if you have space for 3 
1GB data chunks located in 3 different places on the storage device, you 
can still write a 3GB file to the filesystem, it will just end up with 3 
1GB extents.  The issues with ENOSPC come in when almost all of your 
space is allocated to chunks and one type gets full.  In such a 
situation, if you have metadata space, you can keep writing to the FS, 
but big writes may fail, and you'll eventually end up in a situation 
where you need to delete things to free up space.
>
> Such a situation should be avoided of course.  I'm asking out of curiosity.
>
>>>>> * So scrubbing is not enough to check the health of a btrfs file
>>>>> system?  It’s also necessary to read all the files?
>>>
>>>> Scrubbing checks data integrity, but not the state of the data. IOW,
>>>> you're checking that the data and metadata match with the checksums,
>>>> but not necessarily that the filesystem itself is valid.
>>>
>>> I see, but what should one then do to detect problems such as mine as
>>> soon as possible?  Periodically calculate hashes for all files? I’ve
>>> never seen a recommendation to do that for btrfs.
>
>> Scrub will verify that the data is the same as when the kernel
>> calculated the block checksum.  That's really the best that can be
>> done. In your case, it couldn't correct the errors because both copies
>> of the corrupted blocks were bad (this points at an issue with either
>> RAM or the storage controller BTW, not the disks themselves).  Had one
>> of the copies been valid, it would have intelligently detected which
>> one was bad and fixed things.
>
> I think I understand the problem with the three corrupted blocks that I
> was able to fix by replacing the files.
>
> But there is also the strange "Stale file handle" error with some other
> files that was not found by scrubbing, and also does not seem to appear
> in the output of "btrfs dev stats", which is BTW
>
> [/dev/sda2].write_io_errs   0
> [/dev/sda2].read_io_errs    0
> [/dev/sda2].flush_io_errs   0
> [/dev/sda2].corruption_errs 3
> [/dev/sda2].generation_errs 0
> [/dev/sdb2].write_io_errs   0
> [/dev/sdb2].read_io_errs    0
> [/dev/sdb2].flush_io_errs   0
> [/dev/sdb2].corruption_errs 3
> [/dev/sdb2].generation_errs 0
>
> (The 2 times 3 corruption errors seem to be the uncorrectable errors
> that I could fix by replacing the files.)
Yep, those correspond directly to the uncorrectable errors you mentioned 
in your original post.
>
> To get the "stale file handle" error I need to try to read the affected
> file.  That's why I was wondering whether reading all the files
> periodically is indeed a useful maintenance procedure with btrfs.
In the cases I've seen, no it isn't all that useful.  As far as the 
whole ESTALE thing, that's almost certainly a bug and you either 
shouldn't be getting an error there, or you shouldn't be getting that 
error code there.
>
> "btrfs check" does find the problem, but it can be only run on an
> unmounted file system.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17 11:32     ` Goldwyn Rodrigues
@ 2017-01-17 20:25       ` Christoph Groth
  2017-01-17 21:52         ` Chris Murphy
  2017-01-17 22:57         ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 20:25 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

Goldwyn Rodrigues wrote:
> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>> Goldwyn Rodrigues wrote:
>> 
>>> Would you be able to upload a btrfs-image for me to 
>>> examine. This is a
>>> core ctree error where most probably item size is incorrectly 
>>> registered.
>> 
>> Sure, I can do that.  I'd like to use the -s option, will this 
>> be fine? 
>
> Yes, I think that should be fine.

Unfortunately, giving -s causes btrfs-image to segfault.  I tried 
both btrfs-progs 4.7.3 and 4.4.  I also tried different 
compression levels.

Without -s it works, but since this file system contains the 
complete digital life of our family, I would rather not share even 
the file names.

Any ideas on what could be done?  If you need help to debug the 
problem with btrfs-image, please tell me what I should do.  I can 
keep the broken file system around until an image can be created 
at some later time.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17 20:25       ` Christoph Groth
@ 2017-01-17 21:52         ` Chris Murphy
  2017-01-17 23:10           ` Christoph Groth
  2017-01-17 22:57         ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2017-01-17 21:52 UTC (permalink / raw)
  To: Christoph Groth; +Cc: Goldwyn Rodrigues, Btrfs BTRFS

On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
<christoph@grothesque.org> wrote:
> Goldwyn Rodrigues wrote:
>>
>> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>>>
>>> Goldwyn Rodrigues wrote:
>>>
>>>> Would you be able to upload a btrfs-image for me to examine. This is a
>>>> core ctree error where most probably item size is incorrectly
>>>> registered.
>>>
>>>
>>> Sure, I can do that.  I'd like to use the -s option, will this be fine?
>>
>>
>> Yes, I think that should be fine.
>
>
> Unfortunately, giving -s causes btrfs-image to segfault.  I tried both
> btrfs-progs 4.7.3 and 4.4.  I also tried different compression levels.
>
> Without -s it works, but since this file system contains the complete
> digital life of our family, I would rather not share even the file names.
>
> Any ideas on what could be done?  If you need help to debug the problem with
> btrfs-image, please tell me what I should do.  I can keep the broken file
> system around until an image can be created at some later time.

Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
although I don't know off hand if this particular bug is fixed. I did
recently do a btrfs-image with btrfs-progs v4.9 with -s and did not
get a segfault.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17 20:25       ` Christoph Groth
  2017-01-17 21:52         ` Chris Murphy
@ 2017-01-17 22:57         ` Goldwyn Rodrigues
  2017-01-17 23:22           ` Christoph Groth
  1 sibling, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-17 22:57 UTC (permalink / raw)
  To: Christoph Groth; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1119 bytes --]



On 01/17/2017 02:25 PM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>>> Goldwyn Rodrigues wrote:
>>>
>>>> Would you be able to upload a btrfs-image for me to examine. This is a
>>>> core ctree error where most probably item size is incorrectly
>>>> registered.
>>>
>>> Sure, I can do that.  I'd like to use the -s option, will this be fine? 
>>
>> Yes, I think that should be fine.
> 
> Unfortunately, giving -s causes btrfs-image to segfault.  I tried both
> btrfs-progs 4.7.3 and 4.4.  I also tried different compression levels.
> 
> Without -s it works, but since this file system contains the complete
> digital life of our family, I would rather not share even the file names.
> 
> Any ideas on what could be done?  If you need help to debug the problem
> with btrfs-image, please tell me what I should do.  I can keep the
> broken file system around until an image can be created at some later time.

As Chris mentioned, try a later version. If you are familiar with git,
you could even try the devel version.

-- 
Goldwyn


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17 21:52         ` Chris Murphy
@ 2017-01-17 23:10           ` Christoph Groth
  2017-01-18  7:13             ` gdb log of crashed "btrfs-image -s" Christoph Groth
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 23:10 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Goldwyn Rodrigues, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

Chris Murphy wrote:
> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
> <christoph@grothesque.org> wrote:
>> Any ideas on what could be done?  If you need help to debug the 
>> problem with
>> btrfs-image, please tell me what I should do.  I can keep the 
>> broken file
>> system around until an image can be created at some later time.
>
> Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
> although I don't know off hand if this particular bug is 
> fixed. I did
> recently do a btrfs-image with btrfs-progs v4.9 with -s and did 
> not
> get a segfault.

I compiled btrfs-image.static from btrfs-tools 4.9 (from git) and 
started it from Debian testing's initramfs.  The exact command 
that I use is:

/mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim

It runs for a couple of seconds (enough to write 20263936 bytes of 
output) and then quits with

*** Error in `/mnt/btrfs-image.static`: double free or corruption 
    (!prev): 0x00000000009f0940 ***
====== Backtrace: ======
[0x45fb97]
[0x465442]
[0x465c1e]
[0x402694]
[0x402dcb]
[0x4031fe]
[0x4050ff]
[0x405783]
[0x44cb73]
[0x44cdfe]
[0x400b2a]

(I had to type the above off the other screen, but I double 
checked that there are no errors.)

The executable that I used can be downloaded from 
http://groth.fr/btrfs-image.static
Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.

I hope that this can help someone to see what's going on.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Unocorrectable errors with RAID1
  2017-01-17 22:57         ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
@ 2017-01-17 23:22           ` Christoph Groth
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 23:22 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

Goldwyn Rodrigues wrote:

> As Chris mentioned, try a later version. If you are familiar 
> with git, you could even try the devel version.

Looking at the commits in current devel (2f4a73f9a612876116) since 
v4.9, there doesn't seem to be anything relevant, but I can retry, 
if you think it's worth.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* gdb log of crashed "btrfs-image -s"
  2017-01-17 23:10           ` Christoph Groth
@ 2017-01-18  7:13             ` Christoph Groth
  2017-01-18 11:49               ` Goldwyn Rodrigues
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-18  7:13 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Goldwyn Rodrigues, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 1601 bytes --]

Christoph Groth wrote:
> Chris Murphy wrote:
>> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
>> <christoph@grothesque.org> wrote:
>>> Any ideas on what could be done?  If you need help to debug 
>>> the problem with
>>> btrfs-image, please tell me what I should do.  I can keep the 
>>> broken file
>>> system around until an image can be created at some later 
>>> time.
>>
>> Try 4.9, or even 4.8.5, tons of bugs have been fixed since 
>> 4.7.3
>> although I don't know off hand if this particular bug is 
>> fixed. I did
>> recently do a btrfs-image with btrfs-progs v4.9 with -s and did 
>> not
>> get a segfault.
>
> I compiled btrfs-image.static from btrfs-tools 4.9 (from git) 
> and started it from Debian testing's initramfs.  The exact 
> command that I use is:
>
> /mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim
>
> It runs for a couple of seconds (enough to write 20263936 bytes 
> of output) and then quits with
>
> *** Error in `/mnt/btrfs-image.static`: double free or 
> corruption  (!prev): 0x00000000009f0940 ***
> ====== Backtrace: ======
> [0x45fb97]
> [0x465442]
> [0x465c1e]
> [0x402694]
> [0x402dcb]
> [0x4031fe]
> [0x4050ff]
> [0x405783]
> [0x44cb73]
> [0x44cdfe]
> [0x400b2a]
>
> (I had to type the above off the other screen, but I double 
> checked that there are no errors.)
>
> The executable that I used can be downloaded from 
> http://groth.fr/btrfs-image.static
> Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.
>
> I hope that this can help someone to see what's going on.

I ran the same executable under gdb from a live system.  The log 
is attached.


[-- Attachment #1.2: btrfs-image.log --]
[-- Type: application/octet-stream, Size: 4353 bytes --]

root@xubuntu:/media/xubuntu/wd1t# gdb btrfs-image.static
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from btrfs-image.static...done.
(gdb) run -c3 -s /dev/sda2 /media/xubuntu/wd1t/mim-s.bim
Starting program: /media/xubuntu/wd1t/btrfs-image.static -c3 -s /dev/sda2 /media/xubuntu/wd1t/mim-s.bim
[New LWP 2334]
[New LWP 2335]
[New LWP 2336]
[New LWP 2337]
*** Error in `/media/xubuntu/wd1t/btrfs-image.static': free(): invalid next size (normal): 0x0000000000762570 ***
======= Backtrace: =========
[0x45fb97]
[0x465442]
[0x465c1e]
[0x402dea]
[0x4031fe]
[0x4050ff]
[0x405783]
[0x44cb73]
[0x44cdfe]
[0x400b2a]
======= Memory map: ========
00400000-00521000 r-xp 00000000 08:31 689                                /media/xubuntu/wd1t/btrfs-image.static
00721000-00728000 rw-p 00121000 08:31 689                                /media/xubuntu/wd1t/btrfs-image.static
00728000-0085b000 rw-p 00000000 00:00 0                                  [heap]
7fffe0000000-7fffe01aa000 rw-p 00000000 00:00 0 
7fffe01aa000-7fffe4000000 ---p 00000000 00:00 0 
7fffe4000000-7fffe4025000 rw-p 00000000 00:00 0 
7fffe4025000-7fffe8000000 ---p 00000000 00:00 0 
7fffe8000000-7fffe8186000 rw-p 00000000 00:00 0 
7fffe8186000-7fffec000000 ---p 00000000 00:00 0 
7fffec000000-7fffec195000 rw-p 00000000 00:00 0 
7fffec195000-7ffff0000000 ---p 00000000 00:00 0 
7ffff0000000-7ffff01b0000 rw-p 00000000 00:00 0 
7ffff01b0000-7ffff4000000 ---p 00000000 00:00 0 
7ffff5ff6000-7ffff5ff7000 rw-p 00000000 00:00 0 
7ffff5ff7000-7ffff5ff8000 ---p 00000000 00:00 0 
7ffff5ff8000-7ffff67f8000 rw-p 00000000 00:00 0 
7ffff67f8000-7ffff67f9000 ---p 00000000 00:00 0 
7ffff67f9000-7ffff6ff9000 rw-p 00000000 00:00 0 
7ffff6ff9000-7ffff6ffa000 ---p 00000000 00:00 0 
7ffff6ffa000-7ffff77fa000 rw-p 00000000 00:00 0 
7ffff77fa000-7ffff77fb000 ---p 00000000 00:00 0 
7ffff77fb000-7ffff7ffb000 rw-p 00000000 00:00 0 
7ffff7ffb000-7ffff7ffd000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Thread 1 "btrfs-image.sta" received signal SIGABRT, Aborted.
0x00000000004521de in raise ()
(gdb) bt
#0  0x00000000004521de in raise ()
#1  0x00000000004523aa in abort ()
#2  0x000000000045fb9c in __libc_message ()
#3  0x0000000000465442 in malloc_printerr ()
#4  0x0000000000465c1e in _int_free ()
#5  0x0000000000402dea in sanitize_name (slot=<optimized out>, key=<synthetic pointer>, src=<optimized out>, dst=0x76c690 "4\246", <incomplete sequence \367\261>, 
    md=<optimized out>) at image/main.c:574
#6  zero_items (src=0x760450, dst=0x76c690 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:602
#7  copy_buffer (src=0x760450, dst=0x76c690 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:645
#8  flush_pending (md=md@entry=0x7fffffffddc0, done=done@entry=0) at image/main.c:983
#9  0x00000000004031fe in add_extent (start=start@entry=192593920, size=size@entry=4096, md=md@entry=0x7fffffffddc0, data=data@entry=0) at image/main.c:1025
#10 0x00000000004050ff in copy_from_extent_tree (path=0x7fffffffe390, metadump=0x7fffffffddc0) at image/main.c:1280
#11 create_metadump (input=input@entry=0x7fffffffe851 "/dev/sda2", out=out@entry=0x731be0, num_threads=num_threads@entry=4, compress_level=compress_level@entry=3, 
    sanitize=sanitize@entry=1, walk_trees=walk_trees@entry=0) at image/main.c:1370
#12 0x0000000000405783 in main (argc=<optimized out>, argv=0x7fffffffe5d8) at image/main.c:2855

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: gdb log of crashed "btrfs-image -s"
  2017-01-18  7:13             ` gdb log of crashed "btrfs-image -s" Christoph Groth
@ 2017-01-18 11:49               ` Goldwyn Rodrigues
  2017-01-18 20:11                 ` Christoph Groth
  0 siblings, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-18 11:49 UTC (permalink / raw)
  To: Christoph Groth, Chris Murphy; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2335 bytes --]



On 01/18/2017 01:13 AM, Christoph Groth wrote:
> Christoph Groth wrote:
>> Chris Murphy wrote:
>>> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
>>> <christoph@grothesque.org> wrote:
>>>> Any ideas on what could be done?  If you need help to debug the
>>>> problem with
>>>> btrfs-image, please tell me what I should do.  I can keep the broken
>>>> file
>>>> system around until an image can be created at some later time.
>>>
>>> Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
>>> although I don't know off hand if this particular bug is fixed. I did
>>> recently do a btrfs-image with btrfs-progs v4.9 with -s and did not
>>> get a segfault.
>>
>> I compiled btrfs-image.static from btrfs-tools 4.9 (from git) and
>> started it from Debian testing's initramfs.  The exact command that I
>> use is:
>>
>> /mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim
>>
>> It runs for a couple of seconds (enough to write 20263936 bytes of
>> output) and then quits with
>>
>> *** Error in `/mnt/btrfs-image.static`: double free or corruption 
>> (!prev): 0x00000000009f0940 ***
>> ====== Backtrace: ======
>> [0x45fb97]
>> [0x465442]
>> [0x465c1e]
>> [0x402694]
>> [0x402dcb]
>> [0x4031fe]
>> [0x4050ff]
>> [0x405783]
>> [0x44cb73]
>> [0x44cdfe]
>> [0x400b2a]
>>
>> (I had to type the above off the other screen, but I double checked
>> that there are no errors.)
>>
>> The executable that I used can be downloaded from
>> http://groth.fr/btrfs-image.static
>> Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.
>>
>> I hope that this can help someone to see what's going on.
> 
> I ran the same executable under gdb from a live system.  The log is
> attached.
> 

Thanks Christoph for the backtrace. I am unable to reproduce it, but
looking at your backtrace, I found a bug. Would you be able to give it a
try and check if it fixes the problem?

diff --git a/image/main.c b/image/main.c
index 58dcecb..0158844 100644
--- a/image/main.c
+++ b/image/main.c
@@ -550,7 +550,7 @@ static void sanitize_name(struct metadump_struct
*md, u8 *dst,
                return;
        }

-       memcpy(eb->data, dst, eb->len);
+       memcpy(eb->data, src->data, src->len);

        switch (key->type) {
        case BTRFS_DIR_ITEM_KEY:



-- 
Goldwyn


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: gdb log of crashed "btrfs-image -s"
  2017-01-18 11:49               ` Goldwyn Rodrigues
@ 2017-01-18 20:11                 ` Christoph Groth
  2017-01-23 12:09                   ` Goldwyn Rodrigues
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-18 20:11 UTC (permalink / raw)
  To: Goldwyn Rodrigues; +Cc: Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 452 bytes --]

Goldwyn Rodrigues wrote:
> Thanks Christoph for the backtrace. I am unable to reproduce it, 
> but looking at your backtrace, I found a bug. Would you be able 
> to give it a try and check if it fixes the problem?

I applied your patch to v4.9, and compiled the static binaries. 
Unfortunately, it still segfaults. (Perhaps your fix is correct, 
and there's a second problem?)  I attach a new backtrace.  Do let 
me know if I can help in another way.


[-- Attachment #1.2: btrfs-image2.log --]
[-- Type: application/octet-stream, Size: 4392 bytes --]

root@xubuntu:~# gdb /mnt/btrfs-image.static 
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/btrfs-image.static...done.
(gdb) run -s -c3 /dev/sda2 /mnt/mim.bim
Starting program: /mnt/btrfs-image.static -s -c3 /dev/sda2 /mnt/mim.bim
[New LWP 2334]
[New LWP 2335]
[New LWP 2336]
[New LWP 2337]
*** Error in `/mnt/btrfs-image.static': double free or corruption (out): 0x0000000000772f70 ***
======= Backtrace: =========
[0x45fba7]
[0x465452]
[0x465c2e]
[0x402694]
[0x402dce]
[0x403201]
[0x405102]
[0x405786]
[0x44cb83]
[0x44ce0e]
[0x400b2a]
======= Memory map: ========
00400000-00521000 r-xp 00000000 08:21 689                                /mnt/btrfs-image.static
00721000-00728000 rw-p 00121000 08:21 689                                /mnt/btrfs-image.static
00728000-007e4000 rw-p 00000000 00:00 0                                  [heap]
7fffe0000000-7fffe017e000 rw-p 00000000 00:00 0 
7fffe017e000-7fffe4000000 ---p 00000000 00:00 0 
7fffe4000000-7fffe4025000 rw-p 00000000 00:00 0 
7fffe4025000-7fffe8000000 ---p 00000000 00:00 0 
7fffe8000000-7fffe81a6000 rw-p 00000000 00:00 0 
7fffe81a6000-7fffec000000 ---p 00000000 00:00 0 
7fffec000000-7fffec17c000 rw-p 00000000 00:00 0 
7fffec17c000-7ffff0000000 ---p 00000000 00:00 0 
7ffff0000000-7ffff019a000 rw-p 00000000 00:00 0 
7ffff019a000-7ffff4000000 ---p 00000000 00:00 0 
7ffff5ff6000-7ffff5ff7000 rw-p 00000000 00:00 0 
7ffff5ff7000-7ffff5ff8000 ---p 00000000 00:00 0 
7ffff5ff8000-7ffff67f8000 rw-p 00000000 00:00 0 
7ffff67f8000-7ffff67f9000 ---p 00000000 00:00 0 
7ffff67f9000-7ffff6ff9000 rw-p 00000000 00:00 0 
7ffff6ff9000-7ffff6ffa000 ---p 00000000 00:00 0 
7ffff6ffa000-7ffff77fa000 rw-p 00000000 00:00 0 
7ffff77fa000-7ffff77fb000 ---p 00000000 00:00 0 
7ffff77fb000-7ffff7ffb000 rw-p 00000000 00:00 0 
7ffff7ffb000-7ffff7ffd000 r--p 00000000 00:00 0                          [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0                          [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0                          [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Thread 1 "btrfs-image.sta" received signal SIGABRT, Aborted.
0x00000000004521ee in raise ()
(gdb) bt
#0  0x00000000004521ee in raise ()
#1  0x00000000004523ba in abort ()
#2  0x000000000045fbac in __libc_message ()
#3  0x0000000000465452 in malloc_printerr ()
#4  0x0000000000465c2e in _int_free ()
#5  0x0000000000402694 in sanitize_inode_ref (md=md@entry=0x7fffffffde00, eb=eb@entry=0x771ee0, slot=slot@entry=16, ext=ext@entry=0) at image/main.c:522
#6  0x0000000000402dce in sanitize_name (slot=16, key=<synthetic pointer>, src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=0x7fffffffde00)
    at image/main.c:561
#7  zero_items (src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:602
#8  copy_buffer (src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:645
#9  flush_pending (md=md@entry=0x7fffffffde00, done=done@entry=0) at image/main.c:983
#10 0x0000000000403201 in add_extent (start=start@entry=192589824, size=size@entry=4096, md=md@entry=0x7fffffffde00, data=data@entry=0) at image/main.c:1025
#11 0x0000000000405102 in copy_from_extent_tree (path=0x7fffffffe3d0, metadump=0x7fffffffde00) at image/main.c:1280
#12 create_metadump (input=input@entry=0x7fffffffe87f "/dev/sda2", out=out@entry=0x731be0, num_threads=num_threads@entry=4, compress_level=compress_level@entry=3, 
    sanitize=sanitize@entry=1, walk_trees=walk_trees@entry=0) at image/main.c:1370
#13 0x0000000000405786 in main (argc=<optimized out>, argv=0x7fffffffe618) at image/main.c:2855


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: gdb log of crashed "btrfs-image -s"
  2017-01-18 20:11                 ` Christoph Groth
@ 2017-01-23 12:09                   ` Goldwyn Rodrigues
  0 siblings, 0 replies; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-23 12:09 UTC (permalink / raw)
  To: Christoph Groth; +Cc: Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 881 bytes --]



On 01/18/2017 02:11 PM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>> Thanks Christoph for the backtrace. I am unable to reproduce it, but
>> looking at your backtrace, I found a bug. Would you be able to give it
>> a try and check if it fixes the problem?
> 
> I applied your patch to v4.9, and compiled the static binaries.
> Unfortunately, it still segfaults. (Perhaps your fix is correct, and
> there's a second problem?)  I attach a new backtrace.  Do let me know if
> I can help in another way.

I looked hard, and could not find the reason of a failure here. The
bakctrace of the new one is a little different than previous one, but I
am not sure why it crashes. Until I have a reproduction scneario, I may
not be able to fix this. How about a core? However, a core will have
values which you are trying to mask with sanitize.


-- 
Goldwyn


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-01-23 12:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
2017-01-16 13:24 ` Austin S. Hemmelgarn
2017-01-16 15:42   ` Christoph Groth
2017-01-16 16:29     ` Austin S. Hemmelgarn
2017-01-17  4:50       ` Janos Toth F.
2017-01-17 12:25         ` Austin S. Hemmelgarn
2017-01-17  9:18       ` Christoph Groth
2017-01-17 12:32         ` Austin S. Hemmelgarn
2017-01-16 22:45 ` Goldwyn Rodrigues
2017-01-17  8:44   ` Christoph Groth
2017-01-17 11:32     ` Goldwyn Rodrigues
2017-01-17 20:25       ` Christoph Groth
2017-01-17 21:52         ` Chris Murphy
2017-01-17 23:10           ` Christoph Groth
2017-01-18  7:13             ` gdb log of crashed "btrfs-image -s" Christoph Groth
2017-01-18 11:49               ` Goldwyn Rodrigues
2017-01-18 20:11                 ` Christoph Groth
2017-01-23 12:09                   ` Goldwyn Rodrigues
2017-01-17 22:57         ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
2017-01-17 23:22           ` Christoph Groth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.