* Unocorrectable errors with RAID1
@ 2017-01-16 11:10 Christoph Groth
2017-01-16 13:24 ` Austin S. Hemmelgarn
2017-01-16 22:45 ` Goldwyn Rodrigues
0 siblings, 2 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-16 11:10 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 7876 bytes --]
Hi,
I’ve been using a btrfs RAID1 of two hard disks since early 2012
on my home server. The machine has been working well overall, but
recently some problems with the file system surfaced. Since I do
have backups, I do not worry about the data, but I post here to
better understand what happened. Also I cannot exclude that my
case is useful in some way to btrfs development.
First some information about the system:
root@mim:~# uname -a
Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64
GNU/Linux
root@mim:~# btrfs --version
btrfs-progs v4.7.3
root@mim:~# btrfs fi show
Label: none uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
Total devices 2 FS bytes used 345.97GiB
devid 1 size 465.29GiB used 420.06GiB path /dev/sda2
devid 2 size 465.29GiB used 420.04GiB path /dev/sdb2
root@mim:~# btrfs fi df /
Data, RAID1: total=417.00GiB, used=344.62GiB
Data, single: total=8.00MiB, used=0.00B
System, RAID1: total=40.00MiB, used=68.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=3.00GiB, used=1.35GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=464.00MiB, used=0.00B
root@mim:~# dmesg | grep -i btrfs
[ 4.165859] Btrfs loaded
[ 4.481712] BTRFS: device fsid
2da00153-f9ea-4d6c-a6cc-10c913d22686 devid 1 transid 2075354
/dev/sda2
[ 4.482025] BTRFS: device fsid
2da00153-f9ea-4d6c-a6cc-10c913d22686 devid 2 transid 2075354
/dev/sdb2
[ 4.521090] BTRFS info (device sdb2): disk space caching is
enabled
[ 4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr
0, rd 0, flush 0, corrupt 3, gen 0
[ 4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr
0, rd 0, flush 0, corrupt 3, gen 0
[ 18.315694] BTRFS info (device sdb2): disk space caching is
enabled
The disks themselves have been turning for almost 5 years by now,
but their SMART health is still fully satisfactory.
I noticed that something was wrong because printing stopped to
work. So I did a scrub that detected 0 "correctable errors" and 6
"uncorrectable" errors. The relevant bits from kern.log are:
Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device
sdb2): checksum error at logical 180829634560 on dev /dev/sdb2,
sector 353143968, root 5, inode 10014144, offset 221184, length
4096, links 1 (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device
sdb2): checksum error at logical 180829634560 on dev /dev/sda2,
sector 353182880, root 5, inode 10014144, offset 221184, length
4096, links 1 (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device
sdb2): checksum error at logical 260254629888 on dev /dev/sda2,
sector 508309824, root 5, inode 9990924, offset 6676480, length
4096, links 1 (path:
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device
sdb2): checksum error at logical 260254638080 on dev /dev/sda2,
sector 508309840, root 5, inode 9990924, offset 6684672, length
4096, links 1 (path:
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device
sdb2): checksum error at logical 260254629888 on dev /dev/sdb2,
sector 508270912, root 5, inode 9990924, offset 6676480, length
4096, links 1 (path:
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device
sdb2): checksum error at logical 260254638080 on dev /dev/sdb2,
sector 508270928, root 5, inode 9990924, offset 6684672, length
4096, links 1 (path:
var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
As you can see each disk has the same three errors, and there are
no other errors. Random bad blocks cannot explain this situation.
I asked on #btrfs and someone suggested that these errors are
likely due to RAM problems. This may indeed be the case, since
the machine has no ECC. I managed to fix these errors by
replacing the broken files with good copies. Scrubbing shows no
errors now:
root@mim:~# btrfs scrub status /
scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
scrub started at Sat Jan 14 12:52:03 2017 and finished
after 01:49:10
total bytes scrubbed: 699.17GiB with 0 errors
However, there are further problems. When trying to archive the
full filesystem I noticed that some files/directories cannot be
read. (The problem is localized to some ".git" directory that I
don’t need.) Any attempt to read the broken files (or to delete
them) does not work:
$ du -sh .git
du: cannot access
'.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such
file or directory
du: cannot access
'.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale
file handle
du: cannot access
'.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale
file handle
du: cannot access '.git/objects/info': Stale file handle
du: cannot access '.git/objects/pack': Stale file handle
During the above command the following lines were added to
kern.log:
Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device
sda2): corrupt leaf, slot offset bad: block=192561152,root=1,
slot=15
So I tried to repair the file system by running "btrfs check
--repair", but this doesn’t work:
(initramfs) btrfs --version
btrfs-progs v4.7.3
(initramfs) btrfs check --repair /dev/sda2
UUID: ...
checking extents
incorrect offsets 2527 2543
items overlap, can't fix
cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
btrfs[0x41a8b4]
btrfs[0x41a8db]
btrfs[0x42428b]
btrfs[0x424f83]
btrfs[0x4259cd]
btrfs(cmd_check+0x1111)[0x427d6d]
btrfs(main+0x12f)[0x40a341]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
btrfs(_start+0x2a)[0x40a37a]
I now have the following questions:
* So scrubbing is not enough to check the health of a btrfs file
system? It’s also necessary to read all the files?
* Any ideas what coud have caused the "stale file handle" errors?
Is there any way to fix them? Of course RAM errors can in
principle have _any_ consequences, but I would have hoped that
even without ECC RAM it’s practically inpossible to end up with
an unrepairable file system. Perhaps I simply had very bad
luck.
* I believe that btrfs RAID1 is considered reasonably safe for
production use by now. I want to replace that home server with
a new machine (still without ECC). Is it a good idea to use
btrfs for the main file system? I would certainly hope so! :-)
Thanks for your time,
Christoph
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
@ 2017-01-16 13:24 ` Austin S. Hemmelgarn
2017-01-16 15:42 ` Christoph Groth
2017-01-16 22:45 ` Goldwyn Rodrigues
1 sibling, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-16 13:24 UTC (permalink / raw)
To: Christoph Groth, linux-btrfs
On 2017-01-16 06:10, Christoph Groth wrote:
> Hi,
>
> I’ve been using a btrfs RAID1 of two hard disks since early 2012 on my
> home server. The machine has been working well overall, but recently
> some problems with the file system surfaced. Since I do have backups, I
> do not worry about the data, but I post here to better understand what
> happened. Also I cannot exclude that my case is useful in some way to
> btrfs development.
>
> First some information about the system:
>
> root@mim:~# uname -a
> Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 GNU/Linux
> root@mim:~# btrfs --version
> btrfs-progs v4.7.3
You get bonus points for being up-to-date both with the kernel and the
userspace tools.
> root@mim:~# btrfs fi show
> Label: none uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
> Total devices 2 FS bytes used 345.97GiB
> devid 1 size 465.29GiB used 420.06GiB path /dev/sda2
> devid 2 size 465.29GiB used 420.04GiB path /dev/sdb2
>
> root@mim:~# btrfs fi df /
> Data, RAID1: total=417.00GiB, used=344.62GiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=40.00MiB, used=68.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=3.00GiB, used=1.35GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=464.00MiB, used=0.00B
Just a general comment on this, you might want to consider running a
full balance on this filesystem, you've got a huge amount of slack space
in the data chunks (over 70GiB), and significant space in the Metadata
chunks that isn't accounted for by the GlobalReserve, as well as a
handful of empty single profile chunks which are artifacts from some old
versions of mkfs. This isn't of course essential, but keeping ahead of
such things does help sometimes when you have issues.
> root@mim:~# dmesg | grep -i btrfs
> [ 4.165859] Btrfs loaded
> [ 4.481712] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 1 transid 2075354 /dev/sda2
> [ 4.482025] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 2 transid 2075354 /dev/sdb2
> [ 4.521090] BTRFS info (device sdb2): disk space caching is enabled
> [ 4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [ 4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [ 18.315694] BTRFS info (device sdb2): disk space caching is enabled
>
> The disks themselves have been turning for almost 5 years by now, but
> their SMART health is still fully satisfactory.
>
> I noticed that something was wrong because printing stopped to work. So
> I did a scrub that detected 0 "correctable errors" and 6 "uncorrectable"
> errors. The relevant bits from kern.log are:
>
> Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sdb2, sector
> 353143968, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sda2, sector
> 353182880, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sda2, sector
> 508309824, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sda2, sector
> 508309840, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sdb2, sector
> 508270912, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sdb2, sector
> 508270928, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
>
> As you can see each disk has the same three errors, and there are no
> other errors. Random bad blocks cannot explain this situation. I asked
> on #btrfs and someone suggested that these errors are likely due to RAM
> problems. This may indeed be the case, since the machine has no ECC. I
> managed to fix these errors by replacing the broken files with good
> copies. Scrubbing shows no errors now:
>
> root@mim:~# btrfs scrub status /
> scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
> scrub started at Sat Jan 14 12:52:03 2017 and finished after
> 01:49:10
> total bytes scrubbed: 699.17GiB with 0 errors
>
> However, there are further problems. When trying to archive the full
> filesystem I noticed that some files/directories cannot be read. (The
> problem is localized to some ".git" directory that I don’t need.) Any
> attempt to read the broken files (or to delete them) does not work:
>
> $ du -sh .git
> du: cannot access
> '.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such file
> or directory
> du: cannot access
> '.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale file handle
> du: cannot access
> '.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale file handle
> du: cannot access '.git/objects/info': Stale file handle
> du: cannot access '.git/objects/pack': Stale file handle
>
> During the above command the following lines were added to kern.log:
>
> Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
>
> So I tried to repair the file system by running "btrfs check --repair",
> but this doesn’t work:
>
> (initramfs) btrfs --version
> btrfs-progs v4.7.3
> (initramfs) btrfs check --repair /dev/sda2
> UUID: ...
> checking extents
> incorrect offsets 2527 2543
> items overlap, can't fix
> cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
> btrfs[0x41a8b4]
> btrfs[0x41a8db]
> btrfs[0x42428b]
> btrfs[0x424f83]
> btrfs[0x4259cd]
> btrfs(cmd_check+0x1111)[0x427d6d]
> btrfs(main+0x12f)[0x40a341]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
> btrfs(_start+0x2a)[0x40a37a]
>
> I now have the following questions:
>
> * So scrubbing is not enough to check the health of a btrfs file
> system? It’s also necessary to read all the files?
Scrubbing checks data integrity, but not the state of the data. IOW,
you're checking that the data and metadata match with the checksums, but
not necessarily that the filesystem itself is valid.
>
> * Any ideas what coud have caused the "stale file handle" errors? Is
> there any way to fix them? Of course RAM errors can in principle have
> _any_ consequences, but I would have hoped that even without ECC RAM
> it’s practically inpossible to end up with an unrepairable file
> system. Perhaps I simply had very bad luck.
-ESTALE is _supposed_ to be a networked filesystem only thing. BTRFS
returns it somewhere, and I've been meaning to track down where (because
there is almost certainly a more correct error code to return there), I
just haven't had time to do so.
As far as RAM, it absolutely is possible for bad RAM or even just
transient memory errors to cause filesystem corruption. The disk itself
stores exactly what it was told to (in theory), so if it was told to
store bad data, it stores bad data. I've lost at least 3 filesystems
over the past 5 years just due to bad memory, although I've been
particularly unlucky in that respect. There are a few things you can do
to mitigate the risk of not using ECC RAM though:
* Reboot regularly, at least weekly, and possibly more frequently.
* Keep the system cool, warmer components are more likely to have
transient errors.
* Prefer fewer numbers of memory modules when possible. Fewer modules
means less total area that could be hit by cosmic rays or other
high-energy radiation (the main cause of most transient errors).
>
> * I believe that btrfs RAID1 is considered reasonably safe for
> production use by now. I want to replace that home server with a new
> machine (still without ECC). Is it a good idea to use btrfs for the
> main file system? I would certainly hope so! :-)
FWIW, this wasn't exactly an issue with BTRFS, any other filesystem
would have failed similarly, although others likely would have done more
damage (instead of failing to load libcups due to -EIO, you would have
seen seemingly random segfaults from apps using it when they tried to
use the corrupted data). In fact, if it weren't for the fact that
you're using BTRFS, it likely would have taken longer for you to figure
out what had happened. If you were using ext4 (or XFS, or almost any
other filesystem except for ZFS), you likely would have had no
indication that anything was wrong other than printing not working until
you re-installed whatever package included libcups.
As far as raid1 mode in particular, I consider it stable, and quite a
few other people do, but even the most stable software has issues from
time to time, but I have not lost a single filesystem using raid1 mode
to a filesystem bug since at least kernel 3.16. I have lost a few to
hardware issues, but if I hadn't been using BTRFS I wouldn't have
figured out nearly as quickly that I had said hardware issues.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 13:24 ` Austin S. Hemmelgarn
@ 2017-01-16 15:42 ` Christoph Groth
2017-01-16 16:29 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-16 15:42 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: linux-btrfs
Austin S. Hemmelgarn wrote:
> On 2017-01-16 06:10, Christoph Groth wrote:
>> root@mim:~# btrfs fi df /
>> Data, RAID1: total=417.00GiB, used=344.62GiB
>> Data, single: total=8.00MiB, used=0.00B
>> System, RAID1: total=40.00MiB, used=68.00KiB
>> System, single: total=4.00MiB, used=0.00B
>> Metadata, RAID1: total=3.00GiB, used=1.35GiB
>> Metadata, single: total=8.00MiB, used=0.00B
>> GlobalReserve, single: total=464.00MiB, used=0.00B
> Just a general comment on this, you might want to consider
> running a full balance on this filesystem, you've got a huge
> amount of slack space in the data chunks (over 70GiB), and
> significant space in the Metadata chunks that isn't accounted
> for by the GlobalReserve, as well as a handful of empty single
> profile chunks which are artifacts from some old versions of
> mkfs. This isn't of course essential, but keeping ahead of such
> things does help sometimes when you have issues.
Thanks! So slack is the difference between "total" and "used"? I
saw that the manpage of "btrfs balance" explains this a bit in its
"examples" section. Are you aware of any more in-depth
documentation? Or one has to look at the source at this level?
I ran
btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /
btrfs balance start -dusage=25 -musage=25 /
This resulted in
root@mim:~# btrfs fi df /
Data, RAID1: total=365.00GiB, used=344.61GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=2.00GiB, used=1.35GiB
GlobalReserve, single: total=460.00MiB, used=0.00B
I hope that one day there will be a daemon that silently performs
all the necessary btrfs maintenance in the background when system
load is low!
>> * So scrubbing is not enough to check the health of a btrfs
>> file system? It’s also necessary to read all the files?
> Scrubbing checks data integrity, but not the state of the data.
> IOW, you're checking that the data and metadata match with the
> checksums, but not necessarily that the filesystem itself is
> valid.
I see, but what should one then do to detect problems such as mine
as soon as possible? Periodically calculate hashes for all files?
I’ve never seen a recommendation to do that for btrfs.
> There are a few things you can do to mitigate the risk of not
> using ECC RAM though:
> * Reboot regularly, at least weekly, and possibly more
> frequently.
> * Keep the system cool, warmer components are more likely to
> have transient errors.
> * Prefer fewer numbers of memory modules when possible. Fewer
> modules means less total area that could be hit by cosmic rays
> or other high-energy radiation (the main cause of most transient
> errors).
Thanks for the advice, I think I buy the regular reboots.
As a consequence of my problem I think I’ll stop using RAID1 on
the file server, since this only protects against dead disks,
which evidently is only part of the problem. Instead, I’ll make
sure that the laptop that syncs with the server has a SSD that is
big enough to hold all the data that is on the server as well (1
TB SSDs are affordable now). This way, instead of disk-level
redundancy, I’ll have machine-level redundancy. When something
like the current problem hits one of the two machines, I should
still have a usable second machine with all the data on it.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 15:42 ` Christoph Groth
@ 2017-01-16 16:29 ` Austin S. Hemmelgarn
2017-01-17 4:50 ` Janos Toth F.
2017-01-17 9:18 ` Christoph Groth
0 siblings, 2 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-16 16:29 UTC (permalink / raw)
To: Christoph Groth; +Cc: linux-btrfs
On 2017-01-16 10:42, Christoph Groth wrote:
> Austin S. Hemmelgarn wrote:
>> On 2017-01-16 06:10, Christoph Groth wrote:
>
>>> root@mim:~# btrfs fi df /
>>> Data, RAID1: total=417.00GiB, used=344.62GiB
>>> Data, single: total=8.00MiB, used=0.00B
>>> System, RAID1: total=40.00MiB, used=68.00KiB
>>> System, single: total=4.00MiB, used=0.00B
>>> Metadata, RAID1: total=3.00GiB, used=1.35GiB
>>> Metadata, single: total=8.00MiB, used=0.00B
>>> GlobalReserve, single: total=464.00MiB, used=0.00B
>
>> Just a general comment on this, you might want to consider running a
>> full balance on this filesystem, you've got a huge amount of slack
>> space in the data chunks (over 70GiB), and significant space in the
>> Metadata chunks that isn't accounted for by the GlobalReserve, as well
>> as a handful of empty single profile chunks which are artifacts from
>> some old versions of mkfs. This isn't of course essential, but
>> keeping ahead of such things does help sometimes when you have issues.
>
> Thanks! So slack is the difference between "total" and "used"? I saw
> that the manpage of "btrfs balance" explains this a bit in its
> "examples" section. Are you aware of any more in-depth documentation?
> Or one has to look at the source at this level?
There's not really much in the way of great documentation that I know
of. I can however cover the basics here:
BTRFS uses a 2 level allocation system. At the higher level, you have
chunks. These are just big blocks of space on the disk that get used
for only one type of lower level allocation (Data, Metadata, or System).
Data chunks are normally 1GB, Metadata 256MB, and System depends on
the size of the FS when it was created. Within these chunks, BTRFS then
allocates individual blocks just like any other filesystem. When there
is no free space in any existing chunks for a new block that needs
allocated, a new chunk is allocated. Newly allocated chunks may be
larger (if the filesystem is really big) or smaller (if the FS doesn't
have much free space left at the chunk level) than the default. In the
event that BTRFS can't allocate a new chunk because there's no room, a
couple of different things could happen. If the chunk to be allocated
was a data chunk, you get -ENOSPC (usually, sometimes you might get
other odd results) in the userspace application that triggered the
allocation. However, if BTRFS needs room for metadata, then it will try
to use the GlobalReserve instead. This is a special area within the
metadata chunks that's reserved for internal operations and trying to
get out of free space exhaustion situations. If that fails, then the
filesystem is functionally dead, reads will still work, and you might be
able to write very small amounts of data at a time, but it's not
possible from a practical perspective to recover a filesystem in such a
situation.
The 'total' value in fi df output is the total space allocated to chunks
of that type, while the 'used' value is how much is actually being used.
It's worth noting that since GlobalReserve is a part of the Metadata
chunks, the total there is part of the total for Metadata, but not the
used value (so in an ideal situation with no slack space at the block
level, you would still see a difference between metadata total and used
equal to the global reserve total).
What balancing does is send everything back through the allocator, which
in turn back-fills chunks that are only partially full, and removes ones
that are now empty. In normal usage, it's not absolutely needed. From
a practical perspective though, it's generally a good idea to keep the
slack space (the difference between total and used) within chunks to a
minimum to try and avoid getting the filesystem stuck with no free space
at the chunk level.
>
> I ran
>
> btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft /
> btrfs balance start -dusage=25 -musage=25 /
>
> This resulted in
>
> root@mim:~# btrfs fi df /
> Data, RAID1: total=365.00GiB, used=344.61GiB
> System, RAID1: total=32.00MiB, used=64.00KiB
> Metadata, RAID1: total=2.00GiB, used=1.35GiB
> GlobalReserve, single: total=460.00MiB, used=0.00B
This is a much saner looking FS, you've only got about 20GB of slack in
Data chunks, and less than 1GB in metadata, which is reasonable given
the size of the FS and how much data you have on it. Ideal values for
both are actually hard to determine, as having no slack in the chunks
actually hurts performance a bit, and the ideal values depend on how
much your workloads hit each type of chunk.
>
> I hope that one day there will be a daemon that silently performs all
> the necessary btrfs maintenance in the background when system load is low!
FWIW, while there isn't a daemon yet that does this, it's a perfect
thing for a cronjob. The general maintenance regimen that I use for
most of my filesystems is:
* Run 'btrfs balance start -dusage=20 -musage=20' daily. This will
complete really fast on most filesystems, and keeps the slack-space
relatively under-control (and has the nice bonus that it helps
defragment free space.
* Run a full scrub on all filesystems weekly. This catches silent
corruption of the data, and will fix it if possible.
* Run a full defrag on all filesystems monthly. This should be run
before the balance (reasons are complicated and require more explanation
than you probably care for). I would run this at least weekly though on
HDD's, as they tend to be more negatively impacted by fragmentation.
There are a couple of other things I also do (fstrim and punching holes
in large files to make them sparse), but they're not really BTRFS
specific. Overall, with a decent SSD (I usually use Crucial MX series
SSD's in my personal systems), these have near zero impact most of the
time, and with decent HDD's, you should have limited issues as long as
you run on only one FS at a time.
>
>>> * So scrubbing is not enough to check the health of a btrfs file
>>> system? It’s also necessary to read all the files?
>
>> Scrubbing checks data integrity, but not the state of the data. IOW,
>> you're checking that the data and metadata match with the checksums,
>> but not necessarily that the filesystem itself is valid.
>
> I see, but what should one then do to detect problems such as mine as
> soon as possible? Periodically calculate hashes for all files? I’ve
> never seen a recommendation to do that for btrfs.
Scrub will verify that the data is the same as when the kernel
calculated the block checksum. That's really the best that can be done.
In your case, it couldn't correct the errors because both copies of
the corrupted blocks were bad (this points at an issue with either RAM
or the storage controller BTW, not the disks themselves). Had one of
the copies been valid, it would have intelligently detected which one
was bad and fixed things. It's worth noting that the combination of
checksumming and scrub actually provides more stringent data integrity
guarantees than any other widely used filesystem except ZFS.
As far as general monitoring, in addition to scrubbing (and obviously
watching SMART status) you want to check the output of 'btrfs device
stats' for non-zero error counters (these are cumulative counters that
are only reset when the user says to do so, so right now they'll show
aggregate data for the life of the FS), and if you're paranoid, watch
that the mount options on the FS don't change (some monitoring software
such as Monit makes this insanely easy to do), as the FS will go
read-only if a severe error is detected (stuff like a failed read at the
device level, not just checksum errors).
>
>> There are a few things you can do to mitigate the risk of not using
>> ECC RAM though:
>> * Reboot regularly, at least weekly, and possibly more frequently.
>> * Keep the system cool, warmer components are more likely to have
>> transient errors.
>> * Prefer fewer numbers of memory modules when possible. Fewer modules
>> means less total area that could be hit by cosmic rays or other
>> high-energy radiation (the main cause of most transient errors).
>
> Thanks for the advice, I think I buy the regular reboots.
>
> As a consequence of my problem I think I’ll stop using RAID1 on the file
> server, since this only protects against dead disks, which evidently is
> only part of the problem. Instead, I’ll make sure that the laptop that
> syncs with the server has a SSD that is big enough to hold all the data
> that is on the server as well (1 TB SSDs are affordable now). This way,
> instead of disk-level redundancy, I’ll have machine-level redundancy.
> When something like the current problem hits one of the two machines, I
> should still have a usable second machine with all the data on it.
I actually have a similar situation, I've got a laptop that I back-up to
a personal server system. In my case though, I've take a much
higher-level approach, the backup storage is in fact GlusterFS (a
clustered filesystem) running on top of BTRFS on 3 different systems
(the server, plus a pair of Intel NUC's that are just dedicated SAN
systems). If I didn't have the hardware to do this or cared about
performance more (I'm lucky if I get 20MB/s write speed, but most of the
issue is that I went cheap on the NUC's), I would probably still be
using BTRFS in raid1 mode on the server despite keeping a copy on the
laptop, simply because that provides an extra layer of protection on the
server side.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
2017-01-16 13:24 ` Austin S. Hemmelgarn
@ 2017-01-16 22:45 ` Goldwyn Rodrigues
2017-01-17 8:44 ` Christoph Groth
1 sibling, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-16 22:45 UTC (permalink / raw)
To: Christoph Groth, linux-btrfs
On 01/16/2017 05:10 AM, Christoph Groth wrote:
> Hi,
>
> I’ve been using a btrfs RAID1 of two hard disks since early 2012 on my
> home server. The machine has been working well overall, but recently
> some problems with the file system surfaced. Since I do have backups, I
> do not worry about the data, but I post here to better understand what
> happened. Also I cannot exclude that my case is useful in some way to
> btrfs development.
>
> First some information about the system:
>
> root@mim:~# uname -a
> Linux mim 4.6.0-1-amd64 #1 SMP Debian 4.6.3-1 (2016-07-04) x86_64 GNU/Linux
> root@mim:~# btrfs --version
> btrfs-progs v4.7.3
> root@mim:~# btrfs fi show
> Label: none uuid: 2da00153-f9ea-4d6c-a6cc-10c913d22686
> Total devices 2 FS bytes used 345.97GiB
> devid 1 size 465.29GiB used 420.06GiB path /dev/sda2
> devid 2 size 465.29GiB used 420.04GiB path /dev/sdb2
>
> root@mim:~# btrfs fi df /
> Data, RAID1: total=417.00GiB, used=344.62GiB
> Data, single: total=8.00MiB, used=0.00B
> System, RAID1: total=40.00MiB, used=68.00KiB
> System, single: total=4.00MiB, used=0.00B
> Metadata, RAID1: total=3.00GiB, used=1.35GiB
> Metadata, single: total=8.00MiB, used=0.00B
> GlobalReserve, single: total=464.00MiB, used=0.00B
> root@mim:~# dmesg | grep -i btrfs
> [ 4.165859] Btrfs loaded
> [ 4.481712] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 1 transid 2075354 /dev/sda2
> [ 4.482025] BTRFS: device fsid 2da00153-f9ea-4d6c-a6cc-10c913d22686
> devid 2 transid 2075354 /dev/sdb2
> [ 4.521090] BTRFS info (device sdb2): disk space caching is enabled
> [ 4.628506] BTRFS info (device sdb2): bdev /dev/sdb2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [ 4.628521] BTRFS info (device sdb2): bdev /dev/sda2 errs: wr 0, rd
> 0, flush 0, corrupt 3, gen 0
> [ 18.315694] BTRFS info (device sdb2): disk space caching is enabled
>
> The disks themselves have been turning for almost 5 years by now, but
> their SMART health is still fully satisfactory.
>
> I noticed that something was wrong because printing stopped to work. So
> I did a scrub that detected 0 "correctable errors" and 6 "uncorrectable"
> errors. The relevant bits from kern.log are:
>
> Jan 11 11:05:56 mim kernel: [159873.938579] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sdb2, sector
> 353143968, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:05:57 mim kernel: [159874.857132] BTRFS warning (device sdb2):
> checksum error at logical 180829634560 on dev /dev/sda2, sector
> 353182880, root 5, inode 10014144, offset 221184, length 4096, links 1
> (path: usr/lib/x86_64-linux-gnu/libcups.so.2)
> Jan 11 11:28:42 mim kernel: [161240.083721] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sda2, sector
> 508309824, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:28:42 mim kernel: [161240.235837] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sda2, sector
> 508309840, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.725120] BTRFS warning (device sdb2):
> checksum error at logical 260254629888 on dev /dev/sdb2, sector
> 508270912, root 5, inode 9990924, offset 6676480, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
> Jan 11 11:37:21 mim kernel: [161759.750251] BTRFS warning (device sdb2):
> checksum error at logical 260254638080 on dev /dev/sdb2, sector
> 508270928, root 5, inode 9990924, offset 6684672, length 4096, links 1
> (path:
> var/lib/apt/lists/ftp.fr.debian.org_debian_dists_unstable_main_binary-amd64_Packages)
>
>
> As you can see each disk has the same three errors, and there are no
> other errors. Random bad blocks cannot explain this situation. I asked
> on #btrfs and someone suggested that these errors are likely due to RAM
> problems. This may indeed be the case, since the machine has no ECC. I
> managed to fix these errors by replacing the broken files with good
> copies. Scrubbing shows no errors now:
>
> root@mim:~# btrfs scrub status /
> scrub status for 2da00153-f9ea-4d6c-a6cc-10c913d22686
> scrub started at Sat Jan 14 12:52:03 2017 and finished after
> 01:49:10
> total bytes scrubbed: 699.17GiB with 0 errors
>
> However, there are further problems. When trying to archive the full
> filesystem I noticed that some files/directories cannot be read. (The
> problem is localized to some ".git" directory that I don’t need.) Any
> attempt to read the broken files (or to delete them) does not work:
>
> $ du -sh .git
> du: cannot access
> '.git/objects/28/ea2aae3fe57ab4328adaa8b79f3c1cf005dd8d': No such file
> or directory
> du: cannot access
> '.git/objects/28/fd95a5e9d08b6684819ce6e3d39d99e2ecccd5': Stale file handle
> du: cannot access
> '.git/objects/28/52e887ed436ed2c549b20d4f389589b7b58e09': Stale file handle
> du: cannot access '.git/objects/info': Stale file handle
> du: cannot access '.git/objects/pack': Stale file handle
>
> During the above command the following lines were added to kern.log:
>
> Jan 16 09:41:34 mim kernel: [132206.957566] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.957924] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958505] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.958971] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959534] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.959874] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960523] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
> Jan 16 09:41:34 mim kernel: [132206.960943] BTRFS critical (device
> sda2): corrupt leaf, slot offset bad: block=192561152,root=1, slot=15
>
> So I tried to repair the file system by running "btrfs check --repair",
> but this doesn’t work:
>
> (initramfs) btrfs --version
> btrfs-progs v4.7.3
> (initramfs) btrfs check --repair /dev/sda2
> UUID: ...
> checking extents
> incorrect offsets 2527 2543
> items overlap, can't fix
> cmds-check.c:4297: fix_item_offset: Assertion `ret` failed.
> btrfs[0x41a8b4]
> btrfs[0x41a8db]
> btrfs[0x42428b]
> btrfs[0x424f83]
> btrfs[0x4259cd]
> btrfs(cmd_check+0x1111)[0x427d6d]
> btrfs(main+0x12f)[0x40a341]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7fd98859d2b1]
> btrfs(_start+0x2a)[0x40a37a]
>
Would you be able to upload a btrfs-image for me to examine. This is a
core ctree error where most probably item size is incorrectly registered.
Thanks,
--
--
Goldwyn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 16:29 ` Austin S. Hemmelgarn
@ 2017-01-17 4:50 ` Janos Toth F.
2017-01-17 12:25 ` Austin S. Hemmelgarn
2017-01-17 9:18 ` Christoph Groth
1 sibling, 1 reply; 20+ messages in thread
From: Janos Toth F. @ 2017-01-17 4:50 UTC (permalink / raw)
To: Btrfs BTRFS
> BTRFS uses a 2 level allocation system. At the higher level, you have
> chunks. These are just big blocks of space on the disk that get used for
> only one type of lower level allocation (Data, Metadata, or System). Data
> chunks are normally 1GB, Metadata 256MB, and System depends on the size of
> the FS when it was created. Within these chunks, BTRFS then allocates
> individual blocks just like any other filesystem.
This always seems to confuse me when I try to get an abstract idea
about de-/fragmentation of Btrfs.
Can meta-/data be fragmented on both levels? And if so, can defrag
and/or balance "cure" both levels of fragmentation (if any)?
But how? May be several defrag and balance runs, repeated until
returns diminish (or at least you consider them meaningless and/or
unnecessary)?
> What balancing does is send everything back through the allocator, which in
> turn back-fills chunks that are only partially full, and removes ones that
> are now empty.
Does't this have a potential chance of introducing (additional)
extent-level fragmentation?
> FWIW, while there isn't a daemon yet that does this, it's a perfect thing
> for a cronjob. The general maintenance regimen that I use for most of my
> filesystems is:
> * Run 'btrfs balance start -dusage=20 -musage=20' daily. This will complete
> really fast on most filesystems, and keeps the slack-space relatively
> under-control (and has the nice bonus that it helps defragment free space.
> * Run a full scrub on all filesystems weekly. This catches silent
> corruption of the data, and will fix it if possible.
> * Run a full defrag on all filesystems monthly. This should be run before
> the balance (reasons are complicated and require more explanation than you
> probably care for). I would run this at least weekly though on HDD's, as
> they tend to be more negatively impacted by fragmentation.
I wonder if one should always run a full balance instead of a full
scrub, since balance should also read (and thus theoretically verify)
the meta-/data (does it though? I would expect it to check the
chekcsums, but who knows...? may be it's "optimized" to skip that
step?) and also perform the "consolidation" of the chunk level.
I wish there was some more "integrated" solution for this: a
balance-like operation which consolidates the chunks and also
de-fragments the file extents at the same time while passively
uncovers (and fixes if necessary and possible) any checksum mismatches
/ data errors, so that balance and defrag can't work against
each-other and the overall work is minimized (compared to several full
runs or many different commands).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 22:45 ` Goldwyn Rodrigues
@ 2017-01-17 8:44 ` Christoph Groth
2017-01-17 11:32 ` Goldwyn Rodrigues
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 8:44 UTC (permalink / raw)
To: Goldwyn Rodrigues; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 339 bytes --]
Goldwyn Rodrigues wrote:
> Would you be able to upload a btrfs-image for me to
> examine. This is a core ctree error where most probably item
> size is incorrectly registered.
Sure, I can do that. I'd like to use the -s option, will this be
fine? Is there some preferred place for the upload? If not, I
can use personal webspace.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-16 16:29 ` Austin S. Hemmelgarn
2017-01-17 4:50 ` Janos Toth F.
@ 2017-01-17 9:18 ` Christoph Groth
2017-01-17 12:32 ` Austin S. Hemmelgarn
1 sibling, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 9:18 UTC (permalink / raw)
To: Austin S. Hemmelgarn; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3217 bytes --]
Austin S. Hemmelgarn wrote:
> There's not really much in the way of great documentation that I
> know of. I can however cover the basics here:
>
> (...)
Thanks for this explanation. I'm sure it will be also useful to
others.
> If the chunk to be allocated was a data chunk, you get -ENOSPC
> (usually, sometimes you might get other odd results) in the
> userspace application that triggered the allocation.
It seems that the available space reported by the system df
command corresponds roughly to the size of the block device minus
all the "used" space as reported by "btrfs fi df".
If I understand what you wrote correctly this means that when
writing a huge file it may happen that the system df will report
enough free space, but btrfs will raise ENOSPC. However, it
should be possible to keep writing small files even at this point
(assuming that there's enough space for the metadata). Or will
btrfs split the huge file into small pieces to fit it into the
fragmented free space in the chunks?
Such a situation should be avoided of course. I'm asking out of
curiosity.
>>>> * So scrubbing is not enough to check the health of a btrfs
>>>> file system? It’s also necessary to read all the files?
>>
>>> Scrubbing checks data integrity, but not the state of the
>>> data. IOW, you're checking that the data and metadata match
>>> with the checksums, but not necessarily that the filesystem
>>> itself is valid.
>>
>> I see, but what should one then do to detect problems such as
>> mine as soon as possible? Periodically calculate hashes for
>> all files? I’ve never seen a recommendation to do that for
>> btrfs.
> Scrub will verify that the data is the same as when the kernel
> calculated the block checksum. That's really the best that can
> be done. In your case, it couldn't correct the errors because
> both copies of the corrupted blocks were bad (this points at an
> issue with either RAM or the storage controller BTW, not the
> disks themselves). Had one of the copies been valid, it would
> have intelligently detected which one was bad and fixed things.
I think I understand the problem with the three corrupted blocks
that I was able to fix by replacing the files.
But there is also the strange "Stale file handle" error with some
other files that was not found by scrubbing, and also does not
seem to appear in the output of "btrfs dev stats", which is BTW
[/dev/sda2].write_io_errs 0
[/dev/sda2].read_io_errs 0
[/dev/sda2].flush_io_errs 0
[/dev/sda2].corruption_errs 3
[/dev/sda2].generation_errs 0
[/dev/sdb2].write_io_errs 0
[/dev/sdb2].read_io_errs 0
[/dev/sdb2].flush_io_errs 0
[/dev/sdb2].corruption_errs 3
[/dev/sdb2].generation_errs 0
(The 2 times 3 corruption errors seem to be the uncorrectable
errors that I could fix by replacing the files.)
To get the "stale file handle" error I need to try to read the
affected file. That's why I was wondering whether reading all the
files periodically is indeed a useful maintenance procedure with
btrfs.
"btrfs check" does find the problem, but it can be only run on an
unmounted file system.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 8:44 ` Christoph Groth
@ 2017-01-17 11:32 ` Goldwyn Rodrigues
2017-01-17 20:25 ` Christoph Groth
0 siblings, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-17 11:32 UTC (permalink / raw)
To: Christoph Groth; +Cc: linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 541 bytes --]
On 01/17/2017 02:44 AM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>
>> Would you be able to upload a btrfs-image for me to examine. This is a
>> core ctree error where most probably item size is incorrectly registered.
>
> Sure, I can do that. I'd like to use the -s option, will this be fine?
Yes, I think that should be fine.
> Is there some preferred place for the upload? If not, I can use
> personal webspace.
No, there is no preferred place. As far as I can download it, it is fine.
--
Goldwyn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 4:50 ` Janos Toth F.
@ 2017-01-17 12:25 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-17 12:25 UTC (permalink / raw)
To: Janos Toth F., Btrfs BTRFS
On 2017-01-16 23:50, Janos Toth F. wrote:
>> BTRFS uses a 2 level allocation system. At the higher level, you have
>> chunks. These are just big blocks of space on the disk that get used for
>> only one type of lower level allocation (Data, Metadata, or System). Data
>> chunks are normally 1GB, Metadata 256MB, and System depends on the size of
>> the FS when it was created. Within these chunks, BTRFS then allocates
>> individual blocks just like any other filesystem.
>
> This always seems to confuse me when I try to get an abstract idea
> about de-/fragmentation of Btrfs.
> Can meta-/data be fragmented on both levels? And if so, can defrag
> and/or balance "cure" both levels of fragmentation (if any)?
> But how? May be several defrag and balance runs, repeated until
> returns diminish (or at least you consider them meaningless and/or
> unnecessary)?
Defrag operates only at the block level. It won't allocate chunks
unless it has to, and it won't remove chunks unless they become empty
from it moving things around (although that's not likely to happen most
of the time). Balance functionally operates at both levels, but it
doesn't really do any defragmentation. Balance _may_ merge extents
sometimes, but I'm not sure of this. It will compact allocations and
therefore functionally defragment free space within chunks (though not
necessarily at the chunk-level itself).
Defrag run with the same options _should_ have no net effect after the
first run, the two exceptions being if the filesystem is close to full
or if the data set is being modified live while the defrag is happening.
Balance run with the same options will eventually hit a point where it
doesn't do anything (or only touches one chunk of each type but doesn't
actually give any benefit). If you're just using the usage filters or
doing a full balance, this point is the second run. If you're using
other filters, it's functionally not possible to determine when that
point will be without low-level knowledge of the chunk layout.
For an idle filesystem, if you run defrag then a full balance, that will
get you a near optimal layout. Running them in the reverse order will
get you a different layout that may be less optimal than running defrag
first because defrag may move data in such a way that new chunks get
allocated. Repeated runs of defrag and balance will in more than 95% of
cases provide no extra benefit.
>
>
>> What balancing does is send everything back through the allocator, which in
>> turn back-fills chunks that are only partially full, and removes ones that
>> are now empty.
>
> Does't this have a potential chance of introducing (additional)
> extent-level fragmentation?
In theory, yes. IIRC, extents can't cross a chunk boundary. Beyond
that packing constraint, balance shouldn't fragment things further.
>
>> FWIW, while there isn't a daemon yet that does this, it's a perfect thing
>> for a cronjob. The general maintenance regimen that I use for most of my
>> filesystems is:
>> * Run 'btrfs balance start -dusage=20 -musage=20' daily. This will complete
>> really fast on most filesystems, and keeps the slack-space relatively
>> under-control (and has the nice bonus that it helps defragment free space.
>> * Run a full scrub on all filesystems weekly. This catches silent
>> corruption of the data, and will fix it if possible.
>> * Run a full defrag on all filesystems monthly. This should be run before
>> the balance (reasons are complicated and require more explanation than you
>> probably care for). I would run this at least weekly though on HDD's, as
>> they tend to be more negatively impacted by fragmentation.
>
> I wonder if one should always run a full balance instead of a full
> scrub, since balance should also read (and thus theoretically verify)
> the meta-/data (does it though? I would expect it to check the
> chekcsums, but who knows...? may be it's "optimized" to skip that
> step?) and also perform the "consolidation" of the chunk level.
Scrub uses fewer resources than balance. Balance has to read _and_
re-write all data in the FS regardless of the state of the data. Scrub
only needs to read the data if it's good, and if it's bad it only (for
raid1) has to re-write the replica that's bad, not both of them. In
fact, the only practical reason to run balance on a regular basis at all
is to compact allocations and defragment free space. This is why I only
have it balance chunks that are less than 1/5 full.
>
> I wish there was some more "integrated" solution for this: a
> balance-like operation which consolidates the chunks and also
> de-fragments the file extents at the same time while passively
> uncovers (and fixes if necessary and possible) any checksum mismatches
> / data errors, so that balance and defrag can't work against
> each-other and the overall work is minimized (compared to several full
> runs or many different commands).
More than 90% of the time, the performance difference between the
absolute optimal layout and the one generated by just running defrag
then balancing is so small that it's insignificant. The closer to the
optimal layout you get, the lower the returns for optimizing further
(and this applies to any filesystem in fact). In essence, it's a bit
like the traveling salesman problem, any arbitrary solution probably
isn't optimal, but it's generally close enough to not matter.
As far as scrub fitting into all of this, I'd personally rather have a
daemon that slowly (less than 1% bandwidth usage) scrubs the FS over
time in the background and logs and fixes errors it encounters (similar
to how filesystem scrubbing works in many clustered filesystems) instead
of always having to manually invoke it and jump through hoops to keep
the bandwidth usage reasonable.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 9:18 ` Christoph Groth
@ 2017-01-17 12:32 ` Austin S. Hemmelgarn
0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-01-17 12:32 UTC (permalink / raw)
To: Christoph Groth; +Cc: linux-btrfs
On 2017-01-17 04:18, Christoph Groth wrote:
> Austin S. Hemmelgarn wrote:
>
>> There's not really much in the way of great documentation that I know
>> of. I can however cover the basics here:
>>
>> (...)
>
> Thanks for this explanation. I'm sure it will be also useful to others.
Glad I could help.
>
>> If the chunk to be allocated was a data chunk, you get -ENOSPC
>> (usually, sometimes you might get other odd results) in the userspace
>> application that triggered the allocation.
>
> It seems that the available space reported by the system df command
> corresponds roughly to the size of the block device minus all the "used"
> space as reported by "btrfs fi df".
That's correct.
>
> If I understand what you wrote correctly this means that when writing a
> huge file it may happen that the system df will report enough free
> space, but btrfs will raise ENOSPC. However, it should be possible to
> keep writing small files even at this point (assuming that there's
> enough space for the metadata). Or will btrfs split the huge file into
> small pieces to fit it into the fragmented free space in the chunks?
OK, so the first bit to understanding this is that an extent in a file
can't be larger than a chunk. This means that if you have space for 3
1GB data chunks located in 3 different places on the storage device, you
can still write a 3GB file to the filesystem, it will just end up with 3
1GB extents. The issues with ENOSPC come in when almost all of your
space is allocated to chunks and one type gets full. In such a
situation, if you have metadata space, you can keep writing to the FS,
but big writes may fail, and you'll eventually end up in a situation
where you need to delete things to free up space.
>
> Such a situation should be avoided of course. I'm asking out of curiosity.
>
>>>>> * So scrubbing is not enough to check the health of a btrfs file
>>>>> system? It’s also necessary to read all the files?
>>>
>>>> Scrubbing checks data integrity, but not the state of the data. IOW,
>>>> you're checking that the data and metadata match with the checksums,
>>>> but not necessarily that the filesystem itself is valid.
>>>
>>> I see, but what should one then do to detect problems such as mine as
>>> soon as possible? Periodically calculate hashes for all files? I’ve
>>> never seen a recommendation to do that for btrfs.
>
>> Scrub will verify that the data is the same as when the kernel
>> calculated the block checksum. That's really the best that can be
>> done. In your case, it couldn't correct the errors because both copies
>> of the corrupted blocks were bad (this points at an issue with either
>> RAM or the storage controller BTW, not the disks themselves). Had one
>> of the copies been valid, it would have intelligently detected which
>> one was bad and fixed things.
>
> I think I understand the problem with the three corrupted blocks that I
> was able to fix by replacing the files.
>
> But there is also the strange "Stale file handle" error with some other
> files that was not found by scrubbing, and also does not seem to appear
> in the output of "btrfs dev stats", which is BTW
>
> [/dev/sda2].write_io_errs 0
> [/dev/sda2].read_io_errs 0
> [/dev/sda2].flush_io_errs 0
> [/dev/sda2].corruption_errs 3
> [/dev/sda2].generation_errs 0
> [/dev/sdb2].write_io_errs 0
> [/dev/sdb2].read_io_errs 0
> [/dev/sdb2].flush_io_errs 0
> [/dev/sdb2].corruption_errs 3
> [/dev/sdb2].generation_errs 0
>
> (The 2 times 3 corruption errors seem to be the uncorrectable errors
> that I could fix by replacing the files.)
Yep, those correspond directly to the uncorrectable errors you mentioned
in your original post.
>
> To get the "stale file handle" error I need to try to read the affected
> file. That's why I was wondering whether reading all the files
> periodically is indeed a useful maintenance procedure with btrfs.
In the cases I've seen, no it isn't all that useful. As far as the
whole ESTALE thing, that's almost certainly a bug and you either
shouldn't be getting an error there, or you shouldn't be getting that
error code there.
>
> "btrfs check" does find the problem, but it can be only run on an
> unmounted file system.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 11:32 ` Goldwyn Rodrigues
@ 2017-01-17 20:25 ` Christoph Groth
2017-01-17 21:52 ` Chris Murphy
2017-01-17 22:57 ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
0 siblings, 2 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 20:25 UTC (permalink / raw)
To: Goldwyn Rodrigues; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 902 bytes --]
Goldwyn Rodrigues wrote:
> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>> Goldwyn Rodrigues wrote:
>>
>>> Would you be able to upload a btrfs-image for me to
>>> examine. This is a
>>> core ctree error where most probably item size is incorrectly
>>> registered.
>>
>> Sure, I can do that. I'd like to use the -s option, will this
>> be fine?
>
> Yes, I think that should be fine.
Unfortunately, giving -s causes btrfs-image to segfault. I tried
both btrfs-progs 4.7.3 and 4.4. I also tried different
compression levels.
Without -s it works, but since this file system contains the
complete digital life of our family, I would rather not share even
the file names.
Any ideas on what could be done? If you need help to debug the
problem with btrfs-image, please tell me what I should do. I can
keep the broken file system around until an image can be created
at some later time.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 20:25 ` Christoph Groth
@ 2017-01-17 21:52 ` Chris Murphy
2017-01-17 23:10 ` Christoph Groth
2017-01-17 22:57 ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
1 sibling, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2017-01-17 21:52 UTC (permalink / raw)
To: Christoph Groth; +Cc: Goldwyn Rodrigues, Btrfs BTRFS
On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
<christoph@grothesque.org> wrote:
> Goldwyn Rodrigues wrote:
>>
>> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>>>
>>> Goldwyn Rodrigues wrote:
>>>
>>>> Would you be able to upload a btrfs-image for me to examine. This is a
>>>> core ctree error where most probably item size is incorrectly
>>>> registered.
>>>
>>>
>>> Sure, I can do that. I'd like to use the -s option, will this be fine?
>>
>>
>> Yes, I think that should be fine.
>
>
> Unfortunately, giving -s causes btrfs-image to segfault. I tried both
> btrfs-progs 4.7.3 and 4.4. I also tried different compression levels.
>
> Without -s it works, but since this file system contains the complete
> digital life of our family, I would rather not share even the file names.
>
> Any ideas on what could be done? If you need help to debug the problem with
> btrfs-image, please tell me what I should do. I can keep the broken file
> system around until an image can be created at some later time.
Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
although I don't know off hand if this particular bug is fixed. I did
recently do a btrfs-image with btrfs-progs v4.9 with -s and did not
get a segfault.
--
Chris Murphy
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 20:25 ` Christoph Groth
2017-01-17 21:52 ` Chris Murphy
@ 2017-01-17 22:57 ` Goldwyn Rodrigues
2017-01-17 23:22 ` Christoph Groth
1 sibling, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-17 22:57 UTC (permalink / raw)
To: Christoph Groth; +Cc: linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 1119 bytes --]
On 01/17/2017 02:25 PM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>> On 01/17/2017 02:44 AM, Christoph Groth wrote:
>>> Goldwyn Rodrigues wrote:
>>>
>>>> Would you be able to upload a btrfs-image for me to examine. This is a
>>>> core ctree error where most probably item size is incorrectly
>>>> registered.
>>>
>>> Sure, I can do that. I'd like to use the -s option, will this be fine?
>>
>> Yes, I think that should be fine.
>
> Unfortunately, giving -s causes btrfs-image to segfault. I tried both
> btrfs-progs 4.7.3 and 4.4. I also tried different compression levels.
>
> Without -s it works, but since this file system contains the complete
> digital life of our family, I would rather not share even the file names.
>
> Any ideas on what could be done? If you need help to debug the problem
> with btrfs-image, please tell me what I should do. I can keep the
> broken file system around until an image can be created at some later time.
As Chris mentioned, try a later version. If you are familiar with git,
you could even try the devel version.
--
Goldwyn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 21:52 ` Chris Murphy
@ 2017-01-17 23:10 ` Christoph Groth
2017-01-18 7:13 ` gdb log of crashed "btrfs-image -s" Christoph Groth
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 23:10 UTC (permalink / raw)
To: Chris Murphy; +Cc: Goldwyn Rodrigues, Btrfs BTRFS
[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]
Chris Murphy wrote:
> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
> <christoph@grothesque.org> wrote:
>> Any ideas on what could be done? If you need help to debug the
>> problem with
>> btrfs-image, please tell me what I should do. I can keep the
>> broken file
>> system around until an image can be created at some later time.
>
> Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
> although I don't know off hand if this particular bug is
> fixed. I did
> recently do a btrfs-image with btrfs-progs v4.9 with -s and did
> not
> get a segfault.
I compiled btrfs-image.static from btrfs-tools 4.9 (from git) and
started it from Debian testing's initramfs. The exact command
that I use is:
/mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim
It runs for a couple of seconds (enough to write 20263936 bytes of
output) and then quits with
*** Error in `/mnt/btrfs-image.static`: double free or corruption
(!prev): 0x00000000009f0940 ***
====== Backtrace: ======
[0x45fb97]
[0x465442]
[0x465c1e]
[0x402694]
[0x402dcb]
[0x4031fe]
[0x4050ff]
[0x405783]
[0x44cb73]
[0x44cdfe]
[0x400b2a]
(I had to type the above off the other screen, but I double
checked that there are no errors.)
The executable that I used can be downloaded from
http://groth.fr/btrfs-image.static
Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.
I hope that this can help someone to see what's going on.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Unocorrectable errors with RAID1
2017-01-17 22:57 ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
@ 2017-01-17 23:22 ` Christoph Groth
0 siblings, 0 replies; 20+ messages in thread
From: Christoph Groth @ 2017-01-17 23:22 UTC (permalink / raw)
To: Goldwyn Rodrigues; +Cc: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 302 bytes --]
Goldwyn Rodrigues wrote:
> As Chris mentioned, try a later version. If you are familiar
> with git, you could even try the devel version.
Looking at the commits in current devel (2f4a73f9a612876116) since
v4.9, there doesn't seem to be anything relevant, but I can retry,
if you think it's worth.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* gdb log of crashed "btrfs-image -s"
2017-01-17 23:10 ` Christoph Groth
@ 2017-01-18 7:13 ` Christoph Groth
2017-01-18 11:49 ` Goldwyn Rodrigues
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-18 7:13 UTC (permalink / raw)
To: Chris Murphy; +Cc: Goldwyn Rodrigues, Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 1601 bytes --]
Christoph Groth wrote:
> Chris Murphy wrote:
>> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
>> <christoph@grothesque.org> wrote:
>>> Any ideas on what could be done? If you need help to debug
>>> the problem with
>>> btrfs-image, please tell me what I should do. I can keep the
>>> broken file
>>> system around until an image can be created at some later
>>> time.
>>
>> Try 4.9, or even 4.8.5, tons of bugs have been fixed since
>> 4.7.3
>> although I don't know off hand if this particular bug is
>> fixed. I did
>> recently do a btrfs-image with btrfs-progs v4.9 with -s and did
>> not
>> get a segfault.
>
> I compiled btrfs-image.static from btrfs-tools 4.9 (from git)
> and started it from Debian testing's initramfs. The exact
> command that I use is:
>
> /mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim
>
> It runs for a couple of seconds (enough to write 20263936 bytes
> of output) and then quits with
>
> *** Error in `/mnt/btrfs-image.static`: double free or
> corruption (!prev): 0x00000000009f0940 ***
> ====== Backtrace: ======
> [0x45fb97]
> [0x465442]
> [0x465c1e]
> [0x402694]
> [0x402dcb]
> [0x4031fe]
> [0x4050ff]
> [0x405783]
> [0x44cb73]
> [0x44cdfe]
> [0x400b2a]
>
> (I had to type the above off the other screen, but I double
> checked that there are no errors.)
>
> The executable that I used can be downloaded from
> http://groth.fr/btrfs-image.static
> Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.
>
> I hope that this can help someone to see what's going on.
I ran the same executable under gdb from a live system. The log
is attached.
[-- Attachment #1.2: btrfs-image.log --]
[-- Type: application/octet-stream, Size: 4353 bytes --]
root@xubuntu:/media/xubuntu/wd1t# gdb btrfs-image.static
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from btrfs-image.static...done.
(gdb) run -c3 -s /dev/sda2 /media/xubuntu/wd1t/mim-s.bim
Starting program: /media/xubuntu/wd1t/btrfs-image.static -c3 -s /dev/sda2 /media/xubuntu/wd1t/mim-s.bim
[New LWP 2334]
[New LWP 2335]
[New LWP 2336]
[New LWP 2337]
*** Error in `/media/xubuntu/wd1t/btrfs-image.static': free(): invalid next size (normal): 0x0000000000762570 ***
======= Backtrace: =========
[0x45fb97]
[0x465442]
[0x465c1e]
[0x402dea]
[0x4031fe]
[0x4050ff]
[0x405783]
[0x44cb73]
[0x44cdfe]
[0x400b2a]
======= Memory map: ========
00400000-00521000 r-xp 00000000 08:31 689 /media/xubuntu/wd1t/btrfs-image.static
00721000-00728000 rw-p 00121000 08:31 689 /media/xubuntu/wd1t/btrfs-image.static
00728000-0085b000 rw-p 00000000 00:00 0 [heap]
7fffe0000000-7fffe01aa000 rw-p 00000000 00:00 0
7fffe01aa000-7fffe4000000 ---p 00000000 00:00 0
7fffe4000000-7fffe4025000 rw-p 00000000 00:00 0
7fffe4025000-7fffe8000000 ---p 00000000 00:00 0
7fffe8000000-7fffe8186000 rw-p 00000000 00:00 0
7fffe8186000-7fffec000000 ---p 00000000 00:00 0
7fffec000000-7fffec195000 rw-p 00000000 00:00 0
7fffec195000-7ffff0000000 ---p 00000000 00:00 0
7ffff0000000-7ffff01b0000 rw-p 00000000 00:00 0
7ffff01b0000-7ffff4000000 ---p 00000000 00:00 0
7ffff5ff6000-7ffff5ff7000 rw-p 00000000 00:00 0
7ffff5ff7000-7ffff5ff8000 ---p 00000000 00:00 0
7ffff5ff8000-7ffff67f8000 rw-p 00000000 00:00 0
7ffff67f8000-7ffff67f9000 ---p 00000000 00:00 0
7ffff67f9000-7ffff6ff9000 rw-p 00000000 00:00 0
7ffff6ff9000-7ffff6ffa000 ---p 00000000 00:00 0
7ffff6ffa000-7ffff77fa000 rw-p 00000000 00:00 0
7ffff77fa000-7ffff77fb000 ---p 00000000 00:00 0
7ffff77fb000-7ffff7ffb000 rw-p 00000000 00:00 0
7ffff7ffb000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Thread 1 "btrfs-image.sta" received signal SIGABRT, Aborted.
0x00000000004521de in raise ()
(gdb) bt
#0 0x00000000004521de in raise ()
#1 0x00000000004523aa in abort ()
#2 0x000000000045fb9c in __libc_message ()
#3 0x0000000000465442 in malloc_printerr ()
#4 0x0000000000465c1e in _int_free ()
#5 0x0000000000402dea in sanitize_name (slot=<optimized out>, key=<synthetic pointer>, src=<optimized out>, dst=0x76c690 "4\246", <incomplete sequence \367\261>,
md=<optimized out>) at image/main.c:574
#6 zero_items (src=0x760450, dst=0x76c690 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:602
#7 copy_buffer (src=0x760450, dst=0x76c690 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:645
#8 flush_pending (md=md@entry=0x7fffffffddc0, done=done@entry=0) at image/main.c:983
#9 0x00000000004031fe in add_extent (start=start@entry=192593920, size=size@entry=4096, md=md@entry=0x7fffffffddc0, data=data@entry=0) at image/main.c:1025
#10 0x00000000004050ff in copy_from_extent_tree (path=0x7fffffffe390, metadump=0x7fffffffddc0) at image/main.c:1280
#11 create_metadump (input=input@entry=0x7fffffffe851 "/dev/sda2", out=out@entry=0x731be0, num_threads=num_threads@entry=4, compress_level=compress_level@entry=3,
sanitize=sanitize@entry=1, walk_trees=walk_trees@entry=0) at image/main.c:1370
#12 0x0000000000405783 in main (argc=<optimized out>, argv=0x7fffffffe5d8) at image/main.c:2855
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: gdb log of crashed "btrfs-image -s"
2017-01-18 7:13 ` gdb log of crashed "btrfs-image -s" Christoph Groth
@ 2017-01-18 11:49 ` Goldwyn Rodrigues
2017-01-18 20:11 ` Christoph Groth
0 siblings, 1 reply; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-18 11:49 UTC (permalink / raw)
To: Christoph Groth, Chris Murphy; +Cc: Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 2335 bytes --]
On 01/18/2017 01:13 AM, Christoph Groth wrote:
> Christoph Groth wrote:
>> Chris Murphy wrote:
>>> On Tue, Jan 17, 2017 at 1:25 PM, Christoph Groth
>>> <christoph@grothesque.org> wrote:
>>>> Any ideas on what could be done? If you need help to debug the
>>>> problem with
>>>> btrfs-image, please tell me what I should do. I can keep the broken
>>>> file
>>>> system around until an image can be created at some later time.
>>>
>>> Try 4.9, or even 4.8.5, tons of bugs have been fixed since 4.7.3
>>> although I don't know off hand if this particular bug is fixed. I did
>>> recently do a btrfs-image with btrfs-progs v4.9 with -s and did not
>>> get a segfault.
>>
>> I compiled btrfs-image.static from btrfs-tools 4.9 (from git) and
>> started it from Debian testing's initramfs. The exact command that I
>> use is:
>>
>> /mnt/btrfs-image.static -c3 -s /dev/sda2 /mnt/mim-s.bim
>>
>> It runs for a couple of seconds (enough to write 20263936 bytes of
>> output) and then quits with
>>
>> *** Error in `/mnt/btrfs-image.static`: double free or corruption
>> (!prev): 0x00000000009f0940 ***
>> ====== Backtrace: ======
>> [0x45fb97]
>> [0x465442]
>> [0x465c1e]
>> [0x402694]
>> [0x402dcb]
>> [0x4031fe]
>> [0x4050ff]
>> [0x405783]
>> [0x44cb73]
>> [0x44cdfe]
>> [0x400b2a]
>>
>> (I had to type the above off the other screen, but I double checked
>> that there are no errors.)
>>
>> The executable that I used can be downloaded from
>> http://groth.fr/btrfs-image.static
>> Its md5sum is 48abbc82ac6d3c0cb88cba1e5edb85fd.
>>
>> I hope that this can help someone to see what's going on.
>
> I ran the same executable under gdb from a live system. The log is
> attached.
>
Thanks Christoph for the backtrace. I am unable to reproduce it, but
looking at your backtrace, I found a bug. Would you be able to give it a
try and check if it fixes the problem?
diff --git a/image/main.c b/image/main.c
index 58dcecb..0158844 100644
--- a/image/main.c
+++ b/image/main.c
@@ -550,7 +550,7 @@ static void sanitize_name(struct metadump_struct
*md, u8 *dst,
return;
}
- memcpy(eb->data, dst, eb->len);
+ memcpy(eb->data, src->data, src->len);
switch (key->type) {
case BTRFS_DIR_ITEM_KEY:
--
Goldwyn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: gdb log of crashed "btrfs-image -s"
2017-01-18 11:49 ` Goldwyn Rodrigues
@ 2017-01-18 20:11 ` Christoph Groth
2017-01-23 12:09 ` Goldwyn Rodrigues
0 siblings, 1 reply; 20+ messages in thread
From: Christoph Groth @ 2017-01-18 20:11 UTC (permalink / raw)
To: Goldwyn Rodrigues; +Cc: Chris Murphy, Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 452 bytes --]
Goldwyn Rodrigues wrote:
> Thanks Christoph for the backtrace. I am unable to reproduce it,
> but looking at your backtrace, I found a bug. Would you be able
> to give it a try and check if it fixes the problem?
I applied your patch to v4.9, and compiled the static binaries.
Unfortunately, it still segfaults. (Perhaps your fix is correct,
and there's a second problem?) I attach a new backtrace. Do let
me know if I can help in another way.
[-- Attachment #1.2: btrfs-image2.log --]
[-- Type: application/octet-stream, Size: 4392 bytes --]
root@xubuntu:~# gdb /mnt/btrfs-image.static
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mnt/btrfs-image.static...done.
(gdb) run -s -c3 /dev/sda2 /mnt/mim.bim
Starting program: /mnt/btrfs-image.static -s -c3 /dev/sda2 /mnt/mim.bim
[New LWP 2334]
[New LWP 2335]
[New LWP 2336]
[New LWP 2337]
*** Error in `/mnt/btrfs-image.static': double free or corruption (out): 0x0000000000772f70 ***
======= Backtrace: =========
[0x45fba7]
[0x465452]
[0x465c2e]
[0x402694]
[0x402dce]
[0x403201]
[0x405102]
[0x405786]
[0x44cb83]
[0x44ce0e]
[0x400b2a]
======= Memory map: ========
00400000-00521000 r-xp 00000000 08:21 689 /mnt/btrfs-image.static
00721000-00728000 rw-p 00121000 08:21 689 /mnt/btrfs-image.static
00728000-007e4000 rw-p 00000000 00:00 0 [heap]
7fffe0000000-7fffe017e000 rw-p 00000000 00:00 0
7fffe017e000-7fffe4000000 ---p 00000000 00:00 0
7fffe4000000-7fffe4025000 rw-p 00000000 00:00 0
7fffe4025000-7fffe8000000 ---p 00000000 00:00 0
7fffe8000000-7fffe81a6000 rw-p 00000000 00:00 0
7fffe81a6000-7fffec000000 ---p 00000000 00:00 0
7fffec000000-7fffec17c000 rw-p 00000000 00:00 0
7fffec17c000-7ffff0000000 ---p 00000000 00:00 0
7ffff0000000-7ffff019a000 rw-p 00000000 00:00 0
7ffff019a000-7ffff4000000 ---p 00000000 00:00 0
7ffff5ff6000-7ffff5ff7000 rw-p 00000000 00:00 0
7ffff5ff7000-7ffff5ff8000 ---p 00000000 00:00 0
7ffff5ff8000-7ffff67f8000 rw-p 00000000 00:00 0
7ffff67f8000-7ffff67f9000 ---p 00000000 00:00 0
7ffff67f9000-7ffff6ff9000 rw-p 00000000 00:00 0
7ffff6ff9000-7ffff6ffa000 ---p 00000000 00:00 0
7ffff6ffa000-7ffff77fa000 rw-p 00000000 00:00 0
7ffff77fa000-7ffff77fb000 ---p 00000000 00:00 0
7ffff77fb000-7ffff7ffb000 rw-p 00000000 00:00 0
7ffff7ffb000-7ffff7ffd000 r--p 00000000 00:00 0 [vvar]
7ffff7ffd000-7ffff7fff000 r-xp 00000000 00:00 0 [vdso]
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Thread 1 "btrfs-image.sta" received signal SIGABRT, Aborted.
0x00000000004521ee in raise ()
(gdb) bt
#0 0x00000000004521ee in raise ()
#1 0x00000000004523ba in abort ()
#2 0x000000000045fbac in __libc_message ()
#3 0x0000000000465452 in malloc_printerr ()
#4 0x0000000000465c2e in _int_free ()
#5 0x0000000000402694 in sanitize_inode_ref (md=md@entry=0x7fffffffde00, eb=eb@entry=0x771ee0, slot=slot@entry=16, ext=ext@entry=0) at image/main.c:522
#6 0x0000000000402dce in sanitize_name (slot=16, key=<synthetic pointer>, src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=0x7fffffffde00)
at image/main.c:561
#7 zero_items (src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:602
#8 copy_buffer (src=0x764cf0, dst=0x76bed0 "4\246", <incomplete sequence \367\261>, md=<optimized out>) at image/main.c:645
#9 flush_pending (md=md@entry=0x7fffffffde00, done=done@entry=0) at image/main.c:983
#10 0x0000000000403201 in add_extent (start=start@entry=192589824, size=size@entry=4096, md=md@entry=0x7fffffffde00, data=data@entry=0) at image/main.c:1025
#11 0x0000000000405102 in copy_from_extent_tree (path=0x7fffffffe3d0, metadump=0x7fffffffde00) at image/main.c:1280
#12 create_metadump (input=input@entry=0x7fffffffe87f "/dev/sda2", out=out@entry=0x731be0, num_threads=num_threads@entry=4, compress_level=compress_level@entry=3,
sanitize=sanitize@entry=1, walk_trees=walk_trees@entry=0) at image/main.c:1370
#13 0x0000000000405786 in main (argc=<optimized out>, argv=0x7fffffffe618) at image/main.c:2855
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: gdb log of crashed "btrfs-image -s"
2017-01-18 20:11 ` Christoph Groth
@ 2017-01-23 12:09 ` Goldwyn Rodrigues
0 siblings, 0 replies; 20+ messages in thread
From: Goldwyn Rodrigues @ 2017-01-23 12:09 UTC (permalink / raw)
To: Christoph Groth; +Cc: Chris Murphy, Btrfs BTRFS
[-- Attachment #1.1: Type: text/plain, Size: 881 bytes --]
On 01/18/2017 02:11 PM, Christoph Groth wrote:
> Goldwyn Rodrigues wrote:
>> Thanks Christoph for the backtrace. I am unable to reproduce it, but
>> looking at your backtrace, I found a bug. Would you be able to give it
>> a try and check if it fixes the problem?
>
> I applied your patch to v4.9, and compiled the static binaries.
> Unfortunately, it still segfaults. (Perhaps your fix is correct, and
> there's a second problem?) I attach a new backtrace. Do let me know if
> I can help in another way.
I looked hard, and could not find the reason of a failure here. The
bakctrace of the new one is a little different than previous one, but I
am not sure why it crashes. Until I have a reproduction scneario, I may
not be able to fix this. How about a core? However, a core will have
values which you are trying to mask with sanitize.
--
Goldwyn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2017-01-23 12:11 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 11:10 Unocorrectable errors with RAID1 Christoph Groth
2017-01-16 13:24 ` Austin S. Hemmelgarn
2017-01-16 15:42 ` Christoph Groth
2017-01-16 16:29 ` Austin S. Hemmelgarn
2017-01-17 4:50 ` Janos Toth F.
2017-01-17 12:25 ` Austin S. Hemmelgarn
2017-01-17 9:18 ` Christoph Groth
2017-01-17 12:32 ` Austin S. Hemmelgarn
2017-01-16 22:45 ` Goldwyn Rodrigues
2017-01-17 8:44 ` Christoph Groth
2017-01-17 11:32 ` Goldwyn Rodrigues
2017-01-17 20:25 ` Christoph Groth
2017-01-17 21:52 ` Chris Murphy
2017-01-17 23:10 ` Christoph Groth
2017-01-18 7:13 ` gdb log of crashed "btrfs-image -s" Christoph Groth
2017-01-18 11:49 ` Goldwyn Rodrigues
2017-01-18 20:11 ` Christoph Groth
2017-01-23 12:09 ` Goldwyn Rodrigues
2017-01-17 22:57 ` Unocorrectable errors with RAID1 Goldwyn Rodrigues
2017-01-17 23:22 ` Christoph Groth
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.