csum failed on innexistent inode

* csum failed on innexistent inode
@ 2016-04-04  7:50 Jérôme Poulin
  2016-04-04  9:42 ` Henk Slager
  2016-04-04 20:17 ` Kai Krakow
  0 siblings, 2 replies; 9+ messages in thread
From: Jérôme Poulin @ 2016-04-04  7:50 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

I have a BTRFS on disks running in RAID10 meta+data, one of the disk
has been going bad and scrub was showing 18 uncorrectable errors
(which is weird in RAID10). I tried using --repair-sector with hdparm
even if it shouldn't be necessary since BTRFS would overwrite the
sector. Repair sector fixed the sector in SMART but BTRFS was still
showing 18 uncorr. errors.

I finally decided to give up this opportunity to test the error
correction property of BTRFS (this is a home system, backed up) and
installed a brand new disk in the machine. After running btrfs
replace, everything was fine, I decided to run btrfs scrub again and I
still have the same 18 uncorrectable errors.

Later on, since I had a new disk with more space, I decided to run a
balance to free up the new space but the balance has stopped with csum
errors too. Here are the output of multiple programs.

How is it possible to get rid of the referenced csum errors if they do
not exist? Also, the expected checksum looks suspiciously the same for
multiple errors. Could it be bad RAM in that case? Can I convince
BTRFS to update the csum?

# btrfs inspect-internal logical-resolve -v 1809149952 /mnt/btrfs/
ioctl ret=-1, error: No such file or directory
# btrfs inspect-internal inode-resolve -v 296 /mnt/btrfs/
ioctl ret=-1, error: No such file or directory

dmesg after first bad sector:
avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read
error corrected: ino 1 off 655368716288 (dev /dev/dm-42 sector
2939136)
avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read
error corrected: ino 1 off 655368720384 (dev /dev/dm-42 sector
2939144)
avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read
error corrected: ino 1 off 655368724480 (dev /dev/dm-42 sector
2939152)
avr 01 18:29:52 p4.i.ticpu.net kernel: BTRFS info (device dm-43): read
error corrected: ino 1 off 655368728576 (dev /dev/dm-42 sector
2939160)

dmesg after balance:
[1738474.444648] BTRFS warning (device dm-40): csum failed ino 296 off
1809195008 csum 1515428513 expected csum 2566472073
[1738474.444649] BTRFS warning (device dm-40): csum failed ino 296 off
1809084416 csum 4147641019 expected csum 1755301217
[1738474.444702] BTRFS warning (device dm-40): csum failed ino 296 off
1809199104 csum 1927504681 expected csum 2566472073
[1738474.444717] BTRFS warning (device dm-40): csum failed ino 296 off
1809211392 csum 3086571080 expected csum 2566472073
[1738474.444917] BTRFS warning (device dm-40): csum failed ino 296 off
1809084416 csum 4147641019 expected csum 1755301217
[1738474.444962] BTRFS warning (device dm-40): csum failed ino 296 off
1809195008 csum 1515428513 expected csum 2566472073
[1738474.444998] BTRFS warning (device dm-40): csum failed ino 296 off
1809199104 csum 1927504681 expected csum 2566472073
[1738474.445034] BTRFS warning (device dm-40): csum failed ino 296 off
1809211392 csum 3086571080 expected csum 2566472073
[1738474.473286] BTRFS warning (device dm-40): csum failed ino 296 off
1809149952 csum 3254083717 expected csum 2566472073
[1738474.473357] BTRFS warning (device dm-40): csum failed ino 296 off
1809162240 csum 3157020538 expected csum 2566472073

btrfs check:
./btrfs check /dev/mapper/luksbtrfsdata2
Checking filesystem on /dev/mapper/luksbtrfsdata2
UUID: 805f6ad7-1188-448d-aee4-8ddeeb70c8a7
checking extents
bad metadata [1453741768704, 1453741785088) crossing stripe boundary
bad metadata [1454487764992, 1454487781376) crossing stripe boundary
bad metadata [1454828552192, 1454828568576) crossing stripe boundary
bad metadata [1454879735808, 1454879752192) crossing stripe boundary
bad metadata [1455087222784, 1455087239168) crossing stripe boundary
bad metadata [1456269426688, 1456269443072) crossing stripe boundary
bad metadata [1456273227776, 1456273244160) crossing stripe boundary
bad metadata [1456404234240, 1456404250624) crossing stripe boundary
bad metadata [1456418914304, 1456418930688) crossing stripe boundary
checking free space cache
checking fs roots
checking csums
checking root refs
found 689292505473 bytes used err is 0
total csum bytes: 660112536
total tree bytes: 1764098048
total fs tree bytes: 961921024
total extent tree bytes: 79331328
btree space waste bytes: 232774315
file data blocks allocated: 4148513517568
 referenced 972284129280

btrfs scrub:
I don't have the output handy but the dmesg output were pairs of
logical blocks like balance and no errors were corrected.

^ permalink raw reply	[flat|nested] 9+ messages in thread