On 2018/11/26 下午3:19, Alexander Fieroch wrote: > Hi, > > My data partition with btrfs RAID 0 (/dev/sdc0 and /dev/sdd0) shows > errors in syslog: > > BTRFS error (device sdc): cleaner transaction attach returned -30 > BTRFS info (device sdc): disk space caching is enabled > BTRFS info (device sdc): has skinny extents > BTRFS info (device sdc): bdev /dev/sdc errs: wr 0, rd 0, flush 0, > corrupt 3, gen 1 > BTRFS info (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0, > corrupt 6, gen 2 Generation mismatch means something more serious. > > > BTRFS error (device sdc): scrub: tree block 858803990528 spanning > stripes, ignored. logical=3D858803929088 While the spanning stripes only means scrub code can't really check it since it crosses stripe boundary. It's normally nothing to worry, and it's normally caused by old kernel. Newer kernel will avoid such problem from happening, but for existing one, it will just skip it. > BTRFS error (device sdc): scrub: tree block 858803990528 spanning > stripes, ignored. logical=3D858803994624 > BTRFS warning (device sdc): checksum error at logical 858803961856 on > dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7 > BTRFS warning (device sdc): checksum error at logical 858803961856 on > dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7 This means some csum tree blocks get corrupted. > BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0, > corrupt 4, gen 1 > BTRFS error (device sdc): scrub: tree block 858820505600 spanning > stripes, ignored. logical=3D858820444160 > BTRFS error (device sdc): scrub: tree block 858820505600 spanning > stripes, ignored. logical=3D858820509696 > BTRFS error (device sdc): unable to fixup (regular) error at logical > 858803961856 on dev /dev/sdd > BTRFS error (device sdc): scrub: tree block 858821292032 spanning > stripes, ignored. logical=3D858821230592 > BTRFS error (device sdc): scrub: tree block 858821292032 spanning > stripes, ignored. logical=3D858821296128 > BTRFS warning (device sdc): checksum error at logical 858821263360 on > dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7 > BTRFS warning (device sdc): checksum error at logical 858821263360 on > dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7 > BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0, > corrupt 5, gen 1 > BTRFS error (device sdc): unable to fixup (regular) error at logical > 858821263360 on dev /dev/sdd > BTRFS warning (device sdc): checksum/header error at logical > 858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf > (level 0) in tree 7 > BTRFS warning (device sdc): checksum/header error at logical > 858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf > (level 0) in tree 7 > BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0, > corrupt 5, gen 2 > BTRFS warning (device sdc): checksum error at logical 858820489216 on > dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2 > BTRFS warning (device sdc): checksum error at logical 858820489216 on > dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2 This is some error in extent tree, and I'd say it's a serious problem which may affect later write operation. > BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0, > corrupt 6, gen 2 > BTRFS error (device sdc): unable to fixup (regular) error at logical > 858820476928 on dev /dev/sdd > BTRFS error (device sdc): unable to fixup (regular) error at logical > 858820489216 on dev /dev/sdd0 > > > $ btrfs filesystem show /mnt/data/ > Label: none  uuid: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6 >           Total devices 2 FS bytes used 10.17TiB >           devid    1 size 5.46TiB used 5.43TiB path /dev/sdc >           devid    2 size 5.46TiB used 5.43TiB path /dev/sdd > > $ btrfs --version > btrfs-progs v4.15.1 > > $ uname -a > Linux gpur1 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC > 2018 x86_64 x86_64 x86_64 GNU/Linux > > > $ btrfs dev stats /dev/sdc > [/dev/sdc].write_io_errs    0 > [/dev/sdc].read_io_errs     0 > [/dev/sdc].flush_io_errs    0 > [/dev/sdc].corruption_errs  3 > [/dev/sdc].generation_errs  1 > > $ btrfs dev stats /dev/sdd > [/dev/sdd].write_io_errs    0 > [/dev/sdd].read_io_errs     0 > [/dev/sdd].flush_io_errs    0 > [/dev/sdd].corruption_errs  3 > [/dev/sdd].generation_errs  1 > > $ btrfs fi show > Label: 'system'  uuid: ae121e8e-d483-45f4-8568-2817f5c5d497 >         Total devices 1 FS bytes used 194.05GiB >         devid    1 size 228.66GiB used 199.03GiB path /dev/sda3 > Label: none  uuid: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6 >         Total devices 2 FS bytes used 10.17TiB >         devid    1 size 5.46TiB used 5.43TiB path /dev/sdc >         devid    2 size 5.46TiB used 5.43TiB path /dev/sdd > > $ btrfs fi df /mnt/data/ > Data, RAID0: total=10.84TiB, used=10.15TiB > System, RAID1: total=8.00MiB, used=896.00KiB > Metadata, RAID1: total=15.00GiB, used=13.28GiB > GlobalReserve, single: total=512.00MiB, used=0.00 > > $ btrfs scrub start -B /dev/sdc > ERROR: scrubbing /dev/sdc failed for device id 1: ret=-1, errno=5 > (Input/output error) > scrub canceled for 5e6506b0-bf15-4b2e-b5f4-322c44b89db6 >          scrub started at Thu Nov 22 07:43:45 2018 and was aborted after > 02:31:49 >          total bytes scrubbed: 1.58TiB with 10 errors >          error details: verify=1 csum=3 >          corrected errors: 0, uncorrectable errors: 10, unverified > errors: 0 > > > > I've tried > $ btrfs check /dev/sdc > Checking filesystem on /dev/sdc > UUID: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6 > btrfs check --repairchecking extents Don't use --repair unless you know what you're doing. >   ERROR: add_tree_backref failed (extent items shared block): File exists > ERROR: add_tree_backref failed (extent items tree block): File exists > ERROR: add_tree_backref failed (extent items tree block): File exists > /dev/sdc > ERROR: add_tree_backref failed (non-leaf block): File exists > > ERROR: add_tree_backref failed (non-leaf block): File exists > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > Csum didn't match > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > Csum didn't match > ref mismatch on [8631607296 77824] extent item 1, found 0 > incorrect local backref count on 8631607296 parent 858803974144 owner 0 > offset 0 found 0 wanted 1 back 0x55f8522a5b10 > backref disk bytenr does not match extent record, bytenr=8631607296, ref > bytenr=0 > backpointer mismatch on [8631607296 77824] > owner ref check failed [8631607296 77824] > ref mismatch on [35613634560 77824] extent item 1, found 0 > incorrect local backref count on 35613634560 parent 858803974144 owner 0 > offset 0 found 0 wanted 1 back 0x55f86d87d810 > backref disk bytenr does not match extent record, bytenr=35613634560, > ref bytenr=0 > backpointer mismatch on [35613634560 77824] > owner ref check failed [35613634560 77824] > ref mismatch on [36010762240 77824] extent item 1, found 0 > [...] > ERROR: errors found in extent allocation tree or chunk allocation > checking free space cache > checking fs roots > extent_io.c:605: free_extent_buffer_internal: BUG_ON `eb->refs < 0` > triggered, value 1 > btrfs(+0x29d87)[0x55f83fd51d87] > btrfs(+0x2a0b4)[0x55f83fd520b4] > btrfs(alloc_extent_buffer+0x77)[0x55f83fd527af] > btrfs(read_tree_block+0x44)[0x55f83fd45802] > btrfs(btrfs_next_leaf+0x6e)[0x55f83fd43ad9] > btrfs(count_csum_range+0x1e1)[0x55f83fd89fac] > btrfs(+0x14b33)[0x55f83fd3cb33] > btrfs(cmd_check+0x19fb)[0x55f83fd7bfe2] > btrfs(main+0x143)[0x55f83fd3ec87] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7ff69e81cb97] > btrfs(_start+0x2a)[0x55f83fd3ecca] > Aborted (core dumped) > > > > $ btrfs check --repair /dev/sdc > enabling repair mode > Checking filesystem on /dev/sdc > UUID: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6 > Fixed 0 roots. > checking extents > ERROR: add_tree_backref failed (extent items shared block): File exists > ERROR: add_tree_backref failed (extent items tree block): File exists > ERROR: add_tree_backref failed (extent items tree block): File exists > ERROR: add_tree_backref failed (non-leaf block): File exists > ERROR: add_tree_backref failed (non-leaf block): File exists > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > checksum verify failed on 858803961856 found B2C0FAD9 wanted F31F8495 > Csum didn't match > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > checksum verify failed on 858821263360 found 15208BF4 wanted D68B2514 > Csum didn't match > well this shouldn't happen, extent record overlaps but is metadata? > [858803974144, 16384] > Aborted (core dumped) > > > > > How can I fix the error? > Is there any possibility to see which files are affected? It's not data/files (at least from what I read), it's all about some essential metadata get corrupted. Unlike other traditional fs, corruption in extent tree could lead to a lot of problem and it's pretty hard to fix due to its complexity. The corruption itself looks like some disk error, not some btrfs error like transid error. I recommend to mount the fs RO and salvage your data. If something even went wrong doing the RO mount, you could go "btrfs restore". Since there is something wrong with csum tree, some EIO would be expected during copy. Thanks, Qu > Please have a look at the full log attached. > > Thanks! > > Best regards, > Alexander >