On 2019/11/22 下午8:37, devel@roosoft.ltd.uk wrote:
> So been discussing this on IRC but looks like more sage advice is needed.

You're not the only one hitting the bug. (Not sure if that makes you
feel a little better)
> 
> 
> So quick history. A BTRFS filesystem was initially created with 2x 6Tb
> and 2x1Tb drives. This was configure in a RAID 1 Meta/system   and RAID
> 5 data configuration
> 
> Recently 2 more 6Tb drives were used to replace the 2 1Tb ones. One was
> Added to the filesystem and then the original 1Tb was deleted, the other
> was just a direct replace.
> 
> 
> The filesystem was then expanded to fill the new space with
> 
> 
>>    btrfs fi resize 4:max /mnt/media/
> 
> 
> The filesystem was very unbalanced and currently looks like this after
> an attempt to rebalance it failed.
> 
> 
> btrfs fi show
> Label: none  uuid: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
>     Total devices 4 FS bytes used 2.80TiB
>     devid    1 size 5.46TiB used 1.20TiB path /dev/sdb
>     devid    2 size 5.46TiB used 1.20TiB path /dev/sdc
>     devid    4 size 5.46TiB used 826.03GiB path /dev/sde
>     devid    5 size 5.46TiB used 826.03GiB path /dev/sdd
> 
> 
> btrfs fi usage  /mnt/media/
> WARNING: RAID56 detected, not implemented
> Overall:
>     Device size:          21.83TiB
>     Device allocated:           8.06GiB
>     Device unallocated:          21.82TiB
>     Device missing:             0.00B
>     Used:               6.26GiB
>     Free (estimated):             0.00B    (min: 8.00EiB)
>     Data ratio:                  0.00
>     Metadata ratio:              2.00
>     Global reserve:         512.00MiB    (used: 0.00B)
> 
> Data,RAID5: Size:2.80TiB, Used:2.80TiB
>    /dev/sdb       1.20TiB
>    /dev/sdc       1.20TiB
>    /dev/sdd     822.00GiB
>    /dev/sde     822.00GiB
> 
> Metadata,RAID1: Size:4.00GiB, Used:3.13GiB
>    /dev/sdd       4.00GiB
>    /dev/sde       4.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:256.00KiB
>    /dev/sdd      32.00MiB
>    /dev/sde      32.00MiB
> 
> Unallocated:
>    /dev/sdb       4.26TiB
>    /dev/sdc       4.26TiB
>    /dev/sdd       4.65TiB
>    /dev/sde       4.65TiB
> 
> 
> A scrub is clean
> 
> btrfs scrub status /mnt/media/
> UUID:             6abaa68a-2670-4d8b-8d2a-fd7321df9242
> Scrub started:    Thu Nov 21 11:30:49 2019
> Status:           finished
> Duration:         17:20:11
> Total to scrub:   2.80TiB
> Rate:             47.10MiB/s
> Error summary:    no errors found
> 
> 
> A readonly fs check is clean
> 
> 
> Opening filesystem to check...
> WARNING: filesystem mounted, continuing because of --force
> Checking filesystem on /dev/sdb
> UUID: 6abaa68a-2670-4d8b-8d2a-fd7321df9242
> [1/7] checking root items                      (0:00:13 elapsed, 373111
> items checked)
> [2/7] checking extents                         (0:04:18 elapsed, 205334
> items checked)
> [3/7] checking free space cache                (0:00:37 elapsed, 1233
> items checked)
> [4/7] checking fs roots                        (0:00:10 elapsed, 10714
> items checked)
> [5/7] checking csums (without verifying data)  (0:00:02 elapsed, 414138
> items checked)
> [6/7] checking root refs                       (0:00:00 elapsed, 90
> items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 3079343529984 bytes used, no error found
> total csum bytes: 3003340776
> total tree bytes: 3362635776
> total fs tree bytes: 177717248
> total extent tree bytes: 35635200
> btree space waste bytes: 153780830
> file data blocks allocated: 3077709344768
>  referenced 3077349277696
> 
> 
> 
> A full balance is now failing
> 
> 
> 
> [Fri Nov 22 11:31:27 2019] BTRFS info (device sdb): relocating block
> group 8808400289792 flags data|raid5
> [Fri Nov 22 11:32:07 2019] BTRFS info (device sdb): found 74 extents
> [Fri Nov 22 11:32:24 2019] BTRFS info (device sdb): found 74 extents
> [Fri Nov 22 11:32:43 2019] BTRFS info (device sdb): relocating block
> group 8805179064320 flags data|raid5
> [Fri Nov 22 11:33:24 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 11:33:44 2019] BTRFS info (device sdb): found 61 extents
> [Fri Nov 22 11:33:52 2019] BTRFS info (device sdb): relocating block
> group 8801957838848 flags data|raid5
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131764224 csum 0xd009e874 expected csum 0x00000000 mirror 2
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 1
> [Fri Nov 22 11:33:54 2019] BTRFS warning (device sdb): csum failed root
> -9 ino 307 off 131760128 csum 0x07436c62 expected csum 0x0001cbde mirror 2
> [Fri Nov 22 11:34:02 2019] BTRFS info (device sdb): balance: ended with
> status: -5

The csum error is from data reloc tree, which is a tree to record the
new (relocated) data.
So the good news is, your old data is not corrupted, and since we hit
EIO before switching tree blocks, the corrupted data is just deleted.

And I have also seen the bug just using single device, with DUP meta and
SINGLE data, so I believe there is something wrong with the data reloc tree.
The problem here is, I can't find a way to reproduce it, so it will take
us a longer time to debug.


Despite that, have you seen any other problem? Especially ENOSPC (needs
enospc_debug mount option).
The only time I hit it, I was debugging ENOSPC bug of relocation.

Thanks,
Qu

> 
> 
> Any idea how to proceed from here? The drives are relatively new and
> there were no issues in the 2x6Tb+ 2x1Tb
> 
> 
> Thanks in advance.
> 
>