[Resent because the message was too long for the list] On Tue, 2020-08-11 at 13:17 -0600, Chris Murphy wrote: > > > My advice is to mount ro, backup (or two copies for important info), > > > and start with a new Btrfs file system and restore. It's not worth > > > repairing. > > Sigh, I was expecting I'd have to do this. At least no data was lost, > > and the system still functions even though it's read-only. Do you think > > check --repair is not worth trying? Everything of value is already > > backed up, but restoring it would take many hours of work. > > Metadata, RAID10: total=9.00GiB, used=7.57GiB > > Ballpark 8 hours for --repair given metadata size and spinning drives. > It'll add some time adding --init-extent-tree which... is decently > likely to be needed here. So the gotcha is, see if --repair works, and > it fixes some stuff but still needs extent tree repaired anyway. Now > you have to do that and it could be another 8 hours. Or do you go with > the heavy hammer right away to save time and do both at once? But the > heavy hammer is riskier. > > Whether repair or start over, you need to have the backup plus 2x for > important stuff. To do the repair you need to be prepared for the > possibility tihngs get worse. I'll argue strongly that it's a bug if > things get worse (i.e. now you can't mount ro at all) but as a risk > assessment, it has to be considered. So, I've finally managed to get someone to add a disk to this system and ran a btrfs check --repair. It failed almost immediately with: Starting repair. Opening filesystem to check... Checking filesystem on /dev/disk/by-label/Susanita UUID: 4d3acf20-d408-49ab-b0a6-182396a9f27c [1/7] checking root items checksum verify failed on 10919566688256 found 0000006E wanted 00000066 checksum verify failed on 10919566688256 found 0000006E wanted 00000066 bad tree block 10919566688256, bytenr mismatch, want=10919566688256, have=17196831625821864417 ERROR: failed to repair root items: Input/output error so I ran btrfs check --init-extent-tree, and it's still running after 24 hours. It seems to have processed 2 GiB of... something: [2/7] checking extents (0:04:22 elapsed, 434185 items checked) ref mismatch on [331916251136 4096] extent item 0, found 1 data backref 331916251136 parent 10915911958528 owner 0 offset 0 num_refs 0 not found in extent tree incorrect local backref count on 331916251136 parent 10915911958528 owner 0 offset 0 found 1 wanted 0 back 0x557cdf7560f0 backpointer mismatch on [331916251136 4096] adding new data backref on 331916251136 parent 10915911958528 owner 0 offset 0 found 1 Repaired extent references for 331916251136 [24 hours later] [2/7] checking extents (23:47:26 elapsed, 434185 items checked) ref mismatch on [334605303808 188416] extent item 0, found 2 data backref 334605303808 parent 10915986505728 owner 0 offset 0 num_refs 0 not found in extent tree incorrect local backref count on 334605303808 parent 10915986505728 owner 0 offset 0 found 1 wanted 0 back 0x557ce0ac16c0 data backref 334605303808 root 10455 owner 219090 offset 921600 num_refs 0 not found in extent tree incorrect local backref count on 334605303808 root 10455 owner 219090 offset 921600 found 1 wanted 0 back 0x557d14faebc0 backpointer mismatch on [334605303808 188416] adding new data backref on 334605303808 parent 10915986505728 owner 0 offset 0 found 1 adding new data backref on 334605303808 root 10455 owner 219090 offset 921600 found 1 Repaired extent references for 334605303808 But now but I've got no idea if it's doing something useful or if I'd better ^C it and give up with this filesystem. I attached the log of the ongoing repair and of a read-only check I ran immediately before. Cheers.