On 2019/8/15 下午10:21, Tim Walberg wrote: > 'dump-super -Ffa' from all three devices attached. > > 'btrfs restore' did appear to recover most of the main data, minus > snapshots, which would have greatly increased the required time and > capacity, since I was recovering to XFS. That's why I recommend that experimental patchset, it will make the fs mountable (RO though), with all btrfs snapshots available. > > 'btrfs rescue chunk-recover' ran, but failed to fix anything. > 'btrfs rescue super-recover' says all supers are fine. Those are useless for your case. > > Initial corruption was due to a hard hang, which didn't leave enough > crumbs to determine the source - might have been btrfs, might have > been nvidia, might have been something completely different. Anyway, the corruption is a little strange. First of all, even hard hang/power loss shouldn't cause btrfs to overwrite its tree block, thus even hard hang/power loss happens, btrfs should be corrupted. But that's definitely not the case. (We have quite some such report, but haven't pinned down the cause yet) Secondly, the generation of your fs is strange. The latest geneartion of your tree root is 49750, matches with your corrupted tree block, but your extent tree is definitely older. So it looks like, your super blocks (all nine!) reach disk before some tree blocks reach the disk. Finally, the superblock doesn't record previous transaction correctly. It doesn't has transaction of 49749 in its backup roots. Not 100% sure, but looks somewhat like the problem fixed by this patch: Btrfs: fix race leading to fs corruption after transaction abortion It should get backported to all stable release recently. Thanks, Qu > > > On 08/15/2019 22:07 +0800, Qu Wenruo wrote: >>> >>> >>> On 2019/8/15 ??????9:52, Tim Walberg wrote: >>> > Had to wait for 'btrfs recover' to finish before I proceed farther. >>> > >>> > Kernel is 4.19.45, tools are 4.19.1 >>> > >>> > File system is a 3-disk RAID10 with WD3003FZEX (WD Black 3TB) >>> > >>> > Output from attempting to mount: >>> > >>> > # mount -o ro,usebackuproot /dev/sdc1 /mnt >>> > mount: wrong fs type, bad option, bad superblock on /dev/sdc1, >>> > missing codepage or helper program, or other error >>> > >>> > In some cases useful info is found in syslog - try >>> > dmesg | tail or so. >>> > >>> > Kernel messages from the mount attempt: >>> > >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): trying to use backup root at mount time >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): disk space caching is enabled >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): has skinny extents >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750 >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750 >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): failed to read block groups: -5 >>> >>> Extent tree corruption. >>> >>> So if that's the only corruption, you have a very high chance to recover >>> most of your data. >>> >>> Btrfs rescue can work, or you can try the experimental patches which >>> provides rescue=skip_bg mount option to allow you mount the fs RO and >>> receive your data (later is way faster than user space rescue) >>> https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637 >>> >>> Also, for your dump super output, it doesn't provide too much info. >>> >>> You would like to use -Ffa option for more info. >>> Also, you could also try that on all 3 devices, to find out which one >>> has lower generation. >>> >>> Also, please provide the history of the corruption. >>> One generation corruptions is a little rare. Is sudden power loss >>> involved in this case? >>> >>> Thanks, >>> Qu >>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): open_ctree failed >>> > >>> > Output from 'btrfs check -p /dev/sdc1': >>> > >>> > # btrfs check -p /dev/sdc1 >>> > Opening filesystem to check... >>> > parent transid verify failed on 229846466560 wanted 49749 found 49750 >>> > Ignoring transid failure >>> > ERROR: child eb corrupted: parent bytenr=229845336064 item=0 parent level=1 child level=2 >>> > ERROR: cannot open file system >>> > >>> > >>> > >>> > On 08/15/2019 10:35 +0800, Qu Wenruo wrote: >>> >>> >>> >>> >>> >>> On 2019/8/15 ??????2:32, Tim Walberg wrote: >>> >>> > Most of the recommendations I've found online deal with when "wanted" is >>> >>> > greater than "found", which, if I understand correctly means that one or >>> >>> > more transactions were interrupted/lost before fully committed. >>> >>> >>> >>> No matter what the case is, a proper transaction shouldn't have any tree >>> >>> block overwritten. >>> >>> >>> >>> That means, either the FLUSH/FUA of the hardware/lower block layer is >>> >>> screwed up, or the COW of tree block is already screwed up. >>> >>> >>> >>> > >>> >>> > Are the recommendations for recovery the same if the system is reporting a >>> >>> > "wanted" that is less than "found"? >>> >>> > >>> >>> The salvage is no difference than any transid mismatch, no matter if >>> >>> it's larger or smaller. >>> >>> >>> >>> It depends on the tree block. >>> >>> >>> >>> Please provide full dmesg output and btrfs check for further advice. >>> >>> >>> >>> Thanks, >>> >>> Qu >>> >>> >>> > >>> > >>> > >>> > >>> > > > > End of included message > > >