On 2020/7/1 下午6:16, Illia Bobyr wrote: > On 6/30/2020 6:36 PM, Qu Wenruo wrote: >> On 2020/7/1 上午3:41, Illia Bobyr wrote: >>> Hi, >>> >>> I have a btrfs with bcache setup that failed during a boot yesterday. >>> There is one SSD with bcache that is used as a cache for 3 btrfs HDDs. >>> >>> Reading through a number of discussions, I've decided to ask for advice here. >>> Should I be running "btrfs check --recover"? >>> >>> The last message in the dmesg log is this one: >>> >>> Btrfs loaded, crc32c=crc32c-intel >>> BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned >>> by btrfs (341) >>> BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned >>> by btrfs (341) >>> BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned >>> by btrfs (341) >>> BTRFS info (device bcache0): disk space caching is enabled >>> BTRFS info (device bcache0): has skinny extents >>> BTRFS error (device bcache0): parent transid verify failed on >>> 16984159518720 wanted 138414 found 138207 >>> BTRFS error (device bcache0): parent transid verify failed on >>> 16984159518720 wanted 138414 found 138207 >>> BTRFS error (device bcache0): open_ctree failed >> Looks like some tree blocks not written back correctly. >> >> Considering we don't have known write back related bugs with 5.6, I >> guess bcache may be involved again? > > A bit more details: the system started to misbehave. > Interactive session was saying that the main file system became read/only. Any dmesg of that RO event? That would be the most valuable info to help us to locate the bug and fix it. I guess there is something wrong before that, and by somehow it corrupted the extent tree, breaking the life keeping COW of metadata and screwed up everything. > And then the SSH disconnected and did not reconnect any more. > It did not seem to reboot correctly after I've pressed the reboot > button, so I did a hard rebooted. > And now it could not mount the root partition any more. >>> Trying to mount it in the recovery mode does not seem to work: >>> >>> [...] >>> >>> I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1 >>> from http://defender.exton.net/. >>> After booting tried mounting the bcache using the same command as above. >>> The only message in the console was "Killed". >>> /dev/kmsg on the other hand lists messages very similar to the ones I've >>> seen in the initramfs environment: https://pastebin.com/Vhy072Mx >> It looks like there is a chance to recover, as there is a rootbackup >> with newer generation. >> >> While tree-checker is rejecting the newer generation one. >> >> The kernel panic is caused by some corner error handling with root >> backups cleanups. >> We need to fix it anyway. >> >> In this case, I guess "btrfs ins dump-super -fFa" output would help to >> show if it's possible to recover. > > Here is the output: https://pastebin.com/raw/DtJd813y OK, the backup root is fine. So this means, metadata COW is corrupted, which caused the transid mismatch. > >> Anyway, something looks strange. >> >> The backup roots have a newer generation while the super block is still >> old doesn't look correct at all. > > Just in case, here is the output of "btrfs check", as suggested by "A L > ".  It does not seem to contain any new information. > > parent transid verify failed on 16984014372864 wanted 138350 found 131117 > parent transid verify failed on 16984014405632 wanted 138350 found 131127 > parent transid verify failed on 16984013406208 wanted 138350 found 131112 > parent transid verify failed on 16984075436032 wanted 138384 found 131136 > parent transid verify failed on 16984075436032 wanted 138384 found 131136 > parent transid verify failed on 16984075436032 wanted 138384 found 131136 > Ignoring transid failure > ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent > level=2 child level=0 > ERROR: failed to read block groups: Input/output error Extent tree is completely screwed up, no wonder the transid error happens. I don't believe it's reasonable possible to restore the fs to RW status. The only remaining method left is btrfs-restore then. > ERROR: cannot open file system > Opening filesystem to check... > > As I was running the commands I have accidentally run the following command: > >     btrfs inspect-internal dump-super -fFa >/dev/bcache0 2>&1 > > Effectively overwriting the first 10kb of the partition :( That's not a problem at all. Btrfs reserves the first 0~1M space, so as long as you don't screw up the super block at [64K, 68K) you're completely fine. Thanks, Qu > > Seems like the superblock starts at 64kb.  So, I hope, this would not > cause any more damage. > > P.S. Thanks a lot for your reply Qu Wenruo! > > Thank you, > Illia >