On 2018/12/24 上午8:58, Chris Murphy wrote: > On Sat, Dec 22, 2018 at 10:22 AM Peter Chant wrote: > >> btrfs rescue super -v /dev/sdb2 > ... >> All supers are valid, no need to recover >> >> >> btrfs insp dump-s -f > ... >> generation 7937947 > ... >> backup 0: >> backup_tree_root: 1113909100544 gen: 7937935 level: 1 > ... >> backup 1: >> backup_tree_root: 1113907347456 gen: 7937936 level: 1 > ... >> backup 2: >> backup_tree_root: 1113911951360 gen: 7937937 level: 1 > ... >> backup 3: >> backup_tree_root: 1113907494912 gen: 7937934 level: 1 > ... > > > The kernel wrote out three valid checksummed supers, with what seems > to be a rather significant sanity violation. The super generation and > tree root address do not match any of the backup tree roots. The > *current* tree root is supposed to be in one of the backups as well. Oh, I missed this. Indeed, super generation should match one one backup root. There are cases where we won't record backup roots, but that's only for fsync() calls, and for such case it should only update log_root, not root generation. (But I'm not that familiar with fsync() nor log tree codes, I could be totally wrong) But at least, its log_root is 0, so it's less possible to be caused by fsync() routine. > > Qu, any idea how this is even theoretically possible? Bit flip right > before the super is computed and checksummed? 7937947 = 0x791f9b 7937937 = 0x791f91 The last low half byte, it's 0xb vs 0x1 0xb = 1011 0x1 = 0001 2 bits flipped. I'm not so sure if it's possible. > Seems like some kind of > corruption before checksum is computed. > > >> I'm getting suspicious of the drive as when I was trying the various >> btrfs rescue * tools I saw a 'bad block', or similar, error displayed. >> I also have a separate basic install on ext4 on the same disk. Though >> e2fsck shows no errors and mounts fine I cannot log into that install. >> Maybe a coincidence, but too many bad things thrown up make me >> suspicious. Whatever is happening this seems to be really fighting me. > > I'm not sure how even a bad device accounts for the super generation > and backup mismatches. That's damn strange. Yes, indeed. I'd recommend to use "btrfs check -r 1113911951360" to verify if it's only superblock generation corrupted. Thanks, Qu > > If you get bored with the back and forth and just want to give up, > that's fine. I suggest that if you have the time and space, to take a > btrfs-image in case Qu or some other developer wants to look at this > file system at some point. The btrfs-image is a read only process, can > be set to scrub filenames, and only contains metadata. Size of the > resulting file is around 1/2 of the size of metadata, when doing > 'btrfs filesystem usage' or 'btrfs filesystem df'. So you'll need that > much free space to direct the command to. > > btrfs-image -ss -c9 -t4 pathtofile > > It might fail, if so you can try adding -w and see if that helps. > > There is no log listed in the super so zero-log isn't indicated, and > also tells me there were no fsync's still flushing at the time of the > crash. The loss should be at most a minute of data, not an > inconsistent file system that can't be mounted anymore. Pretty weird. > > What were your mount options? Defaults? Anything custom like discard, > commit=, notreelog? Any non-default mount options themselves would not > be the cause of the problem, but might suggest partial ideas for what > might have happened. > > >