On 2019/8/22 上午5:38, Peter Chant wrote: > On 8/21/19 8:29 AM, Qu Wenruo wrote: > >>> I'll run the checks shortly. >> >> Well, check will also report that transid mismatch, and possibly a lot >> of extent tree corruption. >> > > > Depends on what is 'a lot', over 400 lines here: > > parent transid verify failed on 11000765267968 wanted 2265511 found 2265437 > parent transid verify failed on 11000777834496 wanted 2265511 found 2265453 > parent transid verify failed on 11001243893760 wanted 2265512 found 2265500 [...] > Ignoring transid failure > ERROR: child eb corrupted: parent bytenr=11016181694464 item=83 parent > level=1 child level=1 > ERROR: failed to repair root items: Input/output error > Opening filesystem to check... > Checking filesystem on /dev/mapper/data_disk_1 > UUID: 159b8826-8380-45be-acb6-0cb992a8dfd7 > >>> [ 99.710315] EDAC amd64: Node 0: DRAM ECC disabled. >>> [ 99.710317] EDAC amd64: ECC disabled in the BIOS or no ECC >>> capability, module will not load. >>> Either enable ECC checking or force module loading by >>> setting 'ecc_enable_override'. >>> (Note that use of the override may cause unknown side >>> effects.) >> Not sure what the ECC part is doing, but it repeats quite some times. >> I'd assume it's unrelated though. >> > > Not sure either. I've not got ECC RAM. Motherboard is capable I think. > > > >> [...] >>> [ 142.507291] BTRFS error (device dm-2): parent transid verify failed >>> on 13395960053760 wanted 2265296 found 2263090 >>> [ 142.544548] BTRFS error (device dm-2): parent transid verify failed >>> on 13395960053760 wanted 2265296 found 2263090 >>> [ 142.544561] BTRFS: error (device dm-2) in >>> btrfs_run_delayed_refs:2907: errno=-5 IO failure >> >> This means, btrfs is trying to read extent tree for CoW, but at that >> time, extent tree is already corrupted, thus it returns -EIO. >> >> And btrfs_run_delayed_refs just returns error. >> t >> Not sure if it's related to device replace, but anyway the corruption >> just happened. >> The device replace may be an interesting clue, as currently our >> dm-log-writes are mostly focused on single device usage. > > Sorry, 'device replace'? I've not done that lately. I _may_ have tried > that years back with this file system. However, iirc it failed as the > new, allegedly same size new disk was possibly slightly smaller. OK, it's my bad read on the following lines: [ 99.237670] BTRFS info (device dm-2): device fsid 159b8826-8380-45be-acb6-0cb992a8dfd7 devid 4 moved old:/dev/dm-1 new:/dev/mapper/data_disk_1 [ 99.241061] BTRFS info (device dm-4): device fsid 6b0245ec-bdd4-4076-b800-2243d466b174 devid 1 moved old:/dev/dm-4 new:/dev/mapper/nvme0_vg-lxc [ 99.242692] BTRFS info (device dm-2): device fsid 159b8826-8380-45be-acb6-0cb992a8dfd7 devid 3 moved old:/dev/dm-2 It just a device path update, not a big deal. > > From the above it looks like it is not a specific hardware failure. Yep, no hardware related error message at all. > > >> >> Then I'd recommend to do regular rescue procedure: >> - Try that skip_bg patchset if possible >> This provides the best salvage method so far, full subvolume >> available, although needs out-of-tree patches. >> https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637 >> > > I can give that a go, but not for a while. > > I seem to be able to read the file system as is, as it goes read only. > But perhaps 'seems' is the operative word. As long as you can mount RO, it shouldn't be mostly OK for data salvage. THanks, Qu > >> - btrfs-restore >> The regular unmounted recover, needs extra space. Latest btrfs-progs >> recommended. > > I've got the latest btrfs progs. if neither of those two work I have > a backup. > > So, basically, make a new file system and recover the data to it. I've > a new disk on the way, so I can create a file system as single and once > I'm happy I've migrated data to it, wipe the old disks and move one or > two to the new array and rebalance. > >> >> Thanks, >> Qu >> > > Thank you, very much appreciated. > > Pete >