On 2019/8/22 上午5:38, Peter Chant wrote:
> On 8/21/19 8:29 AM, Qu Wenruo wrote:
> 
>>> I'll run the checks shortly.
>>
>> Well, check will also report that transid mismatch, and possibly a lot
>> of extent tree corruption.
>>
> 
> 
> Depends on what is 'a lot', over 400 lines here:
> 
> parent transid verify failed on 11000765267968 wanted 2265511 found 2265437
> parent transid verify failed on 11000777834496 wanted 2265511 found 2265453
> parent transid verify failed on 11001243893760 wanted 2265512 found 2265500
[...]
> Ignoring transid failure
> ERROR: child eb corrupted: parent bytenr=11016181694464 item=83 parent
> level=1 child level=1
> ERROR: failed to repair root items: Input/output error
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/data_disk_1
> UUID: 159b8826-8380-45be-acb6-0cb992a8dfd7
> 
>>> [   99.710315] EDAC amd64: Node 0: DRAM ECC disabled.
>>> [   99.710317] EDAC amd64: ECC disabled in the BIOS or no ECC
>>> capability, module will not load.
>>>                 Either enable ECC checking or force module loading by
>>> setting 'ecc_enable_override'.
>>>                 (Note that use of the override may cause unknown side
>>> effects.)
>> Not sure what the ECC part is doing, but it repeats quite some times.
>> I'd assume it's unrelated though.
>>
> 
> Not sure either.  I've not got ECC RAM.  Motherboard is capable I think.
> 
> 
> 
>> [...]
>>> [  142.507291] BTRFS error (device dm-2): parent transid verify failed
>>> on 13395960053760 wanted 2265296 found 2263090
>>> [  142.544548] BTRFS error (device dm-2): parent transid verify failed
>>> on 13395960053760 wanted 2265296 found 2263090
>>> [  142.544561] BTRFS: error (device dm-2) in
>>> btrfs_run_delayed_refs:2907: errno=-5 IO failure
>>
>> This means, btrfs is trying to read extent tree for CoW, but at that
>> time, extent tree is already corrupted, thus it returns -EIO.
>>
>> And btrfs_run_delayed_refs just returns error.
>> t
>> Not sure if it's related to device replace, but anyway the corruption
>> just happened.
>> The device replace may be an interesting clue, as currently our
>> dm-log-writes are mostly focused on single device usage.
> 
> Sorry, 'device replace'?  I've not done that lately.  I _may_ have tried
> that years back with this file system.  However, iirc it failed as the
> new, allegedly same size new disk was possibly slightly smaller.

OK, it's my bad read on the following lines:

[   99.237670] BTRFS info (device dm-2): device fsid
159b8826-8380-45be-acb6-0cb992a8dfd7 devid 4 moved old:/dev/dm-1
new:/dev/mapper/data_disk_1
[   99.241061] BTRFS info (device dm-4): device fsid
6b0245ec-bdd4-4076-b800-2243d466b174 devid 1 moved old:/dev/dm-4
new:/dev/mapper/nvme0_vg-lxc
[   99.242692] BTRFS info (device dm-2): device fsid
159b8826-8380-45be-acb6-0cb992a8dfd7 devid 3 moved old:/dev/dm-2

It just a device path update, not a big deal.
> 
> From the above it looks like it is not a specific hardware failure.

Yep, no hardware related error message at all.

> 
> 
>>
>> Then I'd recommend to do regular rescue procedure:
>> - Try that skip_bg patchset if possible
>>   This provides the best salvage method so far, full subvolume
>>   available, although needs out-of-tree patches.
>>   https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637
>>
> 
> I can give that a go, but not for a while.
> 
> I seem to be able to read the file system as is, as it goes read only.
> But perhaps 'seems' is the operative word.

As long as you can mount RO, it shouldn't be mostly OK for data salvage.

THanks,
Qu

> 
>> - btrfs-restore
>>   The regular unmounted recover, needs extra space. Latest btrfs-progs
>>   recommended.
> 
> I've got the latest btrfs progs.    if neither of those two work I have
> a backup.
> 
> So, basically, make a new file system and recover the data to it.  I've
> a new disk on the way, so I can create a file system as single and once
> I'm happy I've migrated data to it, wipe the old disks and move one  or
> two to the new array and rebalance.
> 
>>
>> Thanks,
>> Qu
>>
> 
> Thank you, very much appreciated.
> 
> Pete
>