Re: first it froze, now the (btrfs) root fs won't mount ...

From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Christian Pernegger <pernegger@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: first it froze, now the (btrfs) root fs won't mount ...
Date: Sun, 20 Oct 2019 18:28:26 +0800	[thread overview]
Message-ID: <fe882b36-0f7b-5ad5-e62e-06def50acd30@gmx.com> (raw)
In-Reply-To: <CAKbQEqGwYCB1N+MQVYLNVm5o10acOFAa_JyO8NefGZfVtdyVBQ@mail.gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 12952 bytes --]

On 2019/10/20 下午6:22, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> The current plan is to dump the whole NVMe with dd (ongoing ...) and
> experiment on that. Safer that way.
> 
> Question: Can I work with the mounted backup image on the machine that
> also contains the original disc? I vaguely recall something about
> btrfs really not liking clones.

If your fs only contains one device (single fs on single device), then
you should be mostly fine.

Btrfs doesn't like clones because it needs to assemble multiple devices,
but for single device fs, it should be mostly OK.

Thanks,
Qu

> 
> Cheers,
> Christian
> 
> 
> Am So., 20. Okt. 2019 um 09:41 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>
>>
>>
>> On 2019/10/20 下午3:01, Christian Pernegger wrote:
>>> [Please CC me, I'm not on the list.]
>>>
>>> Good morning & thank you.
>>>
>>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>>> That would cause a kernel panic/hang or whatever.
>>>
>>> No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
>>> PCIe 4, otherwise nothing exotic.
>>
>> Since Radeon 7 doesn't support PCIe 4, they would just negotiate to use
>> PCIE 3, thus really nothing exotic.
>>
>> Just a kernel bug in amdgpu.
>> But since you're already using Radeon 7, it's recommended to use newer
>> kernel for latest drm updates.
>>
>>>
>>>>> [...]
>>>>> BTRFS error [...]: bad tree block start, want 284041084928 have 0
>>>>> BTRFS error [...]: failed to read block groups: -5
>>>>> BTRFS error [...]: open_ctree failed
>>> ["big number" filled in above]
>>>
>>>> This means some tree blocks didn't reach disk or just got wiped out.
>>>> Are you using discard mount option?
>>>
>>> Not to my knowledge. As in, I didn't set "discard", as far as I can
>>> remember it didn't show up in mount output, but it's possible it's on
>>> by default.
>>
>> Discard won't turn on by default IIRC.
>> So it's not discard related.
>>
>>>
>>>>> running btrfs check gives:
>>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>
>> This matches the kernel output, means that tree block doesn't reach disk
>> at all.
>>
>>>>> bytenr mismatch, want=284041084928, have=0
>>>>> ERROR: cannot open filesystem.
>>> ["big number" and "8-digit hex" filled in above]
>>>
>>>> Again, some old tree blocks got wiped out.
>>>> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
>>>
>>> I was just being lazy, sorry about that.
>>>
>>>> If it's the only problem, you can try this kernel branch to at least do
>>>> a RO mount:
>>>> https://github.com/adam900710/linux/tree/rescue_options
>>>>
>>>> Then mount the fs with "rescue=skipbg,ro" option.
>>>> If the bad tree block is the only problem, it should be able to mount it.
>>>>
>>>> If that mount succeeded, and you can access all files, then it means
>>>> only extent tree is corrupted, then you can try btrfs check
>>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>>> the problem.
>>>
>>> You wouldn't happen to know of a bootable rescue image that has this?
>>
>> Archlinux iso at least has the latest btrfs-progs.
>> You can try that.
>>
>> The latest btrfs check is not that super dangerous compared to older
>> versions.
>> You can try --init-extent-tree, if it finishes it should give you a more
>> or less mountable fs.
>>
>> If it crashes, then it shouldn't cause extra damage, but still it's not
>> 100% safe.
>>
>>
>> I'd recommend the following safer methods before trying --init-extent-tree:
>>
>> - Dump backup roots first:
>>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>>   Then grab all big numbers.
>>
>> - Try backup_extent_root numbers in btrfs check first
>>   # btrfs check -r <above big number> <dev>
>>   Use the number with highest generation first.
>>
>>   It's the equivalent of kernel usebackuproot mount option, but more
>>   control as you can try every backup and find which one can pass the
>>   extent tree failure.
>>
>>   If all backup fails to pass basic btrfs check, and all happen to have
>>   the same "wanted 00000000" then it means a big range of tree blocks
>>   get wiped out, not really related to btrfs but some hardware wipe.
>>
>>   If one can pass the initial mount and gives extra errors, then you can
>>   add --repair to hope for a better chance to repair.
>>
>>> The affected machine obviously doesn't boot, getting the NVMe out
>>> requires dismantling the CPU cooler, and TBH, I haven't built a kernel
>>> in ~15 years.
>>
>> The safest one is still that out-of-tree rescue patchset, especially
>> when we can't rule out other corruptions in other trees.
>> I should really push that patchset harder into mainline.
>>
>> Just another unrelated hardware recommend, since you're already using
>> Radeon 7 and X570 board, I guess using an AIO will make M.2 SSD more
>> accessible.
>>
>> Or keep the exotic tower cooler, and use an M.2 to PCIe adapter to make
>> your SSD more accessible, as CrossFire is already dead, I guess you have
>> some free PCIE x4 slots.
>>
>>>
>>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>>> the case).
>>>>
>>>> So it's recommended to update the kernel to 5.3 kernel.
>>>
>>> FWIW, it's a Samsung 970 Evo Plus.
>>
>> It doesn't look like a hardware problem, but I keep my conclusion until
>> you have tried all backup roots.
>>
>> Thanks,
>> Qu
>>
>>> TBH, I didn't expect to lose more than the last couple minutes of
>>> writes in such a crash, certainly not an unmountable filesystem. So
>>> I'd love to know what caused this so I can avoid it in future.> But
>>> first things first, have to get this thing up & running again ...
>>>
>>> Cheers,
>>> Christian
>>>
>>
> 
> Am So., 20. Okt. 2019 um 12:11 Uhr schrieb Christian Pernegger
> <pernegger@gmail.com>:
>>
>> [Re-send, hit reply instead of reply-all by mistake. Please CC me, I'm
>> not on the list.]
>>
>> Good morning & thank you.
>>
>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>> That would cause a kernel panic/hang or whatever.
>>
>> No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
>> PCIe 4, otherwise nothing exotic.
>>
>>>> [...]
>>>> BTRFS error [...]: bad tree block start, want 284041084928 have 0
>>>> BTRFS error [...]: failed to read block groups: -5
>>>> BTRFS error [...]: open_ctree failed
>> ["big number" filled in above]
>>
>>> This means some tree blocks didn't reach disk or just got wiped out.
>>> Are you using discard mount option?
>>
>> Not to my knowledge. As in, I didn't set "discard", as far as I can
>> remember it didn't show up in mount output, but it's possible it's on
>> by default.
>>
>>>> running btrfs check gives:
>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>> bytenr mismatch, want=284041084928, have=0
>>>> ERROR: cannot open filesystem.
>> ["big number" and "8-digit hex" filled in above]
>>
>>> Again, some old tree blocks got wiped out.
>>> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
>>
>> I was just being lazy, sorry about that.
>>
>>> If it's the only problem, you can try this kernel branch to at least do
>>> a RO mount:
>>> https://github.com/adam900710/linux/tree/rescue_options
>>>
>>> Then mount the fs with "rescue=skipbg,ro" option.
>>> If the bad tree block is the only problem, it should be able to mount it.
>>>
>>> If that mount succeeded, and you can access all files, then it means
>>> only extent tree is corrupted, then you can try btrfs check
>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>> the problem.
>>
>> You wouldn't happen to know of a bootable rescue image that has this?
>> The affected machine obviously doesn't boot, getting the NVMe out
>> requires dismantling the CPU cooler, and TBH, I haven't built a kernel
>> in ~15 years.
>>
>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>> the case).
>>>
>>> So it's recommended to update the kernel to 5.3 kernel.
>>
>> FWIW, it's a Samsung 970 Evo Plus.
>> TBH, I didn't expect to lose more than the last couple minutes of
>> writes in such a crash, certainly not an unmountable filesystem. So
>> I'd love to know what caused this so I can avoid it in future. But
>> first things first, have to get this thing up & running again ...
>>
>> Cheers,
>> Christian
>>
>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>>
>>>
>>>
>>> On 2019/10/20 上午6:34, Christian Pernegger wrote:
>>>> [Please CC me, I'm not on the list.]
>>>>
>>>> Hello,
>>>>
>>>> I'm afraid I could use some help.
>>>>
>>>> The affected machine froze during a game, was entirely unresponsive
>>>> locally, though ssh still worked. For completeness' sake, dmesg had:
>>>> [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
>>>> timeout, signaled seq=3404070, emitted seq=3404071
>>>> [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
>>>> information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
>>>> [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
>>>> [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>>>> timeout, signaled seq=13149116, emitted seq=13149118
>>>> [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
>>>> information: process Overcooked.exe pid 4830 thread dxvk-submit pid
>>>> 4856
>>>> [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!
>>>
>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>> That would cause a kernel panic/hang or whatever.
>>>
>>>>
>>>> Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
>>>> and locked me out, but otherwise didn't take, no reboot, still frozen.
>>>> Alt-SysRq-REISUB it was. That did it.
>>>>
>>>> Only now all I get is a rescue shell, the pertinent messages look to
>>>> be [everything is copied off the screen by hand]:
>>>> [...]
>>>> BTRFS info [...]: disk space caching is enabled
>>>> BTRFS info [...]: has skinny extents
>>>> BTRFS error [...]: bad tree block start, want [big number] have 0
>>>> BTRFS error [...]: failed to read block groups: -5
>>>> BTRFS error [...]: open_ctree failed
>>>
>>> This means some tree blocks didn't reach disk or just got wiped out.
>>>
>>> Are you using discard mount option?
>>>
>>>>
>>>> Mounting with -o ro,usebackuproot doesn't change anything.
>>>>
>>>> running btrfs check gives:
>>>> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
>>>> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
>>>
>>> Again, some old tree blocks got wiped out.
>>>
>>> BTW, you don't need to wipe the numbers, sometimes it help developer to
>>> find some corner problem.
>>>
>>>> bytenr mismatch, want=[same big number], have=0
>>>> ERROR: cannot open filesystem.
>>>>
>>>> That's all I've got, I'd really appreciate some help. There's hourly
>>>> snapshots courtesy of Timeshift, though I have a feeling those won't
>>>> help ...
>>>
>>> If it's the only problem, you can try this kernel branch to at least do
>>> a RO mount:
>>> https://github.com/adam900710/linux/tree/rescue_options
>>>
>>> Then mount the fs with "rescue=skipbg,ro" option.
>>> If the bad tree block is the only problem, it should be able to mount it.
>>>
>>> If that mount succeeded, and you can access all files, then it means
>>> only extent tree is corrupted, then you can try btrfs check
>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>> the problem.
>>>
>>>>
>>>> Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
>>>> Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
>>>> 5.0.0-31-generic), btrfs-progs 4.15.1.
>>>
>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>> the case).
>>>
>>> So it's recommended to update the kernel to 5.3 kernel.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> TIA,
>>>> Christian
>>>>
>>>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]