first it froze, now the (btrfs) root fs won't mount ...

All of lore.kernel.org
 help / color / mirror / Atom feed

* first it froze, now the (btrfs) root fs won't mount ...
       [not found] <CAKbQEqE7xN1q3byFL7-_pD=_pGJ0Vm9pj7d-g+rRgtONeH-GrA@mail.gmail.com>
@ 2019-10-19 22:34 ` Christian Pernegger
  2019-10-20  0:38   ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-19 22:34 UTC (permalink / raw)
  To: linux-btrfs

[Please CC me, I'm not on the list.]

Hello,

I'm afraid I could use some help.

The affected machine froze during a game, was entirely unresponsive
locally, though ssh still worked. For completeness' sake, dmesg had:
[110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
timeout, signaled seq=3404070, emitted seq=3404071
[110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
[110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
[110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=13149116, emitted seq=13149118
[110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process Overcooked.exe pid 4830 thread dxvk-submit pid
4856
[110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!

Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
and locked me out, but otherwise didn't take, no reboot, still frozen.
Alt-SysRq-REISUB it was. That did it.

Only now all I get is a rescue shell, the pertinent messages look to
be [everything is copied off the screen by hand]:
[...]
BTRFS info [...]: disk space caching is enabled
BTRFS info [...]: has skinny extents
BTRFS error [...]: bad tree block start, want [big number] have 0
BTRFS error [...]: failed to read block groups: -5
BTRFS error [...]: open_ctree failed

Mounting with -o ro,usebackuproot doesn't change anything.

running btrfs check gives:
checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
bytenr mismatch, want=[same big number], have=0
ERROR: cannot open filesystem.

That's all I've got, I'd really appreciate some help. There's hourly
snapshots courtesy of Timeshift, though I have a feeling those won't
help ...

Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
5.0.0-31-generic), btrfs-progs 4.15.1.

TIA,
Christian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-19 22:34 ` first it froze, now the (btrfs) root fs won't mount Christian Pernegger
@ 2019-10-20  0:38   ` Qu Wenruo
  2019-10-20 10:11     ` Christian Pernegger
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-20  0:38 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3394 bytes --]



On 2019/10/20 上午6:34, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Hello,
> 
> I'm afraid I could use some help.
> 
> The affected machine froze during a game, was entirely unresponsive
> locally, though ssh still worked. For completeness' sake, dmesg had:
> [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=3404070, emitted seq=3404071
> [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
> [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
> [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> timeout, signaled seq=13149116, emitted seq=13149118
> [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process Overcooked.exe pid 4830 thread dxvk-submit pid
> 4856
> [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!

It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
That would cause a kernel panic/hang or whatever.

> 
> Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
> and locked me out, but otherwise didn't take, no reboot, still frozen.
> Alt-SysRq-REISUB it was. That did it.
> 
> Only now all I get is a rescue shell, the pertinent messages look to
> be [everything is copied off the screen by hand]:
> [...]
> BTRFS info [...]: disk space caching is enabled
> BTRFS info [...]: has skinny extents
> BTRFS error [...]: bad tree block start, want [big number] have 0
> BTRFS error [...]: failed to read block groups: -5
> BTRFS error [...]: open_ctree failed

This means some tree blocks didn't reach disk or just got wiped out.

Are you using discard mount option?

> 
> Mounting with -o ro,usebackuproot doesn't change anything.
> 
> running btrfs check gives:
> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000

Again, some old tree blocks got wiped out.

BTW, you don't need to wipe the numbers, sometimes it help developer to
find some corner problem.

> bytenr mismatch, want=[same big number], have=0
> ERROR: cannot open filesystem.
> 
> That's all I've got, I'd really appreciate some help. There's hourly
> snapshots courtesy of Timeshift, though I have a feeling those won't
> help ...

If it's the only problem, you can try this kernel branch to at least do
a RO mount:
https://github.com/adam900710/linux/tree/rescue_options

Then mount the fs with "rescue=skipbg,ro" option.
If the bad tree block is the only problem, it should be able to mount it.

If that mount succeeded, and you can access all files, then it means
only extent tree is corrupted, then you can try btrfs check
--init-extent-tree, there are some reports of --init-extent-tree fixed
the problem.

> 
> Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
> Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
> 5.0.0-31-generic), btrfs-progs 4.15.1.

About the cause, either btrfs didn't write some tree blocks correctly or
the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
the case).

So it's recommended to update the kernel to 5.3 kernel.

Thanks,
Qu

> 
> TIA,
> Christian
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-20  0:38   ` Qu Wenruo
@ 2019-10-20 10:11     ` Christian Pernegger
  2019-10-20 10:22       ` Christian Pernegger
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-20 10:11 UTC (permalink / raw)
  To: linux-btrfs

[Re-send, hit reply instead of reply-all by mistake. Please CC me, I'm
not on the list.]

Good morning & thank you.

Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
> That would cause a kernel panic/hang or whatever.

No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
PCIe 4, otherwise nothing exotic.

> > [...]
> > BTRFS error [...]: bad tree block start, want 284041084928 have 0
> > BTRFS error [...]: failed to read block groups: -5
> > BTRFS error [...]: open_ctree failed
["big number" filled in above]

> This means some tree blocks didn't reach disk or just got wiped out.
> Are you using discard mount option?

Not to my knowledge. As in, I didn't set "discard", as far as I can
remember it didn't show up in mount output, but it's possible it's on
by default.

> > running btrfs check gives:
> > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > bytenr mismatch, want=284041084928, have=0
> > ERROR: cannot open filesystem.
["big number" and "8-digit hex" filled in above]

> Again, some old tree blocks got wiped out.
> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.

I was just being lazy, sorry about that.

> If it's the only problem, you can try this kernel branch to at least do
> a RO mount:
> https://github.com/adam900710/linux/tree/rescue_options
>
> Then mount the fs with "rescue=skipbg,ro" option.
> If the bad tree block is the only problem, it should be able to mount it.
>
> If that mount succeeded, and you can access all files, then it means
> only extent tree is corrupted, then you can try btrfs check
> --init-extent-tree, there are some reports of --init-extent-tree fixed
> the problem.

You wouldn't happen to know of a bootable rescue image that has this?
The affected machine obviously doesn't boot, getting the NVMe out
requires dismantling the CPU cooler, and TBH, I haven't built a kernel
in ~15 years.

> About the cause, either btrfs didn't write some tree blocks correctly or
> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
> the case).
>
> So it's recommended to update the kernel to 5.3 kernel.

FWIW, it's a Samsung 970 Evo Plus.
TBH, I didn't expect to lose more than the last couple minutes of
writes in such a crash, certainly not an unmountable filesystem. So
I'd love to know what caused this so I can avoid it in future. But
first things first, have to get this thing up & running again ...

Cheers,
Christian

Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>
>
>
> On 2019/10/20 上午6:34, Christian Pernegger wrote:
> > [Please CC me, I'm not on the list.]
> >
> > Hello,
> >
> > I'm afraid I could use some help.
> >
> > The affected machine froze during a game, was entirely unresponsive
> > locally, though ssh still worked. For completeness' sake, dmesg had:
> > [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> > timeout, signaled seq=3404070, emitted seq=3404071
> > [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> > information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
> > [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
> > [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > timeout, signaled seq=13149116, emitted seq=13149118
> > [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> > information: process Overcooked.exe pid 4830 thread dxvk-submit pid
> > 4856
> > [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!
>
> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
> That would cause a kernel panic/hang or whatever.
>
> >
> > Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
> > and locked me out, but otherwise didn't take, no reboot, still frozen.
> > Alt-SysRq-REISUB it was. That did it.
> >
> > Only now all I get is a rescue shell, the pertinent messages look to
> > be [everything is copied off the screen by hand]:
> > [...]
> > BTRFS info [...]: disk space caching is enabled
> > BTRFS info [...]: has skinny extents
> > BTRFS error [...]: bad tree block start, want [big number] have 0
> > BTRFS error [...]: failed to read block groups: -5
> > BTRFS error [...]: open_ctree failed
>
> This means some tree blocks didn't reach disk or just got wiped out.
>
> Are you using discard mount option?
>
> >
> > Mounting with -o ro,usebackuproot doesn't change anything.
> >
> > running btrfs check gives:
> > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
> > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
>
> Again, some old tree blocks got wiped out.
>
> BTW, you don't need to wipe the numbers, sometimes it help developer to
> find some corner problem.
>
> > bytenr mismatch, want=[same big number], have=0
> > ERROR: cannot open filesystem.
> >
> > That's all I've got, I'd really appreciate some help. There's hourly
> > snapshots courtesy of Timeshift, though I have a feeling those won't
> > help ...
>
> If it's the only problem, you can try this kernel branch to at least do
> a RO mount:
> https://github.com/adam900710/linux/tree/rescue_options
>
> Then mount the fs with "rescue=skipbg,ro" option.
> If the bad tree block is the only problem, it should be able to mount it.
>
> If that mount succeeded, and you can access all files, then it means
> only extent tree is corrupted, then you can try btrfs check
> --init-extent-tree, there are some reports of --init-extent-tree fixed
> the problem.
>
> >
> > Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
> > Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
> > 5.0.0-31-generic), btrfs-progs 4.15.1.
>
> About the cause, either btrfs didn't write some tree blocks correctly or
> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
> the case).
>
> So it's recommended to update the kernel to 5.3 kernel.
>
> Thanks,
> Qu
>
> >
> > TIA,
> > Christian
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-20 10:11     ` Christian Pernegger
@ 2019-10-20 10:22       ` Christian Pernegger
  2019-10-20 10:28         ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-20 10:22 UTC (permalink / raw)
  To: linux-btrfs

[Please CC me, I'm not on the list.]

The current plan is to dump the whole NVMe with dd (ongoing ...) and
experiment on that. Safer that way.

Question: Can I work with the mounted backup image on the machine that
also contains the original disc? I vaguely recall something about
btrfs really not liking clones.

Cheers,
Christian


Am So., 20. Okt. 2019 um 09:41 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>
>
>
> On 2019/10/20 下午3:01, Christian Pernegger wrote:
> > [Please CC me, I'm not on the list.]
> >
> > Good morning & thank you.
> >
> > Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> >> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
> >> That would cause a kernel panic/hang or whatever.
> >
> > No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
> > PCIe 4, otherwise nothing exotic.
>
> Since Radeon 7 doesn't support PCIe 4, they would just negotiate to use
> PCIE 3, thus really nothing exotic.
>
> Just a kernel bug in amdgpu.
> But since you're already using Radeon 7, it's recommended to use newer
> kernel for latest drm updates.
>
> >
> >>> [...]
> >>> BTRFS error [...]: bad tree block start, want 284041084928 have 0
> >>> BTRFS error [...]: failed to read block groups: -5
> >>> BTRFS error [...]: open_ctree failed
> > ["big number" filled in above]
> >
> >> This means some tree blocks didn't reach disk or just got wiped out.
> >> Are you using discard mount option?
> >
> > Not to my knowledge. As in, I didn't set "discard", as far as I can
> > remember it didn't show up in mount output, but it's possible it's on
> > by default.
>
> Discard won't turn on by default IIRC.
> So it's not discard related.
>
> >
> >>> running btrfs check gives:
> >>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> >>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>
> This matches the kernel output, means that tree block doesn't reach disk
> at all.
>
> >>> bytenr mismatch, want=284041084928, have=0
> >>> ERROR: cannot open filesystem.
> > ["big number" and "8-digit hex" filled in above]
> >
> >> Again, some old tree blocks got wiped out.
> >> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
> >
> > I was just being lazy, sorry about that.
> >
> >> If it's the only problem, you can try this kernel branch to at least do
> >> a RO mount:
> >> https://github.com/adam900710/linux/tree/rescue_options
> >>
> >> Then mount the fs with "rescue=skipbg,ro" option.
> >> If the bad tree block is the only problem, it should be able to mount it.
> >>
> >> If that mount succeeded, and you can access all files, then it means
> >> only extent tree is corrupted, then you can try btrfs check
> >> --init-extent-tree, there are some reports of --init-extent-tree fixed
> >> the problem.
> >
> > You wouldn't happen to know of a bootable rescue image that has this?
>
> Archlinux iso at least has the latest btrfs-progs.
> You can try that.
>
> The latest btrfs check is not that super dangerous compared to older
> versions.
> You can try --init-extent-tree, if it finishes it should give you a more
> or less mountable fs.
>
> If it crashes, then it shouldn't cause extra damage, but still it's not
> 100% safe.
>
>
> I'd recommend the following safer methods before trying --init-extent-tree:
>
> - Dump backup roots first:
>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>   Then grab all big numbers.
>
> - Try backup_extent_root numbers in btrfs check first
>   # btrfs check -r <above big number> <dev>
>   Use the number with highest generation first.
>
>   It's the equivalent of kernel usebackuproot mount option, but more
>   control as you can try every backup and find which one can pass the
>   extent tree failure.
>
>   If all backup fails to pass basic btrfs check, and all happen to have
>   the same "wanted 00000000" then it means a big range of tree blocks
>   get wiped out, not really related to btrfs but some hardware wipe.
>
>   If one can pass the initial mount and gives extra errors, then you can
>   add --repair to hope for a better chance to repair.
>
> > The affected machine obviously doesn't boot, getting the NVMe out
> > requires dismantling the CPU cooler, and TBH, I haven't built a kernel
> > in ~15 years.
>
> The safest one is still that out-of-tree rescue patchset, especially
> when we can't rule out other corruptions in other trees.
> I should really push that patchset harder into mainline.
>
> Just another unrelated hardware recommend, since you're already using
> Radeon 7 and X570 board, I guess using an AIO will make M.2 SSD more
> accessible.
>
> Or keep the exotic tower cooler, and use an M.2 to PCIe adapter to make
> your SSD more accessible, as CrossFire is already dead, I guess you have
> some free PCIE x4 slots.
>
> >
> >> About the cause, either btrfs didn't write some tree blocks correctly or
> >> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
> >> the case).
> >>
> >> So it's recommended to update the kernel to 5.3 kernel.
> >
> > FWIW, it's a Samsung 970 Evo Plus.
>
> It doesn't look like a hardware problem, but I keep my conclusion until
> you have tried all backup roots.
>
> Thanks,
> Qu
>
> > TBH, I didn't expect to lose more than the last couple minutes of
> > writes in such a crash, certainly not an unmountable filesystem. So
> > I'd love to know what caused this so I can avoid it in future.> But
> > first things first, have to get this thing up & running again ...
> >
> > Cheers,
> > Christian
> >
>

Am So., 20. Okt. 2019 um 12:11 Uhr schrieb Christian Pernegger
<pernegger@gmail.com>:
>
> [Re-send, hit reply instead of reply-all by mistake. Please CC me, I'm
> not on the list.]
>
> Good morning & thank you.
>
> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> > It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
> > That would cause a kernel panic/hang or whatever.
>
> No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
> PCIe 4, otherwise nothing exotic.
>
> > > [...]
> > > BTRFS error [...]: bad tree block start, want 284041084928 have 0
> > > BTRFS error [...]: failed to read block groups: -5
> > > BTRFS error [...]: open_ctree failed
> ["big number" filled in above]
>
> > This means some tree blocks didn't reach disk or just got wiped out.
> > Are you using discard mount option?
>
> Not to my knowledge. As in, I didn't set "discard", as far as I can
> remember it didn't show up in mount output, but it's possible it's on
> by default.
>
> > > running btrfs check gives:
> > > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > > bytenr mismatch, want=284041084928, have=0
> > > ERROR: cannot open filesystem.
> ["big number" and "8-digit hex" filled in above]
>
> > Again, some old tree blocks got wiped out.
> > BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
>
> I was just being lazy, sorry about that.
>
> > If it's the only problem, you can try this kernel branch to at least do
> > a RO mount:
> > https://github.com/adam900710/linux/tree/rescue_options
> >
> > Then mount the fs with "rescue=skipbg,ro" option.
> > If the bad tree block is the only problem, it should be able to mount it.
> >
> > If that mount succeeded, and you can access all files, then it means
> > only extent tree is corrupted, then you can try btrfs check
> > --init-extent-tree, there are some reports of --init-extent-tree fixed
> > the problem.
>
> You wouldn't happen to know of a bootable rescue image that has this?
> The affected machine obviously doesn't boot, getting the NVMe out
> requires dismantling the CPU cooler, and TBH, I haven't built a kernel
> in ~15 years.
>
> > About the cause, either btrfs didn't write some tree blocks correctly or
> > the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
> > the case).
> >
> > So it's recommended to update the kernel to 5.3 kernel.
>
> FWIW, it's a Samsung 970 Evo Plus.
> TBH, I didn't expect to lose more than the last couple minutes of
> writes in such a crash, certainly not an unmountable filesystem. So
> I'd love to know what caused this so I can avoid it in future. But
> first things first, have to get this thing up & running again ...
>
> Cheers,
> Christian
>
> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> >
> >
> >
> > On 2019/10/20 上午6:34, Christian Pernegger wrote:
> > > [Please CC me, I'm not on the list.]
> > >
> > > Hello,
> > >
> > > I'm afraid I could use some help.
> > >
> > > The affected machine froze during a game, was entirely unresponsive
> > > locally, though ssh still worked. For completeness' sake, dmesg had:
> > > [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> > > timeout, signaled seq=3404070, emitted seq=3404071
> > > [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> > > information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
> > > [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
> > > [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
> > > timeout, signaled seq=13149116, emitted seq=13149118
> > > [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> > > information: process Overcooked.exe pid 4830 thread dxvk-submit pid
> > > 4856
> > > [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!
> >
> > It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
> > That would cause a kernel panic/hang or whatever.
> >
> > >
> > > Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
> > > and locked me out, but otherwise didn't take, no reboot, still frozen.
> > > Alt-SysRq-REISUB it was. That did it.
> > >
> > > Only now all I get is a rescue shell, the pertinent messages look to
> > > be [everything is copied off the screen by hand]:
> > > [...]
> > > BTRFS info [...]: disk space caching is enabled
> > > BTRFS info [...]: has skinny extents
> > > BTRFS error [...]: bad tree block start, want [big number] have 0
> > > BTRFS error [...]: failed to read block groups: -5
> > > BTRFS error [...]: open_ctree failed
> >
> > This means some tree blocks didn't reach disk or just got wiped out.
> >
> > Are you using discard mount option?
> >
> > >
> > > Mounting with -o ro,usebackuproot doesn't change anything.
> > >
> > > running btrfs check gives:
> > > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
> > > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
> >
> > Again, some old tree blocks got wiped out.
> >
> > BTW, you don't need to wipe the numbers, sometimes it help developer to
> > find some corner problem.
> >
> > > bytenr mismatch, want=[same big number], have=0
> > > ERROR: cannot open filesystem.
> > >
> > > That's all I've got, I'd really appreciate some help. There's hourly
> > > snapshots courtesy of Timeshift, though I have a feeling those won't
> > > help ...
> >
> > If it's the only problem, you can try this kernel branch to at least do
> > a RO mount:
> > https://github.com/adam900710/linux/tree/rescue_options
> >
> > Then mount the fs with "rescue=skipbg,ro" option.
> > If the bad tree block is the only problem, it should be able to mount it.
> >
> > If that mount succeeded, and you can access all files, then it means
> > only extent tree is corrupted, then you can try btrfs check
> > --init-extent-tree, there are some reports of --init-extent-tree fixed
> > the problem.
> >
> > >
> > > Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
> > > Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
> > > 5.0.0-31-generic), btrfs-progs 4.15.1.
> >
> > About the cause, either btrfs didn't write some tree blocks correctly or
> > the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
> > the case).
> >
> > So it's recommended to update the kernel to 5.3 kernel.
> >
> > Thanks,
> > Qu
> >
> > >
> > > TIA,
> > > Christian
> > >
> >

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-20 10:22       ` Christian Pernegger
@ 2019-10-20 10:28         ` Qu Wenruo
  2019-10-21 10:47           ` Christian Pernegger
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-20 10:28 UTC (permalink / raw)
  To: Christian Pernegger, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 12952 bytes --]



On 2019/10/20 下午6:22, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> The current plan is to dump the whole NVMe with dd (ongoing ...) and
> experiment on that. Safer that way.
> 
> Question: Can I work with the mounted backup image on the machine that
> also contains the original disc? I vaguely recall something about
> btrfs really not liking clones.

If your fs only contains one device (single fs on single device), then
you should be mostly fine.

Btrfs doesn't like clones because it needs to assemble multiple devices,
but for single device fs, it should be mostly OK.

Thanks,
Qu

> 
> Cheers,
> Christian
> 
> 
> Am So., 20. Okt. 2019 um 09:41 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>
>>
>>
>> On 2019/10/20 下午3:01, Christian Pernegger wrote:
>>> [Please CC me, I'm not on the list.]
>>>
>>> Good morning & thank you.
>>>
>>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>>> That would cause a kernel panic/hang or whatever.
>>>
>>> No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
>>> PCIe 4, otherwise nothing exotic.
>>
>> Since Radeon 7 doesn't support PCIe 4, they would just negotiate to use
>> PCIE 3, thus really nothing exotic.
>>
>> Just a kernel bug in amdgpu.
>> But since you're already using Radeon 7, it's recommended to use newer
>> kernel for latest drm updates.
>>
>>>
>>>>> [...]
>>>>> BTRFS error [...]: bad tree block start, want 284041084928 have 0
>>>>> BTRFS error [...]: failed to read block groups: -5
>>>>> BTRFS error [...]: open_ctree failed
>>> ["big number" filled in above]
>>>
>>>> This means some tree blocks didn't reach disk or just got wiped out.
>>>> Are you using discard mount option?
>>>
>>> Not to my knowledge. As in, I didn't set "discard", as far as I can
>>> remember it didn't show up in mount output, but it's possible it's on
>>> by default.
>>
>> Discard won't turn on by default IIRC.
>> So it's not discard related.
>>
>>>
>>>>> running btrfs check gives:
>>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>
>> This matches the kernel output, means that tree block doesn't reach disk
>> at all.
>>
>>>>> bytenr mismatch, want=284041084928, have=0
>>>>> ERROR: cannot open filesystem.
>>> ["big number" and "8-digit hex" filled in above]
>>>
>>>> Again, some old tree blocks got wiped out.
>>>> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
>>>
>>> I was just being lazy, sorry about that.
>>>
>>>> If it's the only problem, you can try this kernel branch to at least do
>>>> a RO mount:
>>>> https://github.com/adam900710/linux/tree/rescue_options
>>>>
>>>> Then mount the fs with "rescue=skipbg,ro" option.
>>>> If the bad tree block is the only problem, it should be able to mount it.
>>>>
>>>> If that mount succeeded, and you can access all files, then it means
>>>> only extent tree is corrupted, then you can try btrfs check
>>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>>> the problem.
>>>
>>> You wouldn't happen to know of a bootable rescue image that has this?
>>
>> Archlinux iso at least has the latest btrfs-progs.
>> You can try that.
>>
>> The latest btrfs check is not that super dangerous compared to older
>> versions.
>> You can try --init-extent-tree, if it finishes it should give you a more
>> or less mountable fs.
>>
>> If it crashes, then it shouldn't cause extra damage, but still it's not
>> 100% safe.
>>
>>
>> I'd recommend the following safer methods before trying --init-extent-tree:
>>
>> - Dump backup roots first:
>>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>>   Then grab all big numbers.
>>
>> - Try backup_extent_root numbers in btrfs check first
>>   # btrfs check -r <above big number> <dev>
>>   Use the number with highest generation first.
>>
>>   It's the equivalent of kernel usebackuproot mount option, but more
>>   control as you can try every backup and find which one can pass the
>>   extent tree failure.
>>
>>   If all backup fails to pass basic btrfs check, and all happen to have
>>   the same "wanted 00000000" then it means a big range of tree blocks
>>   get wiped out, not really related to btrfs but some hardware wipe.
>>
>>   If one can pass the initial mount and gives extra errors, then you can
>>   add --repair to hope for a better chance to repair.
>>
>>> The affected machine obviously doesn't boot, getting the NVMe out
>>> requires dismantling the CPU cooler, and TBH, I haven't built a kernel
>>> in ~15 years.
>>
>> The safest one is still that out-of-tree rescue patchset, especially
>> when we can't rule out other corruptions in other trees.
>> I should really push that patchset harder into mainline.
>>
>> Just another unrelated hardware recommend, since you're already using
>> Radeon 7 and X570 board, I guess using an AIO will make M.2 SSD more
>> accessible.
>>
>> Or keep the exotic tower cooler, and use an M.2 to PCIe adapter to make
>> your SSD more accessible, as CrossFire is already dead, I guess you have
>> some free PCIE x4 slots.
>>
>>>
>>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>>> the case).
>>>>
>>>> So it's recommended to update the kernel to 5.3 kernel.
>>>
>>> FWIW, it's a Samsung 970 Evo Plus.
>>
>> It doesn't look like a hardware problem, but I keep my conclusion until
>> you have tried all backup roots.
>>
>> Thanks,
>> Qu
>>
>>> TBH, I didn't expect to lose more than the last couple minutes of
>>> writes in such a crash, certainly not an unmountable filesystem. So
>>> I'd love to know what caused this so I can avoid it in future.> But
>>> first things first, have to get this thing up & running again ...
>>>
>>> Cheers,
>>> Christian
>>>
>>
> 
> Am So., 20. Okt. 2019 um 12:11 Uhr schrieb Christian Pernegger
> <pernegger@gmail.com>:
>>
>> [Re-send, hit reply instead of reply-all by mistake. Please CC me, I'm
>> not on the list.]
>>
>> Good morning & thank you.
>>
>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>> That would cause a kernel panic/hang or whatever.
>>
>> No, it's a Radeon VII in a Gigabyte X570 Aorus Master. The board has
>> PCIe 4, otherwise nothing exotic.
>>
>>>> [...]
>>>> BTRFS error [...]: bad tree block start, want 284041084928 have 0
>>>> BTRFS error [...]: failed to read block groups: -5
>>>> BTRFS error [...]: open_ctree failed
>> ["big number" filled in above]
>>
>>> This means some tree blocks didn't reach disk or just got wiped out.
>>> Are you using discard mount option?
>>
>> Not to my knowledge. As in, I didn't set "discard", as far as I can
>> remember it didn't show up in mount output, but it's possible it's on
>> by default.
>>
>>>> running btrfs check gives:
>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>>> bytenr mismatch, want=284041084928, have=0
>>>> ERROR: cannot open filesystem.
>> ["big number" and "8-digit hex" filled in above]
>>
>>> Again, some old tree blocks got wiped out.
>>> BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem.
>>
>> I was just being lazy, sorry about that.
>>
>>> If it's the only problem, you can try this kernel branch to at least do
>>> a RO mount:
>>> https://github.com/adam900710/linux/tree/rescue_options
>>>
>>> Then mount the fs with "rescue=skipbg,ro" option.
>>> If the bad tree block is the only problem, it should be able to mount it.
>>>
>>> If that mount succeeded, and you can access all files, then it means
>>> only extent tree is corrupted, then you can try btrfs check
>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>> the problem.
>>
>> You wouldn't happen to know of a bootable rescue image that has this?
>> The affected machine obviously doesn't boot, getting the NVMe out
>> requires dismantling the CPU cooler, and TBH, I haven't built a kernel
>> in ~15 years.
>>
>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>> the case).
>>>
>>> So it's recommended to update the kernel to 5.3 kernel.
>>
>> FWIW, it's a Samsung 970 Evo Plus.
>> TBH, I didn't expect to lose more than the last couple minutes of
>> writes in such a crash, certainly not an unmountable filesystem. So
>> I'd love to know what caused this so I can avoid it in future. But
>> first things first, have to get this thing up & running again ...
>>
>> Cheers,
>> Christian
>>
>> Am So., 20. Okt. 2019 um 02:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>>
>>>
>>>
>>> On 2019/10/20 上午6:34, Christian Pernegger wrote:
>>>> [Please CC me, I'm not on the list.]
>>>>
>>>> Hello,
>>>>
>>>> I'm afraid I could use some help.
>>>>
>>>> The affected machine froze during a game, was entirely unresponsive
>>>> locally, though ssh still worked. For completeness' sake, dmesg had:
>>>> [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
>>>> timeout, signaled seq=3404070, emitted seq=3404071
>>>> [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
>>>> information: process Xorg pid 1191 thread Xorg:cs0 pid 1204
>>>> [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin!
>>>> [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
>>>> timeout, signaled seq=13149116, emitted seq=13149118
>>>> [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
>>>> information: process Overcooked.exe pid 4830 thread dxvk-submit pid
>>>> 4856
>>>> [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin!
>>>
>>> It looks like you're using eGPU and the thunderbolt 3 connection disconnect?
>>> That would cause a kernel panic/hang or whatever.
>>>
>>>>
>>>> Oh well, I thought, and "shutdown -h now" it. That quit my ssh session
>>>> and locked me out, but otherwise didn't take, no reboot, still frozen.
>>>> Alt-SysRq-REISUB it was. That did it.
>>>>
>>>> Only now all I get is a rescue shell, the pertinent messages look to
>>>> be [everything is copied off the screen by hand]:
>>>> [...]
>>>> BTRFS info [...]: disk space caching is enabled
>>>> BTRFS info [...]: has skinny extents
>>>> BTRFS error [...]: bad tree block start, want [big number] have 0
>>>> BTRFS error [...]: failed to read block groups: -5
>>>> BTRFS error [...]: open_ctree failed
>>>
>>> This means some tree blocks didn't reach disk or just got wiped out.
>>>
>>> Are you using discard mount option?
>>>
>>>>
>>>> Mounting with -o ro,usebackuproot doesn't change anything.
>>>>
>>>> running btrfs check gives:
>>>> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
>>>> checksum verify failed on [same big number] found [8 digits hex] wanted 00000000
>>>
>>> Again, some old tree blocks got wiped out.
>>>
>>> BTW, you don't need to wipe the numbers, sometimes it help developer to
>>> find some corner problem.
>>>
>>>> bytenr mismatch, want=[same big number], have=0
>>>> ERROR: cannot open filesystem.
>>>>
>>>> That's all I've got, I'd really appreciate some help. There's hourly
>>>> snapshots courtesy of Timeshift, though I have a feeling those won't
>>>> help ...
>>>
>>> If it's the only problem, you can try this kernel branch to at least do
>>> a RO mount:
>>> https://github.com/adam900710/linux/tree/rescue_options
>>>
>>> Then mount the fs with "rescue=skipbg,ro" option.
>>> If the bad tree block is the only problem, it should be able to mount it.
>>>
>>> If that mount succeeded, and you can access all files, then it means
>>> only extent tree is corrupted, then you can try btrfs check
>>> --init-extent-tree, there are some reports of --init-extent-tree fixed
>>> the problem.
>>>
>>>>
>>>> Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home),
>>>> Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel
>>>> 5.0.0-31-generic), btrfs-progs 4.15.1.
>>>
>>> About the cause, either btrfs didn't write some tree blocks correctly or
>>> the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is
>>> the case).
>>>
>>> So it's recommended to update the kernel to 5.3 kernel.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>> TIA,
>>>> Christian
>>>>
>>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-20 10:28         ` Qu Wenruo
@ 2019-10-21 10:47           ` Christian Pernegger
  2019-10-21 10:55             ` Qu Wenruo
  2019-10-21 11:47             ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 29+ messages in thread
From: Christian Pernegger @ 2019-10-21 10:47 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

[Please CC me, I'm not on the list.]

Am So., 20. Okt. 2019 um 12:28 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> > Question: Can I work with the mounted backup image on the machine that
> > also contains the original disc? I vaguely recall something about
> > btrfs really not liking clones.
>
> If your fs only contains one device (single fs on single device), then
> you should be mostly fine. [...] mostly OK.

Should? Mostly? What a nightmare-inducing, yet pleasantly Adams-esqe
way of putting things ... :-)

Anyway, I have an image of the whole disk on a server now and am
feeling all the more adventurous for it. (The first try failed a
couple of MB from completion due to spurious network issues, which is
why I've taken so long to reply.)

> > You wouldn't happen to know of a [suitable] bootable rescue image [...]?
>
> Archlinux iso at least has the latest btrfs-progs.

I'm on the Ubuntu 19.10 live CD (btrfs-progs 5.2.1, kernel 5.3.0)
until further notice. Exploring other options (incl. running your
rescue kernel on another machine and serving the disk via nbd) in
parallel.

> I'd recommend the following safer methods before trying --init-extent-tree:
>
> - Dump backup roots first:
>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>   Then grab all big numbers.

# btrfs inspect-internal dump-super -f /dev/nvme0n1p2 | grep backup_tree_root
backup_tree_root:    284041969664    gen: 58600    level: 1
backup_tree_root:    284041953280    gen: 58601    level: 1
backup_tree_root:    284042706944    gen: 58602    level: 1
backup_tree_root:    284045410304    gen: 58603    level: 1

> - Try backup_extent_root numbers in btrfs check first
>   # btrfs check -r <above big number> <dev>
>   Use the number with highest generation first.

Assuming backup_extent_root == backup_tree_root ...

# btrfs check --tree-root 284045410304 /dev/nvme0n1p2
Opening filesystem to check...
checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
ERROR: cannot open file system

# btrfs check --tree-root 284042706944 /dev/nvme0n1p2
Opening filesystem to check...
checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
bad tree block 284042706944, bytenr mismatch, want=284042706944, have=0
Couldn't read tree root
ERROR: cannot open file system

# btrfs check --tree-root 284041953280 /dev/nvme0n1p2
Opening filesystem to check...
checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
bad tree block 284041953280, bytenr mismatch, want=284041953280, have=0
Couldn't read tree root
ERROR: cannot open file system

# btrfs check --tree-root 284041969664 /dev/nvme0n1p2
Opening filesystem to check...
checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
bad tree block 284041969664, bytenr mismatch, want=284041969664, have=0
Couldn't read tree root
ERROR: cannot open file system

>   If all backup fails to pass basic btrfs check, and all happen to have
>   the same "wanted 00000000" then it means a big range of tree blocks
>   get wiped out, not really related to btrfs but some hardware wipe.

Doesn't look good, does it? Any further ideas at all or is this the
end of the line? TBH, at this point, I don't mind having to re-install
the box so much as the idea that the same thing might happen again --
either to this one, or to my work machine, which is very similar. If
nothing else, I'd really appreciate knowing what exactly happened here
and why -- a bug in the GPU and/or its driver shouldn't cause this --;
and an avoidance strategy that goes beyond-upgrade-and-pray.

Cheers,
Christian

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 10:47           ` Christian Pernegger
@ 2019-10-21 10:55             ` Qu Wenruo
  2019-10-21 11:47             ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2019-10-21 10:55 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4815 bytes --]



On 2019/10/21 下午6:47, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am So., 20. Okt. 2019 um 12:28 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> Question: Can I work with the mounted backup image on the machine that
>>> also contains the original disc? I vaguely recall something about
>>> btrfs really not liking clones.
>>
>> If your fs only contains one device (single fs on single device), then
>> you should be mostly fine. [...] mostly OK.
> 
> Should? Mostly? What a nightmare-inducing, yet pleasantly Adams-esqe
> way of putting things ... :-)
> 
> Anyway, I have an image of the whole disk on a server now and am
> feeling all the more adventurous for it. (The first try failed a
> couple of MB from completion due to spurious network issues, which is
> why I've taken so long to reply.)
> 
>>> You wouldn't happen to know of a [suitable] bootable rescue image [...]?
>>
>> Archlinux iso at least has the latest btrfs-progs.
> 
> I'm on the Ubuntu 19.10 live CD (btrfs-progs 5.2.1, kernel 5.3.0)
> until further notice. Exploring other options (incl. running your
> rescue kernel on another machine and serving the disk via nbd) in
> parallel.
> 
>> I'd recommend the following safer methods before trying --init-extent-tree:
>>
>> - Dump backup roots first:
>>   # btrfs ins dump-super -f <dev> | grep backup_treee_root
>>   Then grab all big numbers.
> 
> # btrfs inspect-internal dump-super -f /dev/nvme0n1p2 | grep backup_tree_root
> backup_tree_root:    284041969664    gen: 58600    level: 1
> backup_tree_root:    284041953280    gen: 58601    level: 1
> backup_tree_root:    284042706944    gen: 58602    level: 1
> backup_tree_root:    284045410304    gen: 58603    level: 1
> 
>> - Try backup_extent_root numbers in btrfs check first
>>   # btrfs check -r <above big number> <dev>
>>   Use the number with highest generation first.
> 
> Assuming backup_extent_root == backup_tree_root ...
> 
> # btrfs check --tree-root 284045410304 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284042706944 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> bad tree block 284042706944, bytenr mismatch, want=284042706944, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041953280 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> bad tree block 284041953280, bytenr mismatch, want=284041953280, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041969664 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> bad tree block 284041969664, bytenr mismatch, want=284041969664, have=0
> Couldn't read tree root
> ERROR: cannot open file system

This doesn't look good at all.

All 4 copies are wiped out, so it doesn't look like a bug in btrfs, but
some other problem wiping out a full range of tree blocks.

> 
>>   If all backup fails to pass basic btrfs check, and all happen to have
>>   the same "wanted 00000000" then it means a big range of tree blocks
>>   get wiped out, not really related to btrfs but some hardware wipe.
> 
> Doesn't look good, does it? Any further ideas at all or is this the
> end of the line? TBH, at this point, I don't mind having to re-install
> the box so much as the idea that the same thing might happen again --

I don't have good idea. The result looks like something have wiped part
of your tree blocks (not a single one, but a range).

> either to this one, or to my work machine, which is very similar. If
> nothing else, I'd really appreciate knowing what exactly happened here
> and why -- a bug in the GPU and/or its driver shouldn't cause this --;
> and an avoidance strategy that goes beyond-upgrade-and-pray.

At this stage, I'm sorry that I have no idea at all.

If you're 100% sure that you haven't enabled discard for a while, then I
guess it doesn't look like btrfs at least.
Btrfs shouldn't cause so many tree blocks wiped at all, even for v5.0
kernel.

Thanks,
Qu

> 
> Cheers,
> Christian
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 10:47           ` Christian Pernegger
  2019-10-21 10:55             ` Qu Wenruo
@ 2019-10-21 11:47             ` Austin S. Hemmelgarn
  2019-10-21 13:02               ` Christian Pernegger
  1 sibling, 1 reply; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-21 11:47 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Qu Wenruo, linux-btrfs

On 2019-10-21 06:47, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am So., 20. Okt. 2019 um 12:28 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>> Question: Can I work with the mounted backup image on the machine that
>>> also contains the original disc? I vaguely recall something about
>>> btrfs really not liking clones.
>>
>> If your fs only contains one device (single fs on single device), then
>> you should be mostly fine. [...] mostly OK.
> 
> Should? Mostly? What a nightmare-inducing, yet pleasantly Adams-esqe
> way of putting things ... :-)
> 
> Anyway, I have an image of the whole disk on a server now and am
> feeling all the more adventurous for it. (The first try failed a
> couple of MB from completion due to spurious network issues, which is
> why I've taken so long to reply.)
I've done stuff like this dozens of times on single-device volumes with 
exactly zero issues.  The only time you're likely to see problems is if 
the kernel thinks (either correctly or incorrectly) that the volume 
should consist of multiple devices.

Ultimately, the issue is that the kernel tries to use all devices it 
knows of with the same volume UUID when you mount the volume, without 
validating the number of devices and that there are no duplicate device 
UUID's in the volume, so it can accidentally pull in multiple instances 
of the same 'device' when mounting.
> 
>>> You wouldn't happen to know of a [suitable] bootable rescue image [...]?
>>
>> Archlinux iso at least has the latest btrfs-progs.
> 
> I'm on the Ubuntu 19.10 live CD (btrfs-progs 5.2.1, kernel 5.3.0)
> until further notice. Exploring other options (incl. running your
> rescue kernel on another machine and serving the disk via nbd) in
> parallel.
> 
>> I'd recommend the following safer methods before trying --init-extent-tree:
>>
>> - Dump backup roots first:
>>    # btrfs ins dump-super -f <dev> | grep backup_treee_root
>>    Then grab all big numbers.
> 
> # btrfs inspect-internal dump-super -f /dev/nvme0n1p2 | grep backup_tree_root
> backup_tree_root:    284041969664    gen: 58600    level: 1
> backup_tree_root:    284041953280    gen: 58601    level: 1
> backup_tree_root:    284042706944    gen: 58602    level: 1
> backup_tree_root:    284045410304    gen: 58603    level: 1
> 
>> - Try backup_extent_root numbers in btrfs check first
>>    # btrfs check -r <above big number> <dev>
>>    Use the number with highest generation first.
> 
> Assuming backup_extent_root == backup_tree_root ...
> 
> # btrfs check --tree-root 284045410304 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284042706944 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284042706944 found E4E3BDB6 wanted 00000000
> bad tree block 284042706944, bytenr mismatch, want=284042706944, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041953280 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041953280 found E4E3BDB6 wanted 00000000
> bad tree block 284041953280, bytenr mismatch, want=284041953280, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> # btrfs check --tree-root 284041969664 /dev/nvme0n1p2
> Opening filesystem to check...
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041969664 found E4E3BDB6 wanted 00000000
> bad tree block 284041969664, bytenr mismatch, want=284041969664, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
>>    If all backup fails to pass basic btrfs check, and all happen to have
>>    the same "wanted 00000000" then it means a big range of tree blocks
>>    get wiped out, not really related to btrfs but some hardware wipe.
> 
> Doesn't look good, does it? Any further ideas at all or is this the
> end of the line? TBH, at this point, I don't mind having to re-install
> the box so much as the idea that the same thing might happen again --
> either to this one, or to my work machine, which is very similar. If
> nothing else, I'd really appreciate knowing what exactly happened here
> and why -- a bug in the GPU and/or its driver shouldn't cause this --;
> and an avoidance strategy that goes beyond-upgrade-and-pray.
There are actually two possible ways I can think of a buggy GPU driver 
causing this type of issue:

* The GPU driver in some way caused memory corruption, which in turn 
caused other problems.
* The GPU driver confused the GPU enough that it issued a P2P transfer 
on the PCI-e bus to the NVMe device, which in turn caused data 
corruption on the NVMe device.

Both are reasonably unlikely, but definitely possible.  Your best option 
for mitigation (other than just not using that version of that GPU 
driver) is to ensure that your hardware has an IOMMU (as long as it's 
not a super-cheap CPU or MB, and both are relatively recent, you 
_should_ have one) and ensure it's enabled in firmware (on Intel 
platforms, it's usually labeled as 'VT-d' in firmware configuration, AMD 
platforms typically just call it an IOMMU).

However, there's also the possibility that you may have hardware issues. 
  Any of your RAM, PSU, MB, or CPU being bad could easily cause both the 
data corruption you're seeing as well as the GPU issues, so I'd suggest 
double checking your hardware if you haven't already.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 11:47             ` Austin S. Hemmelgarn
@ 2019-10-21 13:02               ` Christian Pernegger
  2019-10-21 13:34                 ` Qu Wenruo
  2019-10-21 14:02                 ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 29+ messages in thread
From: Christian Pernegger @ 2019-10-21 13:02 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Qu Wenruo, linux-btrfs

[Please CC me, I'm not on the list.]

Am Mo., 21. Okt. 2019 um 13:47 Uhr schrieb Austin S. Hemmelgarn
<ahferroin7@gmail.com>:
> I've [worked with fs clones] like this dozens of times on single-device volumes with exactly zero issues.

Thank you, I have taken precautions, but it does seem to work fine.

> There are actually two possible ways I can think of a buggy GPU driver causing this type of issue: [snip]

Interesting and plausible, but ...

> Your best option for mitigation [...] is to ensure that your hardware has an IOMMU [...] and ensure it's enabled in firmware.

It has and it is. (The machine's been specced so GPU pass-through is
an option, should it be required. I haven't gotten around to setting
that up yet, haven't even gotten a second GPU, but I have laid the
groundwork, the IOMMU is enabled and, as far as one can tell from logs
and such, working.)

> However, there's also the possibility that you may have hardware issues.

Don't I know it ... The problem is, if there are hardware issues,
that's the first I've seen of them, and while I didn't run torture
tests, there was quite a lot of benchmarking when it was new. Needle
in a haystack. Some memory testing can't hurt, I suppose. Any other
ideas (for hardware testing)?

Back on the topic of TRIM: I'm 99 % certain discard wasn't set on the
mount (not by me, in any case), but I think Mint runs fstrim
periodically by default. Just to be sure, should any form of TRIM be
disabled?
The only other idea I've got is Timeshift's hourly snapshots. (How)
would btrfs deal with a crash during snapshot creation?

In other news, I've still not quite given up, mainly because the fs
doesn't look all that broken. The output of btrfs inspect-internal
dump-tree (incl. options), for instance, looks like gibberish to me of
course, but it looks sane, doesn't spew warnings, doesn't error out or
crash. Also plain btrfs check --init-extent-tree errored out, same
with -s0, but with -s1 it's now chugging along. (BTW, is there a
hierarchy among the super block slots, a best or newest one?)

Will keep you posted.

Cheers,
C.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 13:02               ` Christian Pernegger
@ 2019-10-21 13:34                 ` Qu Wenruo
  2019-10-22 22:56                   ` Christian Pernegger
  2019-10-21 14:02                 ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-21 13:34 UTC (permalink / raw)
  To: Christian Pernegger, Austin S. Hemmelgarn; +Cc: linux-btrfs

[-- Attachment #1.1: Type: text/plain, Size: 3808 bytes --]

On 2019/10/21 下午9:02, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am Mo., 21. Okt. 2019 um 13:47 Uhr schrieb Austin S. Hemmelgarn
> <ahferroin7@gmail.com>:
>> I've [worked with fs clones] like this dozens of times on single-device volumes with exactly zero issues.
> 
> Thank you, I have taken precautions, but it does seem to work fine.
> 
>> There are actually two possible ways I can think of a buggy GPU driver causing this type of issue: [snip]
> 
> Interesting and plausible, but ...
> 
>> Your best option for mitigation [...] is to ensure that your hardware has an IOMMU [...] and ensure it's enabled in firmware.
> 
> It has and it is. (The machine's been specced so GPU pass-through is
> an option, should it be required. I haven't gotten around to setting
> that up yet, haven't even gotten a second GPU, but I have laid the
> groundwork, the IOMMU is enabled and, as far as one can tell from logs
> and such, working.)
> 
>> However, there's also the possibility that you may have hardware issues.
> 
> Don't I know it ... The problem is, if there are hardware issues,
> that's the first I've seen of them, and while I didn't run torture
> tests, there was quite a lot of benchmarking when it was new. Needle
> in a haystack. Some memory testing can't hurt, I suppose. Any other
> ideas (for hardware testing)?
> 
> Back on the topic of TRIM: I'm 99 % certain discard wasn't set on the
> mount (not by me, in any case), but I think Mint runs fstrim
> periodically by default.

Oh, that explains why only one root (the current generation one) is not
all zero.

Then it should be a false alert, just fstrim wiped some old tree blocks.
But maybe it's some unfortunate race, that fstrim trimmed some tree
blocks still in use.

This means, it's a bug of btrfs.
However I can't find a bug fix during v5.0..v5.3 related to trim.
(Only some v5.2 regression and its fixes)

So it may be a hidden bug.

> Just to be sure, should any form of TRIM be
> disabled?

Not exactly.

There are some thing that is completely safe to trim, the unallocated space.
And there may be something tricky to trim, tree blocks. (the bug you hit)

One good compromise is, only trim unallocated space.

Then you need to pass something parameter for fstrim.
Like -o 0 -l 1M.

Newer kernel would try to trim block groups in that range, and modern
btrfs has no block groups in that range, then fstrim will go trim all
unallocated space them.
So with that options, fstrim should be safe.

BTW, as you can found already, trimmed blocks can make recovery
trickier, as old tree blocks are just trimmed, no way to rely on trimmed
data.

> The only other idea I've got is Timeshift's hourly snapshots. (How)
> would btrfs deal with a crash during snapshot creation?

In theory, btrfs has transaction and tree block CoW to take care of
everything. So no matter when the crash happens, it should be safe.
But in real world, you know life is always hard...

> 
> 
> In other news, I've still not quite given up, mainly because the fs
> doesn't look all that broken. The output of btrfs inspect-internal
> dump-tree (incl. options), for instance, looks like gibberish to me of
> course, but it looks sane, doesn't spew warnings, doesn't error out or
> crash. Also plain btrfs check --init-extent-tree errored out, same
> with -s0, but with -s1 it's now chugging along. (BTW, is there a
> hierarchy among the super block slots, a best or newest one?)

As your corruption is only in extent tree.
With my patchset, you should be able to mount it, so it's not that
screwed up.

But extent tree update is really somehow trickier than I thought.

Thanks,
Qu

> 
> Will keep you posted.
> 
> Cheers,
> C.
> 

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 13:34                 ` Qu Wenruo
@ 2019-10-22 22:56                   ` Christian Pernegger
  2019-10-23  0:25                     ` Qu Wenruo
  2019-10-23 11:31                     ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 29+ messages in thread
From: Christian Pernegger @ 2019-10-22 22:56 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S. Hemmelgarn, linux-btrfs

[Please CC me, I'm not on the list.]

Am Mo., 21. Okt. 2019 um 15:34 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> [...] just fstrim wiped some old tree blocks. But maybe it's some unfortunate race, that fstrim trimmed some tree blocks still in use.

Forgive me for asking, but assuming that's what happened, why are the
backup blocks "not in use" from fstrim's perspective in the first
place? I'd consider backup (meta)data to be valuable payload data,
something to be stored extra carefully. No use making them if they're
no goo when you need them, after all. In other words, does fstrim by
default trim btrfs metadata (in which case fstrim's broken) or does
btrfs in effect store backup data in "unused" space (in which case
btrfs is broken)?

> [...] One good compromise is, only trim unallocated space.

It had never occurred to me that anything would purposely try to trim
allocated space ...

> As your corruption is only in extent tree. With my patchset, you should be able to mount it, so it's not that screwed up.

To be clear, we're talking data recovery, not (progress towards) fs
repair, even if I manage to boot your rescue patchset?

A few more random observations from playing with the drive image:
$ btrfs check --init-extent-tree patient
Opening filesystem to check...
Checking filesystem on patient
UUID: c2bd83d6-2261-47bb-8d18-5aba949651d7
repair mode will force to clear out log tree, are you sure? [y/N]: y
ERROR: Corrupted fs, no valid METADATA block group found
ERROR: failed to zero log tree: -117
ERROR: attempt to start transaction over already running one
# rollback

$ btrfs rescue zero-log patient
checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
ERROR: could not open ctree
# rollback

# hm, super 0 has log_root 284056535040, super 1 and 2 have log_root 0 ...
$ btrfs check -s1 --init-extent-tree patient
[...]
ERROR: errors found in fs roots
No device size related problem found
cache and super generation don't match, space cache will be invalidated
found 431478808576 bytes used, error(s) found
total csum bytes: 417926772
total tree bytes: 2203549696
total fs tree bytes: 1754415104
total extent tree bytes: 49152
btree space waste bytes: 382829965
file data blocks allocated: 1591388033024
 referenced 539237134336

That ran a good while, generating a couple of hundred MB of output
(available on request, of course). In any case, it didn't help.

$ ~/local/bin/btrfs check -s1 --repair patient
using SB copy 1, bytenr 67108864
enabling repair mode
Opening filesystem to check...
checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
Csum didn't match
ERROR: cannot open file system

I don't suppose the roots found by btrfs-find-root and/or subvolumes
identified by btrfs restore -l would be any help? It's not like the
real fs root contained anything, just @ [/], @home [/home], and the
Timeshift subvolumes. If btrfs restore -D is to be believed, the
casualties under @home, for example, are inconsequential, caches and
the like, stuff that was likely open for writing at the time.

I don't know, it just seems strange that with all the (meta)data
that's obviously still there, it shouldn't be possible to restore the
fs to some sort of consistent state.

Good night,
Christian

>
> But extent tree update is really somehow trickier than I thought.
>
> Thanks,
> Qu
>
> >
> > Will keep you posted.
> >
> > Cheers,
> > C.
> >
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-22 22:56                   ` Christian Pernegger
@ 2019-10-23  0:25                     ` Qu Wenruo
  2019-10-23 11:31                     ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2019-10-23  0:25 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Austin S. Hemmelgarn, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5418 bytes --]



On 2019/10/23 上午6:56, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am Mo., 21. Okt. 2019 um 15:34 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> [...] just fstrim wiped some old tree blocks. But maybe it's some unfortunate race, that fstrim trimmed some tree blocks still in use.
> 
> Forgive me for asking, but assuming that's what happened, why are the
> backup blocks "not in use" from fstrim's perspective in the first
> place? I'd consider backup (meta)data to be valuable payload data,
> something to be stored extra carefully. No use making them if they're
> no goo when you need them, after all. In other words, does fstrim by
> default trim btrfs metadata (in which case fstrim's broken) or does
> btrfs in effect store backup data in "unused" space (in which case
> btrfs is broken)?

Even backup roots are not trimmed, they has no use, it's just a pointer
to older tree blocks.
The older tree blocks are still trimmed, since they are not in use.

Btrfs has its protection of not trimming tree blocks in use, but I don't
know why it doesn't work.

BTW, to make it clear, here "used" block group just means it has space
being used.
Not means all its space is being used.
Trimming "used" block group is only trimming the unused space (like all
other fs).

And to your last question, yes, backup roots are in unused space, thus
they get trimmed.
But the timing is, only after current transaction is fully committed, to
ensure crash won't cause any problem.
(all in theory though)

> 
>> [...] One good compromise is, only trim unallocated space.
> 
> It had never occurred to me that anything would purposely try to trim
> allocated space ...
> 
>> As your corruption is only in extent tree. With my patchset, you should be able to mount it, so it's not that screwed up.
> 
> To be clear, we're talking data recovery, not (progress towards) fs
> repair, even if I manage to boot your rescue patchset?

Then you should have all your fs accessible, although only read-only.
(Isn't that obvious since the skipbg mount option is under rescue= group?)

Btrfs-progs won't really help here, as it just like kernel, needs to
read extent tree to go on.
But in fact, extent tree is only needed for write operations.

That's exactly what my patchset is doing, skip extent tree completely
for rescue=skipbg mount option.

> 
> A few more random observations from playing with the drive image:
> $ btrfs check --init-extent-tree patient
> Opening filesystem to check...
> Checking filesystem on patient
> UUID: c2bd83d6-2261-47bb-8d18-5aba949651d7
> repair mode will force to clear out log tree, are you sure? [y/N]: y
> ERROR: Corrupted fs, no valid METADATA block group found
> ERROR: failed to zero log tree: -117
> ERROR: attempt to start transaction over already running one
> # rollback
> 
> $ btrfs rescue zero-log patient
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> ERROR: could not open ctree
> # rollback
> 
> # hm, super 0 has log_root 284056535040, super 1 and 2 have log_root 0 ...
> $ btrfs check -s1 --init-extent-tree patient
> [...]
> ERROR: errors found in fs roots
> No device size related problem found
> cache and super generation don't match, space cache will be invalidated
> found 431478808576 bytes used, error(s) found
> total csum bytes: 417926772
> total tree bytes: 2203549696
> total fs tree bytes: 1754415104
> total extent tree bytes: 49152
> btree space waste bytes: 382829965
> file data blocks allocated: 1591388033024
>  referenced 539237134336
> 
> That ran a good while, generating a couple of hundred MB of output
> (available on request, of course). In any case, it didn't help.
> 
> $ ~/local/bin/btrfs check -s1 --repair patient
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Opening filesystem to check...
> checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
> checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
> Csum didn't match
> ERROR: cannot open file system
> 
> I don't suppose the roots found by btrfs-find-root and/or subvolumes
> identified by btrfs restore -l would be any help?

No help at all, especially for trimmed fs.

> It's not like the
> real fs root contained anything, just @ [/], @home [/home], and the
> Timeshift subvolumes. If btrfs restore -D is to be believed, the
> casualties under @home, for example, are inconsequential, caches and
> the like, stuff that was likely open for writing at the time.

btrfs restore is the skip_bg equivalent in btrfs-progs.
It doesn't read extent tree at all, purely use fs trees to read the data.

The only disadvantage is, you can't access the fs like regular fs, but
only through btrfs restore.

Thanks,
Qu
> 
> I don't know, it just seems strange that with all the (meta)data
> that's obviously still there, it shouldn't be possible to restore the
> fs to some sort of consistent state.
> 
> Good night,
> Christian
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>>
>> But extent tree update is really somehow trickier than I thought.
>>
>> Thanks,
>> Qu
>>
>>>
>>> Will keep you posted.
>>>
>>> Cheers,
>>> C.
>>>
>>


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-22 22:56                   ` Christian Pernegger
  2019-10-23  0:25                     ` Qu Wenruo
@ 2019-10-23 11:31                     ` Austin S. Hemmelgarn
  2019-10-24 10:41                       ` Christian Pernegger
  1 sibling, 1 reply; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-23 11:31 UTC (permalink / raw)
  To: Christian Pernegger, Qu Wenruo; +Cc: linux-btrfs

On 2019-10-22 18:56, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am Mo., 21. Okt. 2019 um 15:34 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> [...] just fstrim wiped some old tree blocks. But maybe it's some unfortunate race, that fstrim trimmed some tree blocks still in use.
> 
> Forgive me for asking, but assuming that's what happened, why are the
> backup blocks "not in use" from fstrim's perspective in the first
> place? I'd consider backup (meta)data to be valuable payload data,
> something to be stored extra carefully. No use making them if they're
> no goo when you need them, after all. In other words, does fstrim by
> default trim btrfs metadata (in which case fstrim's broken) or does
> btrfs in effect store backup data in "unused" space (in which case
> btrfs is broken)?
Because they aren't in use unless you've mounted the volume using them. 
BTRFS doesn't go out of it's way to get rid of them, but it really isn't 
using them either once the active tree is fully committed.

Note, however, that you're not guaranteed to have working backup 
metadata trees even if you aren't using TRIM, because BTRFS _will_ 
overwrite them eventually, and that might happen as soon as BTRFS starts 
preparing the next commit.

There has been some discussion about how to deal with this sanely, but 
AFAIK, it hasn't produced any patches yet.
> 
>> [...] One good compromise is, only trim unallocated space.
> 
> It had never occurred to me that anything would purposely try to trim
> allocated space ...
I believe Qu is referring specifically to space not allocated at the 
chunk level, not at the block level.  Nothing should be discarding space 
that's allocated at the block level right now, but the current 
implementation will discard space within chunks that is not allocated at 
the block level, which may include old metadata trees.
> 
>> As your corruption is only in extent tree. With my patchset, you should be able to mount it, so it's not that screwed up.
> 
> To be clear, we're talking data recovery, not (progress towards) fs
> repair, even if I manage to boot your rescue patchset?
> 
> A few more random observations from playing with the drive image:
> $ btrfs check --init-extent-tree patient
> Opening filesystem to check...
> Checking filesystem on patient
> UUID: c2bd83d6-2261-47bb-8d18-5aba949651d7
> repair mode will force to clear out log tree, are you sure? [y/N]: y
> ERROR: Corrupted fs, no valid METADATA block group found
> ERROR: failed to zero log tree: -117
> ERROR: attempt to start transaction over already running one
> # rollback
> 
> $ btrfs rescue zero-log patient
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> ERROR: could not open ctree
> # rollback
> 
> # hm, super 0 has log_root 284056535040, super 1 and 2 have log_root 0 ...
> $ btrfs check -s1 --init-extent-tree patient
> [...]
> ERROR: errors found in fs roots
> No device size related problem found
> cache and super generation don't match, space cache will be invalidated
> found 431478808576 bytes used, error(s) found
> total csum bytes: 417926772
> total tree bytes: 2203549696
> total fs tree bytes: 1754415104
> total extent tree bytes: 49152
> btree space waste bytes: 382829965
> file data blocks allocated: 1591388033024
>   referenced 539237134336
> 
> That ran a good while, generating a couple of hundred MB of output
> (available on request, of course). In any case, it didn't help.
> 
> $ ~/local/bin/btrfs check -s1 --repair patient
> using SB copy 1, bytenr 67108864
> enabling repair mode
> Opening filesystem to check...
> checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
> checksum verify failed on 427311104 found 000000C8 wanted FFFFFF99
> Csum didn't match
> ERROR: cannot open file system
> 
> I don't suppose the roots found by btrfs-find-root and/or subvolumes
> identified by btrfs restore -l would be any help? It's not like the
> real fs root contained anything, just @ [/], @home [/home], and the
> Timeshift subvolumes. If btrfs restore -D is to be believed, the
> casualties under @home, for example, are inconsequential, caches and
> the like, stuff that was likely open for writing at the time.
> 
> I don't know, it just seems strange that with all the (meta)data
> that's obviously still there, it shouldn't be possible to restore the
> fs to some sort of consistent state.
Not all metadata is created equally...

Losing the extent tree shouldn't break things this bad in most cases, 
but there are certain parts of the metadata that if lost mean you've got 
a dead FS with no way to rebuild (the chunk tree for example).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-23 11:31                     ` Austin S. Hemmelgarn
@ 2019-10-24 10:41                       ` Christian Pernegger
  2019-10-24 11:26                         ` Qu Wenruo
  2019-10-24 11:40                         ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 29+ messages in thread
From: Christian Pernegger @ 2019-10-24 10:41 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Qu Wenruo; +Cc: linux-btrfs

I must admit, this discussion is going (technical) places I don't know
anything about, and much as I enjoy learning things, I'd rather not
waste your time (go make btrfs better! :-p). When all is said and done
I'm just a user. I still don't understand how (barring creatively
defective hardware, which is of course always in the cards) a crash
that looked comparatively benign could lead to an fs that's not only
unmountable but unfixable; how metadata that's effectively a single
point of failure could not have backup copies designed in that are
neither stale nor left to the elements, seems awfully fragile -- but I
can accept it. Repair is out.

Recovery it is, then. I'd like to try and build this rescue branch of
yours. Does it have to be the whole thing, or can btrfs alone be built
against the headers of the distro kernel somehow, or can the distro
kernel source be patched with the rescue stuff? Git wasn't a thing the
last time I played with kernels, a shove in the right direction would
be appreciated.

Relapse prevention. "Update everything and pray it's either been fixed
or at least isn't triggered any more" isn't all to
confidence-inspiring. Desktop computers running remotely current
software will crash from time to time, after all, if not amdgpu then
something else. At which point we're back at "a crash shouldn't have
caused this". If excerpts from the damaged image are any help in
finding the actual issue, I can keep it around for a while.

Disaster recovery. What do people use to quickly get back up and
running from bare metal that integrates well with btrfs (and is
suitable just for a handful of machines)?

Cheers,
C.

P.S.: MemTest86 hasn't found anything in (as yet) 6 passes, nothing
glaringly wrong with the RAM.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-24 10:41                       ` Christian Pernegger
@ 2019-10-24 11:26                         ` Qu Wenruo
  2019-10-24 11:40                         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2019-10-24 11:26 UTC (permalink / raw)
  To: Christian Pernegger, Austin S. Hemmelgarn; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3398 bytes --]



On 2019/10/24 下午6:41, Christian Pernegger wrote:
> I must admit, this discussion is going (technical) places I don't know
> anything about, and much as I enjoy learning things, I'd rather not
> waste your time (go make btrfs better! :-p). When all is said and done
> I'm just a user. I still don't understand how (barring creatively
> defective hardware, which is of course always in the cards) a crash
> that looked comparatively benign could lead to an fs that's not only
> unmountable but unfixable; how metadata that's effectively a single
> point of failure could not have backup copies designed in that are
> neither stale nor left to the elements, seems awfully fragile -- but I
> can accept it. Repair is out.
> 
> Recovery it is, then. I'd like to try and build this rescue branch of
> yours. Does it have to be the whole thing, or can btrfs alone be built
> against the headers of the distro kernel somehow, or can the distro
> kernel source be patched with the rescue stuff? Git wasn't a thing the
> last time I played with kernels, a shove in the right direction would
> be appreciated.

Since you're using v5.0 kernel, it's pretty hard to just compile the
btrfs module.
As there are 3 kernel updates between them.

Before compiling the kernel, you need a working toolchain.
Please refer to your distro (you'll see this line for a lot of times)

For Archlinux example, you need:
# pacman -S base-devel bc ncurse

I'd recommend to the following ways to compile the kernel:
$ cd kernel-src/
$ make localmodeconfig
$ make -j12

This would  compile the kernel, with all your current loaded kernel
compiled as module.
Then you need to copy the kernel, install the modules, and the most
important part, generate initramfs, then guide your boot loader to the
new kernel.

# cp arch/x86/boot/bzImage /boot/vmlinuz-new
# make modules_install

For initramfs creation, you need to refer to your distro.
I can only give you an example about Archlinux:

# cat > /etc/mkinitcpio.d/custom.preset <<EOF
ALL_config="/etc/mkinitcpio.conf"
ALL_kver="/boot/vmlinuz-custom"

PRESETS=('default')

#default_config="/etc/mkinitcpio.conf"
default_image="/boot/initramfs-custom.img"
EOF

# mkinitcpio -p custom

For bootloader, also please refer to your distro.
But I guess it's less a problem than compiling the kernel.

Then you can boot into the new kernel, then try mount it with -o
"resuce=skip_bg,ro".

And record the dmesg, if anything went wrong.

> 
> Relapse prevention. "Update everything and pray it's either been fixed
> or at least isn't triggered any more" isn't all to
> confidence-inspiring. Desktop computers running remotely current
> software will crash from time to time, after all, if not amdgpu then
> something else. At which point we're back at "a crash shouldn't have
> caused this". If excerpts from the damaged image are any help in
> finding the actual issue, I can keep it around for a while.
> 
> Disaster recovery. What do people use to quickly get back up and
> running from bare metal that integrates well with btrfs (and is
> suitable just for a handful of machines)?
> 
> Cheers,
> C.
> 
> P.S.: MemTest86 hasn't found anything in (as yet) 6 passes, nothing
> glaringly wrong with the RAM.

This doesn't look like a RAM corruption at all. So don't bother that.

Thanks,
Qu
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-24 10:41                       ` Christian Pernegger
  2019-10-24 11:26                         ` Qu Wenruo
@ 2019-10-24 11:40                         ` Austin S. Hemmelgarn
  2019-10-25 16:43                           ` Christian Pernegger
  1 sibling, 1 reply; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-24 11:40 UTC (permalink / raw)
  To: Christian Pernegger, Qu Wenruo; +Cc: linux-btrfs

On 2019-10-24 06:41, Christian Pernegger wrote:
> I must admit, this discussion is going (technical) places I don't know
> anything about, and much as I enjoy learning things, I'd rather not
> waste your time (go make btrfs better! :-p). When all is said and done
> I'm just a user. I still don't understand how (barring creatively
> defective hardware, which is of course always in the cards) a crash
> that looked comparatively benign could lead to an fs that's not only
> unmountable but unfixable; how metadata that's effectively a single
> point of failure could not have backup copies designed in that are
> neither stale nor left to the elements, seems awfully fragile -- but I
> can accept it. Repair is out.
> 
> Recovery it is, then. I'd like to try and build this rescue branch of
> yours. Does it have to be the whole thing, or can btrfs alone be built
> against the headers of the distro kernel somehow, or can the distro
> kernel source be patched with the rescue stuff? Git wasn't a thing the
> last time I played with kernels, a shove in the right direction would
> be appreciated.
Trying to build the module by itself against your existing kernel is 
likely to not work, it's technically possible, but you really need to 
know what you're doing for it to have any chance of working.

Your best option is probably to just pull down a copy of the repository 
and build that as-is.  Most distros don't strictly depend on any 
specific kernel patches, and I'm fairly certain that Mint isn't doing 
anything weird here, so unless you need specific third-party kernel 
modules, you shouldn't have any issues.
> 
> Relapse prevention. "Update everything and pray it's either been fixed
> or at least isn't triggered any more" isn't all to
> confidence-inspiring. Desktop computers running remotely current
> software will crash from time to time, after all, if not amdgpu then
> something else. At which point we're back at "a crash shouldn't have
> caused this". If excerpts from the damaged image are any help in
> finding the actual issue, I can keep it around for a while.
> 
> Disaster recovery. What do people use to quickly get back up and
> running from bare metal that integrates well with btrfs (and is
> suitable just for a handful of machines)?
Backups, flavor of your choice.  I used to use AMANDA, but have recently 
become a fan of borgbackup (other than it's lack of parallelization 
support, it's way more efficient than most other options I've tried, and 
it's dead simple to set up). I store enough extra info in the backup to 
be able to rebuild the storage stack by hand from a rescue environment 
(I usually use SystemRescueCD, but any live environment where you can 
get your backup software working and rebuild your storage stack will work).

The trick here is not to ask 'what integrates well with BTRFS', but 
instead 'what doesn't care at all what filesystem I'm running on', and 
then find something that works for you to replicate whatever special 
layout requirements you have.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-24 11:40                         ` Austin S. Hemmelgarn
@ 2019-10-25 16:43                           ` Christian Pernegger
  2019-10-25 17:05                             ` Christian Pernegger
                                               ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Christian Pernegger @ 2019-10-25 16:43 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Qu Wenruo, linux-btrfs

Am Do., 24. Okt. 2019 um 13:26 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> Since you're using v5.0 kernel, it's pretty hard to just compile the btrfs module.

Oh, I'm not married to 5.0, it's just that I'd prefer building a
kernel package, so it integrates properly with the system. For
posterity, on Linux Mint 19.2:

# clone repo
$ git clone https://github.com/adam900710/linux.git

# check commit history
(https://github.com/adam900710/linux/commits/rescue_options) for the
release the rescue_options branch is based on =>  5.3-rc7
# download the Ubuntu patches matching that version (=>
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/) and apply
them to the rescue_options branch
$ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/0001-base-packaging.patch
[... repeat for the rest ...]
$ cd linux
$ git checkout rescue_options
$ patch -p1 <../0001-*.patch
[... repeat for the rest ...]

# set number of threads to use for compilation, optional & to taste
export DEB_BUILD_OPTIONS='parallel=16'

# build; could probably trim these targets down more
$ fakeroot debian/rules clean
$ fakeroot debian/rules do_mainline_build=true binary-headers
binary-generic binary-perarch
cd ..

# install
dpkg -i linux-image-*_amd64.deb linux-modules-*_amd64.deb linux-headers-*.deb

Ok then, let's do this:
> Then you can boot into the new kernel, then try mount it with -o
> "resuce=skip_bg,ro".

[  565.097058]  nbd0: p1 p2
[  565.192002] BTRFS: device fsid c2bd83d6-2261-47bb-8d18-5aba949651d7
devid 1 transid 58603 /dev/nbd0p2
[  568.490654]  nbd0: p1 p2
[  869.871598] BTRFS info (device dm-1): unrecognized rescue option 'skip_bg'
[  869.884644] BTRFS error (device dm-1): open_ctree failed

Hmm, glancing at the source I think it's "skipbg", no underscore(?)

[ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
[ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
[ 1350.402589] BTRFS info (device dm-1): has skinny extents
[ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
notreelog mount option for dirty log
[ 1350.419849] BTRFS error (device dm-1): open_ctree failed

Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
actually written with an underscore above.

[ 1399.169484] BTRFS info (device dm-1): disabling tree log
[ 1399.169487] BTRFS info (device dm-1): skip mount time block group searching
[ 1399.169488] BTRFS info (device dm-1): disk space caching is enabled
[ 1399.169488] BTRFS info (device dm-1): has skinny extents
[ 1399.319294] BTRFS info (device dm-1): enabling ssd optimizations
[ 1399.376181] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 1399.376185] #PF: supervisor write access in kernel mode
[ 1399.376186] #PF: error_code(0x0002) - not-present page
[ 1399.376187] PGD 0 P4D 0
[ 1399.376190] Oops: 0002 [#1] SMP NOPTI
[ 1399.376193] CPU: 10 PID: 3730 Comm: mount Not tainted
5.3.0-050300rc7-generic #201909021831
[ 1399.376194] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
MASTER/X570 AORUS MASTER, BIOS F7a 09/09/2019
[ 1399.376199] RIP: 0010:_raw_spin_lock+0x10/0x30
[ 1399.376201] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
1f 00
[ 1399.376202] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
[ 1399.376204] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1399.376205] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
[ 1399.376206] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
[ 1399.376207] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
[ 1399.376208] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
[ 1399.376210] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
knlGS:0000000000000000
[ 1399.376211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1399.376212] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
[ 1399.376213] Call Trace:
[ 1399.376239]  btrfs_reserve_metadata_bytes+0x51/0x9c0 [btrfs]
[ 1399.376241]  ? __switch_to_asm+0x40/0x70
[ 1399.376242]  ? __switch_to_asm+0x34/0x70
[ 1399.376243]  ? __switch_to_asm+0x40/0x70
[ 1399.376245]  ? __switch_to_asm+0x34/0x70
[ 1399.376246]  ? __switch_to_asm+0x40/0x70
[ 1399.376247]  ? __switch_to_asm+0x34/0x70
[ 1399.376248]  ? __switch_to_asm+0x40/0x70
[ 1399.376249]  ? __switch_to_asm+0x34/0x70
[ 1399.376250]  ? __switch_to_asm+0x40/0x70
[ 1399.376251]  ? __switch_to_asm+0x34/0x70
[ 1399.376252]  ? __switch_to_asm+0x40/0x70
[ 1399.376271]  btrfs_use_block_rsv+0xd0/0x180 [btrfs]
[ 1399.376286]  btrfs_alloc_tree_block+0x83/0x550 [btrfs]
[ 1399.376288]  ? __schedule+0x2b0/0x670
[ 1399.376302]  alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
[ 1399.376315]  __btrfs_cow_block+0x12f/0x590 [btrfs]
[ 1399.376329]  btrfs_cow_block+0xf0/0x1b0 [btrfs]
[ 1399.376342]  btrfs_search_slot+0x531/0xad0 [btrfs]
[ 1399.376356]  btrfs_insert_empty_items+0x71/0xc0 [btrfs]
[ 1399.376374]  overwrite_item+0xef/0x5e0 [btrfs]
[ 1399.376390]  replay_one_buffer+0x584/0x890 [btrfs]
[ 1399.376404]  walk_down_log_tree+0x192/0x420 [btrfs]
[ 1399.376419]  walk_log_tree+0xce/0x1f0 [btrfs]
[ 1399.376433]  btrfs_recover_log_trees+0x1ef/0x4a0 [btrfs]
[ 1399.376446]  ? replay_one_extent+0x7e0/0x7e0 [btrfs]
[ 1399.376462]  open_ctree+0x1a23/0x2100 [btrfs]
[ 1399.376476]  btrfs_mount_root+0x612/0x760 [btrfs]
[ 1399.376489]  ? btrfs_mount_root+0x612/0x760 [btrfs]
[ 1399.376492]  ? __lookup_constant+0x4d/0x70
[ 1399.376494]  legacy_get_tree+0x2b/0x50
[ 1399.376495]  ? legacy_get_tree+0x2b/0x50
[ 1399.376497]  vfs_get_tree+0x2a/0x100
[ 1399.376499]  fc_mount+0x12/0x40
[ 1399.376501]  vfs_kern_mount.part.30+0x76/0x90
[ 1399.376502]  vfs_kern_mount+0x13/0x20
[ 1399.376515]  btrfs_mount+0x179/0x920 [btrfs]
[ 1399.376518]  ? __check_object_size+0xdb/0x1b0
[ 1399.376520]  legacy_get_tree+0x2b/0x50
[ 1399.376521]  ? legacy_get_tree+0x2b/0x50
[ 1399.376523]  vfs_get_tree+0x2a/0x100
[ 1399.376526]  ? capable+0x19/0x20
[ 1399.376528]  do_mount+0x6dc/0xa10
[ 1399.376529]  ksys_mount+0x98/0xe0
[ 1399.376531]  __x64_sys_mount+0x25/0x30
[ 1399.376534]  do_syscall_64+0x5a/0x130
[ 1399.376536]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1399.376537] RIP: 0033:0x7fafda93e3ca
[ 1399.376539] Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff
c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89
01 48
[ 1399.376540] RSP: 002b:00007ffd17cb9798 EFLAGS: 00000202 ORIG_RAX:
00000000000000a5
[ 1399.376542] RAX: ffffffffffffffda RBX: 000056262be79a40 RCX: 00007fafda93e3ca
[ 1399.376543] RDX: 000056262be867c0 RSI: 000056262be7b970 RDI: 000056262be7a940
[ 1399.376544] RBP: 0000000000000000 R08: 000056262be79c80 R09: 00007fafda98a1b0
[ 1399.376545] R10: 00000000c0ed0001 R11: 0000000000000202 R12: 000056262be7a940
[ 1399.376546] R13: 000056262be867c0 R14: 0000000000000000 R15: 00007fafdae5f8a4
[ 1399.376547] Modules linked in: nbd k10temp rfcomm edac_mce_amd
kvm_amd cmac bnep kvm irqbypass snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec
ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm input_leds
snd_seq_midi snd_seq_midi_event snd_rawmidi btusb btrtl btbcm
aesni_intel btintel bluetooth snd_seq aes_x86_64 nls_iso8859_1
crypto_simd cryptd ecdh_generic snd_seq_device glue_helper ecc
snd_timer iwlmvm mac80211 wmi_bmof snd libarc4 soundcore iwlwifi ccp
cfg80211 mac_hid sch_fq_codel it87 hwmon_vid parport_pc ppdev lp
parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq
libcrc32c dm_mirror dm_region_hash dm_log hid_generic usbhid hid
amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 igb ahci nvme libahci
dca nvme_core i2c_algo_bit wmi
[ 1399.376583] CR2: 0000000000000000
[ 1399.376585] ---[ end trace 651b3238b53fecb1 ]---
[ 1399.376587] RIP: 0010:_raw_spin_lock+0x10/0x30
[ 1399.376588] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
1f 00
[ 1399.376589] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
[ 1399.376590] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 1399.376591] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
[ 1399.376592] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
[ 1399.376593] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
[ 1399.376594] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
[ 1399.376596] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
knlGS:0000000000000000
[ 1399.376597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1399.376598] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0

Well, that didn't work ...

At least this unclean shutdown didn't eat the btrfs of the machine I
just spent two days installing from scratch, so, yay.

In other news. Both Linux Mint 19.2 and Ubuntu 18.04.3 do run "fstrim
-av" once a week via systemd.

In the meantime, I maybe got @home off via btrfs restore v5.3:
$ egrep -v '^(Restoring|Done searching|SYMLINK:|offset is) ' restore.typescript
Script started on 2019-10-25 10:20:43+0200
[$ fakeroot ~/local/btrfs-progs/btrfs-progs/btrf]s restore -ivmS -r
258 /dev/mapper/nbd0p2 test/
We seem to be looping a lot on
test/hana/.mozilla/firefox/ffjpjbix.windows-move/places.sqlite, do you
want to keep going on ? (y/N/a): a
ERROR: cannot map block logical 131018522624 length 1073741824: -2
Error copying data for
test/hana/.mozilla/firefox/ffjpjbix.windows-move/favicons.sqlite
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_cell_1.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_gamelogic.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_highres.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_highres_aux_1.datx, do you want to keep going on ?
(y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_mesh.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_sound.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_surface.dat, do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Lord of the Rings
Online/client_surface_aux_1.datx, do you want to keep going on ?
(y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/hana/.steam/steam/steamapps/common/Path of Exile/Content.ggpk, do
you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/The Talos
Principle/Content/Talos/00_All.gro, do you want to keep going on ?
(y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part1.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part2.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part3.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part4.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part5.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part6.pak,
do you want to keep going on ? (y/N/a): a
We seem to be looping a lot on
test/chris/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
do you want to keep going on ? (y/N/a): a
chris@chibi:~/local/chibi-rescue$ exit
exit

Script done on 2019-10-25 14:25:08+0200

How does that look to you?
So "favicons.sqlite" is shot, don't care, but how about the other files?
Is the looping message harmless as long as it finishes eventually or
are these likely to be corrupted?
Can & does restore verify the data checksums? In other words, can I
expect the files that were restored without comment to be ...
consistent? up-to-date? full of random data?
Would it silently drop damaged files/directories or would it at least complain?

Am Do., 24. Okt. 2019 um 13:40 Uhr schrieb Austin S. Hemmelgarn
<ahferroin7@gmail.com>:
> Backups, flavor of your choice. [...] I store enough extra info in the backup to
> be able to rebuild the storage stack by hand from a rescue environment
> (I usually use SystemRescueCD, but any live environment where you can
> get your backup software working and rebuild your storage stack will work).

Oh, I have backups. Well, usually :-p. But files do not a running
system make. For my servers, rebuilding the "storage stack" isn't a
problem, since I built it manually in the first place and have notes.
This is a family member's personal desktop, with a GUI install of
Linux Mint on it. I opted for manual partitioning and btrfs but beyond
that I don't even know the specifics of the setup. I mean, I can boot
from a live USB, create partitions and filesystems, restore from
backup and install grub, but I'm not confident that it would boot, let
alone survive the next system update, because distro-specific some
special sauce is missing. And even so, empasis is on I -- try telling
that to a non-technical person.

What I'm looking for is something you can boot off of and restore the
system automatically. ReaR has the right idea, but uses either tar
(really?) or Enterprise-scale backup backends. By the time I've all
the bugs ironed out of a custom backup-restore script for it, it'll be
too late. Timeshift covers the user error and userspace bug use-cases
brilliantly, something as idiot-proof for actual backups would be
nice.

Cheers,
C.

P.S.: Having to stay near the computer during a btrfs restore just to
enter "a" at the loop prompts isn't ideal. How about "A", always
continue looping, for all files?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-25 16:43                           ` Christian Pernegger
@ 2019-10-25 17:05                             ` Christian Pernegger
  2019-10-25 17:16                               ` Austin S. Hemmelgarn
  2019-10-25 17:12                             ` Austin S. Hemmelgarn
  2019-10-26  0:01                             ` Qu Wenruo
  2 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-25 17:05 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Qu Wenruo, linux-btrfs

P.P.S (sorry): Would using the DUP profile for metadata conceiveably
be an extra line of defence in such cases (assuming the NVMe doesn't
just eat the extra copies outright)? If so, is enabling it after fs
creation safe and should system be DUP as well? Something like:
# btrfs balance start -mconvert=dup [-sconvert=dup -f] $PATH

Lastly ist $PATH just used to identify the fs, or does it act as a
filter? IOW, can I use whatever or should it be run on the real root
of the fs?

Cheers,
C.

Am Fr., 25. Okt. 2019 um 18:43 Uhr schrieb Christian Pernegger
<pernegger@gmail.com>:
>
> Am Do., 24. Okt. 2019 um 13:26 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> > Since you're using v5.0 kernel, it's pretty hard to just compile the btrfs module.
>
> Oh, I'm not married to 5.0, it's just that I'd prefer building a
> kernel package, so it integrates properly with the system. For
> posterity, on Linux Mint 19.2:
>
> # clone repo
> $ git clone https://github.com/adam900710/linux.git
>
> # check commit history
> (https://github.com/adam900710/linux/commits/rescue_options) for the
> release the rescue_options branch is based on =>  5.3-rc7
> # download the Ubuntu patches matching that version (=>
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/) and apply
> them to the rescue_options branch
> $ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/0001-base-packaging.patch
> [... repeat for the rest ...]
> $ cd linux
> $ git checkout rescue_options
> $ patch -p1 <../0001-*.patch
> [... repeat for the rest ...]
>
> # set number of threads to use for compilation, optional & to taste
> export DEB_BUILD_OPTIONS='parallel=16'
>
> # build; could probably trim these targets down more
> $ fakeroot debian/rules clean
> $ fakeroot debian/rules do_mainline_build=true binary-headers
> binary-generic binary-perarch
> cd ..
>
> # install
> dpkg -i linux-image-*_amd64.deb linux-modules-*_amd64.deb linux-headers-*.deb
>
>
> Ok then, let's do this:
> > Then you can boot into the new kernel, then try mount it with -o
> > "resuce=skip_bg,ro".
>
> [  565.097058]  nbd0: p1 p2
> [  565.192002] BTRFS: device fsid c2bd83d6-2261-47bb-8d18-5aba949651d7
> devid 1 transid 58603 /dev/nbd0p2
> [  568.490654]  nbd0: p1 p2
> [  869.871598] BTRFS info (device dm-1): unrecognized rescue option 'skip_bg'
> [  869.884644] BTRFS error (device dm-1): open_ctree failed
>
> Hmm, glancing at the source I think it's "skipbg", no underscore(?)
>
> [ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
> [ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
> [ 1350.402589] BTRFS info (device dm-1): has skinny extents
> [ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
> notreelog mount option for dirty log
> [ 1350.419849] BTRFS error (device dm-1): open_ctree failed
>
> Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
> actually written with an underscore above.
>
> [ 1399.169484] BTRFS info (device dm-1): disabling tree log
> [ 1399.169487] BTRFS info (device dm-1): skip mount time block group searching
> [ 1399.169488] BTRFS info (device dm-1): disk space caching is enabled
> [ 1399.169488] BTRFS info (device dm-1): has skinny extents
> [ 1399.319294] BTRFS info (device dm-1): enabling ssd optimizations
> [ 1399.376181] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 1399.376185] #PF: supervisor write access in kernel mode
> [ 1399.376186] #PF: error_code(0x0002) - not-present page
> [ 1399.376187] PGD 0 P4D 0
> [ 1399.376190] Oops: 0002 [#1] SMP NOPTI
> [ 1399.376193] CPU: 10 PID: 3730 Comm: mount Not tainted
> 5.3.0-050300rc7-generic #201909021831
> [ 1399.376194] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
> MASTER/X570 AORUS MASTER, BIOS F7a 09/09/2019
> [ 1399.376199] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376201] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376202] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376204] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376205] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376206] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376207] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376208] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376210] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376212] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
> [ 1399.376213] Call Trace:
> [ 1399.376239]  btrfs_reserve_metadata_bytes+0x51/0x9c0 [btrfs]
> [ 1399.376241]  ? __switch_to_asm+0x40/0x70
> [ 1399.376242]  ? __switch_to_asm+0x34/0x70
> [ 1399.376243]  ? __switch_to_asm+0x40/0x70
> [ 1399.376245]  ? __switch_to_asm+0x34/0x70
> [ 1399.376246]  ? __switch_to_asm+0x40/0x70
> [ 1399.376247]  ? __switch_to_asm+0x34/0x70
> [ 1399.376248]  ? __switch_to_asm+0x40/0x70
> [ 1399.376249]  ? __switch_to_asm+0x34/0x70
> [ 1399.376250]  ? __switch_to_asm+0x40/0x70
> [ 1399.376251]  ? __switch_to_asm+0x34/0x70
> [ 1399.376252]  ? __switch_to_asm+0x40/0x70
> [ 1399.376271]  btrfs_use_block_rsv+0xd0/0x180 [btrfs]
> [ 1399.376286]  btrfs_alloc_tree_block+0x83/0x550 [btrfs]
> [ 1399.376288]  ? __schedule+0x2b0/0x670
> [ 1399.376302]  alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
> [ 1399.376315]  __btrfs_cow_block+0x12f/0x590 [btrfs]
> [ 1399.376329]  btrfs_cow_block+0xf0/0x1b0 [btrfs]
> [ 1399.376342]  btrfs_search_slot+0x531/0xad0 [btrfs]
> [ 1399.376356]  btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> [ 1399.376374]  overwrite_item+0xef/0x5e0 [btrfs]
> [ 1399.376390]  replay_one_buffer+0x584/0x890 [btrfs]
> [ 1399.376404]  walk_down_log_tree+0x192/0x420 [btrfs]
> [ 1399.376419]  walk_log_tree+0xce/0x1f0 [btrfs]
> [ 1399.376433]  btrfs_recover_log_trees+0x1ef/0x4a0 [btrfs]
> [ 1399.376446]  ? replay_one_extent+0x7e0/0x7e0 [btrfs]
> [ 1399.376462]  open_ctree+0x1a23/0x2100 [btrfs]
> [ 1399.376476]  btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376489]  ? btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376492]  ? __lookup_constant+0x4d/0x70
> [ 1399.376494]  legacy_get_tree+0x2b/0x50
> [ 1399.376495]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376497]  vfs_get_tree+0x2a/0x100
> [ 1399.376499]  fc_mount+0x12/0x40
> [ 1399.376501]  vfs_kern_mount.part.30+0x76/0x90
> [ 1399.376502]  vfs_kern_mount+0x13/0x20
> [ 1399.376515]  btrfs_mount+0x179/0x920 [btrfs]
> [ 1399.376518]  ? __check_object_size+0xdb/0x1b0
> [ 1399.376520]  legacy_get_tree+0x2b/0x50
> [ 1399.376521]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376523]  vfs_get_tree+0x2a/0x100
> [ 1399.376526]  ? capable+0x19/0x20
> [ 1399.376528]  do_mount+0x6dc/0xa10
> [ 1399.376529]  ksys_mount+0x98/0xe0
> [ 1399.376531]  __x64_sys_mount+0x25/0x30
> [ 1399.376534]  do_syscall_64+0x5a/0x130
> [ 1399.376536]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1399.376537] RIP: 0033:0x7fafda93e3ca
> [ 1399.376539] Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89
> 01 48
> [ 1399.376540] RSP: 002b:00007ffd17cb9798 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a5
> [ 1399.376542] RAX: ffffffffffffffda RBX: 000056262be79a40 RCX: 00007fafda93e3ca
> [ 1399.376543] RDX: 000056262be867c0 RSI: 000056262be7b970 RDI: 000056262be7a940
> [ 1399.376544] RBP: 0000000000000000 R08: 000056262be79c80 R09: 00007fafda98a1b0
> [ 1399.376545] R10: 00000000c0ed0001 R11: 0000000000000202 R12: 000056262be7a940
> [ 1399.376546] R13: 000056262be867c0 R14: 0000000000000000 R15: 00007fafdae5f8a4
> [ 1399.376547] Modules linked in: nbd k10temp rfcomm edac_mce_amd
> kvm_amd cmac bnep kvm irqbypass snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
> crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec
> ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm input_leds
> snd_seq_midi snd_seq_midi_event snd_rawmidi btusb btrtl btbcm
> aesni_intel btintel bluetooth snd_seq aes_x86_64 nls_iso8859_1
> crypto_simd cryptd ecdh_generic snd_seq_device glue_helper ecc
> snd_timer iwlmvm mac80211 wmi_bmof snd libarc4 soundcore iwlwifi ccp
> cfg80211 mac_hid sch_fq_codel it87 hwmon_vid parport_pc ppdev lp
> parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq
> libcrc32c dm_mirror dm_region_hash dm_log hid_generic usbhid hid
> amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 igb ahci nvme libahci
> dca nvme_core i2c_algo_bit wmi
> [ 1399.376583] CR2: 0000000000000000
> [ 1399.376585] ---[ end trace 651b3238b53fecb1 ]---
> [ 1399.376587] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376588] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376589] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376590] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376591] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376592] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376593] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376594] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376596] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376598] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
>
> Well, that didn't work ...
>
> At least this unclean shutdown didn't eat the btrfs of the machine I
> just spent two days installing from scratch, so, yay.
>
> In other news. Both Linux Mint 19.2 and Ubuntu 18.04.3 do run "fstrim
> -av" once a week via systemd.
>
> In the meantime, I maybe got @home off via btrfs restore v5.3:
> $ egrep -v '^(Restoring|Done searching|SYMLINK:|offset is) ' restore.typescript
> Script started on 2019-10-25 10:20:43+0200
> [$ fakeroot ~/local/btrfs-progs/btrfs-progs/btrf]s restore -ivmS -r
> 258 /dev/mapper/nbd0p2 test/
> We seem to be looping a lot on
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/places.sqlite, do you
> want to keep going on ? (y/N/a): a
> ERROR: cannot map block logical 131018522624 length 1073741824: -2
> Error copying data for
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/favicons.sqlite
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_cell_1.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_gamelogic.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_mesh.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_sound.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Path of Exile/Content.ggpk, do
> you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/The Talos
> Principle/Content/Talos/00_All.gro, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part1.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part3.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part4.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part5.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part6.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> chris@chibi:~/local/chibi-rescue$ exit
> exit
>
> Script done on 2019-10-25 14:25:08+0200
>
> How does that look to you?
> So "favicons.sqlite" is shot, don't care, but how about the other files?
> Is the looping message harmless as long as it finishes eventually or
> are these likely to be corrupted?
> Can & does restore verify the data checksums? In other words, can I
> expect the files that were restored without comment to be ...
> consistent? up-to-date? full of random data?
> Would it silently drop damaged files/directories or would it at least complain?
>
> Am Do., 24. Okt. 2019 um 13:40 Uhr schrieb Austin S. Hemmelgarn
> <ahferroin7@gmail.com>:
> > Backups, flavor of your choice. [...] I store enough extra info in the backup to
> > be able to rebuild the storage stack by hand from a rescue environment
> > (I usually use SystemRescueCD, but any live environment where you can
> > get your backup software working and rebuild your storage stack will work).
>
> Oh, I have backups. Well, usually :-p. But files do not a running
> system make. For my servers, rebuilding the "storage stack" isn't a
> problem, since I built it manually in the first place and have notes.
> This is a family member's personal desktop, with a GUI install of
> Linux Mint on it. I opted for manual partitioning and btrfs but beyond
> that I don't even know the specifics of the setup. I mean, I can boot
> from a live USB, create partitions and filesystems, restore from
> backup and install grub, but I'm not confident that it would boot, let
> alone survive the next system update, because distro-specific some
> special sauce is missing. And even so, empasis is on I -- try telling
> that to a non-technical person.
>
> What I'm looking for is something you can boot off of and restore the
> system automatically. ReaR has the right idea, but uses either tar
> (really?) or Enterprise-scale backup backends. By the time I've all
> the bugs ironed out of a custom backup-restore script for it, it'll be
> too late. Timeshift covers the user error and userspace bug use-cases
> brilliantly, something as idiot-proof for actual backups would be
> nice.
>
> Cheers,
> C.
>
> P.S.: Having to stay near the computer during a btrfs restore just to
> enter "a" at the loop prompts isn't ideal. How about "A", always
> continue looping, for all files?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-25 17:05                             ` Christian Pernegger
@ 2019-10-25 17:16                               ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-25 17:16 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Qu Wenruo, linux-btrfs

On 2019-10-25 13:05, Christian Pernegger wrote:
> P.P.S (sorry): Would using the DUP profile for metadata conceiveably
> be an extra line of defence in such cases (assuming the NVMe doesn't
> just eat the extra copies outright)? If so, is enabling it after fs
> creation safe and should system be DUP as well? Something like:
> # btrfs balance start -mconvert=dup [-sconvert=dup -f] $PATH
Yes, using the dup profile for metadata should help, provided it's not 
an issue with the rest of the system (if the metadata gets corrupted in 
memory, two bad copies will get written out).

On-line conversion is perfectly safe, and should not require explicit 
conversion of the system chunks (converting metadata will do that 
automatically).
> 
> Lastly ist $PATH just used to identify the fs, or does it act as a
> filter? IOW, can I use whatever or should it be run on the real root
> of the fs?
I think any path on the volume will work, though it's best to use it on 
an actual mount point so you know concretely what it's running on. Most 
of the ioctls are pretty forgiving like this, because it's not unusual 
to only have non-root subvolumes mounted from a volume.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-25 16:43                           ` Christian Pernegger
  2019-10-25 17:05                             ` Christian Pernegger
@ 2019-10-25 17:12                             ` Austin S. Hemmelgarn
  2019-10-26  0:01                             ` Qu Wenruo
  2 siblings, 0 replies; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-25 17:12 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Qu Wenruo, linux-btrfs

On 2019-10-25 12:43, Christian Pernegger wrote:
> Am Do., 24. Okt. 2019 um 13:26 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> Since you're using v5.0 kernel, it's pretty hard to just compile the btrfs module.
> 
> Oh, I'm not married to 5.0, it's just that I'd prefer building a
> kernel package, so it integrates properly with the system. For
> posterity, on Linux Mint 19.2:
> 
> # clone repo
> $ git clone https://github.com/adam900710/linux.git
> 
> # check commit history
> (https://github.com/adam900710/linux/commits/rescue_options) for the
> release the rescue_options branch is based on =>  5.3-rc7
> # download the Ubuntu patches matching that version (=>
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/) and apply
> them to the rescue_options branch
> $ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/0001-base-packaging.patch
> [... repeat for the rest ...]
> $ cd linux
> $ git checkout rescue_options
> $ patch -p1 <../0001-*.patch
> [... repeat for the rest ...]
> 
> # set number of threads to use for compilation, optional & to taste
> export DEB_BUILD_OPTIONS='parallel=16'
> 
> # build; could probably trim these targets down more
> $ fakeroot debian/rules clean
> $ fakeroot debian/rules do_mainline_build=true binary-headers
> binary-generic binary-perarch
> cd ..
> 
> # install
> dpkg -i linux-image-*_amd64.deb linux-modules-*_amd64.deb linux-headers-*.deb
> 
> 
> Ok then, let's do this:
>> Then you can boot into the new kernel, then try mount it with -o
>> "resuce=skip_bg,ro".
> 
> [  565.097058]  nbd0: p1 p2
> [  565.192002] BTRFS: device fsid c2bd83d6-2261-47bb-8d18-5aba949651d7
> devid 1 transid 58603 /dev/nbd0p2
> [  568.490654]  nbd0: p1 p2
> [  869.871598] BTRFS info (device dm-1): unrecognized rescue option 'skip_bg'
> [  869.884644] BTRFS error (device dm-1): open_ctree failed
> 
> Hmm, glancing at the source I think it's "skipbg", no underscore(?)
> 
> [ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
> [ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
> [ 1350.402589] BTRFS info (device dm-1): has skinny extents
> [ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
> notreelog mount option for dirty log
> [ 1350.419849] BTRFS error (device dm-1): open_ctree failed
> 
> Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
> actually written with an underscore above.
> 
> [ 1399.169484] BTRFS info (device dm-1): disabling tree log
> [ 1399.169487] BTRFS info (device dm-1): skip mount time block group searching
> [ 1399.169488] BTRFS info (device dm-1): disk space caching is enabled
> [ 1399.169488] BTRFS info (device dm-1): has skinny extents
> [ 1399.319294] BTRFS info (device dm-1): enabling ssd optimizations
> [ 1399.376181] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 1399.376185] #PF: supervisor write access in kernel mode
> [ 1399.376186] #PF: error_code(0x0002) - not-present page
> [ 1399.376187] PGD 0 P4D 0
> [ 1399.376190] Oops: 0002 [#1] SMP NOPTI
> [ 1399.376193] CPU: 10 PID: 3730 Comm: mount Not tainted
> 5.3.0-050300rc7-generic #201909021831
> [ 1399.376194] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
> MASTER/X570 AORUS MASTER, BIOS F7a 09/09/2019
> [ 1399.376199] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376201] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376202] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376204] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376205] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376206] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376207] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376208] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376210] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376212] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
> [ 1399.376213] Call Trace:
> [ 1399.376239]  btrfs_reserve_metadata_bytes+0x51/0x9c0 [btrfs]
> [ 1399.376241]  ? __switch_to_asm+0x40/0x70
> [ 1399.376242]  ? __switch_to_asm+0x34/0x70
> [ 1399.376243]  ? __switch_to_asm+0x40/0x70
> [ 1399.376245]  ? __switch_to_asm+0x34/0x70
> [ 1399.376246]  ? __switch_to_asm+0x40/0x70
> [ 1399.376247]  ? __switch_to_asm+0x34/0x70
> [ 1399.376248]  ? __switch_to_asm+0x40/0x70
> [ 1399.376249]  ? __switch_to_asm+0x34/0x70
> [ 1399.376250]  ? __switch_to_asm+0x40/0x70
> [ 1399.376251]  ? __switch_to_asm+0x34/0x70
> [ 1399.376252]  ? __switch_to_asm+0x40/0x70
> [ 1399.376271]  btrfs_use_block_rsv+0xd0/0x180 [btrfs]
> [ 1399.376286]  btrfs_alloc_tree_block+0x83/0x550 [btrfs]
> [ 1399.376288]  ? __schedule+0x2b0/0x670
> [ 1399.376302]  alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
> [ 1399.376315]  __btrfs_cow_block+0x12f/0x590 [btrfs]
> [ 1399.376329]  btrfs_cow_block+0xf0/0x1b0 [btrfs]
> [ 1399.376342]  btrfs_search_slot+0x531/0xad0 [btrfs]
> [ 1399.376356]  btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> [ 1399.376374]  overwrite_item+0xef/0x5e0 [btrfs]
> [ 1399.376390]  replay_one_buffer+0x584/0x890 [btrfs]
> [ 1399.376404]  walk_down_log_tree+0x192/0x420 [btrfs]
> [ 1399.376419]  walk_log_tree+0xce/0x1f0 [btrfs]
> [ 1399.376433]  btrfs_recover_log_trees+0x1ef/0x4a0 [btrfs]
> [ 1399.376446]  ? replay_one_extent+0x7e0/0x7e0 [btrfs]
> [ 1399.376462]  open_ctree+0x1a23/0x2100 [btrfs]
> [ 1399.376476]  btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376489]  ? btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376492]  ? __lookup_constant+0x4d/0x70
> [ 1399.376494]  legacy_get_tree+0x2b/0x50
> [ 1399.376495]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376497]  vfs_get_tree+0x2a/0x100
> [ 1399.376499]  fc_mount+0x12/0x40
> [ 1399.376501]  vfs_kern_mount.part.30+0x76/0x90
> [ 1399.376502]  vfs_kern_mount+0x13/0x20
> [ 1399.376515]  btrfs_mount+0x179/0x920 [btrfs]
> [ 1399.376518]  ? __check_object_size+0xdb/0x1b0
> [ 1399.376520]  legacy_get_tree+0x2b/0x50
> [ 1399.376521]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376523]  vfs_get_tree+0x2a/0x100
> [ 1399.376526]  ? capable+0x19/0x20
> [ 1399.376528]  do_mount+0x6dc/0xa10
> [ 1399.376529]  ksys_mount+0x98/0xe0
> [ 1399.376531]  __x64_sys_mount+0x25/0x30
> [ 1399.376534]  do_syscall_64+0x5a/0x130
> [ 1399.376536]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1399.376537] RIP: 0033:0x7fafda93e3ca
> [ 1399.376539] Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89
> 01 48
> [ 1399.376540] RSP: 002b:00007ffd17cb9798 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a5
> [ 1399.376542] RAX: ffffffffffffffda RBX: 000056262be79a40 RCX: 00007fafda93e3ca
> [ 1399.376543] RDX: 000056262be867c0 RSI: 000056262be7b970 RDI: 000056262be7a940
> [ 1399.376544] RBP: 0000000000000000 R08: 000056262be79c80 R09: 00007fafda98a1b0
> [ 1399.376545] R10: 00000000c0ed0001 R11: 0000000000000202 R12: 000056262be7a940
> [ 1399.376546] R13: 000056262be867c0 R14: 0000000000000000 R15: 00007fafdae5f8a4
> [ 1399.376547] Modules linked in: nbd k10temp rfcomm edac_mce_amd
> kvm_amd cmac bnep kvm irqbypass snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
> crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec
> ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm input_leds
> snd_seq_midi snd_seq_midi_event snd_rawmidi btusb btrtl btbcm
> aesni_intel btintel bluetooth snd_seq aes_x86_64 nls_iso8859_1
> crypto_simd cryptd ecdh_generic snd_seq_device glue_helper ecc
> snd_timer iwlmvm mac80211 wmi_bmof snd libarc4 soundcore iwlwifi ccp
> cfg80211 mac_hid sch_fq_codel it87 hwmon_vid parport_pc ppdev lp
> parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq
> libcrc32c dm_mirror dm_region_hash dm_log hid_generic usbhid hid
> amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 igb ahci nvme libahci
> dca nvme_core i2c_algo_bit wmi
> [ 1399.376583] CR2: 0000000000000000
> [ 1399.376585] ---[ end trace 651b3238b53fecb1 ]---
> [ 1399.376587] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376588] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376589] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376590] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376591] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376592] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376593] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376594] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376596] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376598] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
> 
> Well, that didn't work ...
> 
> At least this unclean shutdown didn't eat the btrfs of the machine I
> just spent two days installing from scratch, so, yay.
> 
> In other news. Both Linux Mint 19.2 and Ubuntu 18.04.3 do run "fstrim
> -av" once a week via systemd.
> 
> In the meantime, I maybe got @home off via btrfs restore v5.3:
> $ egrep -v '^(Restoring|Done searching|SYMLINK:|offset is) ' restore.typescript
> Script started on 2019-10-25 10:20:43+0200
> [$ fakeroot ~/local/btrfs-progs/btrfs-progs/btrf]s restore -ivmS -r
> 258 /dev/mapper/nbd0p2 test/
> We seem to be looping a lot on
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/places.sqlite, do you
> want to keep going on ? (y/N/a): a
> ERROR: cannot map block logical 131018522624 length 1073741824: -2
> Error copying data for
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/favicons.sqlite
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_cell_1.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_gamelogic.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_mesh.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_sound.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Path of Exile/Content.ggpk, do
> you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/The Talos
> Principle/Content/Talos/00_All.gro, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part1.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part3.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part4.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part5.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part6.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> chris@chibi:~/local/chibi-rescue$ exit
> exit
> 
> Script done on 2019-10-25 14:25:08+0200
> 
> How does that look to you?
> So "favicons.sqlite" is shot, don't care, but how about the other files?
> Is the looping message harmless as long as it finishes eventually or
> are these likely to be corrupted?
> Can & does restore verify the data checksums? In other words, can I
> expect the files that were restored without comment to be ...
> consistent? up-to-date? full of random data?
> Would it silently drop damaged files/directories or would it at least complain?
I think it will try to restore the data without looking at checksums, 
but I'm not 100% certain.

ON a side note, I've had some success on occasion getting data off of 
damaged BTRFS volumes using GRUB's BTRFS driver (the package you want 
for that is probably called 'grub-mount', it's essentially a FUSE module 
that uses GRUB's filesystem drivers to provide read-only access to any 
filesystem type supported by GRUB) in cases where `btrfs restore` 
couldn't do it.
> 
> Am Do., 24. Okt. 2019 um 13:40 Uhr schrieb Austin S. Hemmelgarn
> <ahferroin7@gmail.com>:
>> Backups, flavor of your choice. [...] I store enough extra info in the backup to
>> be able to rebuild the storage stack by hand from a rescue environment
>> (I usually use SystemRescueCD, but any live environment where you can
>> get your backup software working and rebuild your storage stack will work).
> 
> Oh, I have backups. Well, usually :-p. But files do not a running
> system make. For my servers, rebuilding the "storage stack" isn't a
> problem, since I built it manually in the first place and have notes.
> This is a family member's personal desktop, with a GUI install of
> Linux Mint on it. I opted for manual partitioning and btrfs but beyond
> that I don't even know the specifics of the setup. I mean, I can boot
> from a live USB, create partitions and filesystems, restore from
> backup and install grub, but I'm not confident that it would boot, let
> alone survive the next system update, because distro-specific some
> special sauce is missing. And even so, empasis is on I -- try telling
> that to a non-technical person.
Actually, it should boot and survive the next upgrade just fine.  Just 
make sure that:

* You used the correct version of GRUB.
* The GRUB config is correct.
* The storage configuration is the same.

The last one is the tricky one, as most systems require even the UUID's 
to be the same in a lot of cases (since most Linux distros mount by UUID 
these days).
> 
> What I'm looking for is something you can boot off of and restore the
> system automatically. ReaR has the right idea, but uses either tar
> (really?) or Enterprise-scale backup backends. By the time I've all
> the bugs ironed out of a custom backup-restore script for it, it'll be
> too late. Timeshift covers the user error and userspace bug use-cases
> brilliantly, something as idiot-proof for actual backups would be
> nice.
I have yet to see such a tool for Linux in general unfortunately.

BTW, don't underestimate tar, it's rock-solid reliable if you use it 
right, and works _everywhere_.  Yeah, it doesn't do deduplication or 
encryption or checksumming or compression, but all of that is easy to 
layer in.  The only major issue is that it requries the state of the 
volume to not change while it's running (which can be mitigated by 
splitting backups into logical subsets instead of running for the whole 
system all at once).
> 
> Cheers,
> C.
> 
> P.S.: Having to stay near the computer during a btrfs restore just to
> enter "a" at the loop prompts isn't ideal. How about "A", always
> continue looping, for all files?
Or, alternatively:

yes a | btrfs restore

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-25 16:43                           ` Christian Pernegger
  2019-10-25 17:05                             ` Christian Pernegger
  2019-10-25 17:12                             ` Austin S. Hemmelgarn
@ 2019-10-26  0:01                             ` Qu Wenruo
  2019-10-26  9:23                               ` Christian Pernegger
  2 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-26  0:01 UTC (permalink / raw)
  To: Christian Pernegger, Austin S. Hemmelgarn; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 16180 bytes --]



On 2019/10/26 上午12:43, Christian Pernegger wrote:
> Am Do., 24. Okt. 2019 um 13:26 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> Since you're using v5.0 kernel, it's pretty hard to just compile the btrfs module.
> 
> Oh, I'm not married to 5.0, it's just that I'd prefer building a
> kernel package, so it integrates properly with the system. For
> posterity, on Linux Mint 19.2:
> 
> # clone repo
> $ git clone https://github.com/adam900710/linux.git
> 
> # check commit history
> (https://github.com/adam900710/linux/commits/rescue_options) for the
> release the rescue_options branch is based on =>  5.3-rc7
> # download the Ubuntu patches matching that version (=>
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/) and apply
> them to the rescue_options branch
> $ wget https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc7/0001-base-packaging.patch
> [... repeat for the rest ...]
> $ cd linux
> $ git checkout rescue_options
> $ patch -p1 <../0001-*.patch
> [... repeat for the rest ...]
> 
> # set number of threads to use for compilation, optional & to taste
> export DEB_BUILD_OPTIONS='parallel=16'
> 
> # build; could probably trim these targets down more
> $ fakeroot debian/rules clean
> $ fakeroot debian/rules do_mainline_build=true binary-headers
> binary-generic binary-perarch
> cd ..
> 
> # install
> dpkg -i linux-image-*_amd64.deb linux-modules-*_amd64.deb linux-headers-*.deb
> 
> 
> Ok then, let's do this:
>> Then you can boot into the new kernel, then try mount it with -o
>> "resuce=skip_bg,ro".
> 
> [  565.097058]  nbd0: p1 p2
> [  565.192002] BTRFS: device fsid c2bd83d6-2261-47bb-8d18-5aba949651d7
> devid 1 transid 58603 /dev/nbd0p2
> [  568.490654]  nbd0: p1 p2
> [  869.871598] BTRFS info (device dm-1): unrecognized rescue option 'skip_bg'
> [  869.884644] BTRFS error (device dm-1): open_ctree failed
> 
> Hmm, glancing at the source I think it's "skipbg", no underscore(?)
> 
> [ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
> [ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
> [ 1350.402589] BTRFS info (device dm-1): has skinny extents
> [ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
> notreelog mount option for dirty log
> [ 1350.419849] BTRFS error (device dm-1): open_ctree failed
> 
> Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
> actually written with an underscore above.
> 
> [ 1399.169484] BTRFS info (device dm-1): disabling tree log
> [ 1399.169487] BTRFS info (device dm-1): skip mount time block group searching
> [ 1399.169488] BTRFS info (device dm-1): disk space caching is enabled
> [ 1399.169488] BTRFS info (device dm-1): has skinny extents
> [ 1399.319294] BTRFS info (device dm-1): enabling ssd optimizations
> [ 1399.376181] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 1399.376185] #PF: supervisor write access in kernel mode
> [ 1399.376186] #PF: error_code(0x0002) - not-present page
> [ 1399.376187] PGD 0 P4D 0
> [ 1399.376190] Oops: 0002 [#1] SMP NOPTI
> [ 1399.376193] CPU: 10 PID: 3730 Comm: mount Not tainted
> 5.3.0-050300rc7-generic #201909021831
> [ 1399.376194] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS
> MASTER/X570 AORUS MASTER, BIOS F7a 09/09/2019
> [ 1399.376199] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376201] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376202] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376204] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376205] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376206] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376207] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376208] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376210] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376212] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
> [ 1399.376213] Call Trace:
> [ 1399.376239]  btrfs_reserve_metadata_bytes+0x51/0x9c0 [btrfs]
> [ 1399.376241]  ? __switch_to_asm+0x40/0x70
> [ 1399.376242]  ? __switch_to_asm+0x34/0x70
> [ 1399.376243]  ? __switch_to_asm+0x40/0x70
> [ 1399.376245]  ? __switch_to_asm+0x34/0x70
> [ 1399.376246]  ? __switch_to_asm+0x40/0x70
> [ 1399.376247]  ? __switch_to_asm+0x34/0x70
> [ 1399.376248]  ? __switch_to_asm+0x40/0x70
> [ 1399.376249]  ? __switch_to_asm+0x34/0x70
> [ 1399.376250]  ? __switch_to_asm+0x40/0x70
> [ 1399.376251]  ? __switch_to_asm+0x34/0x70
> [ 1399.376252]  ? __switch_to_asm+0x40/0x70
> [ 1399.376271]  btrfs_use_block_rsv+0xd0/0x180 [btrfs]
> [ 1399.376286]  btrfs_alloc_tree_block+0x83/0x550 [btrfs]
> [ 1399.376288]  ? __schedule+0x2b0/0x670
> [ 1399.376302]  alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
> [ 1399.376315]  __btrfs_cow_block+0x12f/0x590 [btrfs]
> [ 1399.376329]  btrfs_cow_block+0xf0/0x1b0 [btrfs]
> [ 1399.376342]  btrfs_search_slot+0x531/0xad0 [btrfs]
> [ 1399.376356]  btrfs_insert_empty_items+0x71/0xc0 [btrfs]
> [ 1399.376374]  overwrite_item+0xef/0x5e0 [btrfs]
> [ 1399.376390]  replay_one_buffer+0x584/0x890 [btrfs]
> [ 1399.376404]  walk_down_log_tree+0x192/0x420 [btrfs]
> [ 1399.376419]  walk_log_tree+0xce/0x1f0 [btrfs]
> [ 1399.376433]  btrfs_recover_log_trees+0x1ef/0x4a0 [btrfs]

It's already working, the problem is, there is a dirty log while
nologreplay mount option doesn't really work.

You can btrfs-zero-log to clear the log, then try again using skipbg
mount option.

And thanks for the report, I'll look into why nologreplay doesn't work.

Thanks,
Qu

> [ 1399.376446]  ? replay_one_extent+0x7e0/0x7e0 [btrfs]
> [ 1399.376462]  open_ctree+0x1a23/0x2100 [btrfs]
> [ 1399.376476]  btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376489]  ? btrfs_mount_root+0x612/0x760 [btrfs]
> [ 1399.376492]  ? __lookup_constant+0x4d/0x70
> [ 1399.376494]  legacy_get_tree+0x2b/0x50
> [ 1399.376495]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376497]  vfs_get_tree+0x2a/0x100
> [ 1399.376499]  fc_mount+0x12/0x40
> [ 1399.376501]  vfs_kern_mount.part.30+0x76/0x90
> [ 1399.376502]  vfs_kern_mount+0x13/0x20
> [ 1399.376515]  btrfs_mount+0x179/0x920 [btrfs]
> [ 1399.376518]  ? __check_object_size+0xdb/0x1b0
> [ 1399.376520]  legacy_get_tree+0x2b/0x50
> [ 1399.376521]  ? legacy_get_tree+0x2b/0x50
> [ 1399.376523]  vfs_get_tree+0x2a/0x100
> [ 1399.376526]  ? capable+0x19/0x20
> [ 1399.376528]  do_mount+0x6dc/0xa10
> [ 1399.376529]  ksys_mount+0x98/0xe0
> [ 1399.376531]  __x64_sys_mount+0x25/0x30
> [ 1399.376534]  do_syscall_64+0x5a/0x130
> [ 1399.376536]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1399.376537] RIP: 0033:0x7fafda93e3ca
> [ 1399.376539] Code: 48 8b 0d c1 8a 2c 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 8e 8a 2c 00 f7 d8 64 89
> 01 48
> [ 1399.376540] RSP: 002b:00007ffd17cb9798 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000a5
> [ 1399.376542] RAX: ffffffffffffffda RBX: 000056262be79a40 RCX: 00007fafda93e3ca
> [ 1399.376543] RDX: 000056262be867c0 RSI: 000056262be7b970 RDI: 000056262be7a940
> [ 1399.376544] RBP: 0000000000000000 R08: 000056262be79c80 R09: 00007fafda98a1b0
> [ 1399.376545] R10: 00000000c0ed0001 R11: 0000000000000202 R12: 000056262be7a940
> [ 1399.376546] R13: 000056262be867c0 R14: 0000000000000000 R15: 00007fafdae5f8a4
> [ 1399.376547] Modules linked in: nbd k10temp rfcomm edac_mce_amd
> kvm_amd cmac bnep kvm irqbypass snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
> crct10dif_pclmul snd_hda_intel crc32_pclmul snd_hda_codec
> ghash_clmulni_intel snd_hda_core snd_hwdep snd_pcm input_leds
> snd_seq_midi snd_seq_midi_event snd_rawmidi btusb btrtl btbcm
> aesni_intel btintel bluetooth snd_seq aes_x86_64 nls_iso8859_1
> crypto_simd cryptd ecdh_generic snd_seq_device glue_helper ecc
> snd_timer iwlmvm mac80211 wmi_bmof snd libarc4 soundcore iwlwifi ccp
> cfg80211 mac_hid sch_fq_codel it87 hwmon_vid parport_pc ppdev lp
> parport ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq
> libcrc32c dm_mirror dm_region_hash dm_log hid_generic usbhid hid
> amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 igb ahci nvme libahci
> dca nvme_core i2c_algo_bit wmi
> [ 1399.376583] CR2: 0000000000000000
> [ 1399.376585] ---[ end trace 651b3238b53fecb1 ]---
> [ 1399.376587] RIP: 0010:_raw_spin_lock+0x10/0x30
> [ 1399.376588] Code: 01 00 00 75 06 48 89 d8 5b 5d c3 e8 4a 75 66 ff
> 48 89 d8 5b 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 31 c0 ba 01
> 00 00 00 <f0> 0f b1 17 75 02 5d c3 89 c6 e8 01 5d 66 ff 66 90 5d c3 0f
> 1f 00
> [ 1399.376589] RSP: 0018:ffffaec3c2d33370 EFLAGS: 00010246
> [ 1399.376590] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 1399.376591] RDX: 0000000000000001 RSI: ffffa0487f540248 RDI: 0000000000000000
> [ 1399.376592] RBP: ffffaec3c2d33370 R08: ffffaec3c2d335a7 R09: 0000000000000002
> [ 1399.376593] R10: 00000000ffffffff R11: ffffaec3c2d338a5 R12: 0000000000004000
> [ 1399.376594] R13: ffffa0487f540000 R14: 0000000000000000 R15: ffffa04d0e6d5800
> [ 1399.376596] FS:  00007fafdb068080(0000) GS:ffffa04d7ea80000(0000)
> knlGS:0000000000000000
> [ 1399.376597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1399.376598] CR2: 0000000000000000 CR3: 0000000494ea0000 CR4: 0000000000340ee0
> 
> Well, that didn't work ...
> 
> At least this unclean shutdown didn't eat the btrfs of the machine I
> just spent two days installing from scratch, so, yay.
> 
> In other news. Both Linux Mint 19.2 and Ubuntu 18.04.3 do run "fstrim
> -av" once a week via systemd.
> 
> In the meantime, I maybe got @home off via btrfs restore v5.3:
> $ egrep -v '^(Restoring|Done searching|SYMLINK:|offset is) ' restore.typescript
> Script started on 2019-10-25 10:20:43+0200
> [$ fakeroot ~/local/btrfs-progs/btrfs-progs/btrf]s restore -ivmS -r
> 258 /dev/mapper/nbd0p2 test/
> We seem to be looping a lot on
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/places.sqlite, do you
> want to keep going on ? (y/N/a): a
> ERROR: cannot map block logical 131018522624 length 1073741824: -2
> Error copying data for
> test/hana/.mozilla/firefox/ffjpjbix.windows-move/favicons.sqlite
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_cell_1.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_gamelogic.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_highres_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_mesh.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_sound.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface.dat, do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Lord of the Rings
> Online/client_surface_aux_1.datx, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/hana/.steam/steam/steamapps/common/Path of Exile/Content.ggpk, do
> you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/The Talos
> Principle/Content/Talos/00_All.gro, do you want to keep going on ?
> (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part1.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part2.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part3.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part4.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part5.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/GameSDK/Objects-part6.pak,
> do you want to keep going on ? (y/N/a): a
> We seem to be looping a lot on
> test/chris/.steam/steam/steamapps/common/Prey/Whiplash/GameSDK/Precache/Campaign_textures-part2.pak,
> do you want to keep going on ? (y/N/a): a
> chris@chibi:~/local/chibi-rescue$ exit
> exit
> 
> Script done on 2019-10-25 14:25:08+0200
> 
> How does that look to you?
> So "favicons.sqlite" is shot, don't care, but how about the other files?
> Is the looping message harmless as long as it finishes eventually or
> are these likely to be corrupted?
> Can & does restore verify the data checksums? In other words, can I
> expect the files that were restored without comment to be ...
> consistent? up-to-date? full of random data?
> Would it silently drop damaged files/directories or would it at least complain?
> 
> Am Do., 24. Okt. 2019 um 13:40 Uhr schrieb Austin S. Hemmelgarn
> <ahferroin7@gmail.com>:
>> Backups, flavor of your choice. [...] I store enough extra info in the backup to
>> be able to rebuild the storage stack by hand from a rescue environment
>> (I usually use SystemRescueCD, but any live environment where you can
>> get your backup software working and rebuild your storage stack will work).
> 
> Oh, I have backups. Well, usually :-p. But files do not a running
> system make. For my servers, rebuilding the "storage stack" isn't a
> problem, since I built it manually in the first place and have notes.
> This is a family member's personal desktop, with a GUI install of
> Linux Mint on it. I opted for manual partitioning and btrfs but beyond
> that I don't even know the specifics of the setup. I mean, I can boot
> from a live USB, create partitions and filesystems, restore from
> backup and install grub, but I'm not confident that it would boot, let
> alone survive the next system update, because distro-specific some
> special sauce is missing. And even so, empasis is on I -- try telling
> that to a non-technical person.
> 
> What I'm looking for is something you can boot off of and restore the
> system automatically. ReaR has the right idea, but uses either tar
> (really?) or Enterprise-scale backup backends. By the time I've all
> the bugs ironed out of a custom backup-restore script for it, it'll be
> too late. Timeshift covers the user error and userspace bug use-cases
> brilliantly, something as idiot-proof for actual backups would be
> nice.
> 
> Cheers,
> C.
> 
> P.S.: Having to stay near the computer during a btrfs restore just to
> enter "a" at the loop prompts isn't ideal. How about "A", always
> continue looping, for all files?
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26  0:01                             ` Qu Wenruo
@ 2019-10-26  9:23                               ` Christian Pernegger
  2019-10-26  9:41                                 ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-26  9:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S. Hemmelgarn, linux-btrfs

Am Sa., 26. Okt. 2019 um 02:01 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> It's already working, the problem is, there is a dirty log while
> nologreplay mount option doesn't really work.

For the record, I didn't try to mount using nologreplay, only
notreelog. (Apologies if notreelog and/or skipbg imply nologreplay.)

> You can btrfs-zero-log to clear the log, then try again using skipbg
> mount option.

I don't think I can, actually. At least, zeroing the log didn't work
back when btrfs check --repair was still in the table. Admittedly,
that was using Ubuntu eoan's 5.3 kernel, not yours, and with their
btrfs-progs (5.2.1); I don't think I'd gotten around to compiling
btrfs-progs 5.3, yet. So if you think trying again with the
rescue_options kernel and/or latest btrfs-progs might allow me to zero
the log, I'll try again.
Alternatively, using backup super 1 or 2 got me past that hurdle with
btrfs check --repair, so if there's an option to mount using one of
these ...?
(Output quoted below for reference.)

> > $ btrfs check --init-extent-tree patient
> > Opening filesystem to check...
> > Checking filesystem on patient
> > UUID: c2bd83d6-2261-47bb-8d18-5aba949651d7
> > repair mode will force to clear out log tree, are you sure? [y/N]: y
> > ERROR: Corrupted fs, no valid METADATA block group found
> > ERROR: failed to zero log tree: -117
> > ERROR: attempt to start transaction over already running one
> > # rollback
> >
> > $ btrfs rescue zero-log patient
> > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
> > bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
> > ERROR: could not open ctree
> > # rollback
> >
> > # hm, super 0 has log_root 284056535040, super 1 and 2 have log_root 0 ...
> > [...]

> And thanks for the report, I'll look into why nologreplay doesn't work.

On the contrary, thank you! It's the least I can do. If there's
anything else I can to help make it less likely (something like) this
bites me or anyone else again, just say the word. Also, I'm curious as
to the state of the data and btrfs restore doesn't care about
checksums, so I'd love to be able to ro-mount the image sometime.

Cheers,
C.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26  9:23                               ` Christian Pernegger
@ 2019-10-26  9:41                                 ` Qu Wenruo
  2019-10-26 13:52                                   ` Christian Pernegger
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-26  9:41 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Austin S. Hemmelgarn, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3325 bytes --]



On 2019/10/26 下午5:23, Christian Pernegger wrote:
> Am Sa., 26. Okt. 2019 um 02:01 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> It's already working, the problem is, there is a dirty log while
>> nologreplay mount option doesn't really work.
> 
> For the record, I didn't try to mount using nologreplay, only
> notreelog. (Apologies if notreelog and/or skipbg imply nologreplay.)

Then that's the problem.

With skipbg, all block groups are marked readonly, any write should go a
ENOSPC error.
Thus I put a log tree check in skipbg mount option, and if it detects
log tree, it refuse to mount and output kernel message to require
nologreplay.

> 
>> You can btrfs-zero-log to clear the log, then try again using skipbg
>> mount option.
> 
> I don't think I can, actually. At least, zeroing the log didn't work
> back when btrfs check --repair was still in the table. Admittedly,
> that was using Ubuntu eoan's 5.3 kernel, not yours, and with their
> btrfs-progs (5.2.1); I don't think I'd gotten around to compiling
> btrfs-progs 5.3, yet. So if you think trying again with the
> rescue_options kernel and/or latest btrfs-progs might allow me to zero
> the log, I'll try again.
> Alternatively, using backup super 1 or 2 got me past that hurdle with
> btrfs check --repair, so if there's an option to mount using one of
> these ...?
> (Output quoted below for reference.)
> 
>>> $ btrfs check --init-extent-tree patient
>>> Opening filesystem to check...
>>> Checking filesystem on patient
>>> UUID: c2bd83d6-2261-47bb-8d18-5aba949651d7
>>> repair mode will force to clear out log tree, are you sure? [y/N]: y
>>> ERROR: Corrupted fs, no valid METADATA block group found
>>> ERROR: failed to zero log tree: -117
>>> ERROR: attempt to start transaction over already running one
>>> # rollback
>>>
>>> $ btrfs rescue zero-log patient
>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>> checksum verify failed on 284041084928 found E4E3BDB6 wanted 00000000
>>> bad tree block 284041084928, bytenr mismatch, want=284041084928, have=0
>>> ERROR: could not open ctree
>>> # rollback
>>>
>>> # hm, super 0 has log_root 284056535040, super 1 and 2 have log_root 0 ...
>>> [...]
> 
>> And thanks for the report, I'll look into why nologreplay doesn't work.
> 
> On the contrary, thank you! It's the least I can do. If there's
> anything else I can to help make it less likely (something like) this
> bites me or anyone else again, just say the word. Also, I'm curious as
> to the state of the data and btrfs restore doesn't care about
> checksums, so I'd love to be able to ro-mount the image sometime.

Then you can try btrfs-mod-sb, which modifies superblock without
mounting the fs.

# ./btrfs-sb-mod /dev/nvme/btrfs log_root =0

Of course, you'll need to compile btrfs-progs.
You don't need to compile the full btrfs-progs, which has quite some
dependencies, you only need to:
# cd btrfs-progs/
# ./autogen.sh
# make btrfs-sb-mod

Then try above command.
You should got something like:
$ ./btrfs-sb-mod /dev/nvme/btrfs log_root =0
super block checksum is ok
GET: log_root xxxxx (0xXXXXXX)
SET: log_root 0 (0x0)
Update csum

Then try mount with rescue=skipbg,ro again.

Thanks,
Qu
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26  9:41                                 ` Qu Wenruo
@ 2019-10-26 13:52                                   ` Christian Pernegger
  2019-10-26 14:06                                     ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-26 13:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S. Hemmelgarn, linux-btrfs

Am Sa., 26. Okt. 2019 um 11:41 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> Thus I put a log tree check in skipbg mount option, and if it detects
> log tree, it refuse to mount and output kernel message to require
> nologreplay.

No, it doesn't, it asks for notreelog (quoted from a few e-mails back):

> > [ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
> > [ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
> > [ 1350.402589] BTRFS info (device dm-1): has skinny extents
> > [ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
> > notreelog mount option for dirty log
> > [ 1350.419849] BTRFS error (device dm-1): open_ctree failed
> >
> > Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
> > actually written with an underscore above.

> Then you can try btrfs-mod-sb, [...]

Yep, that did the trick. Can mount the fs ro now, access files and it
even reports checksum errors for a couple of them. Promising.

Cheers,
C.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26 13:52                                   ` Christian Pernegger
@ 2019-10-26 14:06                                     ` Qu Wenruo
  2019-10-26 16:30                                       ` Christian Pernegger
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-26 14:06 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Austin S. Hemmelgarn, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1454 bytes --]



On 2019/10/26 下午9:52, Christian Pernegger wrote:
> Am Sa., 26. Okt. 2019 um 11:41 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> Thus I put a log tree check in skipbg mount option, and if it detects
>> log tree, it refuse to mount and output kernel message to require
>> nologreplay.
> 
> No, it doesn't, it asks for notreelog (quoted from a few e-mails back):

Exposed a bug!
My bad!

> 
>>> [ 1350.402586] BTRFS info (device dm-1): skip mount time block group searching
>>> [ 1350.402588] BTRFS info (device dm-1): disk space caching is enabled
>>> [ 1350.402589] BTRFS info (device dm-1): has skinny extents
>>> [ 1350.402590] BTRFS error (device dm-1): skip_bg must be used with
>>> notreelog mount option for dirty log
>>> [ 1350.419849] BTRFS error (device dm-1): open_ctree failed
>>>
>>> Fine by me, let's add "notreelog" as well. Note that "skip_bg" is
>>> actually written with an underscore above.
> 
>> Then you can try btrfs-mod-sb, [...]
> 
> Yep, that did the trick. Can mount the fs ro now, access files and it
> even reports checksum errors for a couple of them. Promising.

Mind to share the csum error log?
As you may already find out, certain CRC32 means all zero, and can
indicate more trim error for data.

Otherwise, indeed looks promising.

I'll fix the bug and update the patchset.
(Maybe also make btrfs falls back to skipbg by default)

Thanks,
Qu

> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26 14:06                                     ` Qu Wenruo
@ 2019-10-26 16:30                                       ` Christian Pernegger
  2019-10-27  0:46                                         ` Qu Wenruo
  0 siblings, 1 reply; 29+ messages in thread
From: Christian Pernegger @ 2019-10-26 16:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Austin S. Hemmelgarn, linux-btrfs

Am Sa., 26. Okt. 2019 um 16:07 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
> Mind to share the csum error log?

Certainly. That is, you're welcome to it, if you tell me how to
generate such a thing. Is there an elegant way to walk the entire
filesystem, trigger csum calculations for everything and generate a
log? Something like a poor man's read-only scrub?

> I'll fix the bug and update the patchset.
> (Maybe also make btrfs falls back to skipbg by default)

I appreciate it.

Cheers,
C.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-26 16:30                                       ` Christian Pernegger
@ 2019-10-27  0:46                                         ` Qu Wenruo
       [not found]                                           ` <CAKbQEqFne8eohE3gvCMm8LqA-KimFrwwvE5pUBTn-h-VBhJq1A@mail.gmail.com>
  0 siblings, 1 reply; 29+ messages in thread
From: Qu Wenruo @ 2019-10-27  0:46 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Austin S. Hemmelgarn, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 830 bytes --]



On 2019/10/27 上午12:30, Christian Pernegger wrote:
> Am Sa., 26. Okt. 2019 um 16:07 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> Mind to share the csum error log?
> 
> Certainly. That is, you're welcome to it, if you tell me how to
> generate such a thing. Is there an elegant way to walk the entire
> filesystem, trigger csum calculations for everything and generate a
> log? Something like a poor man's read-only scrub?

The csum error can be seen in the kernel message.
So "dmesg" is enough to output them.

To find all csum error, there is the poor man's read-only scrub:
# find <mnt> -type f -exec cat {} > /dev/null \;

Thanks,
Qu

> 
>> I'll fix the bug and update the patchset.
>> (Maybe also make btrfs falls back to skipbg by default)
> 
> I appreciate it.
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

[parent not found: <CAKbQEqFne8eohE3gvCMm8LqA-KimFrwwvE5pUBTn-h-VBhJq1A@mail.gmail.com>]

* Re: first it froze, now the (btrfs) root fs won't mount ...
       [not found]                                           ` <CAKbQEqFne8eohE3gvCMm8LqA-KimFrwwvE5pUBTn-h-VBhJq1A@mail.gmail.com>
@ 2019-10-27 13:38                                             ` Qu Wenruo
  0 siblings, 0 replies; 29+ messages in thread
From: Qu Wenruo @ 2019-10-27 13:38 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1491 bytes --]



On 2019/10/27 下午8:41, Christian Pernegger wrote:
> [Replying off-list. There shouldn't be anything private in there, but
> you never know, and it does have filenames. I'd appreciate it if you
> were to delete the data once you're sure you've gotten everything you
> can from it.]

Sure.

BTW, this use case reminds me to add an option to censor the filenames...

> 
> Am So., 27. Okt. 2019 um 02:46 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>> So "dmesg" is enough to output them.
>>
>> To find all csum error, there is the poor man's read-only scrub:
>> # find <mnt> -type f -exec cat {} > /dev/null \;
> 
> I ended up diffing the files that were either in the data restored by
> btrfs restore or the ro-mount courtesy of rescue_branch, in order to
> read all files, expose files that were restored differently [1] or
> missing in one copy [just pipes & such]. Note that this is just for
> @home.

Considering you have log tree populated, it may have something to do
with log tree.

If you want to be extra safe, notreelog (yes, this time I didn't screw
up the mount option name) could make it a little safer, while make
fsync() a little slower.

Furthermore, according to your kernel log, it's not your data corrupted,
but the csum tree corrupted, thus btrfs fails to read out the csum, then
report -EIO.
It's possible that your data is just fine.

Maybe I can also enhance that part for btrfs...

Thanks,
Qu
> 
> Cheers,
> C.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: first it froze, now the (btrfs) root fs won't mount ...
  2019-10-21 13:02               ` Christian Pernegger
  2019-10-21 13:34                 ` Qu Wenruo
@ 2019-10-21 14:02                 ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 29+ messages in thread
From: Austin S. Hemmelgarn @ 2019-10-21 14:02 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: Qu Wenruo, linux-btrfs

On 2019-10-21 09:02, Christian Pernegger wrote:
> [Please CC me, I'm not on the list.]
> 
> Am Mo., 21. Okt. 2019 um 13:47 Uhr schrieb Austin S. Hemmelgarn
> <ahferroin7@gmail.com>:
>> I've [worked with fs clones] like this dozens of times on single-device volumes with exactly zero issues.
> 
> Thank you, I have taken precautions, but it does seem to work fine.
> 
>> There are actually two possible ways I can think of a buggy GPU driver causing this type of issue: [snip]
> 
> Interesting and plausible, but ...
> 
>> Your best option for mitigation [...] is to ensure that your hardware has an IOMMU [...] and ensure it's enabled in firmware.
> 
> It has and it is. (The machine's been specced so GPU pass-through is
> an option, should it be required. I haven't gotten around to setting
> that up yet, haven't even gotten a second GPU, but I have laid the
> groundwork, the IOMMU is enabled and, as far as one can tell from logs
> and such, working.)
> 
>> However, there's also the possibility that you may have hardware issues.
> 
> Don't I know it ... The problem is, if there are hardware issues,
> that's the first I've seen of them, and while I didn't run torture
> tests, there was quite a lot of benchmarking when it was new. Needle
> in a haystack. Some memory testing can't hurt, I suppose. Any other
> ideas (for hardware testing)?
The power supply would be the other big one I'd suggest testing, as a 
bad PSU can cause all kinds of odd intermittent issues. Just like with 
RAM, you can't really easily cover everything, but you can check some 
things that have very low false negative rates when indicating problems.

Typical procedure I use is:

1. Completely disconnect the PSU from _everything_ inside the computer. 
(If you're really paranoid, you can completely remove the PSU from the 
case too, though that won't really make the testing more reliable or safer).
2. Make sure the PSU itself is plugged in to mains power, with the 
switch on the back (if it has one) turned on.
3. Connect a good multimeter to the 24-pin main power connector, with 
the positive probe on pin 8 and the negative probe on pin 7, set to 
measure DC voltages in the double-digit range with the highest precision 
possible.
4. Short pins 15 and 16 of the 24-pin main power connector using a short 
piece of solid copper wire. At this point, if the PSU has a fan, the fan 
should turn on. The multimeter should read +5 volts within half a second 
or less.
5. Check voltages of each of the power rails relative to ground. Make 
sure and check each one for a couple of seconds to watch for any 
fluctuations, and make a point to check _each_ set of wires coming off 
of the PSU separately (as well as checking each wire in each connector 
independently, even if they're supposed to be tied together internally).
6. Check the =5V standby power by hooking up the multimeter to that and 
a ground pin, then disconnecting the copper wire mentioned in step 3. 
It should maintain it's voltage while you're disconnecting the wire and 
afterwards, even once the fan stops.

You can find the respective pinouts online in many places (for example, 
[1]).  Tolerances are +/- 5% on everything except the negative voltages 
which are +/- 10%. The -5V pin may show nothing, which is normal (modern 
systems do not use -5V for anything, and actually most don't use -12V 
anymore either, though that's still provided). This won't confirm that 
the PSU isn't suspect (it could still have issues under load), but if 
any of this testing fails, you can be 100% certain you have either a bad 
PSU, or that your mains power is suspect (usually the issue there is 
very high line noise, though you'll need special equipment to test for 
that).
> 
> Back on the topic of TRIM: I'm 99 % certain discard wasn't set on the
> mount (not by me, in any case), but I think Mint runs fstrim
> periodically by default. Just to be sure, should any form of TRIM be
> disabled?
The issue with TRIM is that it drops old copies of the on-disk data 
structures used by BTRFS, which can make recovery more difficult in the 
event of a crash. Running `fstrim` at regular intervals is not as much 
of an issue as inline discard, but still drops the old trees, so there's 
a window of time right after it gets run when you are more vulnerable.

Additionally, some SSD's have had issues with TRIM causing data 
corruption elsewhere on the disk, but it's been years since I've seen a 
report of such issues, and I don't think a Samsung device as recent as 
yours is likely to have such problems.

> The only other idea I've got is Timeshift's hourly snapshots. (How)
> would btrfs deal with a crash during snapshot creation?
It should have no issues whatsoever most of the time.  The only case I 
can think of where it might is if you're snapshotting a subvolume that's 
being written to at the same time. Snapshots on BTRFS are only truly 
atomic if none of the data being snapshotted is being written to at the 
same time. If there are pending writes, there are some indeterminate 
states involved, and crashing then might produce a corrupted snapshot, 
but shouldn't cause any other issues.
> 
> 
> In other news, I've still not quite given up, mainly because the fs
> doesn't look all that broken. The output of btrfs inspect-internal
> dump-tree (incl. options), for instance, looks like gibberish to me of
> course, but it looks sane, doesn't spew warnings, doesn't error out or
> crash. Also plain btrfs check --init-extent-tree errored out, same
> with -s0, but with -s1 it's now chugging along. (BTW, is there a
> hierarchy among the super block slots, a best or newest one?)
AIUI, when they get updated, they get written out in the order they 
occur on disk, but other than that they're supposed to always be 
in-sync.  So if you have an issue when the first is being written out, 
you can often recover by using the second or later ones.
> 
> Will keep you posted.
> 
> Cheers,
> C.
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2019-10-27 13:38 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKbQEqE7xN1q3byFL7-_pD=_pGJ0Vm9pj7d-g+rRgtONeH-GrA@mail.gmail.com>
2019-10-19 22:34 ` first it froze, now the (btrfs) root fs won't mount Christian Pernegger
2019-10-20  0:38   ` Qu Wenruo
2019-10-20 10:11     ` Christian Pernegger
2019-10-20 10:22       ` Christian Pernegger
2019-10-20 10:28         ` Qu Wenruo
2019-10-21 10:47           ` Christian Pernegger
2019-10-21 10:55             ` Qu Wenruo
2019-10-21 11:47             ` Austin S. Hemmelgarn
2019-10-21 13:02               ` Christian Pernegger
2019-10-21 13:34                 ` Qu Wenruo
2019-10-22 22:56                   ` Christian Pernegger
2019-10-23  0:25                     ` Qu Wenruo
2019-10-23 11:31                     ` Austin S. Hemmelgarn
2019-10-24 10:41                       ` Christian Pernegger
2019-10-24 11:26                         ` Qu Wenruo
2019-10-24 11:40                         ` Austin S. Hemmelgarn
2019-10-25 16:43                           ` Christian Pernegger
2019-10-25 17:05                             ` Christian Pernegger
2019-10-25 17:16                               ` Austin S. Hemmelgarn
2019-10-25 17:12                             ` Austin S. Hemmelgarn
2019-10-26  0:01                             ` Qu Wenruo
2019-10-26  9:23                               ` Christian Pernegger
2019-10-26  9:41                                 ` Qu Wenruo
2019-10-26 13:52                                   ` Christian Pernegger
2019-10-26 14:06                                     ` Qu Wenruo
2019-10-26 16:30                                       ` Christian Pernegger
2019-10-27  0:46                                         ` Qu Wenruo
     [not found]                                           ` <CAKbQEqFne8eohE3gvCMm8LqA-KimFrwwvE5pUBTn-h-VBhJq1A@mail.gmail.com>
2019-10-27 13:38                                             ` Qu Wenruo
2019-10-21 14:02                 ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.