On 2019/10/20 上午6:34, Christian Pernegger wrote: > [Please CC me, I'm not on the list.] > > Hello, > > I'm afraid I could use some help. > > The affected machine froze during a game, was entirely unresponsive > locally, though ssh still worked. For completeness' sake, dmesg had: > [110592.128512] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > timeout, signaled seq=3404070, emitted seq=3404071 > [110592.128545] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process > information: process Xorg pid 1191 thread Xorg:cs0 pid 1204 > [110592.128549] amdgpu 0000:0c:00.0: GPU reset begin! > [110592.138530] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx > timeout, signaled seq=13149116, emitted seq=13149118 > [110592.138577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process > information: process Overcooked.exe pid 4830 thread dxvk-submit pid > 4856 > [110592.138579] amdgpu 0000:0c:00.0: GPU reset begin! It looks like you're using eGPU and the thunderbolt 3 connection disconnect? That would cause a kernel panic/hang or whatever. > > Oh well, I thought, and "shutdown -h now" it. That quit my ssh session > and locked me out, but otherwise didn't take, no reboot, still frozen. > Alt-SysRq-REISUB it was. That did it. > > Only now all I get is a rescue shell, the pertinent messages look to > be [everything is copied off the screen by hand]: > [...] > BTRFS info [...]: disk space caching is enabled > BTRFS info [...]: has skinny extents > BTRFS error [...]: bad tree block start, want [big number] have 0 > BTRFS error [...]: failed to read block groups: -5 > BTRFS error [...]: open_ctree failed This means some tree blocks didn't reach disk or just got wiped out. Are you using discard mount option? > > Mounting with -o ro,usebackuproot doesn't change anything. > > running btrfs check gives: > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000 > checksum verify failed on [same big number] found [8 digits hex] wanted 00000000 Again, some old tree blocks got wiped out. BTW, you don't need to wipe the numbers, sometimes it help developer to find some corner problem. > bytenr mismatch, want=[same big number], have=0 > ERROR: cannot open filesystem. > > That's all I've got, I'd really appreciate some help. There's hourly > snapshots courtesy of Timeshift, though I have a feeling those won't > help ... If it's the only problem, you can try this kernel branch to at least do a RO mount: https://github.com/adam900710/linux/tree/rescue_options Then mount the fs with "rescue=skipbg,ro" option. If the bad tree block is the only problem, it should be able to mount it. If that mount succeeded, and you can access all files, then it means only extent tree is corrupted, then you can try btrfs check --init-extent-tree, there are some reports of --init-extent-tree fixed the problem. > > Oh, it's a recent Linux Mint 19.2 install, default layout (@, @home), > Timeshift enabled; on a single device (NVMe). HWE kernel (Kernel > 5.0.0-31-generic), btrfs-progs 4.15.1. About the cause, either btrfs didn't write some tree blocks correctly or the NVMe doesn't implement FUA/FLUSH correctly (which I don't believe is the case). So it's recommended to update the kernel to 5.3 kernel. Thanks, Qu > > TIA, > Christian >