Bug ID 107572
Summary Unrecoverable GPU hang with IP block:gfx_v8_0 is hung
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter madcatx@atlas.cz

Hello,

I have been experiencing a worrying amount of these ever since I got my RX 570
a few months ago. I can reproduce the hang quite reliably by with some 3D
workloads, for instance the Unigine Superposition run on High quality or
Witcher 3 (through WINE) crash the GPU quite reliably within minutes.

Once that happens I can always SSH into the machine and try to get at least
some debugging information. Unfortunately, there does not seem to be much to go
on.

dmesg does not tell me more than this:
[  254.704581] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=103742, last emitted seq=103745
[  254.704586] [drm] IP block:gfx_v8_0 is hung!
[  254.704629] [drm] GPU recovery disabled.

Here are a few things I have tried so far:
- Boot with amdgpu.dc=0
- Boot with amdgpu.vm_update_mode=3
- Force the GPU to max power state
- Disable IOMMU (both by iommu=off and by disabling VT-d in BIOS)
- Boot with amdgpu.gpu_recovery=1 (does not produce any additional info)

I grabbed the umr tool to try to get the state of the GPU when in crashes but
it does not seem to be able to read anything. Running:

umr -R gfx[.]

Leaves me with:

[ERROR]: Could not open ring debugfs file#  

I check that entries in /sys/kernel/debug/amdgpu that look relevant are there,
cat'ing them gives me "Operation not permitted". Yes, I am doing it as root.

Once this happens the only way out is a hard reboot.

I am running up-to-date Fedora 28, kernel 4.17.2, Mesa 18.0 series, LLVM 6.0.1.

Is there anything else I can do?

Thanks.


You are receiving this mail because: