From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang Date: Fri, 19 Oct 2018 10:24:55 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0065153075==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 420EE6E27A for ; Fri, 19 Oct 2018 10:24:55 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0065153075== Content-Type: multipart/alternative; boundary="15399446950.ED60f.13761" Content-Transfer-Encoding: 7bit --15399446950.ED60f.13761 Date: Fri, 19 Oct 2018 10:24:55 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D108493 Bug ID: 108493 Summary: Unigine Heaven at 4K crashes amdgpu and causes a GPU hang Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: venemo@msn.com I experience a consistent amdgpu crash when using my AMD GPU with a 4K scre= en. Hardware: * Sapphire Radeon RX 570 Pulse ITX 4GB * Zotac AMP box mini external GPU enclosure * Dell XPS 13 9370 laptop * Dell U2718Q 4K display Software: First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18= .12, 4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is 18.2.2, but the crash is also there with 18.0 (on Fedora 28). Steps to reproduce the crash: 1. Turn off the laptop 2. Attach the eGPU to the laptop 3. Attach a 4K screen to the HDMI output of the AMD GPU 4. Turn on the laptop 5. Add the following to the kernel command line: 'module_blacklist=3Di915 3= ' (to ensure the Intel GPU is not used at all, plus the graphical login won't interfere) 6. Launch the operating system 7. Log in from the console 8. Launch an X session with 'startx' 9. Start the Unigine Heaven benchmark in fullscreen 4K Expected outcome: Unigine Heaven should show up and run in a stable and performant manner. Actual outcome: Unigine Heaven shows up, runs for a couple of seconds and then the screen g= oes dark. I can still log into the machine with SSH, but can not kill X or inte= ract with the AMD GPU in any way. Can't even reboot the machine, the only thing = that works is long pressing the power key. Relevant lines from dmesg log: [ 305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=3D147930, emitted seq=3D147933 [ 305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou= t, signaled seq=3D3176, emitted seq=3D3178 [ 305.078573] [drm] GPU recovery disabled. Possible workaround: * The crash does not happen when I disable power management with amdgpu.dpm= =3D0, however then it has very poor performance. * The crash also doesn't happen when I use 'echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level' with the same note about bad performance. Additional information: * Note that running any other graphics intensive application (ie. your favourite game) will also result in the same crash, but Unigine Heaven is w= hat I found to be the quickest way to reproduce it. * Also note that the crash is not X-specific but again this is what I found= to be the simplest way to reproduce it. * The very same hardware works correctly on Windows without a crash. So thi= s is probably not a hardware defect. * The crash is almost immediate on 4K, but it also occours with other resolutions, just takes more time. At 1440p it takes a couple of minutes but still crashes. At 1080p I could run it for several minutes without a crash = (did not test further than that). * The problem seems to be similar to these: https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 and https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 - the difference is t= hat the suggested workarounds don't help, just seem to postpone the crash by a = very small margin. It still crashes in less than a minute though. * Enabling GPU recovery does not actually manage to recover the GPU. If you need any other kind of log or any more info, please let me know. Tha= nk you in advance for looking into solving this problem. --=20 You are receiving this mail because: You are the assignee for the bug.= --15399446950.ED60f.13761 Date: Fri, 19 Oct 2018 10:24:55 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 108493
Summary Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter venemo@msn.com

I experience a consistent amdgpu crash when using my AMD GPU w=
ith a 4K screen.

Hardware:
* Sapphire Radeon RX 570 Pulse ITX 4GB
* Zotac AMP box mini external GPU enclosure
* Dell XPS 13 9370 laptop
* Dell U2718Q 4K display

Software:
First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18=
.12,
4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is
18.2.2, but the crash is also there with 18.0 (on Fedora 28).

Steps to reproduce the crash:
1. Turn off the laptop
2. Attach the eGPU to the laptop
3. Attach a 4K screen to the HDMI output of the AMD GPU
4. Turn on the laptop
5. Add the following to the kernel command line: 'module_blacklist=3Di915 3=
' (to
ensure the Intel GPU is not used at all, plus the graphical login won't
interfere)
6. Launch the operating system
7. Log in from the console
8. Launch an X session with 'startx'
9. Start the Unigine Heaven benchmark in fullscreen 4K

Expected outcome:
Unigine Heaven should show up and run in a stable and performant manner.

Actual outcome:
Unigine Heaven shows up, runs for a couple of seconds and then the screen g=
oes
dark. I can still log into the machine with SSH, but can not kill X or inte=
ract
with the AMD GPU in any way. Can't even reboot the machine, the only thing =
that
works is long pressing the power key.

Relevant lines from dmesg log:
[  305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D147930, emitted seq=3D147933
[  305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D3176, emitted seq=3D3178
[  305.078573] [drm] GPU recovery disabled.

Possible workaround:
* The crash does not happen when I disable power management with amdgpu.dpm=
=3D0,
however then it has very poor performance.
* The crash also doesn't happen when I use 'echo low >
/sys/class/drm/card0/device/power_dpm_force_performance_level' with the same
note about bad performance.

Additional information:
* Note that running any other graphics intensive application (ie. your
favourite game) will also result in the same crash, but Unigine Heaven is w=
hat
I found to be the quickest way to reproduce it.
* Also note that the crash is not X-specific but again this is what I found=
 to
be the simplest way to reproduce it.
* The very same hardware works correctly on Windows without a crash. So thi=
s is
probably not a hardware defect.
* The crash is almost immediate on 4K, but it also occours with other
resolutions, just takes more time. At 1440p it takes a couple of minutes but
still crashes. At 1080p I could run it for several minutes without a crash =
(did
not test further than that).
* The problem seems to be similar to these:
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D105733 and
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D102322 - the difference is that
the suggested workarounds don't help, just seem to postpone the crash by a =
very
small margin. It still crashes in less than a minute though.
* Enabling GPU recovery does not actually manage to recover the GPU.

If you need any other kind of log or any more info, please let me know. Tha=
nk
you in advance for looking into solving this problem.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15399446950.ED60f.13761-- --===============0065153075== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0065153075==--