From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
Date: Fri, 19 Oct 2018 10:24:55 +0000
Message-ID:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0065153075=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 420EE6E27A
for ; Fri, 19 Oct 2018 10:24:55 +0000 (UTC)
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0065153075==
Content-Type: multipart/alternative; boundary="15399446950.ED60f.13761"
Content-Transfer-Encoding: 7bit
--15399446950.ED60f.13761
Date: Fri, 19 Oct 2018 10:24:55 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D108493
Bug ID: 108493
Summary: Unigine Heaven at 4K crashes amdgpu and causes a GPU
hang
Product: DRI
Version: unspecified
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: venemo@msn.com
I experience a consistent amdgpu crash when using my AMD GPU with a 4K scre=
en.
Hardware:
* Sapphire Radeon RX 570 Pulse ITX 4GB
* Zotac AMP box mini external GPU enclosure
* Dell XPS 13 9370 laptop
* Dell U2718Q 4K display
Software:
First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18=
.12,
4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is
18.2.2, but the crash is also there with 18.0 (on Fedora 28).
Steps to reproduce the crash:
1. Turn off the laptop
2. Attach the eGPU to the laptop
3. Attach a 4K screen to the HDMI output of the AMD GPU
4. Turn on the laptop
5. Add the following to the kernel command line: 'module_blacklist=3Di915 3=
' (to
ensure the Intel GPU is not used at all, plus the graphical login won't
interfere)
6. Launch the operating system
7. Log in from the console
8. Launch an X session with 'startx'
9. Start the Unigine Heaven benchmark in fullscreen 4K
Expected outcome:
Unigine Heaven should show up and run in a stable and performant manner.
Actual outcome:
Unigine Heaven shows up, runs for a couple of seconds and then the screen g=
oes
dark. I can still log into the machine with SSH, but can not kill X or inte=
ract
with the AMD GPU in any way. Can't even reboot the machine, the only thing =
that
works is long pressing the power key.
Relevant lines from dmesg log:
[ 305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D147930, emitted seq=3D147933
[ 305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D3176, emitted seq=3D3178
[ 305.078573] [drm] GPU recovery disabled.
Possible workaround:
* The crash does not happen when I disable power management with amdgpu.dpm=
=3D0,
however then it has very poor performance.
* The crash also doesn't happen when I use 'echo low >
/sys/class/drm/card0/device/power_dpm_force_performance_level' with the same
note about bad performance.
Additional information:
* Note that running any other graphics intensive application (ie. your
favourite game) will also result in the same crash, but Unigine Heaven is w=
hat
I found to be the quickest way to reproduce it.
* Also note that the crash is not X-specific but again this is what I found=
to
be the simplest way to reproduce it.
* The very same hardware works correctly on Windows without a crash. So thi=
s is
probably not a hardware defect.
* The crash is almost immediate on 4K, but it also occours with other
resolutions, just takes more time. At 1440p it takes a couple of minutes but
still crashes. At 1080p I could run it for several minutes without a crash =
(did
not test further than that).
* The problem seems to be similar to these:
https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 and
https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 - the difference is t=
hat
the suggested workarounds don't help, just seem to postpone the crash by a =
very
small margin. It still crashes in less than a minute though.
* Enabling GPU recovery does not actually manage to recover the GPU.
If you need any other kind of log or any more info, please let me know. Tha=
nk
you in advance for looking into solving this problem.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15399446950.ED60f.13761
Date: Fri, 19 Oct 2018 10:24:55 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Bug ID |
108493
|
Summary |
Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
|
Product |
DRI
|
Version |
unspecified
|
Hardware |
x86-64 (AMD64)
|
OS |
Linux (All)
|
Status |
NEW
|
Severity |
normal
|
Priority |
medium
|
Component |
DRM/AMDgpu
|
Assignee |
dri-devel@lists.freedesktop.org
|
Reporter |
venemo@msn.com
|
I experience a consistent amdgpu crash when using my AMD GPU w=
ith a 4K screen.
Hardware:
* Sapphire Radeon RX 570 Pulse ITX 4GB
* Zotac AMP box mini external GPU enclosure
* Dell XPS 13 9370 laptop
* Dell U2718Q 4K display
Software:
First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18=
.12,
4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is
18.2.2, but the crash is also there with 18.0 (on Fedora 28).
Steps to reproduce the crash:
1. Turn off the laptop
2. Attach the eGPU to the laptop
3. Attach a 4K screen to the HDMI output of the AMD GPU
4. Turn on the laptop
5. Add the following to the kernel command line: 'module_blacklist=3Di915 3=
' (to
ensure the Intel GPU is not used at all, plus the graphical login won't
interfere)
6. Launch the operating system
7. Log in from the console
8. Launch an X session with 'startx'
9. Start the Unigine Heaven benchmark in fullscreen 4K
Expected outcome:
Unigine Heaven should show up and run in a stable and performant manner.
Actual outcome:
Unigine Heaven shows up, runs for a couple of seconds and then the screen g=
oes
dark. I can still log into the machine with SSH, but can not kill X or inte=
ract
with the AMD GPU in any way. Can't even reboot the machine, the only thing =
that
works is long pressing the power key.
Relevant lines from dmesg log:
[ 305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D147930, emitted seq=3D147933
[ 305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D3176, emitted seq=3D3178
[ 305.078573] [drm] GPU recovery disabled.
Possible workaround:
* The crash does not happen when I disable power management with amdgpu.dpm=
=3D0,
however then it has very poor performance.
* The crash also doesn't happen when I use 'echo low >
/sys/class/drm/card0/device/power_dpm_force_performance_level' with the same
note about bad performance.
Additional information:
* Note that running any other graphics intensive application (ie. your
favourite game) will also result in the same crash, but Unigine Heaven is w=
hat
I found to be the quickest way to reproduce it.
* Also note that the crash is not X-specific but again this is what I found=
to
be the simplest way to reproduce it.
* The very same hardware works correctly on Windows without a crash. So thi=
s is
probably not a hardware defect.
* The crash is almost immediate on 4K, but it also occours with other
resolutions, just takes more time. At 1440p it takes a couple of minutes but
still crashes. At 1080p I could run it for several minutes without a crash =
(did
not test further than that).
* The problem seems to be similar to these:
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D105733 and
https://bugs.freedesktop.org/show_bug.=
cgi?id=3D102322 - the difference is that
the suggested workarounds don't help, just seem to postpone the crash by a =
very
small margin. It still crashes in less than a minute though.
* Enabling GPU recovery does not actually manage to recover the GPU.
If you need any other kind of log or any more info, please let me know. Tha=
nk
you in advance for looking into solving this problem.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15399446950.ED60f.13761--
--===============0065153075==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============0065153075==--