All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
@ 2018-10-19 10:24 bugzilla-daemon
  2018-10-22 19:20 ` bugzilla-daemon
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-19 10:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3814 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

            Bug ID: 108493
           Summary: Unigine Heaven at 4K crashes amdgpu and causes a GPU
                    hang
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: venemo@msn.com

I experience a consistent amdgpu crash when using my AMD GPU with a 4K screen.

Hardware:
* Sapphire Radeon RX 570 Pulse ITX 4GB
* Zotac AMP box mini external GPU enclosure
* Dell XPS 13 9370 laptop
* Dell U2718Q 4K display

Software:
First tried with Fedora 28. Now using Fedora 29. Tried kernel versions 4.18.12,
4.18.13 and 4.19-rc7, the issue appears with all of these. Mesa version is
18.2.2, but the crash is also there with 18.0 (on Fedora 28).

Steps to reproduce the crash:
1. Turn off the laptop
2. Attach the eGPU to the laptop
3. Attach a 4K screen to the HDMI output of the AMD GPU
4. Turn on the laptop
5. Add the following to the kernel command line: 'module_blacklist=i915 3' (to
ensure the Intel GPU is not used at all, plus the graphical login won't
interfere)
6. Launch the operating system
7. Log in from the console
8. Launch an X session with 'startx'
9. Start the Unigine Heaven benchmark in fullscreen 4K

Expected outcome:
Unigine Heaven should show up and run in a stable and performant manner.

Actual outcome:
Unigine Heaven shows up, runs for a couple of seconds and then the screen goes
dark. I can still log into the machine with SSH, but can not kill X or interact
with the AMD GPU in any way. Can't even reboot the machine, the only thing that
works is long pressing the power key.

Relevant lines from dmesg log:
[  305.078426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=147930, emitted seq=147933
[  305.078567] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=3176, emitted seq=3178
[  305.078573] [drm] GPU recovery disabled.

Possible workaround:
* The crash does not happen when I disable power management with amdgpu.dpm=0,
however then it has very poor performance.
* The crash also doesn't happen when I use 'echo low >
/sys/class/drm/card0/device/power_dpm_force_performance_level' with the same
note about bad performance.

Additional information:
* Note that running any other graphics intensive application (ie. your
favourite game) will also result in the same crash, but Unigine Heaven is what
I found to be the quickest way to reproduce it.
* Also note that the crash is not X-specific but again this is what I found to
be the simplest way to reproduce it.
* The very same hardware works correctly on Windows without a crash. So this is
probably not a hardware defect.
* The crash is almost immediate on 4K, but it also occours with other
resolutions, just takes more time. At 1440p it takes a couple of minutes but
still crashes. At 1080p I could run it for several minutes without a crash (did
not test further than that).
* The problem seems to be similar to these:
https://bugs.freedesktop.org/show_bug.cgi?id=105733 and
https://bugs.freedesktop.org/show_bug.cgi?id=102322 - the difference is that
the suggested workarounds don't help, just seem to postpone the crash by a very
small margin. It still crashes in less than a minute though.
* Enabling GPU recovery does not actually manage to recover the GPU.

If you need any other kind of log or any more info, please let me know. Thank
you in advance for looking into solving this problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5528 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
@ 2018-10-22 19:20 ` bugzilla-daemon
  2018-10-22 19:26 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 443 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #1 from keramidasceid@gmail.com ---
I have the exact same problem.

The only differences are the following:
* I have the Asus Radeon RX 580 ROG Strix TOP OC 8GB GPU
* I use the Unigine Superposition to reproduce the problem quickly
* I have kernel 4.18.14 on Fedora 28
* I have mesa 18.0.5

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
  2018-10-22 19:20 ` bugzilla-daemon
@ 2018-10-22 19:26 ` bugzilla-daemon
  2018-10-22 19:27 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 313 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #2 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142139
  --> https://bugs.freedesktop.org/attachment.cgi?id=142139&action=edit
dmesg after the crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1243 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
  2018-10-22 19:20 ` bugzilla-daemon
  2018-10-22 19:26 ` bugzilla-daemon
@ 2018-10-22 19:27 ` bugzilla-daemon
  2018-10-22 19:27 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 325 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #3 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142140
  --> https://bugs.freedesktop.org/attachment.cgi?id=142140&action=edit
ddebug dumb from unigine heaven 0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1279 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-10-22 19:27 ` bugzilla-daemon
@ 2018-10-22 19:27 ` bugzilla-daemon
  2018-10-22 19:27 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 325 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #4 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142141
  --> https://bugs.freedesktop.org/attachment.cgi?id=142141&action=edit
ddebug dumb from unigine heaven 1

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1279 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-10-22 19:27 ` bugzilla-daemon
@ 2018-10-22 19:27 ` bugzilla-daemon
  2018-10-22 19:30 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 325 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #5 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142142
  --> https://bugs.freedesktop.org/attachment.cgi?id=142142&action=edit
ddebug dumb from unigine heaven 2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1279 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-10-22 19:27 ` bugzilla-daemon
@ 2018-10-22 19:30 ` bugzilla-daemon
  2018-10-23 16:01 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-22 19:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 573 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #6 from Timur Kristóf <venemo@msn.com> ---
On freenode in #dri-devel I got the suggestion to run unigine heaven with
GALLIUM_DDEBUG="1000". So I just did that. It created 3 files, which I attached
to this bug report along with the dmesg log that I took after the crash.

Some people suggested that this may be in fact an issue with radeonsi (and not
amdgpu), if this is the case, please reassign this bug appropriately.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1379 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-10-22 19:30 ` bugzilla-daemon
@ 2018-10-23 16:01 ` bugzilla-daemon
  2018-10-23 16:12 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-23 16:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 782 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #7 from Timur Kristóf <venemo@msn.com> ---
1. It was suggested that this is a thermal issue. So I monitored the GPU
temperatures with GALLIUM_HUD and it was about 40 Celsius when the crash
happened.

2. It was also suggested that this is a VRAM memory leak. Again with
GALLIUM_HUD I could see that about 1 GB of VRAM gets used (out of the 4 GB),
when the crash happens.

3. Also, just to see if this is a power consumption issue (ie. the GPU drawing
more power than can be supplied), I tried to lower the value from
/sys/class/drm/card0/device/hwmon/hwmon0/power1_cap to 80 Watts. It did not
stop the crash from happening.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1578 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-10-23 16:01 ` bugzilla-daemon
@ 2018-10-23 16:12 ` bugzilla-daemon
  2018-10-27  5:32 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-23 16:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 314 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #8 from Timur Kristóf <venemo@msn.com> ---
Upgraded to kernel 4.18.16 and mesa 18.2.3 which is supposed to fix a GPU hang.
Did not help, the problem is still there.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1110 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-10-23 16:12 ` bugzilla-daemon
@ 2018-10-27  5:32 ` bugzilla-daemon
  2018-10-27  5:38 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-27  5:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2399 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #9 from Timur Kristóf <venemo@msn.com> ---
I think I discovered a possible reason for this issue. If you look at the
DDEBUG dumps, it says in several places: "This slot was corrupted in GPU
memory". So I began to suspect something was wrong with the VRAM.

After looking around a bit, I found that the amdgpu driver does not honor the
voltage settings from the VBIOS, and sets the memory to use lower voltages
instead. So basically the driver undervolts the VRAM without me asking to do
so. I guess this might be considered a feature for some people.

However, when I manually edit pp_od_clk_voltage to increase the OD_MCLK
voltages, then the card begins to work in a stable manner and the GPU hang is
gone. (Or at the very least I haven't seen a hang yet, whereas previously it
used to hang in less than a minute.)

In my case, the VBIOS wants to set the MCLK voltages to 1000 mV at all
frequencies, while amdgpu sets them to 750 mv, 800 mV, and 900mV. And it turns
out that 900 mV is just too low for my card at 1750 MHz.

[root@timur-xps ~]# cat /sys/class/drm/card0/device/pp_od_clk_voltage 
OD_SCLK:
0:        300MHz        750mV
1:        588MHz        765mV
2:        952MHz        900mV
3:       1041MHz        975mV
4:       1106MHz       1031mV
5:       1168MHz       1093mV
6:       1209MHz       1143mV
7:       1244MHz       1150mV
OD_MCLK:
0:        300MHz        750mV
1:       1000MHz        800mV
2:       1750MHz        900mV
OD_RANGE:
SCLK:     300MHz       2000MHz
MCLK:     300MHz       2250MHz
VDDC:     750mV        1150mV
[root@timur-xps ~]# cat /sys/kernel/debug/dri/0/amdgpu_vbios > mybios.rom
[root@timur-xps ~]# pbec -i mybios.rom -s -r MEMORY_CLOCK

----
[DEFAULT] ATOM_MCLK_ENTRY Array
----

Entry: 0
        Frequency: 300 MHz.
        Voltage:. 1000 MV
Entry: 1
        Frequency: 1000 MHz.
        Voltage:. 1000 MV
Entry: 2
        Frequency: 1750 MHz.
        Voltage:. 1000 MV
----


Here is some info about the VBIOS:

[root@timur-xps ~]# cat /sys/class/drm/card0/device/subsystem_device
0xe343
[root@timur-xps ~]# cat /sys/class/drm/card0/device/subsystem_vendor
0x1da2
[root@timur-xps ~]# cat /sys/class/drm/card0/device/vbios_version
113-D00034-S07

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3232 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-10-27  5:32 ` bugzilla-daemon
@ 2018-10-27  5:38 ` bugzilla-daemon
  2018-10-27  5:39 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-27  5:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #10 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142227
  --> https://bugs.freedesktop.org/attachment.cgi?id=142227&action=edit
Content of /sys/kernel/debug/dri/0/amdgpu_vbios

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1323 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-10-27  5:38 ` bugzilla-daemon
@ 2018-10-27  5:39 ` bugzilla-daemon
  2018-10-27  5:50 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-27  5:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #11 from Timur Kristóf <venemo@msn.com> ---
Created attachment 142228
  --> https://bugs.freedesktop.org/attachment.cgi?id=142228&action=edit
Content of /sys/class/drm/card0/device/pp_table

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1323 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-10-27  5:39 ` bugzilla-daemon
@ 2018-10-27  5:50 ` bugzilla-daemon
  2018-10-29 20:41 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-27  5:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 374 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #12 from Timur Kristóf <venemo@msn.com> ---
By the way the voltage issue has already been reported against ROCm and is
supposed to be already fixed. The details are here:
https://github.com/RadeonOpenCompute/ROCm/issues/348

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1238 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-10-27  5:50 ` bugzilla-daemon
@ 2018-10-29 20:41 ` bugzilla-daemon
  2018-10-29 21:01 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-29 20:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 782 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #13 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Timur Kristóf from comment #9)

> OD_MCLK:
> 0:        300MHz        750mV
> 1:       1000MHz        800mV
> 2:       1750MHz        900mV

This is vddc.

> [DEFAULT] ATOM_MCLK_ENTRY Array
> ----
> 
> Entry: 0
> 	Frequency: 300 MHz.
> 	Voltage:. 1000 MV
> Entry: 1
> 	Frequency: 1000 MHz.
> 	Voltage:. 1000 MV
> Entry: 2
> 	Frequency: 1750 MHz.
> 	Voltage:. 1000 MV
> ----

This is mvdd.  these are not the same voltages.  The pp_od_clk_voltage
interface only allows you to adjust vddc.  The vddc values match what is in the
vbios for your card.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1731 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-10-29 20:41 ` bugzilla-daemon
@ 2018-10-29 21:01 ` bugzilla-daemon
  2018-10-31  8:34 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-29 21:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 388 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #14 from Alex Deucher <alexdeucher@gmail.com> ---
I suspect the display may require additional voltage in your case which is why
you see the issue at 4k.  The display requirements are not handled as finely on
Linux as they are in windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1190 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-10-29 21:01 ` bugzilla-daemon
@ 2018-10-31  8:34 ` bugzilla-daemon
  2018-11-18 21:52 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-10-31  8:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1445 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #15 from Timur Kristóf <venemo@msn.com> ---
(In reply to Alex Deucher from comment #14)
> I suspect the display may require additional voltage in your case which is
> why you see the issue at 4k.  The display requirements are not handled as
> finely on Linux as they are in windows.

Thanks Alex for explaining the difference between vddc and mvdd.

After using the GPU in this way for a couple of days I can tell you that
increasing the voltage definitely improves the stability of the system but
ultimately it doesn't fix the problem. The GPU can still hang with the same
"ring gfx timeout", it just takes more time before the problem occours.

Some additonal comments:

* I'd like to emphasize that the problem is not specific to 4K and will happen
on 1080p, just later. Ie. the GPU hangs in a couple of minutes instead of
immediately.
* echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
does not help at all.
* I also tried amdgpu.vm_update_mode=3 (found as a suggestion from another
similar bug report) but it doesn't help at all.
* I also tried manually setting the sclk to a fixed, lower level (another
suggestion from another bugreport) which seems to improve the stability by a
small margin but it also doesn't prevent the GPU from hanging.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2333 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (14 preceding siblings ...)
  2018-10-31  8:34 ` bugzilla-daemon
@ 2018-11-18 21:52 ` bugzilla-daemon
  2019-01-14 16:17 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2018-11-18 21:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1574 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #16 from Timur Kristóf <venemo@msn.com> ---
Hi,

After some more experimentation it seems that increasing the highest mclk
voltage above 900 mV and  setting all other voltages in pp_od_clk_voltage in
such a way that they remain below 1000 mV, is a viable workaround that makes
the GPU stable.

Here is what I do to achieve this:

echo "2" > /sys/class/drm/card0/device/pp_sclk_od
echo "2" > /sys/class/drm/card0/device/pp_mclk_od
echo "s 0 300 750" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 1 588 765" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 2 952 900" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 3 1041 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 4 1106 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 5 1168 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 6 1209 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "s 7 1244 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "m 0 300 750" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "m 1 1000 850" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "m 2 1750 970" > /sys/class/drm/card0/device/pp_od_clk_voltage
echo "c" > /sys/class/drm/card0/device/pp_od_clk_voltage

After running this script, I can play on the GPU for several hours and I don't
see the hang happening anymore.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2566 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-11-18 21:52 ` bugzilla-daemon
@ 2019-01-14 16:17 ` bugzilla-daemon
  2019-03-14  1:47 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2019-01-14 16:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 889 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #17 from Timur Kristóf <venemo@msn.com> ---
Hi Everyone,

I've just tested Linux 5.0-rc1 and have not encountered the problem so far.
Looking into it more, I think the same patch set that fixed the Sapphire RX 590
for Michael @ Phoronix also fixed my Sapphire RX 570.

Assuming this is the main patch that fixed the issue:
https://github.com/torvalds/linux/commit/816b6931315b641c5864cf33a9363cb89da05d0b
(specifically the line that sets ucEnableApplyAVFS_CKS_OFF_Voltage). Looking at
the code, it seems a bunch other GPUs are affected (besides the RX 590 and RX
570).

Could you guys please send this patch series for inclusion into the stable
kernel? Since it fixes a huge stability issue I think it is a reasonable
request.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1786 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (16 preceding siblings ...)
  2019-01-14 16:17 ` bugzilla-daemon
@ 2019-03-14  1:47 ` bugzilla-daemon
  2019-03-15 13:04 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2019-03-14  1:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 294 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #18 from fin4478@hotmail.com ---
Add amdgpu.ppfeaturemask=0xfffd7fff to the kernel command line to make the
powerplay work with RX 570 at 4K60Hz.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1074 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (17 preceding siblings ...)
  2019-03-14  1:47 ` bugzilla-daemon
@ 2019-03-15 13:04 ` bugzilla-daemon
  2019-08-25 16:26 ` bugzilla-daemon
  2019-11-19  8:59 ` bugzilla-daemon
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2019-03-15 13:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 556 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

Timur Kristóf <venemo@msn.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #19 from Timur Kristóf <venemo@msn.com> ---
Since this is fixed by kernel 5.0, I'm marking it as resolved fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2075 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-03-15 13:04 ` bugzilla-daemon
@ 2019-08-25 16:26 ` bugzilla-daemon
  2019-11-19  8:59 ` bugzilla-daemon
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2019-08-25 16:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1303 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

Łukasz Posadowski <mail@lukaszposadowski.pl> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|FIXED                       |---
             Status|RESOLVED                    |REOPENED

--- Comment #20 from Łukasz Posadowski <mail@lukaszposadowski.pl> ---
Since it is my first (technically second, but by GPU crashed before I was able
to finish it the first time), hi everyone.

I am expering very similar problem on Kernel 5.2.8 on Fedora 30 and AMD RX570
pci express card. I described it here -
https://bugzilla.redhat.com/show_bug.cgi?id=1739766 .

I can run X with 'low' in
sys/class/drm/card0/device/power_dpm_force_performance_level. Basically
everytime GPU has ~100% load, the card resets itself and never come back. I can
ssh into the system and switch to the text console, it is just GPU with X
Server that doesn't work.

Card also crashes sometimes with 'low' setting, usually after 30 minutes or a
hour of gaming., but then it's a hard crash and I can't even switch to the text
console.

Thanks for any help.
Łukasz

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2875 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang
  2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
                   ` (19 preceding siblings ...)
  2019-08-25 16:26 ` bugzilla-daemon
@ 2019-11-19  8:59 ` bugzilla-daemon
  20 siblings, 0 replies; 22+ messages in thread
From: bugzilla-daemon @ 2019-11-19  8:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 806 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=108493

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|REOPENED                    |RESOLVED

--- Comment #21 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/564.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2402 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-11-19  8:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-19 10:24 [Bug 108493] Unigine Heaven at 4K crashes amdgpu and causes a GPU hang bugzilla-daemon
2018-10-22 19:20 ` bugzilla-daemon
2018-10-22 19:26 ` bugzilla-daemon
2018-10-22 19:27 ` bugzilla-daemon
2018-10-22 19:27 ` bugzilla-daemon
2018-10-22 19:27 ` bugzilla-daemon
2018-10-22 19:30 ` bugzilla-daemon
2018-10-23 16:01 ` bugzilla-daemon
2018-10-23 16:12 ` bugzilla-daemon
2018-10-27  5:32 ` bugzilla-daemon
2018-10-27  5:38 ` bugzilla-daemon
2018-10-27  5:39 ` bugzilla-daemon
2018-10-27  5:50 ` bugzilla-daemon
2018-10-29 20:41 ` bugzilla-daemon
2018-10-29 21:01 ` bugzilla-daemon
2018-10-31  8:34 ` bugzilla-daemon
2018-11-18 21:52 ` bugzilla-daemon
2019-01-14 16:17 ` bugzilla-daemon
2019-03-14  1:47 ` bugzilla-daemon
2019-03-15 13:04 ` bugzilla-daemon
2019-08-25 16:26 ` bugzilla-daemon
2019-11-19  8:59 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.