All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
@ 2019-03-11 11:27 ` bugzilla-daemon
  2019-03-22 20:01 ` bugzilla-daemon
                   ` (136 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-03-11 11:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 656 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

Michel Dänzer <michel@daenzer.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop
                   |org                         |.org
          Component|Mesa core                   |Drivers/Gallium/radeonsi
         QA Contact|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop
                   |org                         |.org

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1543 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
  2019-03-11 11:27 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming bugzilla-daemon
@ 2019-03-22 20:01 ` bugzilla-daemon
  2019-03-22 20:02 ` bugzilla-daemon
                   ` (135 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-03-22 20:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 327 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #1 from Mauro Gaspari <ilvipero@gmx.com> ---
Created attachment 143759
  --> https://bugs.freedesktop.org/attachment.cgi?id=143759&action=edit
syslog lines relevant to the crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
  2019-03-11 11:27 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming bugzilla-daemon
  2019-03-22 20:01 ` bugzilla-daemon
@ 2019-03-22 20:02 ` bugzilla-daemon
  2019-03-22 20:02 ` bugzilla-daemon
                   ` (134 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-03-22 20:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 315 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #2 from Mauro Gaspari <ilvipero@gmx.com> ---
Created attachment 143760
  --> https://bugs.freedesktop.org/attachment.cgi?id=143760&action=edit
full dmesg after crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (2 preceding siblings ...)
  2019-03-22 20:02 ` bugzilla-daemon
@ 2019-03-22 20:02 ` bugzilla-daemon
  2019-04-11  6:37 ` bugzilla-daemon
                   ` (133 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-03-22 20:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 718 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #3 from Mauro Gaspari <ilvipero@gmx.com> ---
New reports as the issue is still happening:

I found a link on phoronix that describes with pictures exactly what is
happening:
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1049483-amd-devs-error-ring-gfx-timeout


OS: OpenSUSE tumbleweed x86_64 updated (2019 03 23)
Kernel: 5.0.2-1-default
Desktop Environment: KDE Plasma (x11)
OpenGL version: string: 4.5 (Compatibility Profile) Mesa 19.0.0
GPU: AMD Radeon RX Vega 64 8GB

Attaching log files and dmesg after crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1636 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (3 preceding siblings ...)
  2019-03-22 20:02 ` bugzilla-daemon
@ 2019-04-11  6:37 ` bugzilla-daemon
  2019-04-12 21:37 ` bugzilla-daemon
                   ` (132 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-11  6:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2181 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #4 from Mauro Gaspari <ilvipero@gmx.com> ---
Issue still happens despite kernel updates and mesa updates on openSUSE
Tumbleweed. Same happens on Kubuntu with oibaf ppa, and on Arch.

It seems this bug affects many people on linux using AMDGPUS, and found some
interesting workarounds. Had a look at kernel options, applied to grub, and so
far it has been 2 weeks of extensive testing, and I did not have a single
system freeze or hang.

-> BEGIN KENEL PARAMETERS <-
This is what I am using now. Please note that some of those settings are to
enable debugging and should not left there forever. I will remove those once
I am confident with the stability of the system.

AMDGPU
amdgpu.dc=1 amdgpu.vm_update_mode=0 amdgpu.dpm=-1
amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_fault_stop=2 amdgpu.vm_debug=1
amdgpu.gpu_recovery=0


- Kernel parameters explained from:
https://www.kernel.org/doc/html/latest/gpu/amdgpu.html

--- dc (int)
Disable/Enable Display Core driver for debugging (1 = enable, 0 = disable).
The default is -1 (automatic for each asic).


--- dpm (int)
Override for dynamic power management setting (1 = enable, 0 = disable). The
default is -1 (auto).

--- vm_update_mode (int)
Override VM update mode. VM updated by using CPU (0 = never, 1 = Graphics
only, 2 = Compute only, 3 = Both). The default is -1 (Only in large BAR(LB)
systems Compute VM tables will be updated by CPU, otherwise 0, never).

--- ppfeaturemask (uint)
Override power features enabled. See enum PP_FEATURE_MASK in
drivers/gpu/drm/amd/include/amd_shared.h. The default is the current set of
stable power features.

--- vm_fault_stop (int)
Stop on VM fault for debugging (0 = never, 1 = print first, 2 = always). The
default is 0 (No stop).

--- vm_debug (int)
Debug VM handling (0 = disabled, 1 = enabled). The default is 0 (Disabled).

-gpu_recovery (int)
Set to enable GPU recovery mechanism (1 = enable, 0 = disable). The default
is -1 (auto, disabled except SRIOV).

-> END KERNEL PARAMETERS <-

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3037 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (4 preceding siblings ...)
  2019-04-11  6:37 ` bugzilla-daemon
@ 2019-04-12 21:37 ` bugzilla-daemon
  2019-04-12 22:10 ` bugzilla-daemon
                   ` (131 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-12 21:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 798 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #5 from Jaap Buurman <jaapbuurman@gmail.com> ---
I have the exact same problem with my Vega 64. Crashes when playing games.
Happens with Vulkan games (RADV), OpenGL games (RadeonSI) and DirectX 9 games
via Wine (Gallium9). It happens only for some games, presumably because it
depends on the workload.

I am also suspecting power management issues. This might be a long shot, but
worth a try. I know for a fact that Power management works slightly different
when multiple monitors are connected, as memory isn't clocked back as much in
that case. For the people also experiencing this issue, are you guys running
multiple monitors like I am?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1577 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (5 preceding siblings ...)
  2019-04-12 21:37 ` bugzilla-daemon
@ 2019-04-12 22:10 ` bugzilla-daemon
  2019-04-13  9:34 ` bugzilla-daemon
                   ` (130 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-12 22:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 668 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #6 from Jaap Buurman <jaapbuurman@gmail.com> ---
Another question: What is the output of the following command for you guys?

cat /sys/class/drm/card0/device/vbios_version 

I am running the following version:

113-D0500100-103

According to the techpowerup GPU bios database, this is a vega bios that was
replaced two days (!) later by a new version. Perhaps issues were found that
required another bios update? I might install Windows on a spare HDD and try to
flash my Vega to see if that changes anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1447 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (6 preceding siblings ...)
  2019-04-12 22:10 ` bugzilla-daemon
@ 2019-04-13  9:34 ` bugzilla-daemon
  2019-04-13  9:41 ` bugzilla-daemon
                   ` (129 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13  9:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 300 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #7 from Mauro Gaspari <ilvipero@gmx.com> ---
@ Jaap Buurman 
I run a single monitor, ultra-wide 3440xx1440 @100hz.

my bios version: 113-D0500100-103

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1083 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (7 preceding siblings ...)
  2019-04-13  9:34 ` bugzilla-daemon
@ 2019-04-13  9:41 ` bugzilla-daemon
  2019-04-13  9:49 ` bugzilla-daemon
                   ` (128 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13  9:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 570 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #8 from Jaap Buurman <jaapbuurman@gmail.com> ---
I guess we can rule out a multi-monitor issue then. But I find is VERY
interesting that you also run the exact same bios version, that was replaced
two days later, so it should be fairly rare. Perhaps it is buggy and was
therefor replaced only 2 days after it was released? I am going to try and
flash my GPU in Windows on a separate HDD and see if that fixes anything.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1349 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (8 preceding siblings ...)
  2019-04-13  9:41 ` bugzilla-daemon
@ 2019-04-13  9:49 ` bugzilla-daemon
  2019-04-13  9:52 ` bugzilla-daemon
                   ` (127 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13  9:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 493 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #9 from Mauro Gaspari <ilvipero@gmx.com> ---
Interesting catch the one about the BIOS of the card.

I have a separate SSD with windows10 I use to test this card stability. I will
check my windows MSI update tool, see if it offers me an updated BIOS. If I do
have an updated bios I will temporarily remove my workarounds and see how it
goes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1268 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (9 preceding siblings ...)
  2019-04-13  9:49 ` bugzilla-daemon
@ 2019-04-13  9:52 ` bugzilla-daemon
  2019-04-13 11:34 ` bugzilla-daemon
                   ` (126 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13  9:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 701 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #10 from Jaap Buurman <jaapbuurman@gmail.com> ---
You will have to flash using Atiflash:

https://www.techpowerup.com/download/ati-atiflash/

And downloading the latest bios for your card from Techpowerup as well:

https://www.techpowerup.com/vgabios/

Bios updates are usually not supported directly by the vendor, but I have never
worked with MSI update tool, so I am not 100% sure.

Make sure you are very careful when picking the bios. Some bioses are for the
watercooling variant, variants with aftermarket coolers, or overclocked ones.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1597 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (10 preceding siblings ...)
  2019-04-13  9:52 ` bugzilla-daemon
@ 2019-04-13 11:34 ` bugzilla-daemon
  2019-04-13 13:19 ` bugzilla-daemon
                   ` (125 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13 11:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 766 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #11 from Mauro Gaspari <ilvipero@gmx.com> ---
You are right. MSI tools do not offer any BIOS update for GPU.

I downloaded the utility and filtered BIOS by vendor and DeviceID, I saw the 3
BIOS version and the one that, as you said was released 2 days after the one we
are using.

I do not have high hopes, because with current BIOS, all games on windows run
fine. But well, cannot hurt to try the upgrade. Worst case I will re-introduce
my workarounds. I had zero freezes with those enabled in the last 2 weeks. 

And if I end up bricking my GPU out of warranty, I have the excuse to get a new
RadeonVII :D

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1542 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (11 preceding siblings ...)
  2019-04-13 11:34 ` bugzilla-daemon
@ 2019-04-13 13:19 ` bugzilla-daemon
  2019-04-13 13:45 ` bugzilla-daemon
                   ` (124 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13 13:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 684 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #12 from Jaap Buurman <jaapbuurman@gmail.com> ---
My Vega64 was also 100% stable on the exact same build under Windows 10. So I
am also not getting my hopes up, but I am really frustrated. I am hoping it is
some kind of incompatibility problem. I have honestly tried so many things,
that I am willing to give the long-shots a chance as well. 

Since my Switch to Linux ~1.5 years ago, stability with the Vega64 has been
very finicky. Some games run fine, while some games cause this crash pretty
reliably. Very, very frustrating.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1464 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (12 preceding siblings ...)
  2019-04-13 13:19 ` bugzilla-daemon
@ 2019-04-13 13:45 ` bugzilla-daemon
  2019-04-15 12:51 ` bugzilla-daemon
                   ` (123 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-13 13:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1096 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #13 from Mauro Gaspari <ilvipero@gmx.com> ---
Status update: I updated the BIOS and now disabled all kernel parameters I
previously used. It might take some time to make sure the system is stable. 

Regarding your frustrations,
AMD released open source drivers and that is a major improvement for people on
Linux. I got the Vega RX64 to support that. I expected a few bumps in the road
but well, it is taking longer than anticipated.

Having said that, there you are all kernel parameters I enabled, and with those
as I said, I was unable to get a single freeze. Those are not fixes, most
likely optimizations and workarounds. Still, work pretty well for me.

CPU
rcu_nocbs=0-15 (adjust to the number of cores of your cpu)
idle=nomwait
processor.max_cstate=5
pcie_aspm=off 

GPU
amdgpu.dc=1
amdgpu.vm_update_mode=0
amdgpu.dpm=-1
amdgpu.ppfeaturemask=0xffffffff
amdgpu.vm_fault_stop=2
amdgpu.vm_debug=1
amdgpu.gpu_recovery=0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1872 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (13 preceding siblings ...)
  2019-04-13 13:45 ` bugzilla-daemon
@ 2019-04-15 12:51 ` bugzilla-daemon
  2019-04-25 19:44 ` bugzilla-daemon
                   ` (122 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-15 12:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 692 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #14 from Mauro Gaspari <ilvipero@gmx.com> ---
Quick update.


OS: OpenSUSE tumbleweed x86_64 updated (2019 04 15)
Kernel: 5.0.7-1-default
Desktop Environment: KDE Plasma (x11)
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.1
GPU: AMD Radeon RX Vega 64 8GB


GPU firmware upgrade did not change much. 
I disabled kernel parameters on grub, upgraded BIOS, ran some games. Same old
system freeze on my system came back.

After that, I re-enabled kernel parameters on grub, rebooted. no more system
freeze on my system.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1468 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (14 preceding siblings ...)
  2019-04-15 12:51 ` bugzilla-daemon
@ 2019-04-25 19:44 ` bugzilla-daemon
  2019-04-28 16:33 ` bugzilla-daemon
                   ` (121 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-25 19:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 757 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #15 from Jaap Buurman <jaapbuurman@gmail.com> ---
That's bad to hear :( Worth a try though. How often do you experience freezes
by the way? And is this for all games, or are some games completely stable? For
me, I am getting crashes in Kerbal Space Program, but not in Final Fantasy XII
or World of Warcraft, even after hundreds of hours in both of these stable
games.

Also, have you ever figured out which kernel parameter in particular makes your
setup stable? It might help identify where the problem exists. Or do you need
that exact combination of all those parameters to get your system stable?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1537 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (15 preceding siblings ...)
  2019-04-25 19:44 ` bugzilla-daemon
@ 2019-04-28 16:33 ` bugzilla-daemon
  2019-04-29  1:15 ` bugzilla-daemon
                   ` (120 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-28 16:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 22938 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #16 from Jaap Buurman <jaapbuurman@gmail.com> ---
Just got a crash in World of Warcraft as well, running via vkd3d. It happens
instantly after trying to log into the game world, so the issue is nicely
reproducible for me. If you want me to get any traces, please let me know what
you would like me to run to get them. dmesg logs for now:

[   78.450637] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450641] amdgpu 0000:09:00.0:   in page starting at address
0x0000984ec2d4b000 from 27
[   78.450642] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450648] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450650] amdgpu 0000:09:00.0:   in page starting at address
0x0000850e92553000 from 27
[   78.450652] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450656] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450658] amdgpu 0000:09:00.0:   in page starting at address
0x0000984ec2d4e000 from 27
[   78.450660] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450665] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450666] amdgpu 0000:09:00.0:   in page starting at address
0x0000850e92542000 from 27
[   78.450668] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450673] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450674] amdgpu 0000:09:00.0:   in page starting at address
0x0000984ec2d42000 from 27
[   78.450676] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450680] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450682] amdgpu 0000:09:00.0:   in page starting at address
0x0000850e92552000 from 27
[   78.450683] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450688] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450690] amdgpu 0000:09:00.0:   in page starting at address
0x0000984ec2d40000 from 27
[   78.450691] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450696] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450697] amdgpu 0000:09:00.0:   in page starting at address
0x0000850e92552000 from 27
[   78.450699] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450703] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450705] amdgpu 0000:09:00.0:   in page starting at address
0x0000984ec2d49000 from 27
[   78.450706] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.450711] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:158
vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370)
[   78.450713] amdgpu 0000:09:00.0:   in page starting at address
0x0000850ea1eb2000 from 27
[   78.450714] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
[   78.454307] amdgpu 0000:09:00.0: IH ring buffer overflow (0x000BEDC0,
0x0003EEC0, 0x0003EDE0)
[   88.570062] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=25317, emitted seq=25319
[   88.570099] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370
[   88.570102] amdgpu 0000:09:00.0: GPU reset begin!
[   88.831392] amdgpu 0000:09:00.0: GPU reset
[   89.356679] [drm] psp mode1 reset succeed 
[   89.475356] amdgpu 0000:09:00.0: GPU reset succeeded, trying to resume
[   89.475465] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[   89.475508] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[   89.475642] [drm] PSP is resuming...
[   89.623052] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[   89.806625] [drm] SADs count is: -2, don't need to read it
[   89.856619] [drm] SADs count is: -2, don't need to read it
[   89.938255] [drm] UVD and UVD ENC initialized successfully.
[   90.038674] [drm] VCE initialized successfully.
[   90.039672] [drm] recover vram bo from shadow start
[   90.047496] [drm] recover vram bo from shadow done
[   90.047497] [drm] Skip scheduling IBs!
[   90.047499] [drm] Skip scheduling IBs!
[   90.047511] [drm] Skip scheduling IBs!
[   90.047518] [drm] Skip scheduling IBs!
[   90.047523] [drm] Skip scheduling IBs!
[   90.047524] [drm] Skip scheduling IBs!
[   90.047530] [drm] Skip scheduling IBs!
[   90.047531] [drm] Skip scheduling IBs!
[   90.047533] [drm] Skip scheduling IBs!
[   90.047535] [drm] Skip scheduling IBs!
[   90.047536] [drm] Skip scheduling IBs!
[   90.047538] [drm] Skip scheduling IBs!
[   90.047539] [drm] Skip scheduling IBs!
[   90.047555] amdgpu 0000:09:00.0: GPU reset(2) succeeded!
[   90.047796] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.049377] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.050524] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.051990] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.055576] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.136508] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.180374] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.181405] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.246698] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.313258] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.380264] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.446291] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.513947] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   90.579552] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.218785] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.218976] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.219571] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.219745] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.221821] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.221969] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.222145] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.222360] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.229911] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.230213] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.231183] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.231328] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.231487] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.231703] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.233480] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.247154] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.249213] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.249437] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.250924] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.251258] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.251320] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.252417] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.252532] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.252739] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.252994] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.254745] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.265835] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.265974] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266056] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266222] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266342] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266436] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266516] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266646] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266796] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.266997] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.271605] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274639] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274699] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274747] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274794] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274869] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274929] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.274981] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.275033] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.275373] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.284443] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.286591] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.286881] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.302782] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.319311] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.335908] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.353111] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.369124] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.385670] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.402801] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.421232] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.737933] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.738054] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.742378] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.742737] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.742845] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.744592] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.744806] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.751833] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.752108] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.752371] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.752475] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.752604] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.752762] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.754128] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.765700] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.766154] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.766250] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.767140] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.767447] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789098] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789205] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789293] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789364] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789473] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789598] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789675] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.789745] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.790301] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.803790] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.811866] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.821133] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.837593] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.841186] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.854467] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.870915] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.871297] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.887676] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.901326] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.902101] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.903913] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.927724] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.938301] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.941050] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.952885] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.975232] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.975468] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[   99.986053] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.005910] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.018771] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.036370] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.052090] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.067194] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.067901] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.068016] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081081] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081359] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081525] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081618] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081721] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.081845] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082026] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082151] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082246] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082329] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082439] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082579] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.082757] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.086543] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.098769] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.102700] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.445931] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.446590] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.946103] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  100.946823] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  101.446237] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  101.446803] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  101.946107] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  101.946642] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  102.445541] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  102.446075] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  102.946163] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  102.946730] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  103.446040] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  103.446555] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  103.945513] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  103.945951] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  104.437414] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  104.437827] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  104.946771] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  104.947166] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  105.446585] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  105.447008] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  105.937954] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  105.938407] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  106.445966] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  106.446429] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  106.945528] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  106.945999] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  107.445983] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  107.446405] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  107.946131] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  107.946642] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  108.446428] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  108.446960] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  108.946992] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  108.947500] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  109.445052] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  109.445477] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  109.533707] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  109.946108] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  109.946604] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  110.445730] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  110.446232] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  110.943308] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  110.943823] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  111.036544] kauditd_printk_skb: 16509 callbacks suppressed
[  111.036545] audit: type=1006 audit(1556468881.509:99): pid=2590 uid=0
old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=4 res=1
[  111.446470] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  111.446899] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  111.945982] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  111.946413] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 23718 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (16 preceding siblings ...)
  2019-04-28 16:33 ` bugzilla-daemon
@ 2019-04-29  1:15 ` bugzilla-daemon
  2019-04-29 10:41 ` bugzilla-daemon
                   ` (119 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-29  1:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6063 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #17 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Jaap Buurman from comment #16)
> Just got a crash in World of Warcraft as well, running via vkd3d. It happens
> instantly after trying to log into the game world, so the issue is nicely
> reproducible for me. If you want me to get any traces, please let me know
> what you would like me to run to get them. dmesg logs for now:
> 
> [   78.450637] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450641] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000984ec2d4b000 from 27
> [   78.450642] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450648] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450650] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000850e92553000 from 27
> [   78.450652] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450656] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450658] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000984ec2d4e000 from 27
> [   78.450660] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450665] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450666] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000850e92542000 from 27
> [   78.450668] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450673] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450674] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000984ec2d42000 from 27
> [   78.450676] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450680] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450682] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000850e92552000 from 27
> [   78.450683] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450688] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450690] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000984ec2d40000 from 27
> [   78.450691] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450696] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450697] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000850e92552000 from 27
> [   78.450699] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450703] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450705] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000984ec2d49000 from 27
> [   78.450706] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.450711] amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0
> ring:158 vmid:1 pasid:32769, for process WoW.exe pid 2349 thread WoW.exe:cs0
> pid 2370)
> [   78.450713] amdgpu 0000:09:00.0:   in page starting at address
> 0x0000850ea1eb2000 from 27
> [   78.450714] amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0010113D
> [   78.454307] amdgpu 0000:09:00.0: IH ring buffer overflow (0x000BEDC0,
> 0x0003EEC0, 0x0003EDE0)
> [   88.570062] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> signaled seq=25317, emitted seq=25319
> [   88.570099] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
> information: process WoW.exe pid 2349 thread WoW.exe:cs0 pid 2370
> [   88.570102] amdgpu 0000:09:00.0: GPU reset begin!
> [   88.831392] amdgpu 0000:09:00.0: GPU reset
> [   89.356679] [drm] psp mode1 reset succeed 
> [   89.475356] amdgpu 0000:09:00.0: GPU reset succeeded, trying to resume
> [   89.475465] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
> [   89.475508] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
> [   89.475642] [drm] PSP is resuming...
> [   89.623052] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
> [   89.806625] [drm] SADs count is: -2, don't need to read it
> [   89.856619] [drm] SADs count is: -2, don't need to read it
> [   89.938255] [drm] UVD and UVD ENC initialized successfully.
> [   90.038674] [drm] VCE initialized successfully.
> [   90.039672] [drm] recover vram bo from shadow start
> [   90.047496] [drm] recover vram bo from shadow done
> [   90.047497] [drm] Skip scheduling IBs!
> [   90.047499] [drm] Skip scheduling IBs!
> [   90.047511] [drm] Skip scheduling IBs!
> [   90.047518] [drm] Skip scheduling IBs!
> [   90.047523] [drm] Skip scheduling IBs!
> [   90.047524] [drm] Skip scheduling IBs!
> [   90.047530] [drm] Skip scheduling IBs!
> [   90.047531] [drm] Skip scheduling IBs!
> [   90.047533] [drm] Skip scheduling IBs!
> [   90.047535] [drm] Skip scheduling IBs!
> [   90.047536] [drm] Skip scheduling IBs!
> [   90.047538] [drm] Skip scheduling IBs!
> [   90.047539] [drm] Skip scheduling IBs!
> [   90.047555] amdgpu 0000:09:00.0: GPU reset(2) succeeded!

The GPU reset succeeded.  You'll need to restart your desktop manager to
recover because currently no desktop managers handle GPU reset errors and
re-initialize their contexts.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7209 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (17 preceding siblings ...)
  2019-04-29  1:15 ` bugzilla-daemon
@ 2019-04-29 10:41 ` bugzilla-daemon
  2019-04-29 11:35 ` bugzilla-daemon
                   ` (118 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-29 10:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 398 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #18 from Jaap Buurman <jaapbuurman@gmail.com> ---
I was aware of that. I was more curious if the bug that is causing the crash
can be identified and hopefully fixed. I can provide traces if required, since
it seems I can easily reproduce the crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1178 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (18 preceding siblings ...)
  2019-04-29 10:41 ` bugzilla-daemon
@ 2019-04-29 11:35 ` bugzilla-daemon
  2019-04-29 11:37 ` bugzilla-daemon
                   ` (117 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-29 11:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1860 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #19 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Jaap Buurman from comment #15)
> That's bad to hear :( Worth a try though. How often do you experience
> freezes by the way? And is this for all games, or are some games completely
> stable? For me, I am getting crashes in Kerbal Space Program, but not in
> Final Fantasy XII or World of Warcraft, even after hundreds of hours in both
> of these stable games.
> 
> Also, have you ever figured out which kernel parameter in particular makes
> your setup stable? It might help identify where the problem exists. Or do
> you need that exact combination of all those parameters to get your system
> stable?

Hi, regarding the parameters I am using.
Unfortunately for me the issue is not easy to reproduce. Without the parameters
enabled, it still takes hours for a crash to happen. On top of that, mesa and
kernel updates are really frequent on Tumbleweed, that is another variable that
makes it a bit harder to troubleshoot. Unless I can find a really fast way to
reproduce the issue.

Regarding which game crash, with those kernel parameters applied, the only
crashes I noticed were when I tried to run games through Wine in DX11 mode with
DXVK. Which i believe to be stable on Vega GPUs, would need at least LLVM8.
Currently on my Tumbleweed I have LLVM7 so I just stick to NON-DXVK games, or
even better native ones, until LLVM8 is available for tumbleweed.

If you want to give it a try and you run on ubuntu, you can check this article:
https://github.com/lutris/lutris/wiki/Installing-drivers

If you do so, I recommend you run a full system backup using clonezilla or
similar software, those ppas are marked as unstable.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2806 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (19 preceding siblings ...)
  2019-04-29 11:35 ` bugzilla-daemon
@ 2019-04-29 11:37 ` bugzilla-daemon
  2019-04-29 13:52 ` bugzilla-daemon
                   ` (116 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-29 11:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 304 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #20 from Jaap Buurman <jaapbuurman@gmail.com> ---
I already run LLVM 8.0.0, since it's the latest stable in Arch's repository.
Thanks for the tip though :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1084 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (20 preceding siblings ...)
  2019-04-29 11:37 ` bugzilla-daemon
@ 2019-04-29 13:52 ` bugzilla-daemon
  2019-05-24  5:12 ` bugzilla-daemon
                   ` (115 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-04-29 13:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 490 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #21 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Jaap Buurman from comment #20)
> I already run LLVM 8.0.0, since it's the latest stable in Arch's repository.
> Thanks for the tip though :)

Since it is very easy for you to reproduce the freeze, it would be great if you
could add those kernel parameters, and see if they help.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1341 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (21 preceding siblings ...)
  2019-04-29 13:52 ` bugzilla-daemon
@ 2019-05-24  5:12 ` bugzilla-daemon
  2019-05-24 12:24   ` sylvain.bertrand
  2019-05-24 12:25 ` bugzilla-daemon
                   ` (114 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-05-24  5:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 815 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #22 from Mauro Gaspari <ilvipero@gmx.com> ---
I ran more tests:

1. Installed Arch Linux, vulkan, llvm8 and ran wine games with DXVK. With same
kernel parameters on grub, no freezes, no crashes. Great performance.

2. Installed Ubuntu Budgie 19.04, Oibaf ppa, updated mesa and llvm8. Same as
with Arch Linux: With same kernel parameters on grub, no freezes, no crashes.
Great performance.

The only issue I have not being able to reproduce the issue quickly, is to
clearly understand when the issue is resolved by Mesa. It takes hours for me to
get the freeze sometimes. 
If someone has a quick way to trigger system freeze, I am happy to run more
tests.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1591 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-05-24  5:12 ` bugzilla-daemon
@ 2019-05-24 12:24   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-05-24 12:24 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

It seems I get the same freezes than you. It takes hours of gaming to get some
random hard hang (no log). I thought I was overheating, but realized that my system is on
"vacation" while playing.
linux amd-staging-drm-new/x11 native/mesa/llvm(erk...), all git no older than a
week.
playing mostly dota2 vulkan on AMD TAHITI XT
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (22 preceding siblings ...)
  2019-05-24  5:12 ` bugzilla-daemon
@ 2019-05-24 12:25 ` bugzilla-daemon
  2019-05-24 13:44 ` bugzilla-daemon
                   ` (113 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-05-24 12:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 538 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #23 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
It seems I get the same freezes than you. It takes hours of gaming to get some
random hard hang (no log). I thought I was overheating, but realized that my
system is on
"vacation" while playing.
linux amd-staging-drm-new/x11 native/mesa/llvm(erk...), all git no older than a
week.
playing mostly dota2 vulkan on AMD TAHITI XT

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1337 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (23 preceding siblings ...)
  2019-05-24 12:25 ` bugzilla-daemon
@ 2019-05-24 13:44 ` bugzilla-daemon
  2019-06-03  8:07 ` bugzilla-daemon
                   ` (112 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-05-24 13:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1060 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #24 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #23)
> It seems I get the same freezes than you. It takes hours of gaming to get
> some
> random hard hang (no log). I thought I was overheating, but realized that my
> system is on
> "vacation" while playing.
> linux amd-staging-drm-new/x11 native/mesa/llvm(erk...), all git no older
> than a
> week.
> playing mostly dota2 vulkan on AMD TAHITI XT

Hi, a bit frustrating eh? :)
I have been asking around and it seems that RadeonVII and RX590 do not suffer
those issues. Probably related to default clock speeds by manufacturers.

Anyway, If you try the kernel parameters I mentioned above, those should help.
I have not had crashes in weeks after I enabled those on my grub. And not
related to distribution, those grub kernel settings worked for me on
Tumbleweed, Arch, Ubuntu Budgie.

I hope it helps.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1942 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (24 preceding siblings ...)
  2019-05-24 13:44 ` bugzilla-daemon
@ 2019-06-03  8:07 ` bugzilla-daemon
  2019-06-03 20:10 ` bugzilla-daemon
                   ` (111 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-03  8:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 497 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #25 from Matt Coffin <mcoffin13@gmail.com> ---
(In reply to Mauro Gaspari from comment #24)

> Hi, a bit frustrating eh? :)
> I have been asking around and it seems that RadeonVII and RX590 do not
> suffer those issues. Probably related to default clock speeds by
> manufacturers.

FWIW, I'm seeing this exact same issue, and I'm on an RX590.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1355 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (25 preceding siblings ...)
  2019-06-03  8:07 ` bugzilla-daemon
@ 2019-06-03 20:10 ` bugzilla-daemon
  2019-06-04 21:43 ` bugzilla-daemon
                   ` (110 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-03 20:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1401 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #26 from Matt Coffin <mcoffin13@gmail.com> ---
For reproducability, here's what I've been using. (I can reproduce this crash
on both the RADV and AMDVLK Vulkan implementations, and can reproduce it both
on top of sway 1.1 (wayland), and xfce4 (X11)).

* 5.1.3-arch2-1-ARCH
* LLVM 8.0.0
* mesa/vulkan-radeon: 19.0.4
* AMDVLK: (dev branch from nighttime Mountain time 20190602)
* DXVK: winelib version - release 1.2.1

I run "House Flipper" from Steam with DXVK_FILTER_DEVICE_NAME=590.

On 1080p@60Hz with v-sync, it runs quite well and stable (for hours). If I
disable v-sync and framerate limiting, the crash occurs within a minute
usually.

At 2560x1440 resolution, no refresh rate works in a stable mannner, but I have
tried both 60Hz and 144Hz.

With the game rendering 1080p but scaling up to a 2560x1440 display, I saw it
crash once, but was unable to duplicate it again.

I'm new to low-level development, and would like to help. If I can provide any
information since I can reliably reproduce the issue, I'd love to. Let me know
what would be useful and I'd be happy to get it out to you.

I've also seen the bugs listed in my other comment on the other bug here:
https://bugs.freedesktop.org/show_bug.cgi?id=102322#c82

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2404 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (26 preceding siblings ...)
  2019-06-03 20:10 ` bugzilla-daemon
@ 2019-06-04 21:43 ` bugzilla-daemon
  2019-06-05  6:34 ` bugzilla-daemon
                   ` (109 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-04 21:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1701 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

Sam <samueldgv@mailbox.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |samueldgv@mailbox.org

--- Comment #27 from Sam <samueldgv@mailbox.org> ---
Hello! I can confirm that I have the same issues. I am using a Vega 56 and
openSUSE Tumbleweed (X11 and KDE) with:

Kernel Version:  5.1.5-1-default
X Server Release:  12004000
Driver:  X.Org Radeon RX Vega (VEGA10, DRM 3.30.0, 5.1.5-1-default, LLVM 7.0.1)


I have been having the same freezes exactly as described here since, as far as
I can remember, mesa 19.0.4 and 5.0.13 (based on the Tumbleweed snapshots from
when this started happening)

This was definitely not happening before on mesa 18.x/LLVM 6 and 7 and kernel
4.20. I niehter run overclocks, never messed with firmware/BIOS...etc.
Everything has been running as-is since Oct. 2018 so firmware or BIOS issues
should be discarded, I guess.

In my case, I have also experienced this issue when running non-demanding
OpenGL games and even desktop applications (I had a crash happen on the desktop
with just WxMaxima, a computer algebra system GUI, opened doing nothing)

The easiest way for me to reproduce it is by simply leaving Pillars of Eternity
(an OpenGL unity game) open and idle for an hour or so. I have tried setting up
Kdump and trying to catch some error messages in the logs with no luck. I'm
definitely open for directions on how to get more info if this can help.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3008 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (27 preceding siblings ...)
  2019-06-04 21:43 ` bugzilla-daemon
@ 2019-06-05  6:34 ` bugzilla-daemon
  2019-06-09 18:46 ` bugzilla-daemon
                   ` (108 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-05  6:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 656 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #28 from Mauro Gaspari <ilvipero@gmx.com> ---
Thanks all for adding comments and testing to this bug. I believe if we prove
there is enough people affected on different cards, it will get the attention
it needs, and hopefully a permanent mesa fix can be found and implemented.

For those affected, if you don't mind testing the kernel parameters workaround
i described above, and post your results, that would be a nice start.
If you need help on how to do that you can reach out to me via PM or email.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1432 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (28 preceding siblings ...)
  2019-06-05  6:34 ` bugzilla-daemon
@ 2019-06-09 18:46 ` bugzilla-daemon
  2019-06-10 17:13 ` bugzilla-daemon
                   ` (107 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-09 18:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 703 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #29 from Sam <samueldgv@mailbox.org> ---
I have been trying myself for the moment to get some info with just debug
parameters:

amdgpu.dc=1 
amdgpu.vm_fault_stop=2 
amdgpu.vm_debug=1 
amdgpu.gpu_recovery=0 

Incidentally I couldn't get any freeze to happen after running two troublesome
games for about two hours each (left idle but on load, Pillars of Eternity and
Surviving Mars) but this could mean anything as they happen completely
randomly. 

Perhaps someone who can reproduce the issue instantly can test the parameters
more reliably?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1474 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (29 preceding siblings ...)
  2019-06-09 18:46 ` bugzilla-daemon
@ 2019-06-10 17:13 ` bugzilla-daemon
  2019-06-13 21:04 ` bugzilla-daemon
                   ` (106 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-10 17:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 653 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #30 from Sam <samueldgv@mailbox.org> ---
Update: I can now confirm, at least in my case, that the freezes DO occur using
the parameters above, and also with all of them (shown below), while doing
another test round on Pillars of Eternity.

amdgpu.dc=1 
amdgpu.vm_update_mode=0 
amdgpu.dpm=-1 
amdgpu.ppfeaturemask=0xffffffff 
amdgpu.vm_fault_stop=2 
amdgpu.vm_debug=1 
amdgpu.gpu_recovery=0 

I was continuously writing dmesg to a file but yet again I didn't get any
messages/warnings/errors.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1424 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (30 preceding siblings ...)
  2019-06-10 17:13 ` bugzilla-daemon
@ 2019-06-13 21:04 ` bugzilla-daemon
  2019-06-13 21:04 ` bugzilla-daemon
                   ` (105 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-13 21:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 751 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #31 from Sam <samueldgv@mailbox.org> ---
I have attached another trace I managed to get today at 22:24 while playing
Pillars Of Eternity (OpenGL) 

It didn't freeze the whole as usual, just the whole Plasma and X sessions, so
the other TTYs were accessible. This is the first occurrence of this happening.
I was using the latest kernel default from the openSUSE Kernel:stable repo
(5.1.9-5.1), as per request on
https://bugzilla.opensuse.org/show_bug.cgi?id=1136293

To note that, as in the other dmesgs attached, the crash seems to be caused by
amdgpu. Should the bug category be moved there?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1590 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (31 preceding siblings ...)
  2019-06-13 21:04 ` bugzilla-daemon
@ 2019-06-13 21:04 ` bugzilla-daemon
  2019-06-14  5:48 ` bugzilla-daemon
                   ` (104 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-13 21:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 375 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #32 from Sam <samueldgv@mailbox.org> ---
Created attachment 144535
  --> https://bugs.freedesktop.org/attachment.cgi?id=144535&action=edit
dmesg from the freeze which didn't completely bork everything. It starts on
line 1181

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1408 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (32 preceding siblings ...)
  2019-06-13 21:04 ` bugzilla-daemon
@ 2019-06-14  5:48 ` bugzilla-daemon
  2019-06-14 14:33 ` bugzilla-daemon
                   ` (103 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-14  5:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2355 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

Jiri Slaby <jirislaby@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         QA Contact|dri-devel@lists.freedesktop |
                   |.org                        |
          Component|Drivers/Gallium/radeonsi    |DRM/AMDgpu
            Product|Mesa                        |DRI
            Version|18.3                        |unspecified

--- Comment #33 from Jiri Slaby <jirislaby@gmail.com> ---
(In reply to Sam from comment #32)
> Created attachment 144535 [details]
> dmesg from the freeze which didn't completely bork everything. It starts on
> line 1181

Attaching the relevant part inline:

> [drm:amdgpu_dm_commit_planes.isra.0 [amdgpu]] *ERROR* Waiting for fences timed out.
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=726226, emitted seq=726228
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process PillarsOfEterni pid 12250 thread PillarsOfE:cs0 pid 12254
> amdgpu 0000:1e:00.0: GPU reset begin!
> [drm:amdgpu_dm_commit_planes.isra.0 [amdgpu]] *ERROR* Waiting for fences timed out.
> amdgpu 0000:1e:00.0: GPU BACO reset
> amdgpu 0000:1e:00.0: GPU reset succeeded, trying to resume
> [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
> [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
> [drm] PSP is resuming...
> [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
> [drm] UVD and UVD ENC initialized successfully.
> [drm] VCE initialized successfully.
> [drm] recover vram bo from shadow start
> [drm] recover vram bo from shadow done
> [drm] Skip scheduling IBs!
> [drm] Skip scheduling IBs!
> amdgpu 0000:1e:00.0: GPU reset(2) succeeded!
> [drm] Skip scheduling IBs!
> ...
> [drm] Skip scheduling IBs!
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [drm] Skip scheduling IBs!
> ...
> [drm] Skip scheduling IBs!
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4530 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (33 preceding siblings ...)
  2019-06-14  5:48 ` bugzilla-daemon
@ 2019-06-14 14:33 ` bugzilla-daemon
  2019-07-06  9:30 ` bugzilla-daemon
                   ` (102 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-06-14 14:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 839 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #34 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Jiri Slaby from comment #33)
> > amdgpu 0000:1e:00.0: GPU reset(2) succeeded!
> > [drm] Skip scheduling IBs!
> > ...
> > [drm] Skip scheduling IBs!
> > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> > [drm] Skip scheduling IBs!
> > ...
> > [drm] Skip scheduling IBs!
> > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> > [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

The GPU reset was successful.  You need to restart your desktop environment to
recover.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1754 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (34 preceding siblings ...)
  2019-06-14 14:33 ` bugzilla-daemon
@ 2019-07-06  9:30 ` bugzilla-daemon
  2019-07-07  5:31 ` bugzilla-daemon
                   ` (101 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-06  9:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1134 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #35 from shadow.archemage@gmail.com ---
(In reply to Mauro Gaspari from comment #22)

> The only issue I have not being able to reproduce the issue quickly, is to
> clearly understand when the issue is resolved by Mesa. It takes hours for me
> to get the freeze sometimes. 
> If someone has a quick way to trigger system freeze, I am happy to run more
> tests.

Hi Mauro,

The issue happened to me much more frequently when I opted into Steam beta and
ran Monster Hunter: World. Before opting in, the crashes happen around 1-2
hours after the game starts. With Steam beta though, it happens around <5
minutes in.

The only change that I noted when I opted into Steam beta was that the games
suddenly downloaded some shader pre-caching stuff. Unfortunately, I'm not too
familiar with it, and I'm not too sure if it is related to the problem.

I am running Manjaro, Gnome 3.32.2, Kernel version 5.1.15-1, Mesa 19.1.1.
Let me know if I missed something.

Thanks,
Eph

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1993 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (35 preceding siblings ...)
  2019-07-06  9:30 ` bugzilla-daemon
@ 2019-07-07  5:31 ` bugzilla-daemon
  2019-07-07 17:41   ` sylvain.bertrand
  2019-07-07 10:55 ` bugzilla-daemon
                   ` (100 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-07  5:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2109 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #36 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to shadow.archemage from comment #35)
> (In reply to Mauro Gaspari from comment #22)
> 
> > The only issue I have not being able to reproduce the issue quickly, is to
> > clearly understand when the issue is resolved by Mesa. It takes hours for me
> > to get the freeze sometimes. 
> > If someone has a quick way to trigger system freeze, I am happy to run more
> > tests.
> 
> Hi Mauro,
> 
> The issue happened to me much more frequently when I opted into Steam beta
> and ran Monster Hunter: World. Before opting in, the crashes happen around
> 1-2 hours after the game starts. With Steam beta though, it happens around
> <5 minutes in.
> 
> The only change that I noted when I opted into Steam beta was that the games
> suddenly downloaded some shader pre-caching stuff. Unfortunately, I'm not
> too familiar with it, and I'm not too sure if it is related to the problem.
> 
> I am running Manjaro, Gnome 3.32.2, Kernel version 5.1.15-1, Mesa 19.1.1.
> Let me know if I missed something.
> 
> Thanks,
> Eph

I am not an expert, but I am quite sure shaders have a big part in this. If you
can, disable shader caching.
There are a few tests you can do:
1. Did you try with the kernel parameters I posted above? I always ran all the
parameters together. GPU+CPU and at the time, I did not have crashes for weeks
on my Vega64. I am using a RadeonVII now and it seems those parameters are not
needed.
2. Valve sponsored an interesting project that removes dependency of AMD Mesa
from LLVM. And instead uses ACO. Valve made this available for Arch based
systems via AUR, and Ubuntu based system via PPA. If you want to test it, you
can check the posts below. I am going to test this myself on both Arch and
Ubuntu. 
https://steamcommunity.com/games/221410/announcements/detail/1602634609636894200
https://steamcommunity.com/app/221410/discussions/0/1640915206474070669/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3267 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (36 preceding siblings ...)
  2019-07-07  5:31 ` bugzilla-daemon
@ 2019-07-07 10:55 ` bugzilla-daemon
  2019-07-07 17:42 ` bugzilla-daemon
                   ` (99 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-07 10:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1370 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #37 from shadow.archemage@gmail.com ---
(In reply to Mauro Gaspari from comment #36)
> (In reply to shadow.archemage from comment #35) 
> I am not an expert, but I am quite sure shaders have a big part in this. If
> you can, disable shader caching.
> There are a few tests you can do:
> 1. Did you try with the kernel parameters I posted above? I always ran all
> the parameters together. GPU+CPU and at the time, I did not have crashes for
> weeks on my Vega64. I am using a RadeonVII now and it seems those parameters
> are not needed.

I tried the kernel parameters above, and the game still crashed for me.

> 2. Valve sponsored an interesting project that removes dependency of AMD
> Mesa from LLVM. And instead uses ACO. Valve made this available for Arch
> based systems via AUR, and Ubuntu based system via PPA. If you want to test
> it, you can check the posts below. I am going to test this myself on both
> Arch and Ubuntu. 
> https://steamcommunity.com/games/221410/announcements/detail/
> 1602634609636894200
> https://steamcommunity.com/app/221410/discussions/0/1640915206474070669/

Will check this out, but will also keep an eye on this thread about the results
of your tests. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2491 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-07  5:31 ` bugzilla-daemon
@ 2019-07-07 17:41   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-07 17:41 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

On Sun, Jul 07, 2019 at 05:31:34AM +0000, bugzilla-daemon@freedesktop.org wrote:
> 2. Valve sponsored an interesting project that removes dependency of AMD Mesa
> from LLVM. And instead uses ACO. Valve made this available for Arch based
> systems via AUR, and Ubuntu based system via PPA. If you want to test it, you
> can check the posts below. I am going to test this myself on both Arch and
> Ubuntu. 
> https://steamcommunity.com/games/221410/announcements/detail/1602634609636894200
> https://steamcommunity.com/app/221410/discussions/0/1640915206474070669/

Huho!

Cons:
    - it's c++
    - only GFX8 and GFX9 (I have GFX6 :( )
    - some nasty python scripts (there are tons in mesa)

Pros:
    - it's several orders of magnitude less brain f*cked than llvm.
    - it is actual working code which does disjoint mesa from llvm.

conclusion:
    - for GFX8 and GFX9, it's less worse than llvm.
    - I was asking for a clean GCN ABI definition document from shaders
      perspective, maybe this code will help to write one (or it is an AMD
      confidential document??).

-- 
Sylvain
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (37 preceding siblings ...)
  2019-07-07 10:55 ` bugzilla-daemon
@ 2019-07-07 17:42 ` bugzilla-daemon
  2019-07-08  5:29 ` bugzilla-daemon
                   ` (98 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-07 17:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1310 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #38 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
On Sun, Jul 07, 2019 at 05:31:34AM +0000, bugzilla-daemon@freedesktop.org
wrote:
> 2. Valve sponsored an interesting project that removes dependency of AMD Mesa
> from LLVM. And instead uses ACO. Valve made this available for Arch based
> systems via AUR, and Ubuntu based system via PPA. If you want to test it, you
> can check the posts below. I am going to test this myself on both Arch and
> Ubuntu. 
> https://steamcommunity.com/games/221410/announcements/detail/1602634609636894200
> https://steamcommunity.com/app/221410/discussions/0/1640915206474070669/

Huho!

Cons:
    - it's c++
    - only GFX8 and GFX9 (I have GFX6 :( )
    - some nasty python scripts (there are tons in mesa)

Pros:
    - it's several orders of magnitude less brain f*cked than llvm.
    - it is actual working code which does disjoint mesa from llvm.

conclusion:
    - for GFX8 and GFX9, it's less worse than llvm.
    - I was asking for a clean GCN ABI definition document from shaders
      perspective, maybe this code will help to write one (or it is an AMD
      confidential document??).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2391 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (38 preceding siblings ...)
  2019-07-07 17:42 ` bugzilla-daemon
@ 2019-07-08  5:29 ` bugzilla-daemon
  2019-07-09 14:29 ` bugzilla-daemon
                   ` (97 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-08  5:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #39 from Samuel Sieb <samuel@sieb.net> ---
(In reply to shadow.archemage from comment #37)
> I tried the kernel parameters above, and the game still crashed for me.

Are you saying that the game is crashing or the graphics device is?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1229 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (39 preceding siblings ...)
  2019-07-08  5:29 ` bugzilla-daemon
@ 2019-07-09 14:29 ` bugzilla-daemon
  2019-07-09 18:05   ` sylvain.bertrand
  2019-07-09 18:06 ` bugzilla-daemon
                   ` (96 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-09 14:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 728 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #40 from Wilko Bartels <me@jasondaigo.de> ---
Since i experience the same issue since june (didnt game much) i want to share
my system info.
I am on Ryzen 2600X, Vega 56 Pulse, Strix B450. Using Arch 5.1.
Tested every Windowmanager i know , tested also 60Hz and 144Hz. The crashes are
totally random. I only play Dota 2. Last friday i played like 6 games in a row
without a single issue. The day after i crashed like 7 times per game. Always
have to press reset on my PC. 
Is it know that hits issue related to a kernel or mesa update? I mean it wasnt
always like this no?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1504 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-09 14:29 ` bugzilla-daemon
@ 2019-07-09 18:05   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-09 18:05 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

Guys,

I am getting freezes on tahiti xt/fx9590 recently... But I am not logging a bug yet
because I think the reason is summer heat.

Try to game with an opened computer case with a big fan blowing
into it.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (40 preceding siblings ...)
  2019-07-09 14:29 ` bugzilla-daemon
@ 2019-07-09 18:06 ` bugzilla-daemon
  2019-07-10  7:25 ` bugzilla-daemon
                   ` (95 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-09 18:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 421 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #41 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
Guys,

I am getting freezes on tahiti xt/fx9590 recently... But I am not logging a bug
yet
because I think the reason is summer heat.

Try to game with an opened computer case with a big fan blowing
into it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1210 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (41 preceding siblings ...)
  2019-07-09 18:06 ` bugzilla-daemon
@ 2019-07-10  7:25 ` bugzilla-daemon
  2019-07-10  8:03 ` bugzilla-daemon
                   ` (94 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10  7:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 904 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #43 from Mauro Gaspari <ilvipero@gmx.com> ---
Hi,
No it was not always like this. I was using Kubuntu and my games were really
smooth for months. Zero crashes. Then after a mesa update, I do not recall
exactly the version but was around 18.5 or something like that, it all got
worse. 

Same game on same PC same hardware same power supply, same cooling, but on
windows, zero crashes.
same game on same PC with NVIDIA gpu, zero crashes.

I wish we could get the attention of someone @AMD because there is clearly some
issue going on. I would be very happy to help troubleshooting, if only we had
some contact with AMD. 

I have not used AMDGPU-PRO in ages, anyone here got that one to check if the
same issue happens with proprietary drivers?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1684 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (42 preceding siblings ...)
  2019-07-10  7:25 ` bugzilla-daemon
@ 2019-07-10  8:03 ` bugzilla-daemon
  2019-07-10  8:19 ` bugzilla-daemon
                   ` (93 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10  8:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1357 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #44 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Mauro Gaspari from comment #43)
> Hi,
> No it was not always like this. I was using Kubuntu and my games were really
> smooth for months. Zero crashes. Then after a mesa update, I do not recall
> exactly the version but was around 18.5 or something like that, it all got
> worse. 
> 
> Same game on same PC same hardware same power supply, same cooling, but on
> windows, zero crashes.
> same game on same PC with NVIDIA gpu, zero crashes.
> 
> I wish we could get the attention of someone @AMD because there is clearly
> some issue going on. I would be very happy to help troubleshooting, if only
> we had some contact with AMD. 
> 
> I have not used AMDGPU-PRO in ages, anyone here got that one to check if the
> same issue happens with proprietary drivers?

I was also thinking about GPU-PRO but i would want to install Ubuntu LTS on
another disk then. That might take several weeks for me to test or even longer.
And i am not even sure if thats super helpful. Im pretty sure at least on Arch
at the end of 2018 i had zero problems. At least with my Vega ;-)
Maybe i was wrong switching from green to red after 10 years. hehe

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2254 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (43 preceding siblings ...)
  2019-07-10  8:03 ` bugzilla-daemon
@ 2019-07-10  8:19 ` bugzilla-daemon
  2019-07-10  8:26 ` bugzilla-daemon
                   ` (92 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10  8:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 745 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #45 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Mauro Gaspari from comment #43)
> Hi,
> No it was not always like this. I was using Kubuntu and my games were really
> smooth for months. Zero crashes. Then after a mesa update, I do not recall
> exactly the version but was around 18.5 or something like that, it all got
> worse. 
But it is proven that Mesa is the problem here?  There was once an issue
regarding linux-firmware package in early 2018 if i remember correctly. Users
had to rollback back than.
I might rollback to mesa 18.3 to test if i can manage that regardless.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1605 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (44 preceding siblings ...)
  2019-07-10  8:19 ` bugzilla-daemon
@ 2019-07-10  8:26 ` bugzilla-daemon
  2019-07-10  9:41 ` bugzilla-daemon
                   ` (91 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10  8:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1129 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #46 from Mauro Gaspari <ilvipero@gmx.com> ---
This is exactly the reason why I wish we could get more attention to this
issue. 
I have seen so many people in forums on the internet replacing their AMD cards
with NVIDIA due to similar issues. Or switching back to windows. 

I do not have the proof that the issue is just Mesa, could be a combination of
mesa, kernel, firmware for all I know. 

I  opened this bug to see if I could get help troubleshooting the issue and
finding a permanent fix for all affected users. If there is a better place to
report this, I am happy to open as many tickets and sending as many emails as
needed :)

Also It would be extremely helpful if we had a script or something to trigger
the freeze quickly and consistently, so that troubleshooting mesa, kernel, ad
firmware combinations would be so much easier and reliable. 
If anyone has a test suite or script or some automated check that can trigger
the issue quickly, please share.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1905 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (45 preceding siblings ...)
  2019-07-10  8:26 ` bugzilla-daemon
@ 2019-07-10  9:41 ` bugzilla-daemon
  2019-07-10 14:44 ` bugzilla-daemon
                   ` (90 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10  9:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1151 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #47 from Sam <samueldgv@mailbox.org> ---
The relevant issue and bug report here (the system freezing completely or if
lucky just killing the X session, NOT games crashing) seems to be related
exclusively to AMDGPU, and not to mesa. Whereas I got the same issues over and
over after trying out several versions of mesa, switching to older versions of
the kernel "fixes" it for me (the latest version I tried out which didn't have
these issues is Kernel 4.20.13, in my case from
https://download.opensuse.org/repositories/home:/tiwai:/kernel:/4.20/standard/x86_64/)

There is also a report from another user which temporarily fixed it by forcing
the gpu to run at the maximum power setting
(https://bugzilla.opensuse.org/show_bug.cgi?id=1136293):

# echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
# echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

and then to reset back to normal:

# echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2109 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (46 preceding siblings ...)
  2019-07-10  9:41 ` bugzilla-daemon
@ 2019-07-10 14:44 ` bugzilla-daemon
  2019-07-10 18:42 ` bugzilla-daemon
                   ` (89 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10 14:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 546 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #48 from Mauro Gaspari <ilvipero@gmx.com> ---
@Sam,

Thank you, this is helpful. Since it is not distribution specific and not mesa
related, do you think we should keep the bug here, merge it with other similar
bugs, or create on other bug tracking?
Happy to help and troubleshoot more from my side, and/or push for this to be
resolved once and for all, for all AMDGPU users.

Thanks
Mauro

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1326 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (47 preceding siblings ...)
  2019-07-10 14:44 ` bugzilla-daemon
@ 2019-07-10 18:42 ` bugzilla-daemon
  2019-07-12 15:26 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-10 18:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1585 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #49 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Sam from comment #47)
> The relevant issue and bug report here (the system freezing completely or if
> lucky just killing the X session, NOT games crashing) seems to be related
> exclusively to AMDGPU, and not to mesa. Whereas I got the same issues over
> and over after trying out several versions of mesa, switching to older
> versions of the kernel "fixes" it for me (the latest version I tried out
> which didn't have these issues is Kernel 4.20.13, in my case from
> https://download.opensuse.org/repositories/home:/tiwai:/kernel:/4.20/
> standard/x86_64/)
> 
> There is also a report from another user which temporarily fixed it by
> forcing the gpu to run at the maximum power setting
> (https://bugzilla.opensuse.org/show_bug.cgi?id=1136293):
> 
> # echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
> # echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk
> 
> and then to reset back to normal:
> 
> # echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level

I am currently on my 4th game of dota in a row when setting performance level
manual to 7. working so far. Everyone should test this now so we have more
reliable data. As we all now the issue can be gone for several hours so my
experience means nothing yet. 
Would be amazing if we can pin down the issue to the  performance level of the
cards.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2658 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (48 preceding siblings ...)
  2019-07-10 18:42 ` bugzilla-daemon
@ 2019-07-12 15:26 ` bugzilla-daemon
  2019-07-13 17:22 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-12 15:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 600 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #50 from shadow.archemage@gmail.com ---
(In reply to Samuel Sieb from comment #39)
> (In reply to shadow.archemage from comment #37)
> > I tried the kernel parameters above, and the game still crashed for me.
> 
> Are you saying that the game is crashing or the graphics device is?

Apologies, what I meant by this is that my system locks up, not just the game
crashing. I can't recover from it except by resetting my PC using the power
button.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1497 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (49 preceding siblings ...)
  2019-07-12 15:26 ` bugzilla-daemon
@ 2019-07-13 17:22 ` bugzilla-daemon
  2019-07-16  8:28 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-13 17:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1890 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #51 from shadow.archemage@gmail.com ---
(In reply to Wilko Bartels from comment #49)
> (In reply to Sam from comment #47)
> > The relevant issue and bug report here (the system freezing completely or if
> > lucky just killing the X session, NOT games crashing) seems to be related
> > exclusively to AMDGPU, and not to mesa. Whereas I got the same issues over
> > and over after trying out several versions of mesa, switching to older
> > versions of the kernel "fixes" it for me (the latest version I tried out
> > which didn't have these issues is Kernel 4.20.13, in my case from
> > https://download.opensuse.org/repositories/home:/tiwai:/kernel:/4.20/
> > standard/x86_64/)
> > 
> > There is also a report from another user which temporarily fixed it by
> > forcing the gpu to run at the maximum power setting
> > (https://bugzilla.opensuse.org/show_bug.cgi?id=1136293):
> > 
> > # echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
> > # echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk
> > 
> > and then to reset back to normal:
> > 
> > # echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level
> 
> I am currently on my 4th game of dota in a row when setting performance
> level manual to 7. working so far. Everyone should test this now so we have
> more reliable data. As we all now the issue can be gone for several hours so
> my experience means nothing yet. 
> Would be amazing if we can pin down the issue to the  performance level of
> the cards.

Played Monster Hunter and Dota 2 for quite a long time, and I didn't experience
any system freezes with the max performance settings. Will test again tomorrow
to see if the workaround is consistent enough.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3081 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (50 preceding siblings ...)
  2019-07-13 17:22 ` bugzilla-daemon
@ 2019-07-16  8:28 ` bugzilla-daemon
  2019-07-17  3:34 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-16  8:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 326 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #52 from Wilko Bartels <me@jasondaigo.de> ---
i played like 30 dota 2 matches withour a single freeze. its save to say this
is it. where is the right place to report this issue?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (51 preceding siblings ...)
  2019-07-16  8:28 ` bugzilla-daemon
@ 2019-07-17  3:34 ` bugzilla-daemon
  2019-07-17 16:02   ` sylvain.bertrand
  2019-07-17 16:02 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-17  3:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 319 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #53 from Mauro Gaspari <ilvipero@gmx.com> ---
Thank you all for the great work.
I will post on AMD support forums and add the link of this and other AMDGPU
related bugs.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1095 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-17  3:34 ` bugzilla-daemon
@ 2019-07-17 16:02   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-17 16:02 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

power management related code is in amdgpu, then the right place is here, the "dri" and
"amdgfx" mailing lists (aka linux gpu driver mailing lists).

As far as I am concerned, when I play dota2, I always switch the GPU dpm to
high and the CPU freq governor to perf (because, all those things steal a
significant amount of fps... actually, I do switch my GPU dpm to high just in
case it would be nasty like the cpu governor).

-- 
Sylvain
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (52 preceding siblings ...)
  2019-07-17  3:34 ` bugzilla-daemon
@ 2019-07-17 16:02 ` bugzilla-daemon
  2019-07-18  2:30 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-17 16:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 638 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #54 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
power management related code is in amdgpu, then the right place is here, the
"dri" and
"amdgfx" mailing lists (aka linux gpu driver mailing lists).

As far as I am concerned, when I play dota2, I always switch the GPU dpm to
high and the CPU freq governor to perf (because, all those things steal a
significant amount of fps... actually, I do switch my GPU dpm to high just in
case it would be nasty like the cpu governor).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1447 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (53 preceding siblings ...)
  2019-07-17 16:02 ` bugzilla-daemon
@ 2019-07-18  2:30 ` bugzilla-daemon
  2019-07-18 13:44   ` sylvain.bertrand
  2019-07-18 13:44 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-18  2:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 432 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #55 from Hadet <hadet@protonmail.com> ---
So I think this might have something to do with something Xorg is doing because
I've not had it happen while gaming for many hours since just seeing if it
happened on wayland on a whim. I now have 21 hours of uptime with no random
crashes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1204 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-18  2:30 ` bugzilla-daemon
@ 2019-07-18 13:44   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-18 13:44 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

Playing dota2 vulkan or GL?

I guess it's vulkan: and there I don't know how vulkan deal with multiple WSIs,
and how dota2 selects the one it will use.

The idea is to clearly identify the code paths which would be "buggy".

(my custom distro is x11 native)

That said, I don't know the status of wayland: did they reach the same "cluster
f*ck" level that x11 is at? (irony, since wayland reason to exist is to be
orders of magnitude less kludgy than x11)

-- 
Sylvain
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (54 preceding siblings ...)
  2019-07-18  2:30 ` bugzilla-daemon
@ 2019-07-18 13:44 ` bugzilla-daemon
  2019-07-19  0:12 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-18 13:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 673 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #56 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
Playing dota2 vulkan or GL?

I guess it's vulkan: and there I don't know how vulkan deal with multiple WSIs,
and how dota2 selects the one it will use.

The idea is to clearly identify the code paths which would be "buggy".

(my custom distro is x11 native)

That said, I don't know the status of wayland: did they reach the same "cluster
f*ck" level that x11 is at? (irony, since wayland reason to exist is to be
orders of magnitude less kludgy than x11)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1482 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (55 preceding siblings ...)
  2019-07-18 13:44 ` bugzilla-daemon
@ 2019-07-19  0:12 ` bugzilla-daemon
  2019-07-22  5:19 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-19  0:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 388 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #57 from Hadet <hadet@protonmail.com> ---
Created attachment 144821
  --> https://bugs.freedesktop.org/attachment.cgi?id=144821&action=edit
Dmesg after crash

I spoke too soon it's happening on Wayland now too just a lot less frequently

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1286 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (56 preceding siblings ...)
  2019-07-19  0:12 ` bugzilla-daemon
@ 2019-07-22  5:19 ` bugzilla-daemon
  2019-07-23 16:25 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-22  5:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1647 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #58 from Mauro Gaspari <ilvipero@gmx.com> ---
After a long time without crashes on Tumbleweed, I wanted to prepare a test
setup for valve mesa built with ACO. So I installed Ubuntu Budgie 18.04 LTS
with hardware enablement stack and I noticed the OS freezes are now back, even
on the RadeonVII. 

What I noticed in the game behavior is this. This is a game running on
crossover (wine) with DX11 and DXVK. I want to point out that I do alt-tab out
of games to do other things, so this might be a factor to consider. But again,
I do the same on my NVIDIA-GPU laptop and I never had a single freeze or fps
drop.
Not sure if point 2 and 3 are related, I just wanted to share my observations.

1. Game starts with excellent FPS. I can hear GPU fans spinning.
2. After a while, game loses a lot of FPS starts to become slow and sluggish,
GPU seems to be no longer doing much and I can no longer hear the fans
spinning.
3. After a while longer, the whole OS freezes as described in my first post.


What I am going to do next:
1. Use the workaround of comment #47 and test for a few days.
2. Install Valve mesa-aco with ubuntu PPA and test (without workarounds) for a
few days.

I will report back when I have more details on my tests.

System info:
OS: Ubuntu 18.04.2 LTS x86_64 
Kernel: 5.0.0-21-generic
Resolution: 3440x1440
CPU: AMD Ryzen 7 2700X (16) @ 3.700G 
GPU: AMD Vega 20 
Memory: 2650MiB / 64398MiB
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2468 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (57 preceding siblings ...)
  2019-07-22  5:19 ` bugzilla-daemon
@ 2019-07-23 16:25 ` bugzilla-daemon
  2019-07-23 16:30 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-23 16:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1235 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #59 from wedens13@yandex.ru ---
I have similar issues with Sapphire Pulse Vega 56.
Arch Linux
Kernel versions: 4.19.60-1-lts, 5.2.1-1
mesa: 19.1.3-1, mesa with ACO (f9b38efdda166f2b79562525e72fe135c6b23d54)
llvm: 8.0.0

I've also tried booting with integrated video and using DRI_PRIME=1 to offload
to vega. It crashes similarly (after 5min of playing witcher 3 with dxvk
1.3.1):

Jul 23 22:44:01 wedens-pc kernel: amdgpu 0000:03:00.0: [mmhub] VMC page fault
(src_id:0 ring:154 vmid:1 pasid:32771, for process  pid 0 thread  pid 0
                                  )
Jul 23 22:44:01 wedens-pc kernel: amdgpu 0000:03:00.0:   at address
0x0000800100a00000 from 18
Jul 23 22:44:01 wedens-pc kernel: amdgpu 0000:03:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00100134
Jul 23 22:44:11 wedens-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring sdma1 timeout, signaled seq=230, emitted seq=233
Jul 23 22:44:11 wedens-pc kernel: [drm] GPU recovery disabled.


I'm going to try mesa master and manual power level workaround (when should I
use "reset to normal" command?).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2001 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (58 preceding siblings ...)
  2019-07-23 16:25 ` bugzilla-daemon
@ 2019-07-23 16:30 ` bugzilla-daemon
  2019-07-23 17:14 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-23 16:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 253 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #60 from wedens13@yandex.ru ---
A couple of relevant log fragments with crashes: https://paste.ee/p/rtDEg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1048 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (59 preceding siblings ...)
  2019-07-23 16:30 ` bugzilla-daemon
@ 2019-07-23 17:14 ` bugzilla-daemon
  2019-07-23 20:17   ` sylvain.bertrand
  2019-07-23 20:18 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-23 17:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 422 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #61 from wedens13@yandex.ru ---
I've tried starting witcher 3 after executing
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

and it still crashes immediately.

log: https://paste.ee/p/thvXf

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-23 17:14 ` bugzilla-daemon
@ 2019-07-23 20:17   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-23 20:17 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

unstable power supply lines to the gpu if overheating is excluded?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (60 preceding siblings ...)
  2019-07-23 17:14 ` bugzilla-daemon
@ 2019-07-23 20:18 ` bugzilla-daemon
  2019-07-24  4:14 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-23 20:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 273 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #62 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
unstable power supply lines to the gpu if overheating is excluded?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1062 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (61 preceding siblings ...)
  2019-07-23 20:18 ` bugzilla-daemon
@ 2019-07-24  4:14 ` bugzilla-daemon
  2019-07-24 13:08   ` sylvain.bertrand
  2019-07-24 13:09 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-24  4:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 563 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #63 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #62)
> unstable power supply lines to the gpu if overheating is excluded?

I cannot speak for others. In my case,U would say no. I installed windows10 in
a separate ssd, just to check there was no hardware issue of any kind. 
On windows10 with latest amd drivers, I have no freezes or any other issue
running same games.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1411 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-24  4:14 ` bugzilla-daemon
@ 2019-07-24 13:08   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-24 13:08 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

> I cannot speak for others. In my case,U would say no. I installed windows10 in
> a separate ssd, just to check there was no hardware issue of any kind. 
> On windows10 with latest amd drivers, I have no freezes or any other issue
> running same games.

Native gnu/linux game or going through wine/dxvk?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (62 preceding siblings ...)
  2019-07-24  4:14 ` bugzilla-daemon
@ 2019-07-24 13:09 ` bugzilla-daemon
  2019-07-24 14:27 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-24 13:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 516 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #64 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
> I cannot speak for others. In my case,U would say no. I installed windows10 in
> a separate ssd, just to check there was no hardware issue of any kind. 
> On windows10 with latest amd drivers, I have no freezes or any other issue
> running same games.

Native gnu/linux game or going through wine/dxvk?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1345 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (63 preceding siblings ...)
  2019-07-24 13:09 ` bugzilla-daemon
@ 2019-07-24 14:27 ` bugzilla-daemon
  2019-07-24 14:41 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-24 14:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 754 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #65 from wedens13@yandex.ru ---
(In reply to Sylvain BERTRAND from comment #62)
> unstable power supply lines to the gpu if overheating is excluded?

It's not overheating in my case, but my PSU is pretty old (I'm waiting for
components for my new build to arrive, including new PSU). I've lowered power
limit (to 80W) and I haven't had any crashes yet. 

So, in my case the problem *might be* related to PSU. But I can't exclude (nor
confirm) possibility of driver problems with higher power states (until I have
a better PSU).

I'll report back if I have any crashes with new PSU or lowered PL.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1582 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (64 preceding siblings ...)
  2019-07-24 14:27 ` bugzilla-daemon
@ 2019-07-24 14:41 ` bugzilla-daemon
  2019-07-24 14:55   ` sylvain.bertrand
  2019-07-24 14:56 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-24 14:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 646 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #66 from Hadet <hadet@protonmail.com> ---
I don't think it's faulty hardware in any of our cases to be perfectly honest,
it's a bad instruction set, this didn't happen with older kernels or firmware
and the issue now is there are so few of us with Vega cards that we're really
on our own trying to troubleshoot this situatio.

Since switching to wayland my crashing has been a lot less frequent, it'd say
once every couple days as opposed to once every few hours when gaming with
Vulkan/DXVK

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-24 14:41 ` bugzilla-daemon
@ 2019-07-24 14:55   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-24 14:55 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

> ...
> Vulkan/DXVK

The bugs may be in wine/DXVK then. You should report to a bug to them and link
this bug to theirs.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (65 preceding siblings ...)
  2019-07-24 14:41 ` bugzilla-daemon
@ 2019-07-24 14:56 ` bugzilla-daemon
  2019-07-27 11:28 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-24 14:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 330 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #67 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
> ...
> Vulkan/DXVK

The bugs may be in wine/DXVK then. You should report to a bug to them and link
this bug to theirs.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1153 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (66 preceding siblings ...)
  2019-07-24 14:56 ` bugzilla-daemon
@ 2019-07-27 11:28 ` bugzilla-daemon
  2019-07-27 13:19   ` sylvain.bertrand
  2019-07-27 13:19 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  137 siblings, 1 reply; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-27 11:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1130 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #68 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #67)
> > ...
> > Vulkan/DXVK
> 
> The bugs may be in wine/DXVK then. You should report to a bug to them and
> link
> this bug to theirs.

If any of you opened bugs on other bug trackers, please post a link here so we
can all contribute to both.

I did some test on my end and I can report the following:

System info:
OS: Ubuntu 18.04.2 LTS x86_64 
Kernel: 5.0.0-21-generic
Resolution: 3440x1440
CPU: AMD Ryzen 7 2700X (16) @ 3.700G 
GPU: AMD Vega 20 
Memory: 2650MiB / 64398MiB
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.2

1. Power profile set to manual did not help
2. Mesa-ACO from valve seem to have helped quite a bit. So far, no system
freezes

I installed Arch on another SSD and will try to reproduce the same tests:
1. Plain Arch - crash or not ?
2. Arch with forced power profile - crash or not ?
3- Arch with mesa-ACO - crash or not ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2003 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* Re: [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
  2019-07-27 11:28 ` bugzilla-daemon
@ 2019-07-27 13:19   ` sylvain.bertrand
  0 siblings, 0 replies; 147+ messages in thread
From: sylvain.bertrand @ 2019-07-27 13:19 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: dri-devel

Don't forget to provide the software stack used:

which sofware (game, cad...)? wine/dxvk? native?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (67 preceding siblings ...)
  2019-07-27 11:28 ` bugzilla-daemon
@ 2019-07-27 13:19 ` bugzilla-daemon
  2019-07-27 17:32 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-27 13:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 307 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #69 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
Don't forget to provide the software stack used:

which sofware (game, cad...)? wine/dxvk? native?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1096 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (68 preceding siblings ...)
  2019-07-27 13:19 ` bugzilla-daemon
@ 2019-07-27 17:32 ` bugzilla-daemon
  2019-07-28  3:14 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-27 17:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 461 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #70 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #69)
> Don't forget to provide the software stack used:
> 
> which sofware (game, cad...)? wine/dxvk? native?

Good point. Games being tested:

Pillars of Eternity - Native
Battletech - Native
Eve Online - Wine+DXVK

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1315 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (69 preceding siblings ...)
  2019-07-27 17:32 ` bugzilla-daemon
@ 2019-07-28  3:14 ` bugzilla-daemon
  2019-08-03 13:35 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-07-28  3:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 283 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #71 from Yury Zhuravlev <stalkerg@gmail.com> ---
Can somebody try games without any fps limits?
Like vblank_mode=0 and in-game limits.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1062 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (70 preceding siblings ...)
  2019-07-28  3:14 ` bugzilla-daemon
@ 2019-08-03 13:35 ` bugzilla-daemon
  2019-08-03 16:54 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-03 13:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 31274 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #72 from Mauro Gaspari <ilvipero@gmx.com> ---
After a few weeks without crashes on Ubuntu Budgie 18.04 LTS with valve
mesa-aco, I moved to another distribution that does not have valve mesa-aco to
cross check.

This is what I am using:
OS: openSUSE Tumbleweed x86_64 
Kernel: 5.2.2-1-default
Resolution: 3440x1440
DE: Xfce
WM: Xfwm4
CPU: AMD Ryzen 7 2700X (16) @ 3.700GHz
GPU: AMD ATI Radeon VII
Memory: 1644MiB / 64387MiB 
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.3
No kernel parameters configured, just out of the box openSUSE

I had 3 of full OS freezes:

1. As I was playing Albion Online (Native) No full system freeze, I was able to
drop to tty, and notice this error: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR*
Failed to initialize parser -125!

2. As I closed down Albion Online (Native) and returned to desktop. Full System
Freeze

3. As I was doing regular desktop operations on XFCE. No 3d gaming going on.
Please see below logs:

DMESG after crash:

ilvipero@MGDT-ROG:~> dmesg | grep amdgpu
[    5.758450] [drm] amdgpu kernel modesetting enabled.
[    5.758569] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 0:
0xe0000000 -> 0xefffffff
[    5.758570] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 2:
0xf0000000 -> 0xf01fffff
[    5.758571] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 5:
0xfcd00000 -> 0xfcd7ffff
[    5.758573] fb0: switching to amdgpudrmfb from EFI VGA
[    5.758646] amdgpu 0000:0a:00.0: vgaarb: deactivate vga console
[    5.758826] amdgpu 0000:0a:00.0: No more image in the PCI ROM
[    5.758870] amdgpu 0000:0a:00.0: VRAM: 16368M 0x0000008000000000 -
0x00000083FEFFFFFF (16368M used)
[    5.758871] amdgpu 0000:0a:00.0: GART: 512M 0x0000000000000000 -
0x000000001FFFFFFF
[    5.758872] amdgpu 0000:0a:00.0: AGP: 267894784M 0x0000008400000000 -
0x0000FFFFFFFFFFFF
[    5.758936] [drm] amdgpu: 16368M of VRAM memory ready
[    5.758938] [drm] amdgpu: 16368M of GTT memory ready.
[    5.759204] amdgpu 0000:0a:00.0: Direct firmware load for
amdgpu/vega20_ta.bin failed with error -2
[    5.759205] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
"amdgpu/vega20_ta.bin"
[    6.855053] fbcon: amdgpudrmfb (fb0) is primary device
[    6.913835] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[    6.928054] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[    6.928055] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    6.928056] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    6.928056] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    6.928057] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    6.928058] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    6.928059] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    6.928059] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    6.928060] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    6.928060] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    6.928061] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[    6.928062] amdgpu 0000:0a:00.0: ring page0 uses VM inv eng 1 on hub 1
[    6.928063] amdgpu 0000:0a:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[    6.928063] amdgpu 0000:0a:00.0: ring page1 uses VM inv eng 5 on hub 1
[    6.928064] amdgpu 0000:0a:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[    6.928064] amdgpu 0000:0a:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    6.928065] amdgpu 0000:0a:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    6.928066] amdgpu 0000:0a:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[    6.928066] amdgpu 0000:0a:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub
1
[    6.928067] amdgpu 0000:0a:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub
1
[    6.928067] amdgpu 0000:0a:00.0: ring vce0 uses VM inv eng 12 on hub 1
[    6.928068] amdgpu 0000:0a:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    6.928068] amdgpu 0000:0a:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    7.609167] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:0a:00.0 on
minor 0

system logs:

2019-08-03T18:51:21.779695+08:00 MGDT-ROG kernel: [11817.727681] pcieport
0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
2019-08-03T18:51:21.779730+08:00 MGDT-ROG kernel: [11817.771355] pcieport
0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer,
(Transmitter ID)
2019-08-03T18:51:21.779735+08:00 MGDT-ROG kernel: [11817.771358] pcieport
0000:00:03.1: AER:   device [1022:1453] error status/mask=00003100/00006000
2019-08-03T18:51:21.779737+08:00 MGDT-ROG kernel: [11817.771361] pcieport
0000:00:03.1: AER:    [ 8] Rollover              
2019-08-03T18:51:21.779738+08:00 MGDT-ROG kernel: [11817.771371] pcieport
0000:00:03.1: AER:    [12] Timeout               
2019-08-03T18:51:26.721833+08:00 MGDT-ROG sudo: pam_unix(sudo:session): session
closed for user root
2019-08-03T18:51:31.983837+08:00 MGDT-ROG kernel: [11827.971739]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=2324984, emitted seq=2324986
2019-08-03T18:51:31.983851+08:00 MGDT-ROG kernel: [11827.971800]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid
2132 thread X:cs0 pid 2139
2019-08-03T18:51:31.983853+08:00 MGDT-ROG kernel: [11827.971804] amdgpu
0000:0a:00.0: GPU reset begin!
2019-08-03T18:51:32.751834+08:00 MGDT-ROG kernel: [11828.741066] amdgpu:
[powerplay] Failed to send message 0x47, response 0xffffffff
2019-08-03T18:51:32.751846+08:00 MGDT-ROG kernel: [11828.741077] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:32.751849+08:00 MGDT-ROG kernel: [11828.741078] amdgpu:
[powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed!
2019-08-03T18:51:32.751850+08:00 MGDT-ROG kernel: [11828.741090] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:32.751852+08:00 MGDT-ROG kernel: [11828.741091] amdgpu:
[powerplay] Attempt to set Hard Min for DCEFCLK Failed!
2019-08-03T18:51:32.751854+08:00 MGDT-ROG kernel: [11828.741102] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:32.751855+08:00 MGDT-ROG kernel: [11828.741102] amdgpu:
[powerplay] [SetHardMinFreq] Set hard min uclk failed!
2019-08-03T18:51:32.751856+08:00 MGDT-ROG kernel: [11828.741113] amdgpu:
[powerplay] Failed to send message 0x26, response 0xffffffff
2019-08-03T18:51:32.751858+08:00 MGDT-ROG kernel: [11828.741114] amdgpu:
[powerplay] Failed to set soft min gfxclk !
2019-08-03T18:51:32.751859+08:00 MGDT-ROG kernel: [11828.741114] amdgpu:
[powerplay] Failed to upload DPM Bootup Levels!
2019-08-03T18:51:32.787843+08:00 MGDT-ROG kernel: [11828.775671] [drm] REG_WAIT
timeout 10us * 3000 tries - dce110_stream_encoder_dp_blank line:951
2019-08-03T18:51:32.787852+08:00 MGDT-ROG kernel: [11828.775672] ------------[
cut here ]------------
2019-08-03T18:51:32.787853+08:00 MGDT-ROG kernel: [11828.775778] WARNING: CPU:
1 PID: 10195 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:329
generic_reg_wait.cold+0x31/0x53 [amdgpu]
2019-08-03T18:51:32.787855+08:00 MGDT-ROG kernel: [11828.775779] Modules linked
in: tun fuse af_packet ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter ip_tables x_tables bpfilter uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common snd_usbmidi_lib
videodev snd_rawmidi snd_seq_device media joydev scsi_transport_iscsi msr
nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd kvm_amd kvm irqbypass
snd_hda_codec_realtek crct10dif_pclmul snd_hda_codec_generic crc32_pclmul
ledtrig_audio snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep aesni_intel eeepc_wmi asus_wmi aes_x86_64
sparse_keymap snd_pcm crypto_simd rfkill cryptd video glue_helper wmi_bmof
mxm_wmi igb snd_timer sp5100_tco snd ptp pcspkr i2c_piix4 pps_core dca k10temp
ccp soundcore gpio_amdpt gpio_generic pcc_cpufreq button acpi_cpufreq btrfs
libcrc32c xor hid_generic usbhid amdgpu raid6_pq amd_iommu_v2 gpu_sched
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
xhci_pci drm
2019-08-03T18:51:32.787858+08:00 MGDT-ROG kernel: [11828.775807]  crc32c_intel
xhci_hcd usbcore sr_mod cdrom wmi pinctrl_amd l2tp_ppp l2tp_netlink l2tp_core
ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
2019-08-03T18:51:32.787860+08:00 MGDT-ROG kernel: [11828.775817] CPU: 1 PID:
10195 Comm: kworker/1:0 Not tainted 5.2.3-1-default #1 openSUSE Tumbleweed
(unreleased)
2019-08-03T18:51:32.787861+08:00 MGDT-ROG kernel: [11828.775818] Hardware name:
System manufacturer System Product Name/ROG STRIX X470-F GAMING, BIOS 5007
06/17/2019
2019-08-03T18:51:32.787862+08:00 MGDT-ROG kernel: [11828.775822] Workqueue:
events drm_sched_job_timedout [gpu_sched]
2019-08-03T18:51:32.787863+08:00 MGDT-ROG kernel: [11828.775897] RIP:
0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
2019-08-03T18:51:32.787864+08:00 MGDT-ROG kernel: [11828.775899] Code: 4c 24 18
44 89 fa 89 ee 48 c7 c7 68 7c 75 c0 e8 e9 71 84 f4 83 7b 20 01 0f 84 2b 1b fe
ff 48 c7 c7 d8 7b 75 c0 e8 d3 71 84 f4 <0f> 0b e9 18 1b fe ff 48 c7 c7 d8 7b 75
c0 89 54 24 04 e8 bc 71 84
2019-08-03T18:51:32.787866+08:00 MGDT-ROG kernel: [11828.775901] RSP:
0018:ffffab7acdeb77e8 EFLAGS: 00010282
2019-08-03T18:51:32.787867+08:00 MGDT-ROG kernel: [11828.775902] RAX:
0000000000000024 RBX: ffff960e92c3c880 RCX: 0000000000000006
2019-08-03T18:51:32.787868+08:00 MGDT-ROG kernel: [11828.775903] RDX:
0000000000000007 RSI: 0000000000000096 RDI: ffff960e9e659a10
2019-08-03T18:51:32.787869+08:00 MGDT-ROG kernel: [11828.775903] RBP:
000000000000000a R08: 00000000000004da R09: 0000000000000001
2019-08-03T18:51:32.787870+08:00 MGDT-ROG kernel: [11828.775904] R10:
0000000000000000 R11: 0000000000000001 R12: 0000000000004ee2
2019-08-03T18:51:32.787871+08:00 MGDT-ROG kernel: [11828.775905] R13:
0000000000000bb9 R14: 0000000000000000 R15: 0000000000000bb8
2019-08-03T18:51:32.787872+08:00 MGDT-ROG kernel: [11828.775906] FS: 
0000000000000000(0000) GS:ffff960e9e640000(0000) knlGS:0000000000000000
2019-08-03T18:51:32.787874+08:00 MGDT-ROG kernel: [11828.775907] CS:  0010 DS:
0000 ES: 0000 CR0: 0000000080050033
2019-08-03T18:51:32.787874+08:00 MGDT-ROG kernel: [11828.775907] CR2:
000055d4170da000 CR3: 0000000f03cd6000 CR4: 00000000003406e0
2019-08-03T18:51:32.787875+08:00 MGDT-ROG kernel: [11828.775908] Call Trace:
2019-08-03T18:51:32.787876+08:00 MGDT-ROG kernel: [11828.775982] 
dce110_stream_encoder_dp_blank+0xda/0x120 [amdgpu]
2019-08-03T18:51:32.787877+08:00 MGDT-ROG kernel: [11828.776049] 
core_link_disable_stream+0x32/0x260 [amdgpu]
2019-08-03T18:51:32.787878+08:00 MGDT-ROG kernel: [11828.776054]  ?
printk+0x48/0x4a
2019-08-03T18:51:32.787879+08:00 MGDT-ROG kernel: [11828.776119] 
dce110_reset_hw_ctx_wrap+0xc1/0x1e0 [amdgpu]
2019-08-03T18:51:32.787881+08:00 MGDT-ROG kernel: [11828.776192]  ?
vega20_dpm_force_dpm_level.cold+0x5b/0x90 [amdgpu]
2019-08-03T18:51:32.787882+08:00 MGDT-ROG kernel: [11828.776256] 
dce110_apply_ctx_to_hw+0x3a/0x470 [amdgpu]
2019-08-03T18:51:32.787883+08:00 MGDT-ROG kernel: [11828.776318]  ?
hwmgr_handle_task+0x66/0xc0 [amdgpu]
2019-08-03T18:51:32.787884+08:00 MGDT-ROG kernel: [11828.776322]  ?
mutex_lock+0xe/0x30
2019-08-03T18:51:32.787885+08:00 MGDT-ROG kernel: [11828.776385]  ?
pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
2019-08-03T18:51:32.787886+08:00 MGDT-ROG kernel: [11828.776450]  ?
dm_pp_apply_display_requirements+0x1a1/0x1c0 [amdgpu]
2019-08-03T18:51:32.787887+08:00 MGDT-ROG kernel: [11828.776513] 
dc_commit_state_no_check+0x200/0x530 [amdgpu]
2019-08-03T18:51:32.787888+08:00 MGDT-ROG kernel: [11828.776516]  ?
get_page_from_freelist+0x289/0x380
2019-08-03T18:51:32.787889+08:00 MGDT-ROG kernel: [11828.776579] 
dc_commit_state+0x8f/0xb0 [amdgpu]
2019-08-03T18:51:32.787889+08:00 MGDT-ROG kernel: [11828.776644] 
amdgpu_dm_atomic_commit_tail+0x3a6/0xd30 [amdgpu]
2019-08-03T18:51:32.787890+08:00 MGDT-ROG kernel: [11828.776709]  ?
bw_calcs+0x8ac/0x1440 [amdgpu]
2019-08-03T18:51:32.787892+08:00 MGDT-ROG kernel: [11828.776711]  ?
__ww_mutex_lock.isra.0+0x2a/0x780
2019-08-03T18:51:32.787893+08:00 MGDT-ROG kernel: [11828.776714]  ?
_raw_spin_unlock_irqrestore+0x24/0x40
2019-08-03T18:51:32.787893+08:00 MGDT-ROG kernel: [11828.776717]  ?
__wake_up_common_lock+0x7c/0xa0
2019-08-03T18:51:32.787894+08:00 MGDT-ROG kernel: [11828.776719]  ?
wait_for_completion_timeout+0xf3/0x110
2019-08-03T18:51:32.787895+08:00 MGDT-ROG kernel: [11828.776720]  ?
wait_for_completion_interruptible+0x10b/0x150
2019-08-03T18:51:32.787896+08:00 MGDT-ROG kernel: [11828.776728]  ?
commit_tail+0x3c/0x70 [drm_kms_helper]
2019-08-03T18:51:32.787897+08:00 MGDT-ROG kernel: [11828.776735] 
commit_tail+0x3c/0x70 [drm_kms_helper]
2019-08-03T18:51:32.787898+08:00 MGDT-ROG kernel: [11828.776742] 
drm_atomic_helper_commit+0x108/0x110 [drm_kms_helper]
2019-08-03T18:51:32.787899+08:00 MGDT-ROG kernel: [11828.776749] 
drm_atomic_helper_disable_all+0x144/0x160 [drm_kms_helper]
2019-08-03T18:51:32.787900+08:00 MGDT-ROG kernel: [11828.776756] 
drm_atomic_helper_suspend+0x4c/0xe0 [drm_kms_helper]
2019-08-03T18:51:32.787901+08:00 MGDT-ROG kernel: [11828.776820] 
dm_suspend+0x20/0x60 [amdgpu]
2019-08-03T18:51:32.787902+08:00 MGDT-ROG kernel: [11828.776861] 
amdgpu_device_ip_suspend_phase1+0x8b/0xc0 [amdgpu]
2019-08-03T18:51:32.787903+08:00 MGDT-ROG kernel: [11828.776903] 
amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
2019-08-03T18:51:32.787904+08:00 MGDT-ROG kernel: [11828.776975] 
amdgpu_device_pre_asic_reset+0x1f4/0x209 [amdgpu]
2019-08-03T18:51:32.787905+08:00 MGDT-ROG kernel: [11828.777047] 
amdgpu_device_gpu_recover+0x67/0x765 [amdgpu]
2019-08-03T18:51:32.787906+08:00 MGDT-ROG kernel: [11828.777106] 
amdgpu_job_timedout+0xf7/0x120 [amdgpu]
2019-08-03T18:51:32.787906+08:00 MGDT-ROG kernel: [11828.777110] 
drm_sched_job_timedout+0x3a/0x70 [gpu_sched]
2019-08-03T18:51:32.787907+08:00 MGDT-ROG kernel: [11828.777113] 
process_one_work+0x1df/0x3c0
2019-08-03T18:51:32.787908+08:00 MGDT-ROG kernel: [11828.777115] 
worker_thread+0x4d/0x400
2019-08-03T18:51:32.787909+08:00 MGDT-ROG kernel: [11828.777117] 
kthread+0xf9/0x130
2019-08-03T18:51:32.787910+08:00 MGDT-ROG kernel: [11828.777119]  ?
process_one_work+0x3c0/0x3c0
2019-08-03T18:51:32.787911+08:00 MGDT-ROG kernel: [11828.777120]  ?
kthread_park+0x80/0x80
2019-08-03T18:51:32.787912+08:00 MGDT-ROG kernel: [11828.777122] 
ret_from_fork+0x27/0x50
2019-08-03T18:51:32.787913+08:00 MGDT-ROG kernel: [11828.777125] ---[ end trace
9aaf1f62ae398b4b ]---
2019-08-03T18:51:37.791882+08:00 MGDT-ROG kernel: [11833.780084]
[drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs
aborting
2019-08-03T18:51:37.791896+08:00 MGDT-ROG kernel: [11833.780129]
[drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck
executing B0B0 (len 2971, WS 4, PS 0) @ 0xB963
2019-08-03T18:51:37.791898+08:00 MGDT-ROG kernel: [11833.780172]
[drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck
executing AFB0 (len 255, WS 4, PS 0) @ 0xB089
2019-08-03T18:51:37.791899+08:00 MGDT-ROG kernel: [11833.780240]
[drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR*
dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
2019-08-03T18:51:37.791901+08:00 MGDT-ROG kernel: [11833.780240] ------------[
cut here ]------------
2019-08-03T18:51:37.791902+08:00 MGDT-ROG kernel: [11833.780328] WARNING: CPU:
1 PID: 10195 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_link_encoder.c:1096
dce110_link_encoder_disable_output+0x13d/0x150 [amdgpu]
2019-08-03T18:51:37.791903+08:00 MGDT-ROG kernel: [11833.780329] Modules linked
in: tun fuse af_packet ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter ip_tables x_tables bpfilter uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common snd_usbmidi_lib
videodev snd_rawmidi snd_seq_device media joydev scsi_transport_iscsi msr
nls_iso8859_1 nls_cp437 vfat fat edac_mce_amd kvm_amd kvm irqbypass
snd_hda_codec_realtek crct10dif_pclmul snd_hda_codec_generic crc32_pclmul
ledtrig_audio snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep aesni_intel eeepc_wmi asus_wmi aes_x86_64
sparse_keymap snd_pcm crypto_simd rfkill cryptd video glue_helper wmi_bmof
mxm_wmi igb snd_timer sp5100_tco snd ptp pcspkr i2c_piix4 pps_core dca k10temp
ccp soundcore gpio_amdpt gpio_generic pcc_cpufreq button acpi_cpufreq btrfs
libcrc32c xor hid_generic usbhid amdgpu raid6_pq amd_iommu_v2 gpu_sched
i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
xhci_pci drm
2019-08-03T18:51:37.791905+08:00 MGDT-ROG kernel: [11833.780356]  crc32c_intel
xhci_hcd usbcore sr_mod cdrom wmi pinctrl_amd l2tp_ppp l2tp_netlink l2tp_core
ip6_udp_tunnel udp_tunnel pppox ppp_generic slhc sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
2019-08-03T18:51:37.791907+08:00 MGDT-ROG kernel: [11833.780365] CPU: 1 PID:
10195 Comm: kworker/1:0 Tainted: G        W         5.2.3-1-default #1 openSUSE
Tumbleweed (unreleased)
2019-08-03T18:51:37.791908+08:00 MGDT-ROG kernel: [11833.780366] Hardware name:
System manufacturer System Product Name/ROG STRIX X470-F GAMING, BIOS 5007
06/17/2019
2019-08-03T18:51:37.791910+08:00 MGDT-ROG kernel: [11833.780370] Workqueue:
events drm_sched_job_timedout [gpu_sched]
2019-08-03T18:51:37.791911+08:00 MGDT-ROG kernel: [11833.780435] RIP:
0010:dce110_link_encoder_disable_output+0x13d/0x150 [amdgpu]
2019-08-03T18:51:37.791912+08:00 MGDT-ROG kernel: [11833.780437] Code: ff ff 48
83 c4 38 5b 5d 41 5c c3 48 c7 c6 c0 c8 6f c0 48 c7 c7 d8 d9 74 c0 e8 cf bb de
ff 48 c7 c7 70 d9 74 c0 e8 61 13 8c f4 <0f> 0b eb d4 66 66 2e 0f 1f 84 00 00 00
00 00 0f 1f 40 00 0f 1f 44
2019-08-03T18:51:37.791913+08:00 MGDT-ROG kernel: [11833.780438] RSP:
0018:ffffab7acdeb77f8 EFLAGS: 00010282
2019-08-03T18:51:37.791914+08:00 MGDT-ROG kernel: [11833.780439] RAX:
0000000000000024 RBX: ffff960e96034a80 RCX: 0000000000000006
2019-08-03T18:51:37.791915+08:00 MGDT-ROG kernel: [11833.780440] RDX:
0000000000000007 RSI: 0000000000000096 RDI: ffff960e9e659a10
2019-08-03T18:51:37.791917+08:00 MGDT-ROG kernel: [11833.780441] RBP:
0000000000000020 R08: 0000000000000518 R09: 0000000000000001
2019-08-03T18:51:37.791918+08:00 MGDT-ROG kernel: [11833.780441] R10:
0000000000000000 R11: 0000000000000001 R12: ffffab7acdeb77fc
2019-08-03T18:51:37.791919+08:00 MGDT-ROG kernel: [11833.780442] R13:
ffff95ffc13c1000 R14: 0000000000000000 R15: ffff9601c92c8188
2019-08-03T18:51:37.791920+08:00 MGDT-ROG kernel: [11833.780443] FS: 
0000000000000000(0000) GS:ffff960e9e640000(0000) knlGS:0000000000000000
2019-08-03T18:51:37.791921+08:00 MGDT-ROG kernel: [11833.780444] CS:  0010 DS:
0000 ES: 0000 CR0: 0000000080050033
2019-08-03T18:51:37.791922+08:00 MGDT-ROG kernel: [11833.780445] CR2:
000055d4170da000 CR3: 0000000f03cd6000 CR4: 00000000003406e0
2019-08-03T18:51:37.791923+08:00 MGDT-ROG kernel: [11833.780446] Call Trace:
2019-08-03T18:51:37.791924+08:00 MGDT-ROG kernel: [11833.780512] 
dp_disable_link_phy+0x73/0x110 [amdgpu]
2019-08-03T18:51:37.791925+08:00 MGDT-ROG kernel: [11833.780576] 
core_link_disable_stream+0xb6/0x260 [amdgpu]
2019-08-03T18:51:37.791926+08:00 MGDT-ROG kernel: [11833.780580]  ?
printk+0x48/0x4a
2019-08-03T18:51:37.791927+08:00 MGDT-ROG kernel: [11833.780642] 
dce110_reset_hw_ctx_wrap+0xc1/0x1e0 [amdgpu]
2019-08-03T18:51:37.791928+08:00 MGDT-ROG kernel: [11833.780716]  ?
vega20_dpm_force_dpm_level.cold+0x5b/0x90 [amdgpu]
2019-08-03T18:51:37.791929+08:00 MGDT-ROG kernel: [11833.780779] 
dce110_apply_ctx_to_hw+0x3a/0x470 [amdgpu]
2019-08-03T18:51:37.791930+08:00 MGDT-ROG kernel: [11833.780840]  ?
hwmgr_handle_task+0x66/0xc0 [amdgpu]
2019-08-03T18:51:37.791931+08:00 MGDT-ROG kernel: [11833.780843]  ?
mutex_lock+0xe/0x30
2019-08-03T18:51:37.791933+08:00 MGDT-ROG kernel: [11833.780905]  ?
pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
2019-08-03T18:51:37.791934+08:00 MGDT-ROG kernel: [11833.780969]  ?
dm_pp_apply_display_requirements+0x1a1/0x1c0 [amdgpu]
2019-08-03T18:51:37.791935+08:00 MGDT-ROG kernel: [11833.781032] 
dc_commit_state_no_check+0x200/0x530 [amdgpu]
2019-08-03T18:51:37.791936+08:00 MGDT-ROG kernel: [11833.781036]  ?
get_page_from_freelist+0x289/0x380
2019-08-03T18:51:37.791937+08:00 MGDT-ROG kernel: [11833.781098] 
dc_commit_state+0x8f/0xb0 [amdgpu]
2019-08-03T18:51:37.791938+08:00 MGDT-ROG kernel: [11833.781162] 
amdgpu_dm_atomic_commit_tail+0x3a6/0xd30 [amdgpu]
2019-08-03T18:51:37.791939+08:00 MGDT-ROG kernel: [11833.781227]  ?
bw_calcs+0x8ac/0x1440 [amdgpu]
2019-08-03T18:51:37.791940+08:00 MGDT-ROG kernel: [11833.781229]  ?
__ww_mutex_lock.isra.0+0x2a/0x780
2019-08-03T18:51:37.791941+08:00 MGDT-ROG kernel: [11833.781231]  ?
_raw_spin_unlock_irqrestore+0x24/0x40
2019-08-03T18:51:37.791942+08:00 MGDT-ROG kernel: [11833.781234]  ?
__wake_up_common_lock+0x7c/0xa0
2019-08-03T18:51:37.791943+08:00 MGDT-ROG kernel: [11833.781236]  ?
wait_for_completion_timeout+0xf3/0x110
2019-08-03T18:51:37.791944+08:00 MGDT-ROG kernel: [11833.781237]  ?
wait_for_completion_interruptible+0x10b/0x150
2019-08-03T18:51:37.791945+08:00 MGDT-ROG kernel: [11833.781245]  ?
commit_tail+0x3c/0x70 [drm_kms_helper]
2019-08-03T18:51:37.791946+08:00 MGDT-ROG kernel: [11833.781251] 
commit_tail+0x3c/0x70 [drm_kms_helper]
2019-08-03T18:51:37.791947+08:00 MGDT-ROG kernel: [11833.781258] 
drm_atomic_helper_commit+0x108/0x110 [drm_kms_helper]
2019-08-03T18:51:37.791948+08:00 MGDT-ROG kernel: [11833.781265] 
drm_atomic_helper_disable_all+0x144/0x160 [drm_kms_helper]
2019-08-03T18:51:37.791949+08:00 MGDT-ROG kernel: [11833.781272] 
drm_atomic_helper_suspend+0x4c/0xe0 [drm_kms_helper]
2019-08-03T18:51:37.791950+08:00 MGDT-ROG kernel: [11833.781335] 
dm_suspend+0x20/0x60 [amdgpu]
2019-08-03T18:51:37.791951+08:00 MGDT-ROG kernel: [11833.781377] 
amdgpu_device_ip_suspend_phase1+0x8b/0xc0 [amdgpu]
2019-08-03T18:51:37.791952+08:00 MGDT-ROG kernel: [11833.781418] 
amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
2019-08-03T18:51:37.791953+08:00 MGDT-ROG kernel: [11833.781490] 
amdgpu_device_pre_asic_reset+0x1f4/0x209 [amdgpu]
2019-08-03T18:51:37.791954+08:00 MGDT-ROG kernel: [11833.781561] 
amdgpu_device_gpu_recover+0x67/0x765 [amdgpu]
2019-08-03T18:51:37.791955+08:00 MGDT-ROG kernel: [11833.781620] 
amdgpu_job_timedout+0xf7/0x120 [amdgpu]
2019-08-03T18:51:37.791956+08:00 MGDT-ROG kernel: [11833.781624] 
drm_sched_job_timedout+0x3a/0x70 [gpu_sched]
2019-08-03T18:51:37.791957+08:00 MGDT-ROG kernel: [11833.781627] 
process_one_work+0x1df/0x3c0
2019-08-03T18:51:37.791958+08:00 MGDT-ROG kernel: [11833.781629] 
worker_thread+0x4d/0x400
2019-08-03T18:51:37.791959+08:00 MGDT-ROG kernel: [11833.781631] 
kthread+0xf9/0x130
2019-08-03T18:51:37.791960+08:00 MGDT-ROG kernel: [11833.781633]  ?
process_one_work+0x3c0/0x3c0
2019-08-03T18:51:37.791961+08:00 MGDT-ROG kernel: [11833.781634]  ?
kthread_park+0x80/0x80
2019-08-03T18:51:37.791962+08:00 MGDT-ROG kernel: [11833.781636] 
ret_from_fork+0x27/0x50
2019-08-03T18:51:37.791963+08:00 MGDT-ROG kernel: [11833.781639] ---[ end trace
9aaf1f62ae398b4c ]---
2019-08-03T18:51:42.796019+08:00 MGDT-ROG kernel: [11838.784083]
[drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 5secs
aborting
2019-08-03T18:51:42.796034+08:00 MGDT-ROG kernel: [11838.784127]
[drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck
executing A048 (len 62, WS 0, PS 0) @ 0xA064
2019-08-03T18:51:42.796035+08:00 MGDT-ROG kernel: [11838.784208] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:42.796036+08:00 MGDT-ROG kernel: [11838.784219] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:42.796038+08:00 MGDT-ROG kernel: [11838.784233] amdgpu:
[powerplay] Failed to send message 0x47, response 0xffffffff
2019-08-03T18:51:42.796039+08:00 MGDT-ROG kernel: [11838.784245] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:42.796040+08:00 MGDT-ROG kernel: [11838.784245] amdgpu:
[powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed!
2019-08-03T18:51:42.796041+08:00 MGDT-ROG kernel: [11838.784258] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:42.796042+08:00 MGDT-ROG kernel: [11838.784258] amdgpu:
[powerplay] Attempt to set Hard Min for DCEFCLK Failed!
2019-08-03T18:51:42.796044+08:00 MGDT-ROG kernel: [11838.784269] amdgpu:
[powerplay] Failed to send message 0x28, response 0xffffffff
2019-08-03T18:51:42.796045+08:00 MGDT-ROG kernel: [11838.784270] amdgpu:
[powerplay] [SetHardMinFreq] Set hard min uclk failed!
2019-08-03T18:51:42.796046+08:00 MGDT-ROG kernel: [11838.784281] amdgpu:
[powerplay] Failed to send message 0x26, response 0xffffffff
2019-08-03T18:51:42.796047+08:00 MGDT-ROG kernel: [11838.784282] amdgpu:
[powerplay] Failed to set soft min gfxclk !
2019-08-03T18:51:42.796048+08:00 MGDT-ROG kernel: [11838.784282] amdgpu:
[powerplay] Failed to upload DPM Bootup Levels!
2019-08-03T18:51:43.656061+08:00 MGDT-ROG kernel: [11839.645436] amdgpu:
[powerplay] Failed to send message 0x26, response 0xffffffff
2019-08-03T18:51:43.656078+08:00 MGDT-ROG kernel: [11839.645438] amdgpu:
[powerplay] Failed to set soft min gfxclk !
2019-08-03T18:51:43.656080+08:00 MGDT-ROG kernel: [11839.645438] amdgpu:
[powerplay] Failed to upload DPM Bootup Levels!
2019-08-03T18:51:43.656081+08:00 MGDT-ROG kernel: [11839.645449] amdgpu:
[powerplay] Failed to send message 0x7, response 0xffffffff
2019-08-03T18:51:43.656082+08:00 MGDT-ROG kernel: [11839.645450] amdgpu:
[powerplay] [DisableAllSMUFeatures] Failed to disable all smu features!
2019-08-03T18:51:43.656083+08:00 MGDT-ROG kernel: [11839.645450] amdgpu:
[powerplay] [DisableDpmTasks] Failed to disable all smu features!
2019-08-03T18:51:43.656084+08:00 MGDT-ROG kernel: [11839.645451] amdgpu:
[powerplay] [PowerOffAsic] Failed to disable DPM!
2019-08-03T18:51:43.656086+08:00 MGDT-ROG kernel: [11839.645497]
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block
<powerplay> failed -5
2019-08-03T18:51:43.911990+08:00 MGDT-ROG kernel: [11839.902893] amdgpu
0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0
test failed (-110)
2019-08-03T18:51:43.912001+08:00 MGDT-ROG kernel: [11839.902947]
[drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
2019-08-03T18:51:44.167806+08:00 MGDT-ROG kernel: [11840.159797] [drm] Timeout
wait for RLC serdes 0,0
2019-08-03T18:51:44.191826+08:00 MGDT-ROG kernel: [11840.180793] amdgpu
0000:0a:00.0: GPU mode1 reset
2019-08-03T18:51:44.451982+08:00 MGDT-ROG kernel: [11840.442308] [drm] psp is
not working correctly before mode1 reset!
2019-08-03T18:51:44.451993+08:00 MGDT-ROG kernel: [11840.442310] amdgpu
0000:0a:00.0: GPU mode1 reset failed
2019-08-03T18:51:44.719056+08:00 MGDT-ROG kernel: [11840.710967]
[drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* ASIC reset failed with error,
-22 for drm dev, 0000:0a:00.0
2019-08-03T18:51:44.719066+08:00 MGDT-ROG kernel: [11840.711014] amdgpu
0000:0a:00.0: GPU reset(1) failed
2019-08-03T18:51:44.719068+08:00 MGDT-ROG kernel: [11840.711033] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719068+08:00 MGDT-ROG kernel: [11840.711038] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719070+08:00 MGDT-ROG kernel: [11840.711040] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719071+08:00 MGDT-ROG kernel: [11840.711043] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719072+08:00 MGDT-ROG kernel: [11840.711045] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719073+08:00 MGDT-ROG kernel: [11840.711049] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719075+08:00 MGDT-ROG kernel: [11840.711051] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719076+08:00 MGDT-ROG kernel: [11840.711053] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719077+08:00 MGDT-ROG kernel: [11840.711057] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719078+08:00 MGDT-ROG kernel: [11840.711059] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719079+08:00 MGDT-ROG kernel: [11840.711061] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719080+08:00 MGDT-ROG kernel: [11840.711064] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719081+08:00 MGDT-ROG kernel: [11840.711066] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719082+08:00 MGDT-ROG kernel: [11840.711068] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719083+08:00 MGDT-ROG kernel: [11840.711072] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719084+08:00 MGDT-ROG kernel: [11840.711075] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719085+08:00 MGDT-ROG kernel: [11840.711077] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719086+08:00 MGDT-ROG kernel: [11840.711080] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719087+08:00 MGDT-ROG kernel: [11840.711083] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719088+08:00 MGDT-ROG kernel: [11840.711085] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719089+08:00 MGDT-ROG kernel: [11840.711087] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719090+08:00 MGDT-ROG kernel: [11840.711090] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719091+08:00 MGDT-ROG kernel: [11840.711092] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719092+08:00 MGDT-ROG kernel: [11840.711094] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719093+08:00 MGDT-ROG kernel: [11840.711096] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719094+08:00 MGDT-ROG kernel: [11840.711097] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719095+08:00 MGDT-ROG kernel: [11840.711100] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719096+08:00 MGDT-ROG kernel: [11840.711102] amdgpu
0000:0a:00.0: GPU reset end with ret = -22
2019-08-03T18:51:44.719097+08:00 MGDT-ROG kernel: [11840.711102] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719098+08:00 MGDT-ROG kernel: [11840.711104] [drm] Skip
scheduling IBs!
2019-08-03T18:51:44.719099+08:00 MGDT-ROG kernel: [11840.711106] [drm] Skip
scheduling IBs!
2019-08-03T18:51:54.767980+08:00 MGDT-ROG kernel: [11850.756186]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=2324986, emitted seq=2324986
2019-08-03T18:51:54.767994+08:00 MGDT-ROG kernel: [11850.756247]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid
2132 thread X:cs0 pid 2139
2019-08-03T18:51:54.767996+08:00 MGDT-ROG kernel: [11850.756251] amdgpu
0000:0a:00.0: GPU reset begin!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 32110 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (71 preceding siblings ...)
  2019-08-03 13:35 ` bugzilla-daemon
@ 2019-08-03 16:54 ` bugzilla-daemon
  2019-08-03 17:43 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-03 16:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 563 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #73 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
On Sat, Aug 03, 2019 at 01:35:55PM +0000, bugzilla-daemon@freedesktop.org
wrote:
> [    5.759204] amdgpu 0000:0a:00.0: Direct firmware load for
> amdgpu/vega20_ta.bin failed with error -2
> [    5.759205] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> "amdgpu/vega20_ta.bin"

Did you get the latest and "greatest" amdgpu firmware package?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (72 preceding siblings ...)
  2019-08-03 16:54 ` bugzilla-daemon
@ 2019-08-03 17:43 ` bugzilla-daemon
  2019-08-03 18:46 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-03 17:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 870 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #74 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #73)
> On Sat, Aug 03, 2019 at 01:35:55PM +0000, bugzilla-daemon@freedesktop.org
> wrote:
> > [    5.759204] amdgpu 0000:0a:00.0: Direct firmware load for
> > amdgpu/vega20_ta.bin failed with error -2
> > [    5.759205] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> > "amdgpu/vega20_ta.bin"
> 
> Did you get the latest and "greatest" amdgpu firmware package?

This is a fresh install I made to test this issue, so for now I only installed
the packages per openSUSE wiki: https://en.opensuse.org/SDB:AMDGPU

I have done a snapper btrfs snapshot therefore if there is anything you want me
to test, I am ready.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1881 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (73 preceding siblings ...)
  2019-08-03 17:43 ` bugzilla-daemon
@ 2019-08-03 18:46 ` bugzilla-daemon
  2019-08-04  5:05 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-03 18:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 625 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #75 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
On Sat, Aug 03, 2019 at 05:43:01PM +0000, bugzilla-daemon@freedesktop.org
wrote:
> > > [    5.759204] amdgpu 0000:0a:00.0: Direct firmware load for
> > > amdgpu/vega20_ta.bin failed with error -2
> > > [    5.759205] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> > > "amdgpu/vega20_ta.bin"

It seems you have a corrupted/old/missing vega20_ta.bin firmware file.
It looks like outdated distro files.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1549 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (74 preceding siblings ...)
  2019-08-03 18:46 ` bugzilla-daemon
@ 2019-08-04  5:05 ` bugzilla-daemon
  2019-08-04 14:18 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-04  5:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3887 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #76 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #75)
> On Sat, Aug 03, 2019 at 05:43:01PM +0000, bugzilla-daemon@freedesktop.org
> wrote:
> > > > [    5.759204] amdgpu 0000:0a:00.0: Direct firmware load for
> > > > amdgpu/vega20_ta.bin failed with error -2
> > > > [    5.759205] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> > > > "amdgpu/vega20_ta.bin"
> 
> It seems you have a corrupted/old/missing vega20_ta.bin firmware file.
> It looks like outdated distro files.

Hello,
I did some quick search online and it seems a common problem for many users
amdgpu. And looking around on other reports they seem to be dismissed as
warnings and not mandatory. I am not an expert and I do not  want to dismiss it
here, just report what I see.

By the way, Interesting to see that even my ubuntu budgie LTS with valve
mesa-aco and different kernel, has the same warning.

[    5.435346] [drm] amdgpu kernel modesetting enabled.
[    5.435500] fb0: switching to amdgpudrmfb from EFI VGA
[    5.735058] amdgpu 0000:0a:00.0: No more image in the PCI ROM
[    5.735102] amdgpu 0000:0a:00.0: VRAM: 16368M 0x0000008000000000 -
0x00000083FEFFFFFF (16368M used)
[    5.735103] amdgpu 0000:0a:00.0: GART: 512M 0x0000000000000000 -
0x000000001FFFFFFF
[    5.735104] amdgpu 0000:0a:00.0: AGP: 267894784M 0x0000008400000000 -
0x0000FFFFFFFFFFFF
[    5.735185] [drm] amdgpu: 16368M of VRAM memory ready
[    5.735186] [drm] amdgpu: 16368M of GTT memory ready.
[    5.739656] amdgpu 0000:0a:00.0: Direct firmware load for
amdgpu/vega20_ta.bin failed with error -2
[    5.739659] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
"amdgpu/vega20_ta.bin"
[    6.354308] fbcon: amdgpudrmfb (fb0) is primary device
[    6.354490] amdgpu 0000:0a:00.0: fb0: amdgpudrmfb frame buffer device
[    6.384079] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[    6.384080] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    6.384081] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    6.384082] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    6.384083] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    6.384084] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    6.384084] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    6.384085] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    6.384086] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    6.384087] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    6.384088] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[    6.384089] amdgpu 0000:0a:00.0: ring page0 uses VM inv eng 1 on hub 1
[    6.384089] amdgpu 0000:0a:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[    6.384090] amdgpu 0000:0a:00.0: ring page1 uses VM inv eng 5 on hub 1
[    6.384090] amdgpu 0000:0a:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[    6.384091] amdgpu 0000:0a:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    6.384092] amdgpu 0000:0a:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    6.384092] amdgpu 0000:0a:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[    6.384093] amdgpu 0000:0a:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub
1
[    6.384094] amdgpu 0000:0a:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub
1
[    6.384094] amdgpu 0000:0a:00.0: ring vce0 uses VM inv eng 12 on hub 1
[    6.384095] amdgpu 0000:0a:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    6.384096] amdgpu 0000:0a:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    7.067068] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0a:00.0 on
minor 0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4876 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (75 preceding siblings ...)
  2019-08-04  5:05 ` bugzilla-daemon
@ 2019-08-04 14:18 ` bugzilla-daemon
  2019-08-04 16:17 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-04 14:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1007 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #77 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
On Sun, Aug 04, 2019 at 05:05:52AM +0000, bugzilla-daemon@freedesktop.org
wrote:
> By the way, Interesting to see that even my ubuntu budgie LTS with valve
> mesa-aco and different kernel, has the same warning.
> [    5.739656] amdgpu 0000:0a:00.0: Direct firmware load for
> amdgpu/vega20_ta.bin failed with error -2
> [    5.739659] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> "amdgpu/vega20_ta.bin"

I don't know of an AMD GPU part able to run without properly loaded firmware.

That would have to be confirmed by official AMD devs which are the sole ppl
with that knowledge.

In the very probable case that the firmware _must_ be loaded for proper gpu
operations, you have to tell the maintainers of the distros you use to update
their linux/amdgpu firmware package.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1913 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (76 preceding siblings ...)
  2019-08-04 14:18 ` bugzilla-daemon
@ 2019-08-04 16:17 ` bugzilla-daemon
  2019-08-05  5:54 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-04 16:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1615 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #78 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Sylvain BERTRAND from comment #77)
> On Sun, Aug 04, 2019 at 05:05:52AM +0000, bugzilla-daemon@freedesktop.org
> wrote:
> > By the way, Interesting to see that even my ubuntu budgie LTS with valve
> > mesa-aco and different kernel, has the same warning.
> > [    5.739656] amdgpu 0000:0a:00.0: Direct firmware load for
> > amdgpu/vega20_ta.bin failed with error -2
> > [    5.739659] amdgpu 0000:0a:00.0: psp v11.0: Failed to load firmware
> > "amdgpu/vega20_ta.bin"
> 
> I don't know of an AMD GPU part able to run without properly loaded firmware.
> 
> That would have to be confirmed by official AMD devs which are the sole ppl
> with that knowledge.
> 
> In the very probable case that the firmware _must_ be loaded for proper gpu
> operations, you have to tell the maintainers of the distros you use to update
> their linux/amdgpu firmware package.

I believe so, and yes it makes total sense that you need the correct firmware
for a piece of hardware to work properly. 
I will open bugs for openSUSE and ubuntu, and ask the questions, point to this
bug tracker. Let's see what comes out. I will report back as I hear from
distribution maintainers. 

I am using a RadeonVII at the moment. Is there anyone with a Vega64 or Vega56
that can do the same tests and let me know if they see same issue? I am happy
to include those cards in my same bug reports if someone can confirm.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2600 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (77 preceding siblings ...)
  2019-08-04 16:17 ` bugzilla-daemon
@ 2019-08-05  5:54 ` bugzilla-daemon
  2019-08-05  6:16 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-05  5:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 346 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #79 from Alex Deucher <alexdeucher@gmail.com> ---
the ta bin is optional.  It's only used for server cards with xgmi and ras
features.  Consumer cards don't support those features and don't use it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1126 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (78 preceding siblings ...)
  2019-08-05  5:54 ` bugzilla-daemon
@ 2019-08-05  6:16 ` bugzilla-daemon
  2019-08-07  9:53 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-05  6:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 609 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #80 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Alex Deucher from comment #79)
> the ta bin is optional.  It's only used for server cards with xgmi and ras
> features.  Consumer cards don't support those features and don't use it.

Alex,
Thank you for confirming this. Good to know.
Regarding the logs and dmesg I posted above, in comment #72, do you see
anything useful? Is there any other specific tests I can do to help pinpoint
the issue?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1501 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (79 preceding siblings ...)
  2019-08-05  6:16 ` bugzilla-daemon
@ 2019-08-07  9:53 ` bugzilla-daemon
  2019-08-11  9:31 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-07  9:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 319 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #81 from Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> ---
Can anyone provide a apitrace/renderdoc capture that can reliably reproduce the
crash/freeze?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1126 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (80 preceding siblings ...)
  2019-08-07  9:53 ` bugzilla-daemon
@ 2019-08-11  9:31 ` bugzilla-daemon
  2019-08-12  2:50 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-11  9:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6959 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #82 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Pierre-Eric Pelloux-Prayer from comment #81)
> Can anyone provide a apitrace/renderdoc capture that can reliably reproduce
> the crash/freeze?

Hello, Sadly my freezes are hard to reproduce. Sometimes I can play for a day
with no freeze, sometimes it freezes in 10 minutes, one hour, and so on.

I had another freeze today:

OS: openSUSE Tumbleweed x86_64 
Kernel: 5.2.5-1-default
Resolution: 3440x1440
DE: Xfce
WM: Xfwm4
CPU: AMD Ryzen 7 2700X (16) @ 3.700GHz
GPU: AMD ATI Radeon VII
Memory: 3791MiB / 64387MiB 
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.3

Game: EVE Online: Wine+DXVK. (Crossover 18.5.0) vsync off frame limiter off
Problem description: Afer rougly 1 hour of gameplay, desktop Frozen for a few
seconds but managed to recover. Game did not recover and I killed the process. 

DMESG:

[20612.721860] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=12880412, emitted seq=12880414
[20612.721921] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process exefile.exe pid 1980 thread exefile.ex:cs0 pid 2057
[20612.721925] amdgpu 0000:0a:00.0: GPU reset begin!
[20613.526448] amdgpu 0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[20613.526502] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[20613.547524] amdgpu 0000:0a:00.0: GPU mode1 reset
[20614.055810] [drm] psp mode1 reset succeed 
[20614.128815] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[20614.128943] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[20614.129304] [drm] PSP is resuming...
[20614.192202] [drm] reserve 0x400000 from 0x8000c00000 for PSP TMR SIZE
[20614.649220] [drm] UVD and UVD ENC initialized successfully.
[20614.748872] [drm] VCE initialized successfully.
[20615.271942] [drm] Fence fallback timer expired on ring gfx
[20615.783826] [drm] Fence fallback timer expired on ring comp_1.0.0
[20616.616023] [drm] Fence fallback timer expired on ring uvd_1
[20617.127844] [drm] Fence fallback timer expired on ring uvd_enc_1.0
[20617.639836] [drm] Fence fallback timer expired on ring uvd_enc_1.1
[20617.739606] [drm] recover vram bo from shadow start
[20617.742231] [drm] recover vram bo from shadow done
[20617.742233] [drm] Skip scheduling IBs!
[20617.742234] [drm] Skip scheduling IBs!
[20617.742259] amdgpu 0000:0a:00.0: GPU reset(2) succeeded!
[20617.742289] [drm] Skip scheduling IBs!
[20617.742309] [drm] Skip scheduling IBs!
[20617.742314] [drm] Skip scheduling IBs!
[20617.742316] [drm] Skip scheduling IBs!
[20617.742318] [drm] Skip scheduling IBs!
[20617.742320] [drm] Skip scheduling IBs!
[20617.743840] [drm] Skip scheduling IBs!
[20617.744006] [drm] Skip scheduling IBs!
[20617.744180] [drm] Skip scheduling IBs!
[20617.744450] [drm] Skip scheduling IBs!

System Logs:

2019-08-11T17:13:10.377029+08:00 MGDT-ROG kernel: [20612.721860]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
seq=12880412, emitted seq=12880414
2019-08-11T17:13:10.377046+08:00 MGDT-ROG kernel: [20612.721921]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process
exefile.exe pid 1980 thread exefile.ex:cs0 pid 2057
2019-08-11T17:13:10.377047+08:00 MGDT-ROG kernel: [20612.721925] amdgpu
0000:0a:00.0: GPU reset begin!
2019-08-11T17:13:11.182763+08:00 MGDT-ROG kernel: [20613.526448] amdgpu
0000:0a:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0
test failed (-110)
2019-08-11T17:13:11.182776+08:00 MGDT-ROG kernel: [20613.526502]
[drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
2019-08-11T17:13:11.202766+08:00 MGDT-ROG kernel: [20613.547524] amdgpu
0000:0a:00.0: GPU mode1 reset
2019-08-11T17:13:11.714757+08:00 MGDT-ROG kernel: [20614.055810] [drm] psp
mode1 reset succeed 
2019-08-11T17:13:11.786740+08:00 MGDT-ROG kernel: [20614.128815] amdgpu
0000:0a:00.0: GPU reset succeeded, trying to resume
2019-08-11T17:13:11.786749+08:00 MGDT-ROG kernel: [20614.128943] [drm] PCIE
GART of 512M enabled (table at 0x0000008000300000).
2019-08-11T17:13:11.786751+08:00 MGDT-ROG kernel: [20614.129304] [drm] PSP is
resuming...
2019-08-11T17:13:11.850739+08:00 MGDT-ROG kernel: [20614.192202] [drm] reserve
0x400000 from 0x8000c00000 for PSP TMR SIZE
2019-08-11T17:13:12.306756+08:00 MGDT-ROG kernel: [20614.649220] [drm] UVD and
UVD ENC initialized successfully.
2019-08-11T17:13:12.406756+08:00 MGDT-ROG kernel: [20614.748872] [drm] VCE
initialized successfully.
2019-08-11T17:13:12.926899+08:00 MGDT-ROG kernel: [20615.271942] [drm] Fence
fallback timer expired on ring gfx
2019-08-11T17:13:13.438783+08:00 MGDT-ROG kernel: [20615.783826] [drm] Fence
fallback timer expired on ring comp_1.0.0
2019-08-11T17:13:14.274773+08:00 MGDT-ROG kernel: [20616.616023] [drm] Fence
fallback timer expired on ring uvd_1
2019-08-11T17:13:14.671435+08:00 MGDT-ROG tracker-store[4801]: OK
2019-08-11T17:13:14.672970+08:00 MGDT-ROG systemd[2481]: tracker-store.service:
Succeeded.
2019-08-11T17:13:14.782896+08:00 MGDT-ROG kernel: [20617.127844] [drm] Fence
fallback timer expired on ring uvd_enc_1.0
2019-08-11T17:13:15.294768+08:00 MGDT-ROG kernel: [20617.639836] [drm] Fence
fallback timer expired on ring uvd_enc_1.1
2019-08-11T17:13:15.394759+08:00 MGDT-ROG kernel: [20617.739606] [drm] recover
vram bo from shadow start
2019-08-11T17:13:15.397215+08:00 MGDT-ROG kernel: [20617.742231] [drm] recover
vram bo from shadow done
2019-08-11T17:13:15.397227+08:00 MGDT-ROG kernel: [20617.742233] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397228+08:00 MGDT-ROG kernel: [20617.742234] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397231+08:00 MGDT-ROG kernel: [20617.742259] amdgpu
0000:0a:00.0: GPU reset(2) succeeded!
2019-08-11T17:13:15.397233+08:00 MGDT-ROG kernel: [20617.742289] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397235+08:00 MGDT-ROG kernel: [20617.742309] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397242+08:00 MGDT-ROG kernel: [20617.742314] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397262+08:00 MGDT-ROG kernel: [20617.742316] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397265+08:00 MGDT-ROG kernel: [20617.742318] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.397268+08:00 MGDT-ROG kernel: [20617.742320] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.402744+08:00 MGDT-ROG kernel: [20617.743840] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.402753+08:00 MGDT-ROG kernel: [20617.744006] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.402755+08:00 MGDT-ROG kernel: [20617.744180] [drm] Skip
scheduling IBs!
2019-08-11T17:13:15.402757+08:00 MGDT-ROG kernel: [20617.744450] [drm] Skip
scheduling IBs!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7814 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (81 preceding siblings ...)
  2019-08-11  9:31 ` bugzilla-daemon
@ 2019-08-12  2:50 ` bugzilla-daemon
  2019-08-12  8:16 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-12  2:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 572 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #83 from J. Andrew Lanz-O'Brien <jlanzobr@gmail.com> ---
Can confirm that this bug is still present as of August 11, 2019 on kernel
5.2.8 with mesa 19.1.4. Borderlands 2 hard locked my system about 5 times
tonight. Manually setting the power profile didn't help either, ie these two
commands:

echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1365 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (82 preceding siblings ...)
  2019-08-12  2:50 ` bugzilla-daemon
@ 2019-08-12  8:16 ` bugzilla-daemon
  2019-08-12 14:10 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-12  8:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1005 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #84 from Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> ---
(In reply to Mauro Gaspari from comment #82)
> (In reply to Pierre-Eric Pelloux-Prayer from comment #81)
> > Can anyone provide a apitrace/renderdoc capture that can reliably reproduce
> > the crash/freeze?
> 
> Hello, Sadly my freezes are hard to reproduce. Sometimes I can play for a
> day with no freeze, sometimes it freezes in 10 minutes, one hour, and so on.
> 

Ok.

This patch https://patchwork.freedesktop.org/series/64792/ might help: it won't
fix any issue, but when a timeout is detected it should allow the soft recovery
of the GPU.

Other things worth trying: setting AMD_DEBUG environment variables. I'd
suggest:

   AMD_DEBUG=zerovram,nodma,nodpbb

There are others (see mesa/src/gallium/drivers/radeonsi/si_pipe.c) to try if
these don't help.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2011 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (83 preceding siblings ...)
  2019-08-12  8:16 ` bugzilla-daemon
@ 2019-08-12 14:10 ` bugzilla-daemon
  2019-08-13 15:59 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-12 14:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1484 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #85 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Pierre-Eric Pelloux-Prayer from comment #84)
> (In reply to Mauro Gaspari from comment #82)
> > (In reply to Pierre-Eric Pelloux-Prayer from comment #81)
> > > Can anyone provide a apitrace/renderdoc capture that can reliably reproduce
> > > the crash/freeze?
> > 
> > Hello, Sadly my freezes are hard to reproduce. Sometimes I can play for a
> > day with no freeze, sometimes it freezes in 10 minutes, one hour, and so on.
> > 
> 
> Ok.
> 
> This patch https://patchwork.freedesktop.org/series/64792/ might help: it
> won't fix any issue, but when a timeout is detected it should allow the soft
> recovery of the GPU.
> 
> Other things worth trying: setting AMD_DEBUG environment variables. I'd
> suggest:
> 
>    AMD_DEBUG=zerovram,nodma,nodpbb
> 
> There are others (see mesa/src/gallium/drivers/radeonsi/si_pipe.c) to try if
> these don't help.

Thank you.

I will first try to reintroduce the kernel parameters I previously used. Do you
think those can help at all?

CPU
rcu_nocbs=0-15 (adjust to the number of cores of your cpu)
idle=nomwait
processor.max_cstate=5
pcie_aspm=off 

GPU
amdgpu.dc=1
amdgpu.vm_update_mode=0
amdgpu.dpm=-1
amdgpu.ppfeaturemask=0xffffffff
amdgpu.vm_fault_stop=2
amdgpu.vm_debug=1
amdgpu.gpu_recovery=0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2566 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (84 preceding siblings ...)
  2019-08-12 14:10 ` bugzilla-daemon
@ 2019-08-13 15:59 ` bugzilla-daemon
  2019-08-13 16:19 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-13 15:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 982 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #86 from Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> ---
(In reply to Mauro Gaspari from comment #85)
> I will first try to reintroduce the kernel parameters I previously used.
> Do you think those can help at all?
> [...]
> GPU
> amdgpu.dc=1

Not needed: dc will be automatically enabled on recent GPU

> amdgpu.vm_update_mode=0

Shouldn't be needed since it should be the default value. 

> amdgpu.dpm=-1

Not needed: this is the default value

> amdgpu.ppfeaturemask=0xffffffff

The only difference with the default value is that you're enabling Overdrive.
I'd suggest to keep the default parameter here.

> amdgpu.vm_fault_stop=2

I think this one isn't helpful (it's a debugging tool)

> amdgpu.vm_debug=1

This one can help.

> amdgpu.gpu_recovery=0

No opinion on this one :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2059 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (85 preceding siblings ...)
  2019-08-13 15:59 ` bugzilla-daemon
@ 2019-08-13 16:19 ` bugzilla-daemon
  2019-08-30 19:01 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-13 16:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1400 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #87 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Pierre-Eric Pelloux-Prayer from comment #86)
> (In reply to Mauro Gaspari from comment #85)
> > I will first try to reintroduce the kernel parameters I previously used.
> > Do you think those can help at all?
> > [...]
> > GPU
> > amdgpu.dc=1
> 
> Not needed: dc will be automatically enabled on recent GPU
> 
> > amdgpu.vm_update_mode=0
> 
> Shouldn't be needed since it should be the default value. 
> 
> > amdgpu.dpm=-1
> 
> Not needed: this is the default value
> 
> > amdgpu.ppfeaturemask=0xffffffff
> 
> The only difference with the default value is that you're enabling Overdrive.
> I'd suggest to keep the default parameter here.
> 
> > amdgpu.vm_fault_stop=2
> 
> I think this one isn't helpful (it's a debugging tool)
> 
> > amdgpu.vm_debug=1
> 
> This one can help.
> 
> > amdgpu.gpu_recovery=0
> 
> No opinion on this one :)

Thank you!

I am currently testing on ubuntu budgie with valve-released Mesa-ACO and so
far, I am having no freezes nor crashes. Couple of days without incidents. But
as I posted previously, it is all a bit random so I think I will need to use
this for at least a week. 

I will report back soon with my findings.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (86 preceding siblings ...)
  2019-08-13 16:19 ` bugzilla-daemon
@ 2019-08-30 19:01 ` bugzilla-daemon
  2019-08-31  1:00 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-30 19:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 512 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #88 from Sam <samueldgv@mailbox.org> ---
I have recently started to get even more frequent freezes even on Vulkan now on
kernel 5.2.10

The workaround of the power profile still works (for me) and is the only way to
avoid them:

# echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
# echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1289 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (87 preceding siblings ...)
  2019-08-30 19:01 ` bugzilla-daemon
@ 2019-08-31  1:00 ` bugzilla-daemon
  2019-08-31  5:21 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-31  1:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 253 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #89 from Jaap Buurman <jaapbuurman@gmail.com> ---
Freezes are getting way more frequent for me as well :(

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1033 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (88 preceding siblings ...)
  2019-08-31  1:00 ` bugzilla-daemon
@ 2019-08-31  5:21 ` bugzilla-daemon
  2019-08-31 22:38 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-31  5:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1467 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #90 from Mauro Gaspari <ilvipero@gmx.com> ---
@Sam and @Jaap Buurman

Can you please help and post system info regarding your crash? I hope that with
more detailed reports, we can get better help.

Example:

OS Info can be taken from neofetch:
System info:
OS: openSUSE Tumbleweed
Kernel: 5.2.10-1-default
Resolution: 3440x1440
CPU: AMD Ryzen 7 2700X (16) @ 3.700GHz
GPU: AMD ATI Radeon VII 
Memory: 6308MiB / 64387MiB 


Mesa info can be taken from this command:
glxinfo | grep "OpenGL version" 
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.5


Game being played: Eve Online
Native or Wine or Wine+DXVK: Wine+DXVK Directx11


Crash type: Game crash? Full System freeze? System freeze but still can drop to
tty?



DMESG output after the crash:
sudo dmesg | grep amdgpu



systemd logs output after the crash (If your system did not freeze and you can
get it before reboot):
sudo journalctl -b | grep amdgpu


systemd logs output after the crash (If your system froze and you get logs
after reboot):
sudo journalctl -b -1 | grep amdgpu

If your distribution does not use persistent systemd logs you can change it
according to your distribution. Example for openSUSE:
https://www.suse.com/documentation/sles-12/book_sle_admin/data/journalctl_persistent.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2369 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (89 preceding siblings ...)
  2019-08-31  5:21 ` bugzilla-daemon
@ 2019-08-31 22:38 ` bugzilla-daemon
  2019-09-01 22:49 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-08-31 22:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 260 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #91 from Wilko Bartels <me@jasondaigo.de> ---
how big are your swap partitions guys? just toying around here :-)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1036 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (90 preceding siblings ...)
  2019-08-31 22:38 ` bugzilla-daemon
@ 2019-09-01 22:49 ` bugzilla-daemon
  2019-09-02  7:48 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-01 22:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #92 from Jaap Buurman <jaapbuurman@gmail.com> ---
(In reply to Mauro Gaspari from comment #90)
> @Sam and @Jaap Buurman
> 
> Can you please help and post system info regarding your crash? I hope that
> with more detailed reports, we can get better help.

OS: Arch Linux x86_64 
                `+oooo:                  Host: AB350-Gaming 3 
               `+oooooo:                 Kernel: 5.2.11-arch1-1-ARCH 
               -+oooooo+:                Uptime: 1 min 
             `/:-:++oooo+:               Packages: 1229 (pacman) 
            `/++++/+++++++:              Shell: bash 5.0.9 
           `/++++++++++++++:             Terminal: /dev/pts/0 
          `/+++ooooooooooooo/`           CPU: AMD Ryzen 7 1800X (16) @ 3.600GHz 
         ./ooosssso++osssssso+`          GPU: AMD ATI Radeon RX Vega 56/64 
        .oossssso-````/ossssss+`         Memory: 1178MiB / 48304MiB 



> Mesa info can be taken from this command:
> glxinfo | grep "OpenGL version" 

[jaap@Jaap-Desktop ~]$ glxinfo | grep "OpenGL version"
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.3.0-devel
(git-db73bde35c)

I am running this version because I was trying out the mesa-aco from the AUR. I
experienced the same crashes with the regular mesa drivers from Arch's official
repositories.

> Game being played: 

World of Warcraft: Classic Wine/DXVK 1.3.2

> Crash type: Game crash? Full System freeze? System freeze but still can drop
> to tty?

GPU doesn't successfully reset. Cannot drop to a different tty. However, I am
able to access logs via SSH. Full dmesg log: https://pastebin.com/E2071wHF

> DMESG output after the crash:
> sudo dmesg | grep amdgpu

https://pastebin.com/2kWpeP1y

> systemd logs output after the crash (If your system did not freeze and you
> can get it before reboot):
> sudo journalctl -b | grep amdgpu

https://pastebin.com/4e1PkJ39

> systemd logs output after the crash (If your system froze and you get logs
> after reboot):
> sudo journalctl -b -1 | grep amdgpu

https://pastebin.com/4mqXNsNQ



Hopefully this information is detailed enough to assist in tracking down the
root cause of the issue!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3612 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (91 preceding siblings ...)
  2019-09-01 22:49 ` bugzilla-daemon
@ 2019-09-02  7:48 ` bugzilla-daemon
  2019-09-02 10:07 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-02  7:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 585 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #93 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Wilko Bartels from comment #91)
> how big are your swap partitions guys? just toying around here :-)

also i wanna know if anyone else on arch tested the amdgpu-pro yet?
i played only 3 hours now. we all know that doesnt mean anything :-)
but fingers crossed.
i also have no idea how to confirm its even used. the kernel module showing
amdgpu in both circumstances right?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1433 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (92 preceding siblings ...)
  2019-09-02  7:48 ` bugzilla-daemon
@ 2019-09-02 10:07 ` bugzilla-daemon
  2019-09-04 20:41 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-02 10:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1375 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #94 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Wilko Bartels from comment #93)
> (In reply to Wilko Bartels from comment #91)
> > how big are your swap partitions guys? just toying around here :-)
> 
> also i wanna know if anyone else on arch tested the amdgpu-pro yet?
> i played only 3 hours now. we all know that doesnt mean anything :-)
> but fingers crossed.
> i also have no idea how to confirm its even used. the kernel module showing
> amdgpu in both circumstances right?

Hello,
I am testing on multiple distributions with different mesa drivers. Swap size
is 2GB to 8GB depending on the distro. Having 64GB RAM, my swap is constantly
empty.
So far the best performance I have is on ubuntu budgie 18.04 with MESA-ACO
released by Valve. I had no crashes in quite some time. But I did not have much
time to play lately, so I need more time to test.

Regarding AMDGPU-PRO, I tested on ubuntu a very long time ago, and it was quite
bad. But I think it makes sense to test and compare. I will install another
ubuntu budgie 18.04 on a separate SSD and use it with AMDGPU-PRO. and see if
the same issues are shared with AMDGPU, or not.

Thanks, and let me know how AMDGPU-PRO works on arch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2288 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (93 preceding siblings ...)
  2019-09-02 10:07 ` bugzilla-daemon
@ 2019-09-04 20:41 ` bugzilla-daemon
  2019-09-07  3:48 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-04 20:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5664 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #95 from koala_man <spam@vidarholen.net> ---
I am also seeing this issue on my stock Ubuntu. 

>OS Info can be taken from neofetch
OS: Ubuntu 19.04 x86_64
Host: All Series
Kernel: 5.0.0-27-generic
Uptime: 8 mins
Packages: 2671 (dpkg), 6 (flatpak), 10 (snap)
Shell: bash 5.0.3
Terminal: /dev/pts/1
CPU: Intel i5-4690 (4) @ 3.900GHz
GPU: Intel HD Graphics
GPU: AMD ATI Radeon RX Vega 64
Memory: 861MiB / 23976MiB

> glxinfo | grep "OpenGL version" 
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.8

>Game being played
glxgears in a window, no other applications running

>Native or Wine or Wine+DXVK
Native

> Crash type: 
X crashed with colorful pattern, stopped responding to Ctrl-Alt-Fx. `ssh` still
works. X server does not accept new commands, e.g. `DISPLAY=:0 glxgears`

>sudo dmesg | grep amdgpu
[    2.328917] [drm] amdgpu kernel modesetting enabled.
[    2.331916] fb0: switching to amdgpudrmfb from EFI VGA
[    2.333325] amdgpu 0000:03:00.0: No more image in the PCI ROM
[    2.333400] amdgpu 0000:03:00.0: VRAM: 8176M 0x000000F400000000 -
0x000000F5FEFFFFFF (8176M used)
[    2.333401] amdgpu 0000:03:00.0: GART: 512M 0x0000000000000000 -
0x000000001FFFFFFF
[    2.333403] amdgpu 0000:03:00.0: AGP: 267419648M 0x000000F800000000 -
0x0000FFFFFFFFFFFF
[    2.333866] [drm] amdgpu: 8176M of VRAM memory ready
[    2.333870] [drm] amdgpu: 8176M of GTT memory ready.
[    2.871622] fbcon: amdgpudrmfb (fb0) is primary device
[    2.929315] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device
[    2.944233] amdgpu 0000:03:00.0: ring gfx uses VM inv eng 0 on hub 0
[    2.944249] amdgpu 0000:03:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    2.944264] amdgpu 0000:03:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    2.944279] amdgpu 0000:03:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    2.944294] amdgpu 0000:03:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    2.944308] amdgpu 0000:03:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    2.944323] amdgpu 0000:03:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    2.944338] amdgpu 0000:03:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    2.944353] amdgpu 0000:03:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    2.944368] amdgpu 0000:03:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    2.944382] amdgpu 0000:03:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[    2.944396] amdgpu 0000:03:00.0: ring page0 uses VM inv eng 1 on hub 1
[    2.944410] amdgpu 0000:03:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[    2.944424] amdgpu 0000:03:00.0: ring page1 uses VM inv eng 5 on hub 1
[    2.944438] amdgpu 0000:03:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[    2.944452] amdgpu 0000:03:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    2.944467] amdgpu 0000:03:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    2.944482] amdgpu 0000:03:00.0: ring vce0 uses VM inv eng 9 on hub 1
[    2.944496] amdgpu 0000:03:00.0: ring vce1 uses VM inv eng 10 on hub 1
[    2.944510] amdgpu 0000:03:00.0: ring vce2 uses VM inv eng 11 on hub 1
[    2.945073] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:03:00.0 on
minor 1
[  288.676190] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=72560, emitted seq=72562
[  288.676350] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process glxgears pid 2963 thread glxgears:cs0 pid 2964
[  288.676358] amdgpu 0000:03:00.0: GPU reset begin!
[  288.759763] amdgpu 0000:03:00.0: GPU reset
[  289.208563] RIP: 0010:amdgpu_cs_ioctl+0xaa3/0x1320 [amdgpu]
[  289.208604]  ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[  289.208647]  ? amdgpu_cs_find_mapping+0x120/0x120 [amdgpu]
[  289.208673]  amdgpu_drm_ioctl+0x4f/0x80 [amdgpu]
[  289.208690] Modules linked in: aufs overlay cmac bnep binfmt_misc
nls_iso8859_1 snd_hda_codec_ca0132 snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi ledtrig_audio snd_hda_intel snd_hda_codec snd_hda_core
snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi btusb input_leds
btrtl btbcm btintel bluetooth eeepc_wmi asus_wmi snd_seq ecdh_generic
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp sparse_keymap
kvm_intel intel_cstate intel_rapl_perf snd_seq_device snd_timer wmi_bmof snd
soundcore mei_me mei tpm_infineon mac_hid acpi_pad sch_fq_codel parport_pc
ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg hid_generic
usbhid hid dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
aesni_intel amdgpu i915 kvmgt vfio_mdev mdev chash aes_x86_64 amd_iommu_v2
crypto_simd vfio_iommu_type1 gpu_sched cryptd glue_helper ttm vfio ahci libahci
i2c_i801 kvm mxm_wmi lpc_ich irqbypass i2c_algo_bit pata_acpi e1000e
drm_kms_helper syscopyarea sysfillrect
[  289.208743] RIP: 0010:amdgpu_cs_ioctl+0xaa3/0x1320 [amdgpu]
[  289.395715] amdgpu 0000:03:00.0: GPU reset succeeded, trying to resume
[  289.395813] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  289.969158] amdgpu 0000:03:00.0: GPU reset(2) succeeded!
[  289.969333] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[  289.969519] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!


>sudo journalctl -b | grep amdgpu

Same as dmesg output (after dropping timestamps), verified by vimdiff.

>Other

No swap, 144hz monitor, GPU was very hot to the touch considering it had only
run glxgears @ 144 fps for 5 minutes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6705 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (94 preceding siblings ...)
  2019-09-04 20:41 ` bugzilla-daemon
@ 2019-09-07  3:48 ` bugzilla-daemon
  2019-09-07  3:50 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-07  3:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 31419 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #96 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to Mauro Gaspari from comment #90)

I am experiencing periodic lockups with various games, including Hearts of Iron
IV, BATTLETECH, and Stellaris all being played through Steam.  Below is the
most recent crash from playing less than 5 minutes of Hearts of Iron IV.



> 
> OS Info can be taken from neofetch:
> System info:

           /:-------------:\          
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.11-200.fc30.x86_64+debug 
  :-----------sMMMMNMNMP.    ---:    Uptime: 11 mins 
 :-----------:MMMdP-------    ---\   Packages: 2198 (rpm), 27 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 1666MiB / 32045MiB 

> 
> Mesa info can be taken from this command:
> glxinfo | grep "OpenGL version" 

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.5

> 
> Game being played: 

Hearts of Iron IV through Steam for Linux

> Native or Wine or Wine+DXVK:

Native

> 
> Crash type: Game crash? Full System freeze? System freeze but still can drop
> to tty?

Screen goes black suddenly while music continues plays for less than a minute;
music begins to loop; and computer reboots.

> 
> DMESG output after the crash:
> sudo dmesg | grep amdgpu

Here is the pertinent part dmesg with kernel debugging turned on.  Some of the
information the crash would not be captured by grepping amdgpu.  Entire dmesg
provided as an attachment.

[46957.810300] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[46962.941366] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2446766, emitted seq=2446767
[46962.941453] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process hoi4 pid 24014 thread hoi4:cs0 pid 24015
[46962.941459] amdgpu 0000:06:00.0: GPU reset begin!

[46962.942698] ======================================================
[46962.942700] WARNING: possible circular locking dependency detected
[46962.942702] 5.2.11-200.fc30.x86_64+debug #1 Not tainted
[46962.942704] ------------------------------------------------------
[46962.942705] kworker/3:0/20416 is trying to acquire lock:
[46962.942708] 00000000a4a3593f (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[46962.942717] 
               but task is already holding lock:
[46962.942718] 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.942724] 
               which lock already depends on the new lock.

[46962.942725] 
               the existing dependency chain (in reverse order) is:
[46962.942727] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[46962.942735]        _raw_spin_lock_irqsave+0x49/0x83
[46962.942738]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[46962.942741]        dma_fence_signal+0x111/0x1a0
[46962.942794]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[46962.942858]        sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
[46962.942918]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[46962.942978]        amdgpu_ih_process+0x8d/0x110 [amdgpu]
[46962.943038]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[46962.943043]        __handle_irq_event_percpu+0x3f/0x290
[46962.943046]        handle_irq_event_percpu+0x31/0x80
[46962.943048]        handle_irq_event+0x34/0x51
[46962.943053]        handle_edge_irq+0x83/0x1a0
[46962.943057]        handle_irq+0x1c/0x30
[46962.943059]        do_IRQ+0x61/0x120
[46962.943063]        ret_from_intr+0x0/0x22
[46962.943067]        cpuidle_enter_state+0xc9/0x450
[46962.943069]        cpuidle_enter+0x29/0x40
[46962.943074]        do_idle+0x1ec/0x280
[46962.943076]        cpu_startup_entry+0x19/0x20
[46962.943079]        start_secondary+0x189/0x1e0
[46962.943083]        secondary_startup_64+0xa4/0xb0
[46962.943087] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[46962.943095]        lock_acquire+0xa2/0x1b0
[46962.943105]        _raw_spin_lock_irqsave+0x49/0x83
[46962.943109]        dma_fence_remove_callback+0x1a/0x60
[46962.943114]        drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943225]        amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943338]        amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943413]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943418]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943421]        process_one_work+0x272/0x5e0
[46962.943423]        worker_thread+0x50/0x3b0
[46962.943427]        kthread+0x108/0x140
[46962.943431]        ret_from_fork+0x3a/0x50
[46962.943432] 
               other info that might help us debug this:

[46962.943435]  Possible unsafe locking scenario:

[46962.943437]        CPU0                    CPU1
[46962.943438]        ----                    ----
[46962.943439]   lock(&(&sched->job_list_lock)->rlock);
[46962.943441]                               
lock(&(&ring->fence_drv.lock)->rlock);
[46962.943443]                               
lock(&(&sched->job_list_lock)->rlock);
[46962.943445]   lock(&(&ring->fence_drv.lock)->rlock);
[46962.943447] 
                *** DEADLOCK ***

[46962.943449] 5 locks held by kworker/3:0/20416:
[46962.943450]  #0: 0000000043c92b99 ((wq_completion)events){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943456]  #1: 000000000c360f0c
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5e0
[46962.943459]  #2: 000000007a135814 (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
[46962.943543]  #3: 00000000e83f7d6b (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[46962.943614]  #4: 00000000d45cbf2b (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x130 [gpu_sched]
[46962.943620] 
               stack backtrace:
[46962.943629] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46962.943631] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46962.943636] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46962.943638] Call Trace:
[46962.943648]  dump_stack+0x85/0xc0
[46962.943654]  print_circular_bug.cold+0x15c/0x195
[46962.943658]  __lock_acquire+0x167c/0x1c90
[46962.943664]  lock_acquire+0xa2/0x1b0
[46962.943668]  ? dma_fence_remove_callback+0x1a/0x60
[46962.943674]  _raw_spin_lock_irqsave+0x49/0x83
[46962.943677]  ? dma_fence_remove_callback+0x1a/0x60
[46962.943680]  dma_fence_remove_callback+0x1a/0x60
[46962.943684]  drm_sched_stop+0x59/0x130 [gpu_sched]
[46962.943764]  amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[46962.943847]  amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46962.943923]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46962.943930]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46962.943934]  process_one_work+0x272/0x5e0
[46962.943938]  worker_thread+0x50/0x3b0
[46962.943942]  kthread+0x108/0x140
[46962.943945]  ? process_one_work+0x5e0/0x5e0
[46962.943948]  ? kthread_park+0x80/0x80
[46962.943952]  ret_from_fork+0x3a/0x50
[46962.961034] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[46962.961044] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[46962.961048] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[46962.961051] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[46962.961149] pcieport 0000:00:03.0: AER: Device recovery failed
[46963.955209] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=95391072, emitted seq=95391072
[46963.955328] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[46963.955336] amdgpu 0000:06:00.0: GPU reset begin!
[46968.050083] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46973.170223] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[46983.410080] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[46993.650098] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[46993.962192] amdgpu: [powerplay] No response from smu
[46993.962195] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46994.277773] amdgpu: [powerplay] No response from smu
[46994.593416] amdgpu: [powerplay] No response from smu
[46994.593420] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[46994.908354] amdgpu: [powerplay] No response from smu
[46995.223718] amdgpu: [powerplay] No response from smu
[46995.223722] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[46995.286504] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[46995.286506] ------------[ cut here ]------------
[46995.286605] WARNING: CPU: 3 PID: 20416 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:329
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286606] Modules linked in: vhost_net vhost tap rfcomm xt_CHECKSUM
xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast
xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4
xt_conntrack ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set
nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
ip_tables bnep nct6775 hwmon_vid intel_rapl vfat fat arc4 x86_pkg_temp_thermal
intel_powerclamp coretemp fuse kvm_intel kvm iwlmvm irqbypass iTCO_wdt
iTCO_vendor_support mac80211 crct10dif_pclmul crc32_pclmul
snd_hda_codec_realtek ghash_clmulni_intel intel_cstate snd_hda_codec_generic
iwlwifi snd_hda_codec_hdmi ledtrig_audio intel_uncore snd_hda_intel
intel_rapl_perf cfg80211 snd_hda_codec btusb mxm_wmi snd_hda_core btrtl btbcm
snd_hwdep btintel snd_seq i2c_i801 lpc_ich bluetooth
[46995.286626]  snd_seq_device joydev snd_pcm ecdh_generic snd_timer rfkill ecc
mei_me snd mei soundcore pcc_cpufreq binfmt_misc auth_rpcgss sunrpc amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper crc32c_intel igb uas drm usb_storage
dca mpt3sas i2c_algo_bit e1000e nvme raid_class nvme_core scsi_transport_sas
wmi
[46995.286638] CPU: 3 PID: 20416 Comm: kworker/3:0 Not tainted
5.2.11-200.fc30.x86_64+debug #1
[46995.286639] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[46995.286643] Workqueue: events drm_sched_job_timedout [gpu_sched]
[46995.286682] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[46995.286684] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 78 93 80 c0 e8 45 fd a0
ca 83 7b 20 01 0f 84 27 11 fe ff 48 c7 c7 70 92 80 c0 e8 2f fd a0 ca <0f> 0b e9
14 11 fe ff 48 c7 c7 70 92 80 c0 89 54 24 04 e8 18 fd a0
[46995.286685] RSP: 0018:ffff9cd009b3f728 EFLAGS: 00010246
[46995.286687] RAX: 0000000000000024 RBX: ffff8ada6be8a780 RCX:
0000000000000006
[46995.286688] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8ada7ebd9c80
[46995.286689] RBP: 000000000000000a R08: 0000000000000001 R09:
0000000000000000
[46995.286690] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[46995.286691] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[46995.286692] FS:  0000000000000000(0000) GS:ffff8ada7ea00000(0000)
knlGS:0000000000000000
[46995.286694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[46995.286695] CR2: 0000085777c78000 CR3: 00000003cb612005 CR4:
00000000003606e0
[46995.286696] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[46995.286697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[46995.286698] Call Trace:
[46995.286741]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[46995.286780]  dce110_reset_hw_ctx_wrap+0x14a/0x1e0 [amdgpu]
[46995.286819]  dce110_apply_ctx_to_hw+0x4a/0x490 [amdgpu]
[46995.286843]  ? amdgpu_pm_compute_clocks.part.0+0xcb/0x610 [amdgpu]
[46995.286882]  ? dm_pp_apply_display_requirements+0x19e/0x1c0 [amdgpu]
[46995.286920]  dc_commit_state+0x262/0x580 [amdgpu]
[46995.286925]  ? vsnprintf+0x3aa/0x4f0
[46995.286965]  amdgpu_dm_atomic_commit_tail+0xc34/0x1970 [amdgpu]
[46995.286971]  ? console_unlock+0x363/0x5d0
[46995.286976]  ? __irq_work_queue_local+0x50/0x60
[46995.286977]  ? irq_work_queue+0x4d/0x60
[46995.286979]  ? wake_up_klogd+0x37/0x40
[46995.286984]  ? wait_for_completion_timeout+0x4c/0x190
[46995.286987]  ? _raw_spin_unlock_irq+0x29/0x40
[46995.286989]  ? wait_for_completion_timeout+0x75/0x190
[46995.287016]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287021]  commit_tail+0x3c/0x70 [drm_kms_helper]
[46995.287026]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[46995.287031]  drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[46995.287035]  drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[46995.287076]  dm_suspend+0x20/0x60 [amdgpu]
[46995.287098]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[46995.287123]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[46995.287164]  amdgpu_device_pre_asic_reset+0x1f7/0x20c [amdgpu]
[46995.287204]  amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[46995.287242]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[46995.287246]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[46995.287249]  process_one_work+0x272/0x5e0
[46995.287252]  worker_thread+0x50/0x3b0
[46995.287256]  kthread+0x108/0x140
[46995.287258]  ? process_one_work+0x5e0/0x5e0
[46995.287260]  ? kthread_park+0x80/0x80
[46995.287263]  ret_from_fork+0x3a/0x50
[46995.287267] irq event stamp: 6288284
[46995.287269] hardirqs last  enabled at (6288283): [<ffffffff8bb04d8b>]
_raw_spin_unlock_irqrestore+0x4b/0x60
[46995.287271] hardirqs last disabled at (6288284): [<ffffffff8bb05533>]
_raw_spin_lock_irqsave+0x23/0x83
[46995.287273] softirqs last  enabled at (6288276): [<ffffffff8be0035d>]
__do_softirq+0x35d/0x468
[46995.287276] softirqs last disabled at (6288269): [<ffffffff8b0f07a2>]
irq_exit+0x102/0x110
[46995.287277] ---[ end trace 6a2158c4cfef5172 ]---
[46995.603082] amdgpu: [powerplay] No response from smu
[46995.918767] amdgpu: [powerplay] No response from smu
[46995.918770] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[46996.233769] amdgpu: [powerplay] No response from smu
[46996.549255] amdgpu: [powerplay] No response from smu
[46996.549258] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[46996.865320] amdgpu: [powerplay] No response from smu
[46997.181203] amdgpu: [powerplay] No response from smu
[46997.181206] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[46997.495804] amdgpu: [powerplay] No response from smu
[46997.811227] amdgpu: [powerplay] No response from smu
[46997.811231] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[46998.126794] amdgpu: [powerplay] No response from smu
[46998.442559] amdgpu: [powerplay] No response from smu
[46998.442561] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[46998.756884] amdgpu: [powerplay] No response from smu
[46999.072680] amdgpu: [powerplay] No response from smu
[46999.072684] amdgpu: [powerplay] Failed message: 0x4, input parameter: 0x400,
error code: 0x0
[46999.388310] amdgpu: [powerplay] No response from smu
[46999.704067] amdgpu: [powerplay] No response from smu
[46999.704069] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[47000.019626] amdgpu: [powerplay] No response from smu
[47000.334247] amdgpu: [powerplay] No response from smu
[47000.334251] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[47000.350026] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.350043] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.350052] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.350061] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.350202] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.367437] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.367443] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.367444] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.367446] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.367486] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.384977] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.384982] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.384983] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.384985] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.385055] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.402521] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.402530] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.402532] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.402535] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.402578] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.420068] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.420079] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.420085] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.420090] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.420186] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.437608] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.437617] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.437621] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.437625] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.437726] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.455143] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.455151] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.455154] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.455157] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.455209] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.472688] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.472698] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.472703] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.472708] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.472826] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.490225] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[47000.490232] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.490236] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.490239] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.490289] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.507760] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[47000.735787] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[47000.735791] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[47000.735793] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[47000.735824] pcieport 0000:00:03.0: AER: Device recovery failed
[47000.735826] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0


> systemd logs output after the crash (If your system froze and you get logs
> after reboot):

Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:58 ezra.blanchardmorris.net kernel: Kernel command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:36:59 ezra.blanchardmorris.net dracut-cmdline[361]: Using kernel
command line parameters:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu kernel
modesetting enabled.
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 5: 0xfb600000 -> 0xfb67ffff
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: fb0: switching to amdgpudrmfb
from EFI VGA
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: vgaarb:
deactivate vga console
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: No more
image in the PCI ROM
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: VRAM:
8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GART:
512M 0x0000000000000000 - 0x000000001FFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: AGP:
267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of VRAM
memory ready
Sep 06 08:37:00 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of GTT
memory ready.
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: fbcon: amdgpudrmfb (fb0) is
primary device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: fb0:
amdgpudrmfb frame buffer device
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring gfx
uses VM inv eng 0 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma0 uses VM inv eng 0 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page0 uses VM inv eng 1 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma1 uses VM inv eng 4 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page1 uses VM inv eng 5 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_0 uses VM inv eng 6 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.0 uses VM inv eng 7 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.1 uses VM inv eng 8 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce0
uses VM inv eng 9 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce1
uses VM inv eng 10 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce2
uses VM inv eng 11 on hub 1
Sep 06 08:37:01 ezra.blanchardmorris.net kernel: [drm] Initialized amdgpu
3.32.0 20150101 for 0000:06:00.0 on minor 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:
Kernel command line: BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.11-200.fc30.x86_64+debug
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:     
   loading driver: amdgpu
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (==)
Matched amdgpu as autoconfigured driver 0
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
LoadModule: "amdgpu"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]: (II)
Module amdgpu: vendor="X.Org Foundation"
Sep 06 08:37:48 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1969]:     
   All GPUs supported by the amdgpu kernel driver
Sep 06 16:13:18 ezra.blanchardmorris.net net.lutris.Lutris.desktop[2234]:
2019-09-06 16:13:18,530: GPU: 1002:687F 1002:0B36 using amdgpu drivers
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2446766, emitted seq=2446767
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process hoi4 pid 24014 thread hoi4:cs0
pid 24015
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_fence_process+0xa3/0x100 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_ih_process+0x8d/0x110 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_irq_handler+0x1b/0x50 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:       
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:  #2: 000000007a135814
(&adev->lock_reset){+.+.}, at: amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel:  #3: 00000000e83f7d6b
(&dqm->lock_hidden){+.+.}, at: kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
Sep 06 21:39:39 ezra.blanchardmorris.net kernel: 
amdgpu_job_timedout+0x109/0x130 [amdgpu]
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=95391072, emitted
seq=95391072
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 06 21:39:40 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 06 21:39:49 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 06 21:40:10 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x42, input parameter: 0x1, error code: 0x0
Sep 06 21:40:11 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu

I will try to run apitrace on Hearts of Iron IV to try to capture more
information.  Please let me know if I can be of further assistance in squashing
this annoying bug, like providing crash information with the mesa debug
packages installed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 32736 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (95 preceding siblings ...)
  2019-09-07  3:48 ` bugzilla-daemon
@ 2019-09-07  3:50 ` bugzilla-daemon
  2019-09-12 20:08 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-07  3:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 406 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #97 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145290
  --> https://bugs.freedesktop.org/attachment.cgi?id=145290&action=edit
dmesg for crash

dmesg from crash while playing Hearts of Iron IV using Steam.  Related to
comment #96.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1351 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (96 preceding siblings ...)
  2019-09-07  3:50 ` bugzilla-daemon
@ 2019-09-12 20:08 ` bugzilla-daemon
  2019-09-15  1:16 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-12 20:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 522 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #98 from koala_man <spam@vidarholen.net> ---
(In reply to koala_man from comment #95)
> I am also seeing this issue on my stock Ubuntu. 

In my case it appears to have been faulty hardware. I tried it on Windows 10
with the latest drivers and still got crashes and reboots. Performance
throttling did not help. I swapped out the GPU and haven't seen any crashes
since.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1369 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (97 preceding siblings ...)
  2019-09-12 20:08 ` bugzilla-daemon
@ 2019-09-15  1:16 ` bugzilla-daemon
  2019-09-15  1:20 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-15  1:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 569 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #99 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145366
  --> https://bugs.freedesktop.org/attachment.cgi?id=145366&action=edit
apitrace of Hearts of Iron IV hard lock

Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
from this trace will hard lock the computer, though inconsistently.  I've
replayed the trace three times. The replay hard locked computer one time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1521 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (98 preceding siblings ...)
  2019-09-15  1:16 ` bugzilla-daemon
@ 2019-09-15  1:20 ` bugzilla-daemon
  2019-09-15  1:21 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-15  1:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1819 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #100 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to Rodney A Morris from comment #99)
> Created attachment 145366 [details]
> apitrace of Hearts of Iron IV hard lock
> 
> Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
> from this trace will hard lock the computer, though inconsistently.  I've
> replayed the trace three times. The replay hard locked computer one time.

neofetch from hardlock:

          /:-------------:\          
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
 :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 2478MiB / 32084MiB 

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6

Note:  hard lock replayed occurred when the Discord flatpak is also running.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2926 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (99 preceding siblings ...)
  2019-09-15  1:20 ` bugzilla-daemon
@ 2019-09-15  1:21 ` bugzilla-daemon
  2019-09-15  4:35 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-15  1:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1819 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #101 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to Rodney A Morris from comment #99)
> Created attachment 145366 [details]
> apitrace of Hearts of Iron IV hard lock
> 
> Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
> from this trace will hard lock the computer, though inconsistently.  I've
> replayed the trace three times. The replay hard locked computer one time.

neofetch from hardlock:

          /:-------------:\          
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
 :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 2478MiB / 32084MiB 

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6

Note:  hard lock replayed occurred when the Discord flatpak is also running.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2926 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (100 preceding siblings ...)
  2019-09-15  1:21 ` bugzilla-daemon
@ 2019-09-15  4:35 ` bugzilla-daemon
  2019-09-21  2:05 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-15  4:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 23823 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #102 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145367
  --> https://bugs.freedesktop.org/attachment.cgi?id=145367&action=edit
Full dmesg from Stellaris crash

I had another crash and soft lockup tonight playing Stellaris through Steam. 
Unfortunately, while I had the mesa debuginfo packages installed, I did not
have the debug kernel installed.

          /:-------------:\          
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
 :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 2478MiB / 32084MiB 

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6

> Game being played: 


Stellaris through Steam for Linux.  Like other times Discord is running.

> Native or Wine or Wine+DXVK:


Native

> 
> Crash type: Game crash? Full System freeze? System freeze but still can drop
> to tty?


Screen goes black suddenly while music continues plays for less than a minute;
music begins to loop; and computer reboots.

> 
> DMESG output after the crash:
Below is the pertinent dmesg messages.  Full file attached.

[ 5292.563342] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[ 5297.683350] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=97861046, emitted seq=97861048
[ 5297.683465] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[ 5297.683470] amdgpu 0000:06:00.0: GPU reset begin!
[ 5297.693302] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=1321512, emitted seq=1321513
[ 5297.693406] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process stellaris pid 5624 thread stellaris:cs0 pid 5625
[ 5297.693409] amdgpu 0000:06:00.0: GPU reset begin!
[ 5297.709624] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5297.709631] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5297.709634] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5297.709637] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5297.709706] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5302.803236] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 5307.923355] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[ 5318.163235] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 5328.403235] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[ 5328.717149] amdgpu: [powerplay] No response from smu
[ 5328.717151] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 5329.031482] amdgpu: [powerplay] No response from smu
[ 5329.345845] amdgpu: [powerplay] No response from smu
[ 5329.345847] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 5329.659470] amdgpu: [powerplay] No response from smu
[ 5329.973320] amdgpu: [powerplay] No response from smu
[ 5329.973322] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 5330.044255] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[ 5330.044255] ------------[ cut here ]------------
[ 5330.044355] WARNING: CPU: 9 PID: 7317 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:329
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 5330.044356] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge
stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat
ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables bnep nct6775 hwmon_vid
intel_rapl arc4 x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel vfat
fat kvm fuse irqbypass iwlmvm iTCO_wdt iTCO_vendor_support mac80211
crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek ghash_clmulni_intel
intel_cstate btusb iwlwifi snd_hda_codec_generic btrtl btbcm btintel
ledtrig_audio snd_hda_codec_hdmi intel_uncore bluetooth snd_hda_intel
intel_rapl_perf snd_hda_codec cfg80211 snd_hda_core snd_hwdep mxm_wmi i2c_i801
joydev snd_seq snd_seq_device xpad ecdh_generic
[ 5330.044372]  ff_memless snd_pcm rfkill ecc snd_timer mei_me snd mei
soundcore lpc_ich pcc_cpufreq auth_rpcgss binfmt_misc sunrpc amdgpu
amd_iommu_v2 gpu_sched ttm drm_kms_helper drm mpt3sas igb crc32c_intel e1000e
nvme raid_class nvme_core dca i2c_algo_bit scsi_transport_sas wmi uas
usb_storage
[ 5330.044380] CPU: 9 PID: 7317 Comm: kworker/9:0 Not tainted
5.2.13-200.fc30.x86_64 #1
[ 5330.044381] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[ 5330.044384] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 5330.044424] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 5330.044425] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 b8 e2 7b c0 e8 fb d4 a2
fc 83 7b 20 01 0f 84 8d 14 fe ff 48 c7 c7 28 e2 7b c0 e8 e5 d4 a2 fc <0f> 0b e9
7a 14 fe ff 48 c7 c7 28 e2 7b c0 89 54 24 04 e8 ce d4 a2
[ 5330.044426] RSP: 0000:ffffb980493f37b8 EFLAGS: 00010246
[ 5330.044426] RAX: 0000000000000024 RBX: ffff911f70720780 RCX:
0000000000000006
[ 5330.044427] RDX: 0000000000000000 RSI: 0000000000000086 RDI:
ffff911f7fa57900
[ 5330.044427] RBP: 000000000000000a R08: 0000000000000001 R09:
0000000000000737
[ 5330.044428] R10: 0000000000026ddc R11: 0000000000000003 R12:
00000000000035af
[ 5330.044428] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[ 5330.044429] FS:  0000000000000000(0000) GS:ffff911f7fa40000(0000)
knlGS:0000000000000000
[ 5330.044429] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5330.044430] CR2: 000006af3a9fb000 CR3: 00000007ab40a003 CR4:
00000000003606e0
[ 5330.044430] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 5330.044431] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 5330.044431] Call Trace:
[ 5330.044487]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[ 5330.044524]  dce110_reset_hw_ctx_wrap+0x14a/0x1e0 [amdgpu]
[ 5330.044562]  dce110_apply_ctx_to_hw+0x4a/0x490 [amdgpu]
[ 5330.044588]  ? amdgpu_pm_compute_clocks.part.0+0xcb/0x610 [amdgpu]
[ 5330.044590]  ? _cond_resched+0x15/0x30
[ 5330.044629]  ? dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[ 5330.044666]  dc_commit_state+0x27b/0x5c0 [amdgpu]
[ 5330.044669]  ? number+0x31c/0x360
[ 5330.044707]  amdgpu_dm_atomic_commit_tail+0xc15/0x1930 [amdgpu]
[ 5330.044710]  ? va_format.isra.0+0x6e/0xa0
[ 5330.044713]  ? sched_clock+0x5/0x10
[ 5330.044716]  ? sched_clock_cpu+0xc/0xa0
[ 5330.044719]  ? up+0x12/0x60
[ 5330.044721]  ? __irq_work_queue_local+0x50/0x60
[ 5330.044722]  ? irq_work_queue+0x46/0x50
[ 5330.044725]  ? wake_up_klogd+0x30/0x40
[ 5330.044726]  ? vprintk_emit+0x17c/0x260
[ 5330.044727]  ? printk+0x58/0x6f
[ 5330.044728]  ? __next_timer_interrupt+0xd0/0xd0
[ 5330.044736]  ? drm_atomic_helper_wait_for_dependencies+0x1e4/0x1f0
[drm_kms_helper]
[ 5330.044748]  ? drm_err+0x72/0x90 [drm]
[ 5330.044749]  ? _cond_resched+0x15/0x30
[ 5330.044750]  ? wait_for_completion_timeout+0x38/0x170
[ 5330.044754]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 5330.044791]  ? amdgpu_dm_atomic_check+0x6d0/0x6d0 [amdgpu]
[ 5330.044795]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 5330.044799]  drm_atomic_helper_commit+0x108/0x110 [drm_kms_helper]
[ 5330.044803]  drm_atomic_helper_disable_all+0x144/0x160 [drm_kms_helper]
[ 5330.044807]  drm_atomic_helper_suspend+0x60/0xf0 [drm_kms_helper]
[ 5330.044844]  dm_suspend+0x20/0x60 [amdgpu]
[ 5330.044867]  amdgpu_device_ip_suspend_phase1+0x8b/0xc0 [amdgpu]
[ 5330.044890]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 5330.044927]  amdgpu_device_pre_asic_reset+0x1f4/0x209 [amdgpu]
[ 5330.044965]  amdgpu_device_gpu_recover+0x77/0x785 [amdgpu]
[ 5330.044998]  amdgpu_job_timedout+0xf7/0x120 [amdgpu]
[ 5330.045000]  drm_sched_job_timedout+0x3a/0x70 [gpu_sched]
[ 5330.045003]  process_one_work+0x19d/0x380
[ 5330.045005]  worker_thread+0x50/0x3b0
[ 5330.045007]  kthread+0xfb/0x130
[ 5330.045008]  ? process_one_work+0x380/0x380
[ 5330.045009]  ? kthread_park+0x80/0x80
[ 5330.045010]  ret_from_fork+0x35/0x40
[ 5330.045012] ---[ end trace 7beee32e6101e37d ]---
[ 5330.358847] amdgpu: [powerplay] No response from smu
[ 5330.673262] amdgpu: [powerplay] No response from smu
[ 5330.673263] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[ 5330.987579] amdgpu: [powerplay] No response from smu
[ 5331.302073] amdgpu: [powerplay] No response from smu
[ 5331.302074] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[ 5331.616202] amdgpu: [powerplay] No response from smu
[ 5331.929678] amdgpu: [powerplay] No response from smu
[ 5331.929681] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[ 5332.243534] amdgpu: [powerplay] No response from smu
[ 5332.557383] amdgpu: [powerplay] No response from smu
[ 5332.557384] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[ 5332.871126] amdgpu: [powerplay] No response from smu
[ 5333.185009] amdgpu: [powerplay] No response from smu
[ 5333.185011] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 5333.498596] amdgpu: [powerplay] No response from smu
[ 5333.812147] amdgpu: [powerplay] No response from smu
[ 5333.812155] amdgpu: [powerplay] Failed message: 0x4, input parameter: 0x400,
error code: 0x0
[ 5334.126013] amdgpu: [powerplay] No response from smu
[ 5334.440194] amdgpu: [powerplay] No response from smu
[ 5334.440197] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 5334.753930] amdgpu: [powerplay] No response from smu
[ 5335.067603] amdgpu: [powerplay] No response from smu
[ 5335.067605] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 5335.083579] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.083589] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.083599] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.083603] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.083694] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.101028] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.101034] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.101036] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.101039] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.101085] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.118568] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.118573] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.118575] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.118577] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.118621] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.136108] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.136113] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.136116] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.136118] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.136189] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.153649] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.153654] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.153656] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.153658] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.153702] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.171189] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.171194] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.171196] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.171199] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.171242] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.188769] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.188774] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.188776] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.188778] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.188819] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.206263] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.206266] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.206267] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.206268] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.206286] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.223806] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 5335.223809] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.223811] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.223812] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.223837] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.241348] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[ 5335.469372] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 5335.469374] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 5335.469375] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 5335.469405] pcieport 0000:00:03.0: AER: Device recovery failed
[ 5335.469406] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0

> systemd logs output after the crash (If your system froze and you get logs
> after reboot):

Sep 14 20:52:48 ezra.blanchardmorris.net kernel: Command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.13-200.fc30.x86_64
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 14 20:52:48 ezra.blanchardmorris.net kernel: Kernel command line:
BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.13-200.fc30.x86_64
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 14 20:52:49 ezra.blanchardmorris.net dracut-cmdline[363]: Using kernel
command line parameters: BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.13-200.fc30.x86_64
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: [drm] amdgpu kernel
modesetting enabled.
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 0: 0xe0000000 -> 0xefffffff
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 2: 0xf0000000 -> 0xf01fffff
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0:
remove_conflicting_pci_framebuffers: bar 5: 0xfb600000 -> 0xfb67ffff
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: fb0: switching to amdgpudrmfb
from EFI VGA
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: vgaarb:
deactivate vga console
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: No more
image in the PCI ROM
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: VRAM:
8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GART:
512M 0x0000000000000000 - 0x000000001FFFFFFF
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: AGP:
267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of VRAM
memory ready
Sep 14 20:52:49 ezra.blanchardmorris.net kernel: [drm] amdgpu: 8176M of GTT
memory ready.
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: fbcon: amdgpudrmfb (fb0) is
primary device
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: fb0:
amdgpudrmfb frame buffer device
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring gfx
uses VM inv eng 0 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma0 uses VM inv eng 0 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page0 uses VM inv eng 1 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
sdma1 uses VM inv eng 4 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
page1 uses VM inv eng 5 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_0 uses VM inv eng 6 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.0 uses VM inv eng 7 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring
uvd_enc_0.1 uses VM inv eng 8 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce0
uses VM inv eng 9 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce1
uses VM inv eng 10 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: ring vce2
uses VM inv eng 11 on hub 1
Sep 14 20:52:50 ezra.blanchardmorris.net kernel: [drm] Initialized amdgpu
3.32.0 20150101 for 0000:06:00.0 on minor 0
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]:
Kernel command line: BOOT_IMAGE=(hd4,gpt6)/vmlinuz-5.2.13-200.fc30.x86_64
root=UUID=e7b8b34a-e17f-4c2b-b223-eaa636249d2d ro
resume=UUID=52cc8cd8-b06f-4613-8781-a105d0ebf44a rhgb quiet amdgpu.vm_debug=1
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]:     
   loading driver: amdgpu
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]: (==)
Matched amdgpu as autoconfigured driver 0
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]: (II)
LoadModule: "amdgpu"
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]: (II)
Loading /usr/lib64/xorg/modules/drivers/amdgpu_drv.so
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]: (II)
Module amdgpu: vendor="X.Org Foundation"
Sep 14 20:53:20 ezra.blanchardmorris.net /usr/libexec/gdm-x-session[1928]:     
   All GPUs supported by the amdgpu kernel driver
Sep 14 22:21:05 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=97861046, emitted
seq=97861048
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=1321512, emitted seq=1321513
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process stellaris pid 5624 thread
stellaris:cs0 pid 5625
Sep 14 22:21:05 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Sep 14 22:21:15 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Sep 14 22:21:36 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 14 22:21:36 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Sep 14 22:21:36 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Sep 14 22:21:37 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 24966 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (101 preceding siblings ...)
  2019-09-15  4:35 ` bugzilla-daemon
@ 2019-09-21  2:05 ` bugzilla-daemon
  2019-09-23  2:49 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-21  2:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2201 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #103 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to Rodney A Morris from comment #101)
> (In reply to Rodney A Morris from comment #99)
> > Created attachment 145366 [details]
> > apitrace of Hearts of Iron IV hard lock
> > 
> > Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
> > from this trace will hard lock the computer, though inconsistently.  I've
> > replayed the trace three times. The replay hard locked computer one time.
> 
> neofetch from hardlock:
> 
>           /:-------------:\          
>        :-------------------::        -------------------------------- 
>      :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
>    /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
>   :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
>  :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
> ,------------:MMMd--------    ---:   Shell: bash 5.0.7 
> :------------:MMMd-------    .---:   Resolution: 2560x1440 
> :----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
> :--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
> :-    -------:MMMd--------------:    WM Theme: Adwaita 
> :-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
> :-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
> :-- :dMNdhhdNMMNo------------;       Terminal: tilix 
> :---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
> :------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
> :---------------------://            Memory: 2478MiB / 32084MiB 
> 
> OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6
> 
> Note:  hard lock replayed occurred when the Discord flatpak is also running.

I also noticed some errors that pointed to discord in my logs. In my case
discord was installed via .deb package. 
Could you please try and disable hardware acceleration in discord settings -
appearance menu? Please let me know if it helps or changes anything. 
Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3437 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (102 preceding siblings ...)
  2019-09-21  2:05 ` bugzilla-daemon
@ 2019-09-23  2:49 ` bugzilla-daemon
  2019-09-23  3:06 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-23  2:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2654 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #104 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to Mauro Gaspari from comment #103)
> (In reply to Rodney A Morris from comment #101)
> > (In reply to Rodney A Morris from comment #99)
> > > Created attachment 145366 [details]
> > > apitrace of Hearts of Iron IV hard lock
> > > 
> > > Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
> > > from this trace will hard lock the computer, though inconsistently.  I've
> > > replayed the trace three times. The replay hard locked computer one time.
> > 
> > neofetch from hardlock:
> > 
> >           /:-------------:\          
> >        :-------------------::        -------------------------------- 
> >      :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
> >    /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
> >   :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
> >  :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
> > ,------------:MMMd--------    ---:   Shell: bash 5.0.7 
> > :------------:MMMd-------    .---:   Resolution: 2560x1440 
> > :----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
> > :--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
> > :-    -------:MMMd--------------:    WM Theme: Adwaita 
> > :-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
> > :-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
> > :-- :dMNdhhdNMMNo------------;       Terminal: tilix 
> > :---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
> > :------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
> > :---------------------://            Memory: 2478MiB / 32084MiB 
> > 
> > OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6
> > 
> > Note:  hard lock replayed occurred when the Discord flatpak is also running.
> 
> I also noticed some errors that pointed to discord in my logs. In my case
> discord was installed via .deb package. 
> Could you please try and disable hardware acceleration in discord settings -
> appearance menu? Please let me know if it helps or changes anything. 
> Thanks!

I have disabled hardware acceleration in discord settings to see if that
improves my experience and report back my results.  I am doubtful that it will
help much.  At least on the 5.2.11 kernel, I had lockups with or without
discord running.  Discord running just seemed to make the problem appear more
consistently.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4052 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (103 preceding siblings ...)
  2019-09-23  2:49 ` bugzilla-daemon
@ 2019-09-23  3:06 ` bugzilla-daemon
  2019-09-26 10:37 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-23  3:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 11052 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #105 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145462
  --> https://bugs.freedesktop.org/attachment.cgi?id=145462&action=edit
dmesg from Stellaris crash 2019-09-20

I had another lockup on Friday while playing Stellaris again.  This time I had
the debug kernel running and the mesa debug packages installed.  I do not plan
to post dmesg and journalctl dumps for future crashes unless the logs  indicate
a new problem, or I can obtain more information than I previously provided. 
Like the crash I reported for Hearts of Iron IV, this Stellaris crash seems to
be caused by a circular lock dependency.

If someone believes my problems are caused by faulty hardware, please let me
know.  As an FYI, this problem does not seem to manifest under Windows 10,
playing the same game.

Card:

Sapphire Radeon Vega 64

OS Info:

          /:-------------:\           
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.2.15-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 1 day, 22 hours, 37 mins 
 :-----------:MMMdP-------    ---\   Packages: 2211 (rpm), 30 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: Mutter 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 3097MiB / 32084MiB 

Mesa info:

OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6

Game being played:

Stellaris through steam for Linux

Native or Wine:

Native

Crash Type:

Screen goes black suddenly while music continues plays for less than a minute;
music begins to loop; and computer reboots.

Full dmesg attached.  Pertinent part of dmesg with debug kernel:

[ 2383.732727] perf: interrupt took too long (2502 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[ 2923.530873] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[ 2928.651952] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=51954680, emitted seq=51954682
[ 2928.652090] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[ 2928.652098] amdgpu 0000:06:00.0: GPU reset begin!
[ 2928.661852] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=734676, emitted seq=734677
[ 2928.661898] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process stellaris pid 5395 thread stellaris:cs0 pid 5397
[ 2928.661901] amdgpu 0000:06:00.0: GPU reset begin!

[ 2928.661997] ======================================================
[ 2928.661999] WARNING: possible circular locking dependency detected
[ 2928.662003] 5.2.15-200.fc30.x86_64+debug #1 Not tainted
[ 2928.662005] ------------------------------------------------------
[ 2928.662007] kworker/10:2/974 is trying to acquire lock:
[ 2928.662010] 00000000d514cf70 (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[ 2928.662021] 
               but task is already holding lock:
[ 2928.662023] 00000000e6ce7c0d (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x130 [gpu_sched]
[ 2928.662031] 
               which lock already depends on the new lock.

[ 2928.662033] 
               the existing dependency chain (in reverse order) is:
[ 2928.662035] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[ 2928.662044]        _raw_spin_lock_irqsave+0x49/0x83
[ 2928.662049]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[ 2928.662052]        dma_fence_signal+0x111/0x1a0
[ 2928.662128]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[ 2928.662223]        sdma_v4_0_process_trap_irq+0x8d/0xa0 [amdgpu]
[ 2928.662310]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[ 2928.662398]        amdgpu_ih_process+0x8d/0x110 [amdgpu]
[ 2928.662482]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[ 2928.662487]        __handle_irq_event_percpu+0x3f/0x290
[ 2928.662491]        handle_irq_event_percpu+0x31/0x80
[ 2928.662495]        handle_irq_event+0x34/0x51
[ 2928.662498]        handle_edge_irq+0x83/0x1a0
[ 2928.662502]        handle_irq+0x1c/0x30
[ 2928.662507]        do_IRQ+0x61/0x120
[ 2928.662511]        ret_from_intr+0x0/0x22
[ 2928.662517]        cpuidle_enter_state+0xc9/0x450
[ 2928.662519]        cpuidle_enter+0x29/0x40
[ 2928.662524]        do_idle+0x1ec/0x280
[ 2928.662528]        cpu_startup_entry+0x19/0x20
[ 2928.662531]        start_secondary+0x189/0x1e0
[ 2928.662537]        secondary_startup_64+0xa4/0xb0
[ 2928.662539] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[ 2928.662548]        lock_acquire+0xa2/0x1b0
[ 2928.662551]        _raw_spin_lock_irqsave+0x49/0x83
[ 2928.662555]        dma_fence_remove_callback+0x1a/0x60
[ 2928.662560]        drm_sched_stop+0x59/0x130 [gpu_sched]
[ 2928.662709]        amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[ 2928.662866]        amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[ 2928.663007]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[ 2928.663018]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[ 2928.663024]        process_one_work+0x272/0x5e0
[ 2928.663029]        worker_thread+0x50/0x3b0
[ 2928.663037]        kthread+0x108/0x140
[ 2928.663045]        ret_from_fork+0x3a/0x50
[ 2928.663048] 
               other info that might help us debug this:

[ 2928.663051]  Possible unsafe locking scenario:

[ 2928.663055]        CPU0                    CPU1
[ 2928.663059]        ----                    ----
[ 2928.663062]   lock(&(&sched->job_list_lock)->rlock);
[ 2928.663068]                               
lock(&(&ring->fence_drv.lock)->rlock);
[ 2928.663072]                               
lock(&(&sched->job_list_lock)->rlock);
[ 2928.663076]   lock(&(&ring->fence_drv.lock)->rlock);
[ 2928.663080] 
                *** DEADLOCK ***

[ 2928.663085] 5 locks held by kworker/10:2/974:
[ 2928.663090]  #0: 0000000057c9a435 ((wq_completion)events){+.+.}, at:
process_one_work+0x1e9/0x5e0
[ 2928.663100]  #1: 00000000aadd5dda
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5e0
[ 2928.663108]  #2: 0000000007db378b (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x39 [amdgpu]
[ 2928.663261]  #3: 000000001e0a2926 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[ 2928.663392]  #4: 00000000e6ce7c0d (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x130 [gpu_sched]
[ 2928.663403] 
               stack backtrace:
[ 2928.663409] CPU: 10 PID: 974 Comm: kworker/10:2 Not tainted
5.2.15-200.fc30.x86_64+debug #1
[ 2928.663413] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[ 2928.663423] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 2928.663428] Call Trace:
[ 2928.663442]  dump_stack+0x85/0xc0
[ 2928.663453]  print_circular_bug.cold+0x15c/0x195
[ 2928.663462]  __lock_acquire+0x167c/0x1c90
[ 2928.663475]  lock_acquire+0xa2/0x1b0
[ 2928.663482]  ? dma_fence_remove_callback+0x1a/0x60
[ 2928.663494]  _raw_spin_lock_irqsave+0x49/0x83
[ 2928.663499]  ? dma_fence_remove_callback+0x1a/0x60
[ 2928.663506]  dma_fence_remove_callback+0x1a/0x60
[ 2928.663515]  drm_sched_stop+0x59/0x130 [gpu_sched]
[ 2928.663663]  amdgpu_device_pre_asic_reset+0x41/0x20c [amdgpu]
[ 2928.663818]  amdgpu_device_gpu_recover+0x77/0x788 [amdgpu]
[ 2928.663960]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[ 2928.663974]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[ 2928.663981]  process_one_work+0x272/0x5e0
[ 2928.663991]  worker_thread+0x50/0x3b0
[ 2928.664000]  kthread+0x108/0x140
[ 2928.664005]  ? process_one_work+0x5e0/0x5e0
[ 2928.664011]  ? kthread_park+0x80/0x80
[ 2928.664021]  ret_from_fork+0x3a/0x50
[ 2928.681831] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 2928.681846] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 2928.681851] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 2928.681857] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 2928.681963] pcieport 0000:00:03.0: AER: Device recovery failed
[ 2933.771664] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 2938.890758] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[ 2939.118467] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 2939.118475] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 2939.118477] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 2939.118479] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 2939.118536] pcieport 0000:00:03.0: AER: Device recovery failed
[ 2939.141034] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[ 2939.369014] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 2939.369018] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 2939.369021] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 2939.369072] pcieport 0000:00:03.0: AER: Device recovery failed
[ 2939.369075] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[ 2939.597051] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 2939.597055] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 2939.597057] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 2939.597103] pcieport 0000:00:03.0: AER: Device recovery failed
[ 2939.597106] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0

systemd logs:

Nothing interesting appears in the logs, not even the information from dmesg. 
I'm unsure if systemd captured anything from the crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 12168 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (104 preceding siblings ...)
  2019-09-23  3:06 ` bugzilla-daemon
@ 2019-09-26 10:37 ` bugzilla-daemon
  2019-09-26 12:56 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-26 10:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 899 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #106 from jeroenimo <freedesktop@jeroenimo.nl> ---

This is quite a severe bug. 
I have reasonable stable system with Mint 19.2 (runs hours without a crash
uname -a
Linux jeroenimo-amd 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux


(X)ubuntu 18.04 LTS LTS crashes a lot faster (1 or 2 minutes) 5.0.0.29 kernel

I can reproduce the bug with glmark2 instantly 100% of the times

(https://launchpad.net/glmark2) or sudo apt install glmark2

I'm not very good at debugging but this is what my dmesg looks like when I ssh
and run glmark2

[ 6619.587749] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:45:crtc-1] flip_done timed out

And that's it, no more info.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1724 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (105 preceding siblings ...)
  2019-09-26 10:37 ` bugzilla-daemon
@ 2019-09-26 12:56 ` bugzilla-daemon
  2019-09-28  7:02 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-26 12:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 843 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #107 from jeroenimo <freedesktop@jeroenimo.nl> ---
I have a workaround that at least makes the system workable.

After some testing I managed to run glmark2 at the lowest and second lowest
clock speed on my RX560

>From root:
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 1 > /sys/class/drm/card0/device/pp_dpm_sclk

giving me this
cat /sys/class/drm/card0/device/pp_dpm_sclk 
0: 214Mhz 
1: 387Mhz *
2: 843Mhz 
3: 995Mhz 
4: 1062Mhz 
5: 1108Mhz 
6: 1149Mhz 
7: 1176Mhz 

Obviously this decreases performance big time, but I don't really game so it
makes my system usable.

Any clock speeds over 4: 1062Mhz crashes my system immediately..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1630 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (106 preceding siblings ...)
  2019-09-26 12:56 ` bugzilla-daemon
@ 2019-09-28  7:02 ` bugzilla-daemon
  2019-09-28 11:05 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-28  7:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6879 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #108 from Wilko Bartels <me@jasondaigo.de> ---
Did u try the amdgpu-pro driver as well?
i just did four runs of glmark and it just went through for me. going up to
1600mhz shader clock. tested both closed and opensource drivers. vega pulse
here.

mesa result:

=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     X.Org
    GL_RENDERER:   Radeon RX Vega (VEGA10, DRM 3.33.0, 5.3.1-arch1-1-ARCH, LLVM
8.0.1)
    GL_VERSION:    4.5 (Compatibility Profile) Mesa 19.1.7
=======================================================
[build] use-vbo=false: FPS: 8617 FrameTime: 0.116 ms
[build] use-vbo=true: FPS: 10534 FrameTime: 0.095 ms
[texture] texture-filter=nearest: FPS: 11214 FrameTime: 0.089 ms
[texture] texture-filter=linear: FPS: 11274 FrameTime: 0.089 ms
[texture] texture-filter=mipmap: FPS: 10197 FrameTime: 0.098 ms
[shading] shading=gouraud: FPS: 9790 FrameTime: 0.102 ms
[shading] shading=blinn-phong-inf: FPS: 10979 FrameTime: 0.091 ms
[shading] shading=phong: FPS: 10167 FrameTime: 0.098 ms
[shading] shading=cel: FPS: 9662 FrameTime: 0.103 ms
[bump] bump-render=high-poly: FPS: 9830 FrameTime: 0.102 ms
[bump] bump-render=normals: FPS: 10151 FrameTime: 0.099 ms
[bump] bump-render=height: FPS: 10870 FrameTime: 0.092 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 12008 FrameTime: 0.083 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 10876 FrameTime: 0.092
ms
[pulsar] light=false:quads=5:texture=false: FPS: 10232 FrameTime: 0.098 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS:
6842 FrameTime: 0.146 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 7934 FrameTime: 0.126 ms
[buffer]
columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map:
FPS: 1770 FrameTime: 0.565 ms
[buffer]
columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:
FPS: 2308 FrameTime: 0.433 ms
[buffer]
columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map:
FPS: 1875 FrameTime: 0.533 ms
[ideas] speed=duration: FPS: 4475 FrameTime: 0.223 ms
[jellyfish] <default>: FPS: 9499 FrameTime: 0.105 ms
[terrain] <default>: FPS: 2593 FrameTime: 0.386 ms
[shadow] <default>: FPS: 9423 FrameTime: 0.106 ms
[refract] <default>: FPS: 6008 FrameTime: 0.166 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 11364 FrameTime: 0.088 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 10816 FrameTime: 0.092 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 12000 FrameTime: 0.083 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 10932 FrameTime:
0.091 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 11690 FrameTime:
0.086 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 11119
FrameTime: 0.090 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 11003
FrameTime: 0.091 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 12886
FrameTime: 0.078 ms
=======================================================
                                  glmark2 Score: 9119 
=======================================================

amdgpu-pro result:

=======================================================
    glmark2 2014.03
=======================================================
    OpenGL Information
    GL_VENDOR:     ATI Technologies Inc.
    GL_RENDERER:   Radeon RX Vega
    GL_VERSION:    4.6.13572 Compatibility Profile Context
=======================================================
[build] use-vbo=false: FPS: 3727 FrameTime: 0.268 ms
[build] use-vbo=true: FPS: 9516 FrameTime: 0.105 ms
[texture] texture-filter=nearest: FPS: 7346 FrameTime: 0.136 ms
[texture] texture-filter=linear: FPS: 9236 FrameTime: 0.108 ms
[texture] texture-filter=mipmap: FPS: 9161 FrameTime: 0.109 ms
[shading] shading=gouraud: FPS: 9184 FrameTime: 0.109 ms
[shading] shading=blinn-phong-inf: FPS: 9363 FrameTime: 0.107 ms
[shading] shading=phong: FPS: 9424 FrameTime: 0.106 ms
[shading] shading=cel: FPS: 9060 FrameTime: 0.110 ms
[bump] bump-render=high-poly: FPS: 9047 FrameTime: 0.111 ms
[bump] bump-render=normals: FPS: 8804 FrameTime: 0.114 ms
[bump] bump-render=height: FPS: 9156 FrameTime: 0.109 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 9121 FrameTime: 0.110 ms
libpng warning: iCCP: known incorrect sRGB profile
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 8866 FrameTime: 0.113 ms
[pulsar] light=false:quads=5:texture=false: FPS: 8286 FrameTime: 0.121 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS:
3789 FrameTime: 0.264 ms
libpng warning: iCCP: known incorrect sRGB profile
[desktop] effect=shadow:windows=4: FPS: 4491 FrameTime: 0.223 ms
[buffer]
columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map:
FPS: 1026 FrameTime: 0.975 ms
[buffer]
columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:
FPS: 2228 FrameTime: 0.449 ms
[buffer]
columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map:
FPS: 1275 FrameTime: 0.784 ms
[ideas] speed=duration: FPS: 4038 FrameTime: 0.248 ms
[jellyfish] <default>: FPS: 7342 FrameTime: 0.136 ms
[terrain] <default>: FPS: 790 FrameTime: 1.266 ms
[shadow] <default>: FPS: 6002 FrameTime: 0.167 ms
[refract] <default>: FPS: 4273 FrameTime: 0.234 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 9208 FrameTime: 0.109 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 8964 FrameTime: 0.112 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 8984 FrameTime: 0.111 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 9360 FrameTime: 0.107
ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 9214 FrameTime:
0.109 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 8945
FrameTime: 0.112 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 9218
FrameTime: 0.108 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 9077
FrameTime: 0.110 ms
=======================================================
                                  glmark2 Score: 7197 
=======================================================

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7704 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (107 preceding siblings ...)
  2019-09-28  7:02 ` bugzilla-daemon
@ 2019-09-28 11:05 ` bugzilla-daemon
  2019-09-28 12:25 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-28 11:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 680 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #109 from jeroenimo <freedesktop@jeroenimo.nl> ---
(In reply to Wilko Bartels from comment #108)
> Did u try the amdgpu-pro driver as well?
> i just did four runs of glmark and it just went through for me. going up to
> 1600mhz shader clock. tested both closed and opensource drivers. vega pulse
> here.
> 
Yes I did try all versions. I'm pretty sure it's not the driver, as all results
in the same. Any higher clockspeed just crashed.

Ik have NVIDIA 1030 installed now, which is also buggy but at least it doesn't
crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1546 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (108 preceding siblings ...)
  2019-09-28 11:05 ` bugzilla-daemon
@ 2019-09-28 12:25 ` bugzilla-daemon
  2019-10-03  9:57 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-09-28 12:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3720 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #110 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to Rodney A Morris from comment #104)
> (In reply to Mauro Gaspari from comment #103)
> > (In reply to Rodney A Morris from comment #101)
> > > (In reply to Rodney A Morris from comment #99)
> > > > Created attachment 145366 [details]
> > > > apitrace of Hearts of Iron IV hard lock
> > > > 
> > > > Apitrace from hard lock playing Hearts of Iron IV without Steam.  The replay
> > > > from this trace will hard lock the computer, though inconsistently.  I've
> > > > replayed the trace three times. The replay hard locked computer one time.
> > > 
> > > neofetch from hardlock:
> > > 
> > >           /:-------------:\          
> > >        :-------------------::        -------------------------------- 
> > >      :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
> > >    /-----------omMMMNNNMMD  ---:     Kernel: 5.2.13-200.fc30.x86_64 
> > >   :-----------sMMMMNMNMP.    ---:    Uptime: 25 mins 
> > >  :-----------:MMMdP-------    ---\   Packages: 2202 (rpm), 27 (flatpak) 
> > > ,------------:MMMd--------    ---:   Shell: bash 5.0.7 
> > > :------------:MMMd-------    .---:   Resolution: 2560x1440 
> > > :----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
> > > :--     .+shhhMMMmhhy++   .------/   WM: GNOME Shell 
> > > :-    -------:MMMd--------------:    WM Theme: Adwaita 
> > > :-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
> > > :-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
> > > :-- :dMNdhhdNMMNo------------;       Terminal: tilix 
> > > :---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
> > > :------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
> > > :---------------------://            Memory: 2478MiB / 32084MiB 
> > > 
> > > OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.1.6
> > > 
> > > Note:  hard lock replayed occurred when the Discord flatpak is also running.
> > 
> > I also noticed some errors that pointed to discord in my logs. In my case
> > discord was installed via .deb package. 
> > Could you please try and disable hardware acceleration in discord settings -
> > appearance menu? Please let me know if it helps or changes anything. 
> > Thanks!
> 
> I have disabled hardware acceleration in discord settings to see if that
> improves my experience and report back my results.  I am doubtful that it
> will help much.  At least on the 5.2.11 kernel, I had lockups with or
> without discord running.  Discord running just seemed to make the problem
> appear more consistently.

Another lockup and crash last night of Stellaris with identical dmesg kernel
information as comment 105.

Kernel for this crash: 5.2.17.

  Unlike previous attempts, I also had cpupower configured to run the cpu in
performance mode and was running feral gamemode.  Although I still wonder if my
hardware has an issue, I am able to run Stellaris without issue under Windows.

Final Note: Getting an apitrace of my crash under Stellaris is not feasible for
two reasons.  First, the crash typically happens between 30 minutes and 40
minutes of game play, resulting in a monster trace file.  Second, i cannot get
apitrace to run correctly with Steam and a 64-bit game, which is necessary
since the crashes happen most frequently in multiplayer.

I am happy to provide more data if someone can point me in the direction to
capture it.  Aside from trying the amdgpu-pro drivers, is there anything else I
can try?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5337 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (109 preceding siblings ...)
  2019-09-28 12:25 ` bugzilla-daemon
@ 2019-10-03  9:57 ` bugzilla-daemon
  2019-10-05 10:12 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-03  9:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 513 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #111 from Yury Zhuravlev <stalkerg@gmail.com> ---
Ok, it's many times was here:
echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

this thing also helped me. Without it, many games make my PC is freeze even
without anything in logs or working ssh. 

Something wrong with the PowerPlay system on Vega cards. Can anybody open a
ticket on the kernel bug tracker?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1296 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (110 preceding siblings ...)
  2019-10-03  9:57 ` bugzilla-daemon
@ 2019-10-05 10:12 ` bugzilla-daemon
  2019-10-05 12:02 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-05 10:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2754 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #112 from Jan Orsag <janorsag@outlook.sk> ---
screenfetch
 ██████████████████  ████████     johanides@johanides-manjaro
 ██████████████████  ████████     OS: Manjaro 18.1.0 Juhraya
 ██████████████████  ████████     Kernel: x86_64 Linux 4.19.69-1-MANJARO
 ██████████████████  ████████     Uptime: 18m
 ████████            ████████     Packages: 1186
 ████████  ████████  ████████     Shell: bash
 ████████  ████████  ████████     Resolution: 2560x1440
 ████████  ████████  ████████     DE: GNOME 3.32.2
 ████████  ████████  ████████     WM: Mutter
 ████████  ████████  ████████     WM Theme: Adapta-Nokto-Eta-Maia
 ████████  ████████  ████████     GTK Theme: Adapta-Nokto-Eta-Maia [GTK2/3]
 ████████  ████████  ████████     Icon Theme: Papirus-Adapta-Nokto-Maia
 ████████  ████████  ████████     Font: Noto Sans 10
 ████████  ████████  ████████     Disk: 565G / 1,2T (50%)
                                  CPU: AMD Ryzen 5 1600X Six-Core @ 12x 3.6GHz
                                  GPU: Radeon RX Vega (VEGA10, DRM 3.27.0,
4.19.69-1-MANJARO, LLVM 8.0.1)
                                  RAM: 2320MiB / 16050MiB

System hard freezes after some playtime in Civilization 6 (black/green/gray
screen, music playing, need to use reset button)

Errors in system logs:
sep 19 16:39:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=7763335, emitted seq=7763337
sep 19 16:39:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout, signaled seq=7703731, emitted seq=7703733
sep 19 16:41:11 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
timeout, signaled seq=374796, emitted seq=374798

On my computer however, the crash/freeze occurs sooner with kernels 5.x and
higher than with kernel 4.19. Its approximately 1 hour playtime (kernel 5+) vs.
8 hours (kernel 4.19). It doesnt matter what mesa I use- tried mesa-aco-git
19.3 and mesa 19.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3538 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (111 preceding siblings ...)
  2019-10-05 10:12 ` bugzilla-daemon
@ 2019-10-05 12:02 ` bugzilla-daemon
  2019-10-19 21:26 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-05 12:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1574 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #113 from Jason Playne <jason@jasonplayne.com> ---
As others have noted, with powerplay doing its thing we get system freezes.

Just had a successful 6+ hour gaming session on a kernel 5.3.2-050302-generic
with the following being done:
 * Forcing high perf state
 * Undervolt/Overclock
 * Higher fan curve (https://github.com/grmat/amdgpu-fancontrol)

I know that I have been messing with all sorts here, but I think it suggests
that PowerPlay may be at fault here when my system *does* crash (which is all
the time without the force high perf state)

All details below:

# Forcing High Perf
echo high | sudo tee
/sys/class/drm/card0/device/power_dpm_force_performance_level

# Undervolt / Overclock
I also have done some messing around with voltages/clocks

$ cat /sys/class/drm/card0/device/pp_od_clk_voltage
OD_SCLK:
0:        852Mhz        800mV
1:        991Mhz        900mV
2:       1084Mhz        940mV
3:       1138Mhz        990mV
4:       1200Mhz       1040mV
5:       1401Mhz       1090mV
6:       1536Mhz       1140mV
7:       1630Mhz       1190mV
OD_MCLK:
0:        167Mhz        800mV
1:        500Mhz        800mV
2:        850Mhz        940mV
3:       1000Mhz       1100mV
OD_RANGE:
SCLK:     852MHz       2400MHz
MCLK:     167MHz       1500MHz
VDDC:     800mV        1200mV


# Settings for AMDGPU Fancontrol
TEMPS=( 35000 70000 80000 )
PWMS=(     70   180   255 )

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2412 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (112 preceding siblings ...)
  2019-10-05 12:02 ` bugzilla-daemon
@ 2019-10-19 21:26 ` bugzilla-daemon
  2019-10-19 21:27 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-19 21:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 29323 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #114 from Rodney A Morris <rodamorris@gmail.com> ---
To rule out possible hardware issues, I purchased another Vega 64 card.  This
time a factory overclocked card.  Since installing the card, I have experienced
three lock ups.  Two playing Stellaris and one while playing a youtube video. 
After playing Stellaris without issue two weeks ago, the computer locked up
twice last night.  While my previous problems seemed to be, in part, linked to
a circular lock dependence, the last logs indicate something different.  I'm
seeing a lot of powerplay errors after the fence timeout.  Hope this new
information provides some insight into the problem.

         /:-------------:\          rmorris@ezra.blanchardmorris.net 
       :-------------------::        -------------------------------- 
     :-----------/shhOHbmp---:\      OS: Fedora release 30 (Thirty) x86_64 
   /-----------omMMMNNNMMD  ---:     Kernel: 5.3.6-200.fc30.x86_64 
  :-----------sMMMMNMNMP.    ---:    Uptime: 16 hours, 21 mins 
 :-----------:MMMdP-------    ---\   Packages: 2214 (rpm), 36 (flatpak) 
,------------:MMMd--------    ---:   Shell: bash 5.0.7 
:------------:MMMd-------    .---:   Resolution: 2560x1440 
:----    oNMMMMMMMMMNho     .----:   DE: GNOME 3.32.2 
:--     .+shhhMMMmhhy++   .------/   WM: Mutter 
:-    -------:MMMd--------------:    WM Theme: Adwaita 
:-   --------/MMMd-------------;     Theme: Adapta-Nokto-Eta [GTK2/3] 
:-    ------/hMMMy------------:      Icons: Adwaita [GTK2/3] 
:-- :dMNdhhdNMMNo------------;       Terminal: tilix 
:---:sdNMMMMNds:------------:        CPU: Intel i7-6850K (12) @ 4.000GHz 
:------:://:-------------::          GPU: AMD ATI Radeon RX Vega 56/64 
:---------------------://            Memory: 2814MiB / 32036MiB 


Card:

MSI Vega 64 OC (Card works fine under windows 10)

Game being played:

Stellaris

Native Game

Description of Event:
Screen goes blank and music and sound continues to play before computer locks
up or reboots.

relevant dmesg from crash:
[ 4244.670269] perf: interrupt took too long (2502 > 2500), lowering
kernel.perf_event_max_sample_rate to 79000
[ 4298.241156] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[ 4304.385587] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring page1 timeout,
signaled seq=60549844, emitted seq=60549846
[ 4304.385634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[ 4304.385637] amdgpu 0000:06:00.0: GPU reset begin!
[ 4304.402938] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4304.402945] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4304.402947] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4304.402948] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4304.404006] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4308.481068] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 4314.625180] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:47:crtc-0]
hw_done or flip_done timed out
[ 4324.865057] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 4335.105035] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:45:plane-5] flip_done timed out
[ 4336.695112] amdgpu: [powerplay] No response from smu
[ 4336.695128] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 4338.307125] amdgpu: [powerplay] No response from smu
[ 4339.922039] amdgpu: [powerplay] No response from smu
[ 4339.922043] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 4341.541675] amdgpu: [powerplay] No response from smu
[ 4343.162102] amdgpu: [powerplay] No response from smu
[ 4343.162105] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 4343.221953] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[ 4343.221962] ------------[ cut here ]------------
[ 4343.222070] WARNING: CPU: 0 PID: 16500 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 4343.222072] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE tun bridge
stp llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat
ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter ip_tables cmac bnep nct6775
hwmon_vid intel_rapl_msr intel_rapl_common vfat fat fuse x86_pkg_temp_thermal
intel_powerclamp coretemp iwlmvm kvm_intel iTCO_wdt iTCO_vendor_support
mac80211 kvm snd_hda_codec_realtek irqbypass snd_hda_codec_generic
snd_hda_codec_hdmi libarc4 ledtrig_audio crct10dif_pclmul snd_hda_intel
crc32_pclmul iwlwifi snd_hda_codec snd_hda_core btusb ghash_clmulni_intel btrtl
intel_cstate snd_hwdep btbcm btintel intel_uncore snd_seq snd_seq_device
intel_rapl_perf bluetooth
[ 4343.222099]  mxm_wmi cfg80211 snd_pcm joydev ecdh_generic ecc mei_me
snd_timer rfkill snd mei i2c_i801 soundcore lpc_ich binfmt_misc auth_rpcgss
sunrpc amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper crc32c_intel uas
mpt3sas igb drm e1000e nvme usb_storage dca i2c_algo_bit raid_class nvme_core
scsi_transport_sas wmi
[ 4343.222114] CPU: 0 PID: 16500 Comm: kworker/0:1 Not tainted
5.3.6-200.fc30.x86_64+debug #1
[ 4343.222115] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[ 4343.222119] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 4343.222167] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[ 4343.222169] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 f8 9d 73 c0 e8 60 46 b0
fa 83 7b 20 01 0f 84 02 ee fd ff 48 c7 c7 f0 9c 73 c0 e8 4a 46 b0 fa <0f> 0b e9
ef ed fd ff 48 c7 c7 f0 9c 73 c0 89 54 24 04 e8 33 46 b0
[ 4343.222170] RSP: 0018:ffffabda8729b690 EFLAGS: 00010246
[ 4343.222172] RAX: 0000000000000024 RBX: ffff9ceeab58f700 RCX:
0000000000000006
[ 4343.222173] RDX: 0000000000000000 RSI: ffff9ceeb50c8e50 RDI:
ffff9ceebe5d9e00
[ 4343.222174] RBP: 000000000000000a R08: 000003f33c33ca38 R09:
0000000000000000
[ 4343.222175] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[ 4343.222176] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[ 4343.222178] FS:  0000000000000000(0000) GS:ffff9ceebe400000(0000)
knlGS:0000000000000000
[ 4343.222179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4343.222180] CR2: 00007f1480ef70c0 CR3: 0000000703f30002 CR4:
00000000003606f0
[ 4343.222182] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 4343.222183] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 4343.222184] Call Trace:
[ 4343.222237]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[ 4343.222285]  dce110_reset_hw_ctx_wrap+0x15f/0x200 [amdgpu]
[ 4343.222333]  dce110_apply_ctx_to_hw+0x4b/0x530 [amdgpu]
[ 4343.222365]  ? amdgpu_pm_compute_clocks+0xc9/0x5f0 [amdgpu]
[ 4343.222414]  ? dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[ 4343.222461]  dc_commit_state+0x26b/0x590 [amdgpu]
[ 4343.222514]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
[ 4343.222521]  ? __lock_acquire+0x247/0x1910
[ 4343.222525]  ? find_held_lock+0x32/0x90
[ 4343.222529]  ? find_held_lock+0x32/0x90
[ 4343.222533]  ? sched_clock+0x5/0x10
[ 4343.222536]  ? mark_held_locks+0x50/0x80
[ 4343.222540]  ? __lock_acquire+0x247/0x1910
[ 4343.222545]  ? wake_up_klogd+0x37/0x40
[ 4343.222549]  ? find_held_lock+0x32/0x90
[ 4343.222552]  ? mark_held_locks+0x50/0x80
[ 4343.222556]  ? _raw_spin_unlock_irq+0x29/0x40
[ 4343.222559]  ? lockdep_hardirqs_on+0xf0/0x180
[ 4343.222561]  ? _raw_spin_unlock_irq+0x29/0x40
[ 4343.222564]  ? wait_for_completion_timeout+0x75/0x190
[ 4343.222576]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4343.222622]  ? amdgpu_dm_audio_eld_notify+0x60/0x60 [amdgpu]
[ 4343.222628]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 4343.222634]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[ 4343.222640]  drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[ 4343.222647]  drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[ 4343.222698]  dm_suspend+0x20/0x60 [amdgpu]
[ 4343.222726]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[ 4343.222755]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[ 4343.222801]  amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
[ 4343.222849]  amdgpu_device_gpu_recover+0x260/0x934 [amdgpu]
[ 4343.222893]  amdgpu_job_timedout+0x115/0x140 [amdgpu]
[ 4343.222899]  drm_sched_job_timedout+0x44/0xa0 [gpu_sched]
[ 4343.222903]  process_one_work+0x272/0x5a0
[ 4343.222908]  worker_thread+0x50/0x3b0
[ 4343.222915]  kthread+0x108/0x140
[ 4343.222916]  ? process_one_work+0x5a0/0x5a0
[ 4343.222918]  ? kthread_park+0x80/0x80
[ 4343.222921]  ret_from_fork+0x3a/0x50
[ 4343.222929] irq event stamp: 82808
[ 4343.222931] hardirqs last  enabled at (82807): [<ffffffffbb1716eb>]
console_unlock+0x46b/0x5d0
[ 4343.222935] hardirqs last disabled at (82808): [<ffffffffbb0038da>]
trace_hardirqs_off_thunk+0x1a/0x20
[ 4343.222938] softirqs last  enabled at (82794): [<ffffffffbbe0035d>]
__do_softirq+0x35d/0x45d
[ 4343.222942] softirqs last disabled at (82787): [<ffffffffbb0f2077>]
irq_exit+0xf7/0x100
[ 4343.222943] ---[ end trace 71731c9cc205c24d ]---
[ 4344.758203] amdgpu: [powerplay] No response from smu
[ 4346.363061] amdgpu: [powerplay] No response from smu
[ 4346.363065] amdgpu: [powerplay] Failed to send message: 0x26, ret value: 0x0
[ 4347.973948] amdgpu: [powerplay] No response from smu
[ 4349.588168] amdgpu: [powerplay] No response from smu
[ 4349.588173] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[ 4351.152764] amdgpu: [powerplay] No response from smu
[ 4352.722063] amdgpu: [powerplay] No response from smu
[ 4352.722068] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[ 4354.325541] amdgpu: [powerplay] No response from smu
[ 4355.924138] amdgpu: [powerplay] No response from smu
[ 4355.924141] amdgpu: [powerplay] Failed to send message: 0x63, ret value: 0x0
[ 4357.537736] amdgpu: [powerplay] No response from smu
[ 4359.154141] amdgpu: [powerplay] No response from smu
[ 4359.154146] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[ 4360.760856] amdgpu: [powerplay] No response from smu
[ 4362.372410] amdgpu: [powerplay] No response from smu
[ 4362.372414] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[ 4363.985961] amdgpu: [powerplay] No response from smu
[ 4365.599325] amdgpu: [powerplay] No response from smu
[ 4365.599331] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[ 4367.214945] amdgpu: [powerplay] No response from smu
[ 4368.829650] amdgpu: [powerplay] No response from smu
[ 4368.829655] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[ 4370.443783] amdgpu: [powerplay] No response from smu
[ 4372.057288] amdgpu: [powerplay] No response from smu
[ 4372.057293] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[ 4372.074301] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.074308] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.074310] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.074312] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.074569] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.091832] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.091837] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.091839] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.091840] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.091889] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.109371] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.109376] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.109378] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.109380] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.126998] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.127002] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.127009] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.127021] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.127024] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.127083] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.144452] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.144457] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.144458] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.144460] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.144514] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.161992] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.161997] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.161999] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.162001] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.162086] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.179534] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.179538] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.179540] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.179542] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.179674] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.197074] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[ 4372.197079] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[ 4372.197081] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[ 4372.197082] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[ 4372.197131] pcieport 0000:00:03.0: AER: Device recovery failed
[ 4372.214616] pcieport 0000:00:03.0: AER: Multiple Uncorrected (Non-Fatal)
error received: 0000:00:03.0
[ 4372.267239] amdgpu: [powerplay] Failed to send message: 0x61, ret value:
0xffffffff

Relevant journalctl messages:

Oct 18 21:49:47 ezra.blanchardmorris.net kernel: perf: interrupt took too long
(2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
Oct 18 21:50:47 ezra.blanchardmorris.net kernel:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out or interrupted!
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring page1 timeout, signaled seq=60549844, emitted
seq=60549846
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: amdgpu 0000:06:00.0: GPU reset
begin!
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
Uncorrected (Non-Fatal) error received: 0000:00:03.0
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer,
(Requester ID)
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:  
device [8086:6f08] error status/mask=00004000/00000000
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:   
[14] CmpltTO                (First)
Oct 18 21:50:47 ezra.blanchardmorris.net kernel: pcieport 0000:00:03.0: AER:
Device recovery failed
Oct 18 21:50:51 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
Oct 18 21:50:57 ezra.blanchardmorris.net kernel: [drm:amdgpu_dm_atomic_check
[amdgpu]] *ERROR* [CRTC:47:crtc-0] hw_done or flip_done timed out
Oct 18 21:51:07 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[CRTC:47:crtc-0] flip_done timed out
Oct 18 21:51:18 ezra.blanchardmorris.net kernel:
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[PLANE:45:plane-5] flip_done timed out
Oct 18 21:51:19 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:19 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0xe, input parameter: 0x0, error code: 0x0
Oct 18 21:51:21 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:22 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:22 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x42, input parameter: 0x1, error code: 0x0
Oct 18 21:51:24 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x24, input parameter: 0x0, error code: 0x0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: [drm] REG_WAIT timeout 10us *
3500 tries - dce_mi_free_dmif line:634
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: ------------[ cut here
]------------
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: WARNING: CPU: 0 PID: 16500 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x31/0x53 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Modules linked in: rfcomm
xt_CHECKSUM xt_MASQUERADE tun bridge stp llc nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables cmac bnep nct6775 hwmon_vid
intel_rapl_msr intel_rapl_common vfat fat fuse x86_pkg_temp_thermal
intel_powerclamp coretemp iwlmvm kvm_intel iTCO_wdt iTCO_vendor_support
mac80211 kvm snd_hda_codec_realtek irqbypass snd_hda_codec_generic
snd_hda_codec_hdmi libarc4 ledtrig_audio crct10dif_pclmul snd_hda_intel
crc32_pclmul iwlwifi snd_hda_codec snd_hda_core btusb ghash_clmulni_intel btrtl
intel_cstate snd_hwdep btbcm btintel intel_uncore snd_seq snd_seq_device
intel_rapl_perf bluetooth
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  mxm_wmi cfg80211 snd_pcm
joydev ecdh_generic ecc mei_me snd_timer rfkill snd mei i2c_i801 soundcore
lpc_ich binfmt_misc auth_rpcgss sunrpc amdgpu amd_iommu_v2 gpu_sched ttm
drm_kms_helper crc32c_intel uas mpt3sas igb drm e1000e nvme usb_storage dca
i2c_algo_bit raid_class nvme_core scsi_transport_sas wmi
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CPU: 0 PID: 16500 Comm:
kworker/0:1 Not tainted 5.3.6-200.fc30.x86_64+debug #1
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Hardware name: To Be Filled By
O.E.M. To Be Filled By O.E.M./X99 Taichi, BIOS P1.80 04/06/2018
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Workqueue: events
drm_sched_job_timedout [gpu_sched]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RIP:
0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Code: 4c 24 18 44 89 fa 89 ee
48 c7 c7 f8 9d 73 c0 e8 60 46 b0 fa 83 7b 20 01 0f 84 02 ee fd ff 48 c7 c7 f0
9c 73 c0 e8 4a 46 b0 fa <0f> 0b e9 ef ed fd ff 48 c7 c7 f0 9c 73 c0 89 54 24 04
e8 33 46 b0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RSP: 0018:ffffabda8729b690
EFLAGS: 00010246
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RAX: 0000000000000024 RBX:
ffff9ceeab58f700 RCX: 0000000000000006
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RDX: 0000000000000000 RSI:
ffff9ceeb50c8e50 RDI: ffff9ceebe5d9e00
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: RBP: 000000000000000a R08:
000003f33c33ca38 R09: 0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: R10: 0000000000000000 R11:
0000000000000000 R12: 00000000000035af
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: R13: 0000000000000dad R14:
0000000000000001 R15: 0000000000000dac
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: FS:  0000000000000000(0000)
GS:ffff9ceebe400000(0000) knlGS:0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CS:  0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: CR2: 00007f1480ef70c0 CR3:
0000000703f30002 CR4: 00000000003606f0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: DR3: 0000000000000000 DR6:
00000000fffe0ff0 DR7: 0000000000000400
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: Call Trace:
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dce_mi_free_dmif+0xef/0x150
[amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
dce110_reset_hw_ctx_wrap+0x15f/0x200 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
dce110_apply_ctx_to_hw+0x4b/0x530 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
amdgpu_pm_compute_clocks+0xc9/0x5f0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dc_commit_state+0x26b/0x590
[amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? __lock_acquire+0x247/0x1910
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? sched_clock+0x5/0x10
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? mark_held_locks+0x50/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? __lock_acquire+0x247/0x1910
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? wake_up_klogd+0x37/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? find_held_lock+0x32/0x90
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? mark_held_locks+0x50/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
_raw_spin_unlock_irq+0x29/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
lockdep_hardirqs_on+0xf0/0x180
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
_raw_spin_unlock_irq+0x29/0x40
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
wait_for_completion_timeout+0x75/0x190
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? commit_tail+0x3c/0x70
[drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
amdgpu_dm_audio_eld_notify+0x60/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  commit_tail+0x3c/0x70
[drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  dm_suspend+0x20/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_device_gpu_recover+0x260/0x934 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
amdgpu_job_timedout+0x115/0x140 [amdgpu]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: 
drm_sched_job_timedout+0x44/0xa0 [gpu_sched]
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  process_one_work+0x272/0x5a0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  worker_thread+0x50/0x3b0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  kthread+0x108/0x140
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ?
process_one_work+0x5a0/0x5a0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ? kthread_park+0x80/0x80
Oct 18 21:51:26 ezra.blanchardmorris.net kernel:  ret_from_fork+0x3a/0x50
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: irq event stamp: 82808
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: hardirqs last  enabled at
(82807): [<ffffffffbb1716eb>] console_unlock+0x46b/0x5d0
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: hardirqs last disabled at
(82808): [<ffffffffbb0038da>] trace_hardirqs_off_thunk+0x1a/0x20
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: softirqs last  enabled at
(82794): [<ffffffffbbe0035d>] __do_softirq+0x35d/0x45d
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: softirqs last disabled at
(82787): [<ffffffffbb0f2077>] irq_exit+0xf7/0x100
Oct 18 21:51:26 ezra.blanchardmorris.net kernel: ---[ end trace
71731c9cc205c24d ]---
Oct 18 21:51:27 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]:
abrt-dump-journal-oops: Found oopses: 1
Oct 18 21:51:27 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]:
abrt-dump-journal-oops: Creating problem directories
Oct 18 21:51:27 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:28 ezra.blanchardmorris.net abrt-dump-journal-oops[1493]: Reported
1 kernel oopses to Abrt
Oct 18 21:51:29 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:29 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed to
send message: 0x26, ret value: 0x0
Oct 18 21:51:30 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:32 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:32 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x4c, input parameter: 0x1, error code: 0x0
Oct 18 21:51:34 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:35 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:35 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x4c, input parameter: 0x3, error code: 0x0
Oct 18 21:51:37 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Can't find a
meaningful backtrace for hashing in '.'
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Option
'DropNotReportableOopses' is not configured
Oct 18 21:51:38 ezra.blanchardmorris.net abrt-server[16691]: Preserving oops
'.' because DropNotReportableOopses is 'no'
Oct 18 21:51:38 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:38 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed to
send message: 0x63, ret value: 0x0
Oct 18 21:51:40 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:42 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:42 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] Failed
message: 0x9, input parameter: 0xf4, error code: 0x0
Oct 18 21:51:42 ezra.blanchardmorris.net abrt-notification[16713]: System
encountered a non-fatal error in ??()
Oct 18 21:51:43 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu
Oct 18 21:51:45 ezra.blanchardmorris.net kernel: amdgpu: [powerplay] No
response from smu

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 30238 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (113 preceding siblings ...)
  2019-10-19 21:26 ` bugzilla-daemon
@ 2019-10-19 21:27 ` bugzilla-daemon
  2019-10-19 21:28 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-19 21:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 347 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #115 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145776
  --> https://bugs.freedesktop.org/attachment.cgi?id=145776&action=edit
Full dmesg from crash

Full dmesg from crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1264 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (114 preceding siblings ...)
  2019-10-19 21:27 ` bugzilla-daemon
@ 2019-10-19 21:28 ` bugzilla-daemon
  2019-10-21 16:24 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-19 21:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 373 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #116 from Rodney A Morris <rodamorris@gmail.com> ---
Created attachment 145777
  --> https://bugs.freedesktop.org/attachment.cgi?id=145777&action=edit
Full journal from start to crash

Full journalctl from start to crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1312 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (115 preceding siblings ...)
  2019-10-19 21:28 ` bugzilla-daemon
@ 2019-10-21 16:24 ` bugzilla-daemon
  2019-10-23  1:52 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-21 16:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 339 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #117 from haro41@gmx.de ---
...are this craches more frequently with VSYNC enabled?

If yes, it could be the same thing like this bug:

https://bugs.freedesktop.org/show_bug.cgi?id=110777

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1244 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (116 preceding siblings ...)
  2019-10-21 16:24 ` bugzilla-daemon
@ 2019-10-23  1:52 ` bugzilla-daemon
  2019-10-23  8:51 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-23  1:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1419 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #118 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to haro41 from comment #117)
> ...are this craches more frequently with VSYNC enabled?
> 
> If yes, it could be the same thing like this bug:
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=110777

vsync is defintely on for both Stellaris and Hearts of Iron.

I looked over the bug report you linked to.  It is very interesting and I will
follow with interest.  The next time I play Stellaris or Hearts of Iron IV, I
will have to see if I can record my memory frequency values to see if they are
indeed not moving off the base frequency under low load with v-sync enabled. 
The problem manifesting under low load would explain why I cannot replicate the
problem while running Unigine Superposition.

I began to wonder if powerplay and the frequency at which the chip and memory
were operating were not the problem after reading the following bug report for
Vega 20:

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Last Friday, I attempted to capture the operating frequency and temps, but my
attempt utterly failed.

I will disable v-sync and see if that improves and report back here.  If I
manage to capture frequency data, I will report back here and may be your
thread.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2619 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (117 preceding siblings ...)
  2019-10-23  1:52 ` bugzilla-daemon
@ 2019-10-23  8:51 ` bugzilla-daemon
  2019-10-24  3:12 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-23  8:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1594 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #119 from haro41@gmx.de ---
bellow is a simple script, i use to record dpm data in the background:

######################################################
#!/bin/bash

# adapt this sample inverval (seconds)
SLEEP_INTERVAL=0.05

# adapt the paths to your need
FILE_SCLK=/sys/class/drm/card0/device/hwmon/hwmon0/freq1_input
FILE_MCLK=/sys/class/drm/card0/device/hwmon/hwmon0/freq2_input
FILE_PWM=/sys/class/drm/card0/device/hwmon/hwmon0/pwm1
FILE_TEMP=/sys/class/drm/card0/device/hwmon/hwmon0/temp1_input
FILE_FAN=/sys/class/drm/card0/device/hwmon/hwmon0/fan1_input
FILE_GFXVDD=/sys/class/drm/card0/device/hwmon/hwmon0/in0_input
FILE_POW=/sys/class/drm/card0/device/hwmon/hwmon0/power1_average
FILE_BUS=/sys/class/drm/card0/device/gpu_busy_percent

# checking for privileges
if [ $UID -ne 0 ]
then
  echo "Writing to sysfs requires privileges, relaunch as root"
  exit 1
fi

function read_output {

  SCLK=$(cat $FILE_SCLK)
  MCLK=$(cat $FILE_MCLK)
  TEMP=$(cat $FILE_TEMP)
  FAN=$(cat $FILE_FAN)
  GFXVDD=$(cat $FILE_GFXVDD)
  POW=$(cat $FILE_POW)
  BUS=$(cat $FILE_BUS)

#  echo "sclk: $SCLK mclk: $MCLK gfx_vdd: $GFXVDD"
  echo "sclk: $SCLK mclk: $MCLK temp: $TEMP fan: $FAN gfx_vdd: $GFXVDD pow:
$POW bus: $BUS"
}

function run_daemon {
  while :; do
    read_output
    sleep $SLEEP_INTERVAL
  done
}

# finally start the loop
run_daemon

######################################################

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2371 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (118 preceding siblings ...)
  2019-10-23  8:51 ` bugzilla-daemon
@ 2019-10-24  3:12 ` bugzilla-daemon
  2019-10-24  4:58 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-24  3:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1163 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #120 from blppt@yahoo.com ---
I dont have anything to attach here, but same issue here, ubuntu 19.04, kernel
5.4-rc3, vega64 W/C, Mesa 19.3.0 -- it only seems to occur with DXVK and not
D9VK for some reason.

Example: GW2 (DX9 game) will work perfectly under heavy load in WvW with
massive zergs for hours with no crash, but FFXIV (DX11) will always lock the
entire system up after a time.

That being said, when you force the top clock using

echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level

and

echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk

FFXIV no longer locks the system at all. It does eat up a good deal more watts
according to my UPS meter though, so resetting to auto is necessary IMHO.

So, it sounds like you guys are on the right track with the whole "power
management" thing being the culprit. Just wanted to add my experience to this.

(and yes, echoing the guy above, the exact same system is stable in windows 10,
so its not a hardware issue).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1930 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (119 preceding siblings ...)
  2019-10-24  3:12 ` bugzilla-daemon
@ 2019-10-24  4:58 ` bugzilla-daemon
  2019-10-24  9:09 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-24  4:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1888 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #121 from Mauro Gaspari <ilvipero@gmx.com> ---
(In reply to blppt from comment #120)
> I dont have anything to attach here, but same issue here, ubuntu 19.04,
> kernel 5.4-rc3, vega64 W/C, Mesa 19.3.0 -- it only seems to occur with DXVK
> and not D9VK for some reason.
> 
> Example: GW2 (DX9 game) will work perfectly under heavy load in WvW with
> massive zergs for hours with no crash, but FFXIV (DX11) will always lock the
> entire system up after a time.
> 
> That being said, when you force the top clock using
> 
> echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
> 
> and
> 
> echo 7 > /sys/class/drm/card0/device/pp_dpm_sclk
> 
> FFXIV no longer locks the system at all. It does eat up a good deal more
> watts according to my UPS meter though, so resetting to auto is necessary
> IMHO.
> 
> So, it sounds like you guys are on the right track with the whole "power
> management" thing being the culprit. Just wanted to add my experience to
> this.
> 
> (and yes, echoing the guy above, the exact same system is stable in windows
> 10, so its not a hardware issue).

I agree with this. I am having much better experience myself even without
commands to force the power performance level by doing:
- change game to windowed or full-screen borderless (fixed window)
- disable vsync
- disable frame limiter

by doing the above 3, it seems that GPU is forced into max power state all the
time while playing. I have been using this method for a few days with DXVK
games and I had no freeze so far.

But again this is just a temporary workaround. So is the command to manually
force high power performance level. Hopefully a permanent fix comes with
AMDGPU/Kernel updates.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2829 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (120 preceding siblings ...)
  2019-10-24  4:58 ` bugzilla-daemon
@ 2019-10-24  9:09 ` bugzilla-daemon
  2019-10-24  9:10 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-24  9:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 742 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #122 from haro41@gmx.de ---
In my experience, this issue is related to mclk switching and it affects the
lowest mclk level only.

So you guy's can save a lot of power, if you, insteed of switching to highest
gfxlevel or to disable vsync, just disable the lowest mclk level by:

echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk

If you are building your kernel locally, look in this thread for a driver code
modification that works, without disabling the lowest mclk level (saves a few
watt on idle).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1515 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (121 preceding siblings ...)
  2019-10-24  9:09 ` bugzilla-daemon
@ 2019-10-24  9:10 ` bugzilla-daemon
  2019-10-29 19:00 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-24  9:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 275 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #123 from haro41@gmx.de ---
... i forgot the link to a related thread:


https://bugs.freedesktop.org/show_bug.cgi?id=110777

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1180 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (122 preceding siblings ...)
  2019-10-24  9:10 ` bugzilla-daemon
@ 2019-10-29 19:00 ` bugzilla-daemon
  2019-11-05 18:01 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-10-29 19:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1333 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #124 from blppt@yahoo.com ---
(In reply to haro41 from comment #122)
> In my experience, this issue is related to mclk switching and it affects the
> lowest mclk level only.
> 
> So you guy's can save a lot of power, if you, insteed of switching to
> highest gfxlevel or to disable vsync, just disable the lowest mclk level by:
> 
> echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
> echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
> 
> If you are building your kernel locally, look in this thread for a driver
> code modification that works, without disabling the lowest mclk level (saves
> a few watt on idle).

Ooh, that seems to have solved it. Haven't had a crash yet, ran The Outer
Worlds for hours (addicting game!), ran FFXIV, ran GW2, no lockups. And, if
there is much of a difference at idle in watt usage, I don't see it on the UPS
meter.

Thanks a million!

(also of note, when using the valve ACO, as others have noted, you don't even
have to do the above to (apparently) solve the problem. unfortunately, that has
other issues, my V64 wont clock up high enough when using ACO for some reason,
so i dont use it).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2216 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (123 preceding siblings ...)
  2019-10-29 19:00 ` bugzilla-daemon
@ 2019-11-05 18:01 ` bugzilla-daemon
  2019-11-06  2:46 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-05 18:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 607 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #125 from haro41@gmx.de ---
... thanks for your feedback, so it seems we are faced with the same bug ...

Btw, i got crashes with at least one vulkan game and ACO compiler backend
enabled too.
I think it really depends of the load pattern. And enabled vsync is triggering
the typical load pattern, with at least one transient (from high to low load)
per frame.

Is someone affected with this bug here, usually building the kernel from source
locally?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1354 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (124 preceding siblings ...)
  2019-11-05 18:01 ` bugzilla-daemon
@ 2019-11-06  2:46 ` bugzilla-daemon
  2019-11-06  9:49 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06  2:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1007 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #126 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to haro41 from comment #125)
> ... thanks for your feedback, so it seems we are faced with the same bug ...
> 
> Btw, i got crashes with at least one vulkan game and ACO compiler backend
> enabled too.
> I think it really depends of the load pattern. And enabled vsync is
> triggering the typical load pattern, with at least one transient (from high
> to low load) per frame.
> 
> Is someone affected with this bug here, usually building the kernel from
> source locally?

If you want someone to apply your changes in bug report no. 110777 to the
kernel for testing, I can so but will not be to it until this weekend. 

As a side note, I've had great success manually limiting the memory clock to
level 1,2,3 on my Vega 64.  I've played over 7 hours of Stellaris without a
crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1890 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (125 preceding siblings ...)
  2019-11-06  2:46 ` bugzilla-daemon
@ 2019-11-06  9:49 ` bugzilla-daemon
  2019-11-06 10:23 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06  9:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1451 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #127 from haro41@gmx.de ---
(In reply to Rodney A Morris from comment #126)
> If you want someone to apply your changes in bug report no. 110777 to the
> kernel for testing, I can so but will not be to it until this weekend. 

... thanks for you reply. Yes, that was the idea and would be very nice...

Since i thing the proposed fix is more relevant to this very thread, let me
repeat the proposed patch here:

in 'drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c':

static void vega10_notify_smc_display_change(struct pp_hwmgr *hwmgr,
                bool has_disp)
{
        smum_send_msg_to_smc_with_parameter(hwmgr,
                                            PPSMC_MSG_SetUclkFastSwitch,
                                            has_disp ? 1 : 0);
/* proposed fix for crashes because of frequently mclk level 0/1 switching */
        smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetUclkDownHyst,
1);
}

Only module 'amdgpu.ko' needs to be rebuild and copied, like this:

$ cd /home/user/linux-5.x.x && make -j8 -C . M=drivers/gpu/drm/amd/amdgpu

# cp /home/user/linux-5.x.x/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
/lib/modules/5.x.x/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko &&
update-initramfs -u

... 'user' and 'x.x' have to be adapted, most likely ...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2290 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (126 preceding siblings ...)
  2019-11-06  9:49 ` bugzilla-daemon
@ 2019-11-06 10:23 ` bugzilla-daemon
  2019-11-06 17:32 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06 10:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 931 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #128 from haro41@gmx.de ---
Created attachment 145901
  --> https://bugs.freedesktop.org/attachment.cgi?id=145901&action=edit
proposed fix for crashes, caused by frequent mclk level 0/1 switches

At least one of the causes for crashes, are more frequently, if vsync is
enabled. 

In this case, memory clock levels are switched usually more frequently.
By experiments i found, that especially the transient betweeen level 1 and
level 0 is critical. The fact, that disabling memory level 0, helps as a
workaround, confirms: this approach points in the right direction.

Result of further experiments:
By sending a 'PPSMC_MSG_SetUclkDownHyst' message to smc (enabling a hysterese
feature ?), the crashes can be avoided, even with enabled mclk level 0 and
vsync activated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1992 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (127 preceding siblings ...)
  2019-11-06 10:23 ` bugzilla-daemon
@ 2019-11-06 17:32 ` bugzilla-daemon
  2019-11-06 18:32 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06 17:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1001 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #129 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to haro41 from comment #122)
> In my experience, this issue is related to mclk switching and it affects the
> lowest mclk level only.
> 
> So you guy's can save a lot of power, if you, insteed of switching to
> highest gfxlevel or to disable vsync, just disable the lowest mclk level by:
> 
> echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
> echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
> 
> If you are building your kernel locally, look in this thread for a driver
> code modification that works, without disabling the lowest mclk level (saves
> a few watt on idle).

do you have any suggestion to automate this? so far i can strictly run these
commands after su. not even sudo works with scripts running these commands. or
systemd files.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1910 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (128 preceding siblings ...)
  2019-11-06 17:32 ` bugzilla-daemon
@ 2019-11-06 18:32 ` bugzilla-daemon
  2019-11-06 19:26 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06 18:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1373 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #130 from haro41@gmx.de ---
> > 
> > echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
> > echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
> > 
> 
> do you have any suggestion to automate this? so far i can strictly run these
> commands after su. not even sudo works with scripts running these commands.
> or systemd files.

Currently i use my patch (see above) to workaround the crashes.
If you prefer not to touch your kernel, you could create a systemd service: 

# cat /etc/systemd/system/amd-pp.service: 

[Unit]
Description=AMD PP adjust service
[Service]
User=root
Group=root
GuessMainPID=no
ExecStart=/srv/amdgpu-pp.sh
[Install]
WantedBy=multi-user.target
---------------------------------------------------------------
# cat /srv/amdgpu-pp.sh:

#!/bin/bash
echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
---------------------------------------------------------------
#systemctl enable amd-pp.service
#systemctl start amd-pp.service
---------------------------------------------------------------

... assuming you have 'amdgpu.ppfeaturemask=0xffffffff' set ...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2236 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (129 preceding siblings ...)
  2019-11-06 18:32 ` bugzilla-daemon
@ 2019-11-06 19:26 ` bugzilla-daemon
  2019-11-07 10:25 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-06 19:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1734 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #131 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to haro41 from comment #130)
> > > 
> > > echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
> > > echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
> > > 
> > 
> > do you have any suggestion to automate this? so far i can strictly run these
> > commands after su. not even sudo works with scripts running these commands.
> > or systemd files.
> 
> Currently i use my patch (see above) to workaround the crashes.
> If you prefer not to touch your kernel, you could create a systemd service: 
> 
> # cat /etc/systemd/system/amd-pp.service: 
> 
> [Unit]
> Description=AMD PP adjust service
> [Service]
> User=root
> Group=root
> GuessMainPID=no
> ExecStart=/srv/amdgpu-pp.sh
> [Install]
> WantedBy=multi-user.target
> ---------------------------------------------------------------
> # cat /srv/amdgpu-pp.sh:
> 
> #!/bin/bash
> echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
> echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk
> ---------------------------------------------------------------
> #systemctl enable amd-pp.service
> #systemctl start amd-pp.service
> ---------------------------------------------------------------
> 
> ... assuming you have 'amdgpu.ppfeaturemask=0xffffffff' set ...

Thank you. I already tried exactly that. And the unit unable to autostart
(permission denied). Only manual systemctl start works. Dont know why. 

I would try to patch the kernel instead if i had any clue how to do the steps.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2774 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (130 preceding siblings ...)
  2019-11-06 19:26 ` bugzilla-daemon
@ 2019-11-07 10:25 ` bugzilla-daemon
  2019-11-07 16:50 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-07 10:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 672 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #132 from haro41@gmx.de ---
(In reply to Wilko Bartels from comment #131)
> Thank you. I already tried exactly that. And the unit unable to autostart
> (permission denied). Only manual systemctl start works. Dont know why. 

If you double checked the permissions of both, the .service and the .sh files,
you could try delay the automatic service start, for example by replacing:

'WantedBy=multi-user.target' with 'WantedBy=graphical.target'

and maybe insert a line in the [Unit] section: 'After=multi-user.target'

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (131 preceding siblings ...)
  2019-11-07 10:25 ` bugzilla-daemon
@ 2019-11-07 16:50 ` bugzilla-daemon
  2019-11-12 11:03 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-07 16:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1428 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #133 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to haro41 from comment #132)
> (In reply to Wilko Bartels from comment #131)
> > Thank you. I already tried exactly that. And the unit unable to autostart
> > (permission denied). Only manual systemctl start works. Dont know why. 
> 
> If you double checked the permissions of both, the .service and the .sh
> files,
> you could try delay the automatic service start, for example by replacing:
> 
> 'WantedBy=multi-user.target' with 'WantedBy=graphical.target'
> 
> and maybe insert a line in the [Unit] section: 'After=multi-user.target'

sadly that doesnt change a thing
line 2: /sys/class/drm/card0/device/power_dpm_force_performance_level:
Permission denied

line 3: /sys/class/drm/card0/device/pp_dpm_mclk: Permission denied
amd-pp.service: Main process exited, code=exited, status=1/FAILURE

-rw-r--r-- 1 root root 4,0K  7. Nov 17:45
/sys/class/drm/card0/device/power_dpm_force_performance_level

-rw-r--r-- 1 root root 4,0K  7. Nov 17:45
/sys/class/drm/card0/device/pp_dpm_mclk

again after logging (i3/xinit or plasma/sddm i have no errors with systemctl
start and it works

[jason@behemoth ~]$ cat /sys/class/drm/card0/device/pp_dpm_mclk
0: 167Mhz 
1: 500Mhz *
2: 700Mhz 
3: 800Mhz

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2360 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (132 preceding siblings ...)
  2019-11-07 16:50 ` bugzilla-daemon
@ 2019-11-12 11:03 ` bugzilla-daemon
  2019-11-17 14:24 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-12 11:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1649 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #134 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Wilko Bartels from comment #133)
> (In reply to haro41 from comment #132)
> > (In reply to Wilko Bartels from comment #131)
> > > Thank you. I already tried exactly that. And the unit unable to autostart
> > > (permission denied). Only manual systemctl start works. Dont know why. 
> > 
> > If you double checked the permissions of both, the .service and the .sh
> > files,
> > you could try delay the automatic service start, for example by replacing:
> > 
> > 'WantedBy=multi-user.target' with 'WantedBy=graphical.target'
> > 
> > and maybe insert a line in the [Unit] section: 'After=multi-user.target'
> 
> sadly that doesnt change a thing
> line 2: /sys/class/drm/card0/device/power_dpm_force_performance_level:
> Permission denied
> 
> line 3: /sys/class/drm/card0/device/pp_dpm_mclk: Permission denied
> amd-pp.service: Main process exited, code=exited, status=1/FAILURE
> 
> -rw-r--r-- 1 root root 4,0K  7. Nov 17:45
> /sys/class/drm/card0/device/power_dpm_force_performance_level
> 
> -rw-r--r-- 1 root root 4,0K  7. Nov 17:45
> /sys/class/drm/card0/device/pp_dpm_mclk
> 
> again after logging (i3/xinit or plasma/sddm i have no errors with systemctl
> start and it works
> 
> [jason@behemoth ~]$ cat /sys/class/drm/card0/device/pp_dpm_mclk
> 0: 167Mhz 
> 1: 500Mhz *
> 2: 700Mhz 
> 3: 800Mhz

running a script at plasma login now. with no password for that command in
sudoers. also after sleep.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2725 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (133 preceding siblings ...)
  2019-11-12 11:03 ` bugzilla-daemon
@ 2019-11-17 14:24 ` bugzilla-daemon
  2019-11-17 17:13 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-17 14:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 36815 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #135 from Rodney A Morris <rodamorris@gmail.com> ---
(In reply to haro41 from comment #127)
> (In reply to Rodney A Morris from comment #126)
> > If you want someone to apply your changes in bug report no. 110777 to the
> > kernel for testing, I can so but will not be to it until this weekend. 
>  
> ... thanks for you reply. Yes, that was the idea and would be very nice...
> 
> Since i thing the proposed fix is more relevant to this very thread, let me
> repeat the proposed patch here:
> 
> in 'drivers/gpu/drm/amd/powerplay/hwmgr/vega10_hwmgr.c':
> 
> static void vega10_notify_smc_display_change(struct pp_hwmgr *hwmgr,
>                 bool has_disp)
> {
> 	smum_send_msg_to_smc_with_parameter(hwmgr,
> 	                                    PPSMC_MSG_SetUclkFastSwitch,
> 	                                    has_disp ? 1 : 0);
> /* proposed fix for crashes because of frequently mclk level 0/1 switching */
> 	smum_send_msg_to_smc_with_parameter(hwmgr, PPSMC_MSG_SetUclkDownHyst, 1);
> }
> 
> Only module 'amdgpu.ko' needs to be rebuild and copied, like this:
> 
> $ cd /home/user/linux-5.x.x && make -j8 -C . M=drivers/gpu/drm/amd/amdgpu
> 
> # cp /home/user/linux-5.x.x/drivers/gpu/drm/amd/amdgpu/amdgpu.ko
> /lib/modules/5.x.x/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko &&
> update-initramfs -u
> 
> ... 'user' and 'x.x' have to be adapted, most likely ...

I applied the patch and recompiled the kernel with the modified amdgpu driver. 
Unfortunately, the patch did not resolve my issues.  I experienced a crash with
the same symptoms as before within 20 minutes of playing Battletech and within
40 minutes of playing Stellaris.  Again, limiting the HMB memory clock to
levels 1,2, and 3 prevents the system from crashing, indicating that something
with the switching of the memory clock between level 0 and 1, 2, and 3 are
causing the crash.

Interestingly, the debug output indicates a possible problem in
amdgpu/../display/dc/dc_helper.c at, I am guessing, line 332.  If I have time
later this week, I may take a look at the code in that file.  Here are the
pertinent details from the Stellaris crash.

Distro:  Fedora
Kernel:  5.3.11

dmesg crash output:

[19792.781681] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3875204, emitted seq=3875205
[19792.781727] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process stellaris pid 13309 thread stellaris:cs0 pid 13310
[19792.781731] amdgpu 0000:06:00.0: GPU reset begin!
[19792.798997] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19792.799004] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19792.799006] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19792.799007] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19792.800004] pcieport 0000:00:03.0: AER: Device recovery failed
[19794.419525] amdgpu: [powerplay] No response from smu
[19794.419542] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[19796.043441] amdgpu: [powerplay] No response from smu
[19797.665903] amdgpu: [powerplay] No response from smu
[19797.665907] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[19799.287749] amdgpu: [powerplay] No response from smu
[19800.910845] amdgpu: [powerplay] No response from smu
[19800.910850] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[19800.977846] [drm] REG_WAIT timeout 10us * 3500 tries - dce_mi_free_dmif
line:634
[19800.977855] ------------[ cut here ]------------
[19800.977967] WARNING: CPU: 10 PID: 15123 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:332
generic_reg_wait.cold+0x31/0x53 [amdgpu]
[19800.977968] Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp
nf_conntrack_tftp tun bridge stp llc nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter
ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat
ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 libcrc32c ip_set nfnetlink ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter cmac bnep nct6775 hwmon_vid vfat fat
intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm iTCO_wdt iTCO_vendor_support irqbypass iwlmvm crct10dif_pclmul
snd_hda_codec_realtek crc32_pclmul snd_hda_codec_generic ledtrig_audio
snd_hda_codec_hdmi ghash_clmulni_intel mac80211 snd_hda_intel intel_cstate
snd_hda_codec libarc4 intel_uncore snd_hda_core btusb snd_hwdep btrtl
intel_rapl_perf btbcm iwlwifi snd_seq btintel snd_seq_device
[19800.977994]  bluetooth joydev mxm_wmi snd_pcm cfg80211 snd_timer
ecdh_generic ecc rfkill snd mei_me soundcore i2c_i801 lpc_ich mei binfmt_misc
auth_rpcgss sunrpc ip_tables amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper
drm crc32c_intel mpt3sas igb nvme e1000e dca raid_class i2c_algo_bit
scsi_transport_sas nvme_core wmi usb_storage fuse
[19800.978009] CPU: 10 PID: 15123 Comm: kworker/10:1 Not tainted
5.3.11-300.RAM.local.fc31.x86_64+debug #1
[19800.978011] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99
Taichi, BIOS P1.80 04/06/2018
[19800.978014] Workqueue: events drm_sched_job_timedout [gpu_sched]
[19800.978082] RIP: 0010:generic_reg_wait.cold+0x31/0x53 [amdgpu]
[19800.978084] Code: 4c 24 18 44 89 fa 89 ee 48 c7 c7 a8 ee 7e c0 e8 82 00 a5
fa 83 7b 20 01 0f 84 94 ee fd ff 48 c7 c7 a0 ed 7e c0 e8 6c 00 a5 fa <0f> 0b e9
81 ee fd ff 48 c7 c7 a0 ed 7e c0 89 54 24 04 e8 55 00 a5
[19800.978086] RSP: 0018:ffff957a0520f690 EFLAGS: 00010246
[19800.978087] RAX: 0000000000000024 RBX: ffff88d6a8030780 RCX:
0000000000000006
[19800.978089] RDX: 0000000000000000 RSI: ffff88d645a10e50 RDI:
ffff88d6bf9d9e00
[19800.978090] RBP: 000000000000000a R08: 0000120246405906 R09:
0000000000000000
[19800.978091] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000035af
[19800.978092] R13: 0000000000000dad R14: 0000000000000001 R15:
0000000000000dac
[19800.978093] FS:  0000000000000000(0000) GS:ffff88d6bf800000(0000)
knlGS:0000000000000000
[19800.978095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19800.978096] CR2: 0000289e30054000 CR3: 0000000278612003 CR4:
00000000003606e0
[19800.978097] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[19800.978098] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[19800.978100] Call Trace:
[19800.978152]  dce_mi_free_dmif+0xef/0x150 [amdgpu]
[19800.978200]  dce110_reset_hw_ctx_wrap+0x15f/0x200 [amdgpu]
[19800.978261]  dce110_apply_ctx_to_hw+0x4b/0x530 [amdgpu]
[19800.978316]  ? amdgpu_pm_compute_clocks+0xc9/0x5f0 [amdgpu]
[19800.978383]  ? dm_pp_apply_display_requirements+0x1a8/0x1c0 [amdgpu]
[19800.978429]  dc_commit_state+0x26b/0x590 [amdgpu]
[19800.978479]  amdgpu_dm_atomic_commit_tail+0xd18/0x1cf0 [amdgpu]
[19800.978486]  ? check_irq_usage+0xa7/0x460
[19800.978488]  ? find_held_lock+0x32/0x90
[19800.978494]  ? check_path+0x22/0x40
[19800.978496]  ? check_noncircular+0xaf/0x1b0
[19800.978501]  ? __lock_acquire+0x247/0x1910
[19800.978507]  ? find_held_lock+0x32/0x90
[19800.978511]  ? mark_held_locks+0x50/0x80
[19800.978513]  ? _raw_spin_unlock_irq+0x29/0x40
[19800.978516]  ? lockdep_hardirqs_on+0xf0/0x180
[19800.978518]  ? _raw_spin_unlock_irq+0x29/0x40
[19800.978521]  ? wait_for_completion_timeout+0x75/0x190
[19800.978534]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[19800.978578]  ? amdgpu_dm_audio_eld_notify+0x60/0x60 [amdgpu]
[19800.978583]  commit_tail+0x3c/0x70 [drm_kms_helper]
[19800.978588]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[19800.978595]  drm_atomic_helper_disable_all+0x14c/0x160 [drm_kms_helper]
[19800.978601]  drm_atomic_helper_suspend+0x66/0x100 [drm_kms_helper]
[19800.978652]  dm_suspend+0x20/0x60 [amdgpu]
[19800.978679]  amdgpu_device_ip_suspend_phase1+0x91/0xc0 [amdgpu]
[19800.978707]  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
[19800.978753]  amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
[19800.978799]  amdgpu_device_gpu_recover+0x260/0x934 [amdgpu]
[19800.978843]  amdgpu_job_timedout+0x115/0x140 [amdgpu]
[19800.978848]  drm_sched_job_timedout+0x44/0xa0 [gpu_sched]
[19800.978852]  process_one_work+0x272/0x5a0
[19800.978858]  worker_thread+0x50/0x3b0
[19800.978863]  kthread+0x108/0x140
[19800.978865]  ? process_one_work+0x5a0/0x5a0
[19800.978867]  ? kthread_park+0x80/0x80
[19800.978870]  ret_from_fork+0x3a/0x50
[19800.978878] irq event stamp: 211500
[19800.978881] hardirqs last  enabled at (211499): [<ffffffffbb1715db>]
console_unlock+0x46b/0x5d0
[19800.978885] hardirqs last disabled at (211500): [<ffffffffbb0038da>]
trace_hardirqs_off_thunk+0x1a/0x20
[19800.978887] softirqs last  enabled at (211486): [<ffffffffbbe0035d>]
__do_softirq+0x35d/0x45d
[19800.978889] softirqs last disabled at (211479): [<ffffffffbb0f20c7>]
irq_exit+0xf7/0x100
[19800.978891] ---[ end trace 722d34fe8b4d4012 ]---
[19802.595549] amdgpu: [powerplay] No response from smu
[19804.214995] amdgpu: [powerplay] No response from smu
[19804.215000] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x1,
error code: 0x0
[19805.837985] amdgpu: [powerplay] No response from smu
[19807.458610] amdgpu: [powerplay] No response from smu
[19807.458614] amdgpu: [powerplay] Failed message: 0x4c, input parameter: 0x3,
error code: 0x0
[19809.078189] amdgpu: [powerplay] No response from smu
[19810.698831] amdgpu: [powerplay] No response from smu
[19810.698835] amdgpu: [powerplay] Failed message: 0x9, input parameter: 0xf4,
error code: 0x0
[19812.321202] amdgpu: [powerplay] No response from smu
[19813.938039] amdgpu: [powerplay] No response from smu
[19813.938043] amdgpu: [powerplay] Failed message: 0xa, input parameter:
0xa0b000, error code: 0x0
[19815.558461] amdgpu: [powerplay] No response from smu
[19817.179965] amdgpu: [powerplay] No response from smu
[19817.179969] amdgpu: [powerplay] Failed message: 0xe, input parameter: 0x0,
error code: 0x0
[19818.790507] amdgpu: [powerplay] No response from smu
[19820.409551] amdgpu: [powerplay] No response from smu
[19820.409555] amdgpu: [powerplay] Failed message: 0x42, input parameter: 0x1,
error code: 0x0
[19822.030397] amdgpu: [powerplay] No response from smu
[19823.648860] amdgpu: [powerplay] No response from smu
[19823.648864] amdgpu: [powerplay] Failed message: 0x43, input parameter: 0x1,
error code: 0x0
[19825.269615] amdgpu: [powerplay] No response from smu
[19826.890755] amdgpu: [powerplay] No response from smu
[19826.890760] amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0,
error code: 0x0
[19826.907783] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.907789] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.907791] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.907793] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.907853] pcieport 0000:00:03.0: AER: Device recovery failed
[19826.925319] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.925325] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.925326] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.925328] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.925371] pcieport 0000:00:03.0: AER: Device recovery failed
[19826.942858] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.942863] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.942865] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.942867] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.942922] pcieport 0000:00:03.0: AER: Device recovery failed
[19826.960471] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.960477] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.960480] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.960483] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.960532] pcieport 0000:00:03.0: AER: Device recovery failed
[19826.977940] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.977945] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.977947] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.977949] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.977988] pcieport 0000:00:03.0: AER: Device recovery failed
[19826.995481] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19826.995486] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19826.995487] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19826.995489] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19826.995529] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.013021] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.013026] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.013027] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.013029] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.013091] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.030562] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.030567] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.030568] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.030570] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.030610] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.048102] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.048106] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.048108] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.048110] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.048148] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.065644] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.065648] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.065650] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.065652] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.065692] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.083183] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.083188] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.083190] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.083192] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.083231] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.100724] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.100729] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.100731] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.100732] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.100772] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.118264] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.118269] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.118270] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.118272] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.118310] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.135804] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.135809] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.135811] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.135812] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.135852] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.153345] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.153350] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.153352] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.153353] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.153393] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.170887] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.170892] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.170893] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.170895] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.170934] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.188426] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.188431] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.188433] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.188435] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.188473] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.205966] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.205971] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.205973] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.205974] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.206013] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.223507] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.223512] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.223514] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.223515] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.223554] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.241053] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.241058] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.241059] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.241061] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.241120] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.258589] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.258594] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.258595] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.258597] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.258637] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.276129] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.276134] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.276135] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.276137] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.276176] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.293670] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.293675] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.293676] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.293678] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.293718] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.311211] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.311215] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.311217] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.311219] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.311259] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.328751] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.328756] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.328758] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.328759] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.328800] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.346291] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.346295] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.346297] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.346299] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.346344] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.363831] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.363836] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.363838] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.363839] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.363886] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.381372] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.381376] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.381378] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.381380] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.381425] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.398913] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.398917] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.398919] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.398921] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.398959] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.416453] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.416458] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.416460] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.416467] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.416507] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.433994] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.433999] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.434001] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.434002] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.434042] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.451536] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.451542] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.451544] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.451545] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.451588] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.469085] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.469091] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.469092] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.469094] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.469136] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.486616] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.486626] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.486628] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.486630] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.486670] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.504161] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.504167] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.504170] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.504171] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.504218] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.521697] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.521702] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.521704] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.521706] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.521934] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.539242] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.539247] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.539249] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.539250] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.539290] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.556778] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.556782] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.556784] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.556786] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.556836] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.574325] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.574330] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.574332] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.574334] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.574373] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.591858] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.591863] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.591865] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.591867] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.591908] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.609401] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.609405] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.609407] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.609409] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.609448] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.626939] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.626944] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.626946] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.626947] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.626986] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.644481] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.644486] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.644488] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.644489] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.644528] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.662021] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.662026] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.662028] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.662029] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.662087] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.679561] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.679566] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.679568] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.679570] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.679608] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.697101] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.697106] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.697108] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.697110] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.697149] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.714648] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.714653] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.714655] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.714656] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.714703] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.732183] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.732188] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.732190] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.732191] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.732230] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.749724] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.749729] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.749730] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.749732] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.767327] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.767330] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.767335] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.767336] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.767338] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.767364] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.784805] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.784810] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.784812] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.784813] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.784853] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.802345] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.802350] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.802352] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.802354] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.802394] pcieport 0000:00:03.0: AER: Device recovery failed
[19827.819886] pcieport 0000:00:03.0: AER: Uncorrected (Non-Fatal) error
received: 0000:00:03.0
[19827.819891] pcieport 0000:00:03.0: AER: PCIe Bus Error: severity=Uncorrected
(Non-Fatal), type=Transaction Layer, (Requester ID)
[19827.819893] pcieport 0000:00:03.0: AER:   device [8086:6f08] error
status/mask=00004000/00000000
[19827.819894] pcieport 0000:00:03.0: AER:    [14] CmpltTO               
(First)
[19827.819934] pcieport 0000:00:03.0: AER: Device recovery failed

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 37852 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (134 preceding siblings ...)
  2019-11-17 14:24 ` bugzilla-daemon
@ 2019-11-17 17:13 ` bugzilla-daemon
  2019-11-17 17:18 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming (VSYNC enabled) bugzilla-daemon
  2019-11-20  7:52 ` bugzilla-daemon
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-17 17:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 875 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

--- Comment #136 from haro41@gmx.de ---
Thank you for testing and reporting back.

I think the crashes are caused by voltage drops, followed by a hardware
failure.
That would explain the many different kernel logs too, because from the drivers
pow, it is randomly.

If vsync is enabled, mclk level is switched at least twice per frame (down/up).
And in some cases i have seen more switches inside a frame. 

I am not sure, if this fast memory clock level switching, multiple times during
a frame really useful? It saves not much power, but makes the system instable,
apparently.

I don't think this is wanted behavior, it looks more like a firmware bug, imo.

Maybe an opensource driver developer can help us to understand?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1622 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming (VSYNC enabled)
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (135 preceding siblings ...)
  2019-11-17 17:13 ` bugzilla-daemon
@ 2019-11-17 17:18 ` bugzilla-daemon
  2019-11-20  7:52 ` bugzilla-daemon
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-17 17:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 506 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

haro41@gmx.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|amdgpu [RX Vega 64] system  |amdgpu [RX Vega 64] system
                   |freeze while gaming         |freeze while gaming (VSYNC
                   |                            |enabled)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1119 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

* [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming (VSYNC enabled)
       [not found] <bug-109955-502@http.bugs.freedesktop.org/>
                   ` (136 preceding siblings ...)
  2019-11-17 17:18 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming (VSYNC enabled) bugzilla-daemon
@ 2019-11-20  7:52 ` bugzilla-daemon
  137 siblings, 0 replies; 147+ messages in thread
From: bugzilla-daemon @ 2019-11-20  7:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 807 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109955

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #137 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/716.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2414 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 147+ messages in thread

end of thread, other threads:[~2019-11-20  7:52 UTC | newest]

Thread overview: 147+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-109955-502@http.bugs.freedesktop.org/>
2019-03-11 11:27 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming bugzilla-daemon
2019-03-22 20:01 ` bugzilla-daemon
2019-03-22 20:02 ` bugzilla-daemon
2019-03-22 20:02 ` bugzilla-daemon
2019-04-11  6:37 ` bugzilla-daemon
2019-04-12 21:37 ` bugzilla-daemon
2019-04-12 22:10 ` bugzilla-daemon
2019-04-13  9:34 ` bugzilla-daemon
2019-04-13  9:41 ` bugzilla-daemon
2019-04-13  9:49 ` bugzilla-daemon
2019-04-13  9:52 ` bugzilla-daemon
2019-04-13 11:34 ` bugzilla-daemon
2019-04-13 13:19 ` bugzilla-daemon
2019-04-13 13:45 ` bugzilla-daemon
2019-04-15 12:51 ` bugzilla-daemon
2019-04-25 19:44 ` bugzilla-daemon
2019-04-28 16:33 ` bugzilla-daemon
2019-04-29  1:15 ` bugzilla-daemon
2019-04-29 10:41 ` bugzilla-daemon
2019-04-29 11:35 ` bugzilla-daemon
2019-04-29 11:37 ` bugzilla-daemon
2019-04-29 13:52 ` bugzilla-daemon
2019-05-24  5:12 ` bugzilla-daemon
2019-05-24 12:24   ` sylvain.bertrand
2019-05-24 12:25 ` bugzilla-daemon
2019-05-24 13:44 ` bugzilla-daemon
2019-06-03  8:07 ` bugzilla-daemon
2019-06-03 20:10 ` bugzilla-daemon
2019-06-04 21:43 ` bugzilla-daemon
2019-06-05  6:34 ` bugzilla-daemon
2019-06-09 18:46 ` bugzilla-daemon
2019-06-10 17:13 ` bugzilla-daemon
2019-06-13 21:04 ` bugzilla-daemon
2019-06-13 21:04 ` bugzilla-daemon
2019-06-14  5:48 ` bugzilla-daemon
2019-06-14 14:33 ` bugzilla-daemon
2019-07-06  9:30 ` bugzilla-daemon
2019-07-07  5:31 ` bugzilla-daemon
2019-07-07 17:41   ` sylvain.bertrand
2019-07-07 10:55 ` bugzilla-daemon
2019-07-07 17:42 ` bugzilla-daemon
2019-07-08  5:29 ` bugzilla-daemon
2019-07-09 14:29 ` bugzilla-daemon
2019-07-09 18:05   ` sylvain.bertrand
2019-07-09 18:06 ` bugzilla-daemon
2019-07-10  7:25 ` bugzilla-daemon
2019-07-10  8:03 ` bugzilla-daemon
2019-07-10  8:19 ` bugzilla-daemon
2019-07-10  8:26 ` bugzilla-daemon
2019-07-10  9:41 ` bugzilla-daemon
2019-07-10 14:44 ` bugzilla-daemon
2019-07-10 18:42 ` bugzilla-daemon
2019-07-12 15:26 ` bugzilla-daemon
2019-07-13 17:22 ` bugzilla-daemon
2019-07-16  8:28 ` bugzilla-daemon
2019-07-17  3:34 ` bugzilla-daemon
2019-07-17 16:02   ` sylvain.bertrand
2019-07-17 16:02 ` bugzilla-daemon
2019-07-18  2:30 ` bugzilla-daemon
2019-07-18 13:44   ` sylvain.bertrand
2019-07-18 13:44 ` bugzilla-daemon
2019-07-19  0:12 ` bugzilla-daemon
2019-07-22  5:19 ` bugzilla-daemon
2019-07-23 16:25 ` bugzilla-daemon
2019-07-23 16:30 ` bugzilla-daemon
2019-07-23 17:14 ` bugzilla-daemon
2019-07-23 20:17   ` sylvain.bertrand
2019-07-23 20:18 ` bugzilla-daemon
2019-07-24  4:14 ` bugzilla-daemon
2019-07-24 13:08   ` sylvain.bertrand
2019-07-24 13:09 ` bugzilla-daemon
2019-07-24 14:27 ` bugzilla-daemon
2019-07-24 14:41 ` bugzilla-daemon
2019-07-24 14:55   ` sylvain.bertrand
2019-07-24 14:56 ` bugzilla-daemon
2019-07-27 11:28 ` bugzilla-daemon
2019-07-27 13:19   ` sylvain.bertrand
2019-07-27 13:19 ` bugzilla-daemon
2019-07-27 17:32 ` bugzilla-daemon
2019-07-28  3:14 ` bugzilla-daemon
2019-08-03 13:35 ` bugzilla-daemon
2019-08-03 16:54 ` bugzilla-daemon
2019-08-03 17:43 ` bugzilla-daemon
2019-08-03 18:46 ` bugzilla-daemon
2019-08-04  5:05 ` bugzilla-daemon
2019-08-04 14:18 ` bugzilla-daemon
2019-08-04 16:17 ` bugzilla-daemon
2019-08-05  5:54 ` bugzilla-daemon
2019-08-05  6:16 ` bugzilla-daemon
2019-08-07  9:53 ` bugzilla-daemon
2019-08-11  9:31 ` bugzilla-daemon
2019-08-12  2:50 ` bugzilla-daemon
2019-08-12  8:16 ` bugzilla-daemon
2019-08-12 14:10 ` bugzilla-daemon
2019-08-13 15:59 ` bugzilla-daemon
2019-08-13 16:19 ` bugzilla-daemon
2019-08-30 19:01 ` bugzilla-daemon
2019-08-31  1:00 ` bugzilla-daemon
2019-08-31  5:21 ` bugzilla-daemon
2019-08-31 22:38 ` bugzilla-daemon
2019-09-01 22:49 ` bugzilla-daemon
2019-09-02  7:48 ` bugzilla-daemon
2019-09-02 10:07 ` bugzilla-daemon
2019-09-04 20:41 ` bugzilla-daemon
2019-09-07  3:48 ` bugzilla-daemon
2019-09-07  3:50 ` bugzilla-daemon
2019-09-12 20:08 ` bugzilla-daemon
2019-09-15  1:16 ` bugzilla-daemon
2019-09-15  1:20 ` bugzilla-daemon
2019-09-15  1:21 ` bugzilla-daemon
2019-09-15  4:35 ` bugzilla-daemon
2019-09-21  2:05 ` bugzilla-daemon
2019-09-23  2:49 ` bugzilla-daemon
2019-09-23  3:06 ` bugzilla-daemon
2019-09-26 10:37 ` bugzilla-daemon
2019-09-26 12:56 ` bugzilla-daemon
2019-09-28  7:02 ` bugzilla-daemon
2019-09-28 11:05 ` bugzilla-daemon
2019-09-28 12:25 ` bugzilla-daemon
2019-10-03  9:57 ` bugzilla-daemon
2019-10-05 10:12 ` bugzilla-daemon
2019-10-05 12:02 ` bugzilla-daemon
2019-10-19 21:26 ` bugzilla-daemon
2019-10-19 21:27 ` bugzilla-daemon
2019-10-19 21:28 ` bugzilla-daemon
2019-10-21 16:24 ` bugzilla-daemon
2019-10-23  1:52 ` bugzilla-daemon
2019-10-23  8:51 ` bugzilla-daemon
2019-10-24  3:12 ` bugzilla-daemon
2019-10-24  4:58 ` bugzilla-daemon
2019-10-24  9:09 ` bugzilla-daemon
2019-10-24  9:10 ` bugzilla-daemon
2019-10-29 19:00 ` bugzilla-daemon
2019-11-05 18:01 ` bugzilla-daemon
2019-11-06  2:46 ` bugzilla-daemon
2019-11-06  9:49 ` bugzilla-daemon
2019-11-06 10:23 ` bugzilla-daemon
2019-11-06 17:32 ` bugzilla-daemon
2019-11-06 18:32 ` bugzilla-daemon
2019-11-06 19:26 ` bugzilla-daemon
2019-11-07 10:25 ` bugzilla-daemon
2019-11-07 16:50 ` bugzilla-daemon
2019-11-12 11:03 ` bugzilla-daemon
2019-11-17 14:24 ` bugzilla-daemon
2019-11-17 17:13 ` bugzilla-daemon
2019-11-17 17:18 ` [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming (VSYNC enabled) bugzilla-daemon
2019-11-20  7:52 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.