dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 201957] New: amdgpu: ring gfx timeout
@ 2018-12-11  4:52 bugzilla-daemon
  2018-12-11 14:57 ` [Bug 201957] " bugzilla-daemon
                   ` (98 more replies)
  0 siblings, 99 replies; 100+ messages in thread
From: bugzilla-daemon @ 2018-12-11  4:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

            Bug ID: 201957
           Summary: amdgpu: ring gfx timeout
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.19.8, 4.20-rc5
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: felix.adrianto@gmail.com
        Regression: No

Error message: 
[Dec 5 22:08] amdgpu 0000:23:00.0: GPU fault detected: 146 0x0000480c for
process yuzu pid 2920 thread yuzu:cs0 pid 2935
[  +0.000005] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[  +0.000002] amdgpu 0000:23:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0604800C
[  +0.000003] amdgpu 0000:23:00.0: VM fault (0x0c, vmid 3, pasid 32770) at page
0, read from 'TC4' (0x54433400) (72)
[ +10.053011] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=37241, emitted seq=37244
[  +0.000007] [drm] GPU recovery disabled.


How to reproduce the issue:
1. Playing with yuzu-emulator 
2. Load Super Mario Odyssey
3. Start new game
4. When Mario is about to jump for the first time after being woken up by
Cappy, this bug must occur. 

During the issue, the following occured:
1. Graphic locked up. 
2. System can be access through SSH.

System specification:
Debian Sid
Radeon RX 580

I have tried the following combination:
1. Kernel 4.17, 4.18, 4.19, 4.20, drm-next-4.21.wip
2. Mesa 18.2, 18.3, 19.0-development branch

But none of the above combination fixes the issue. Let me know if you need more
information and more testing from me.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
@ 2018-12-11 14:57 ` bugzilla-daemon
  2018-12-11 18:18 ` bugzilla-daemon
                   ` (97 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2018-12-11 14:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Alex Deucher (alexdeucher@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexdeucher@gmail.com

--- Comment #1 from Alex Deucher (alexdeucher@gmail.com) ---
This is more likely a mesa issue than a kernel issue.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
  2018-12-11 14:57 ` [Bug 201957] " bugzilla-daemon
@ 2018-12-11 18:18 ` bugzilla-daemon
  2019-03-07  5:20 ` bugzilla-daemon
                   ` (96 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2018-12-11 18:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #2 from e88z4 (felix.adrianto@gmail.com) ---
I will try to test with amdgpu-pro sometimes this week with the kernel that I
mentioned above. If the application works as expected, it could be an issue
with mesa opengl bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
  2018-12-11 14:57 ` [Bug 201957] " bugzilla-daemon
  2018-12-11 18:18 ` bugzilla-daemon
@ 2019-03-07  5:20 ` bugzilla-daemon
  2019-03-07  5:24 ` bugzilla-daemon
                   ` (95 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-03-07  5:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

John Doe (anode.dev@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |anode.dev@gmail.com

--- Comment #3 from John Doe (anode.dev@gmail.com) ---
(In reply to Alex Deucher from comment #1)
> This is more likely a mesa issue than a kernel issue.

no, 4.14 kernel with latest mesa libs works very vell without any stucks
but from 4.20.4 and in all latest kernels (including 5.0) OS freezes and stucks
every 30s ... 1min for 30s when browsing youtube with HW acceleration
enabled(uvd) or playing a game, RX550, Arch, vanilla kernel

  365.021164] amdgpu: [powerplay] 
                last message was failed ret is 0
[  365.045198] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[  365.570667] amdgpu: [powerplay] 
                failed to send message 133 ret is 0 
[  366.115228] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=9365, emitted seq=9365
[  366.115377] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[  366.115388] [drm] Timeout, but no hardware hang detected.
[  366.689407] amdgpu: [powerplay] 
                last message was failed ret is 0
[  367.232287] amdgpu: [powerplay] 
                failed to send message 306 ret is 0 
[  367.787043] amdgpu: [powerplay] 
                last message was failed ret is 0
[  368.320138] amdgpu: [powerplay] 
                failed to send message 5e ret is 0 
[  369.367739] amdgpu: [powerplay] 
                last message was failed ret is 0
[  369.907559] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[  370.994478] amdgpu: [powerplay] 
                last message was failed ret is 0
[  371.538753] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[  372.075079] amdgpu: [powerplay] 
                last message was failed ret is 0
[  372.598565] amdgpu: [powerplay] 
                failed to send message 148 ret is 0 
[  373.657188] amdgpu: [powerplay] 
                last message was failed ret is 0
[  374.198637] amdgpu: [powerplay] 
                failed to send message 145 ret is 0 
[  375.075076] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[  375.284948] amdgpu: [powerplay] 
                last message was failed ret is 0
[  375.830347] amdgpu: [powerplay] 
                failed to send message 146 ret is 0 
[  376.138428] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=10113, emitted seq=10113
[  376.138783] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[  376.138797] [drm] IP block:sdma_v3_0 is hung!
[  376.138809] [drm] GPU recovery disabled.
[  376.394657] amdgpu: [powerplay] 
                last message was failed ret is 0
[  376.934375] amdgpu: [powerplay] 
                failed to send message 16a ret is 0 
[  377.463230] amdgpu: [powerplay] 
                last message was failed ret is 0
[  377.977725] amdgpu: [powerplay] 
                failed to send message 186 ret is 0 
[  378.518406] amdgpu: [powerplay] 
                last message was failed ret is 0
[  379.060098] amdgpu: [powerplay] 
                failed to send message 54 ret is 0 
[  379.556880] amdgpu: [powerplay] 
                last message was failed ret is 0
[  380.075217] amdgpu: [powerplay] 
                failed to send message 26b ret is 0 
[  380.605976] amdgpu: [powerplay] 
                last message was failed ret is 0
[  381.134301] amdgpu: [powerplay] 
                failed to send message 13d ret is 0 
[  381.657486] amdgpu: [powerplay] 
                last message was failed ret is 0
[  382.204551] amdgpu: [powerplay] 
                failed to send message 14f ret is 0 
[  382.741827] amdgpu: [powerplay] 
                last message was failed ret is 0
[  383.281165] amdgpu: [powerplay] 
                failed to send message 151 ret is 0 
[  383.824923] amdgpu: [powerplay] 
                last message was failed ret is 0
[  384.362266] amdgpu: [powerplay] 
                failed to send message 135 ret is 0 
[  384.903686] amdgpu: [powerplay] 
                last message was failed ret is 0
[  385.101515] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[  385.461515] amdgpu: [powerplay] 
                failed to send message 190 ret is 0 
[  386.014015] amdgpu: [powerplay] 
                last message was failed ret is 0
[  386.164818] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=10761, emitted seq=10761
[  386.164970] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[  386.164985] [drm] Timeout, but no hardware hang detected.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-03-07  5:20 ` bugzilla-daemon
@ 2019-03-07  5:24 ` bugzilla-daemon
  2019-03-12 13:15 ` bugzilla-daemon
                   ` (94 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-03-07  5:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #4 from Alex Deucher (alexdeucher@gmail.com) ---
Can you bisect?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-03-07  5:24 ` bugzilla-daemon
@ 2019-03-12 13:15 ` bugzilla-daemon
  2019-04-01 18:20 ` bugzilla-daemon
                   ` (93 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-03-12 13:15 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Cameron (kernel@cameron.bz) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kernel@cameron.bz

--- Comment #5 from Cameron (kernel@cameron.bz) ---
I'm having a very similar issue, running Linux Mint 19.1. The issue has
persisted from at least 4.15, I'm currently running 5.0.1 and the issue
remains. 

Here is the latest syslog of the error:

[37258.615599] gmc_v9_0_process_interrupt: 10 callbacks suppressed
[37258.615608] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615615] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107805000 from 27
[37258.615619] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[37258.615629] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615633] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107807000 from 27
[37258.615636] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615645] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615648] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107801000 from 27
[37258.615651] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615660] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615663] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107803000 from 27
[37258.615666] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615675] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615678] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107809000 from 27
[37258.615681] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615689] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615692] amdgpu 0000:06:00.0:   in page starting at address
0x000080010780b000 from 27
[37258.615695] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615704] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615707] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107805000 from 27
[37258.615710] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615740] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615743] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107807000 from 27
[37258.615746] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615756] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615759] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107801000 from 27
[37258.615762] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37258.615771] amdgpu 0000:06:00.0: [gfxhub] VMC page fault (src_id:0 ring:24
vmid:3 pasid:32768, for process Xorg pid 1287 thread Xorg:cs0 pid 1317)
[37258.615774] amdgpu 0000:06:00.0:   in page starting at address
0x0000800107803000 from 27
[37258.615777] amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[37268.712339] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37268.712387] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37268.712389] [drm] GPU recovery disabled.
[37278.952537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37278.952624] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37278.952628] [drm] GPU recovery disabled.
[37289.192390] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37289.192478] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37289.192481] [drm] GPU recovery disabled.
[37299.432447] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37299.432534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37299.432538] [drm] GPU recovery disabled.
[37309.676431] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37309.676518] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37309.676522] [drm] GPU recovery disabled.
[37319.912444] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37319.912536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37319.912541] [drm] GPU recovery disabled.
[37330.156619] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37330.156706] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37330.156710] [drm] GPU recovery disabled.
[37340.392424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37340.392511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37340.392515] [drm] GPU recovery disabled.
[37350.632424] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37350.632511] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37350.632514] [drm] GPU recovery disabled.
[37360.872417] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37360.872508] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37360.872511] [drm] GPU recovery disabled.
[37371.112436] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37371.112523] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37371.112527] [drm] GPU recovery disabled.
[37381.352427] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37381.352514] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37381.352517] [drm] GPU recovery disabled.
[37391.592410] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37391.592497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37391.592500] [drm] GPU recovery disabled.
[37401.836426] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37401.836513] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37401.836517] [drm] GPU recovery disabled.
[37412.072433] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37412.072520] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37412.072524] [drm] GPU recovery disabled.
[37422.312442] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37422.312528] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37422.312532] [drm] GPU recovery disabled.
[37432.552428] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37432.552515] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37432.552519] [drm] GPU recovery disabled.
[37442.792418] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37442.792506] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37442.792510] [drm] GPU recovery disabled.
[37453.032397] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37453.032483] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37453.032487] [drm] GPU recovery disabled.
[37463.272534] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37463.272621] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37463.272624] [drm] GPU recovery disabled.
[37473.512589] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37473.512676] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37473.512680] [drm] GPU recovery disabled.
[37483.752954] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37483.753041] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37483.753044] [drm] GPU recovery disabled.
[37493.992566] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=602475, emitted seq=602478
[37493.992654] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 1287 thread Xorg:cs0 pid 1317
[37493.992657] [drm] GPU recovery disabled.

During this time the laptop continues to operate (plays music and can SSH in),
however the display and any input (keyboard / mouse) do not respond. The caps
lock light for example does not toggle. The only way to recover is a force
reboot by holding the power button.

I'm unable to provide any steps on how to re-create as the issue happens at
completely random times when performing different tasks or when leaving the
machine idle. 

System specs:
Lenovo ThinkPad A485
AMD Ryzen 7 PRO 2700U with Radeon Vega Mobile Gfx
Linux Mint 19.1
Kernel 5.0.1 (installed via ukuu)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-03-12 13:15 ` bugzilla-daemon
@ 2019-04-01 18:20 ` bugzilla-daemon
  2019-04-01 18:44 ` bugzilla-daemon
                   ` (92 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-04-01 18:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #6 from Dev Bazilio (anode.dev@gmail.com) ---
tried linux-amd-staging-drm-next-git-5.1.811103.2acb851ad43b and dmes is still
has a lot of warnings. Tested also youtube in chrome with UVD, got a minor
freeze and long freeze ~30sec of system

Apr 01 21:01:03 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 21:01:03 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <uvd_v6_0> failed -110
Apr 01 21:01:03 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-110).


Apr 01 20:26:59 kernel: [drm] amdgpu kernel modesetting enabled.
Apr 01 20:26:59 kernel: vga_switcheroo: detected switching method
\_SB_.PCI0.VGA_.ATPX handle
Apr 01 20:26:59 kernel: [drm] initializing kernel modesetting (CARRIZO
0x1002:0x9874 0x1025:0x1201 0xCA).
Apr 01 20:26:59 kernel: [drm] register mmio base: 0xD1500000
Apr 01 20:26:59 kernel: [drm] register mmio size: 262144
Apr 01 20:26:59 kernel: [drm] add ip block number 0 <vi_common>
Apr 01 20:26:59 kernel: [drm] add ip block number 1 <gmc_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 2 <cz_ih>
Apr 01 20:26:59 kernel: [drm] add ip block number 3 <gfx_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 4 <sdma_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 5 <powerplay>
Apr 01 20:26:59 kernel: [drm] add ip block number 6 <dm>
Apr 01 20:26:59 kernel: [drm] add ip block number 7 <uvd_v6_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 8 <vce_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 9 <acp_ip>
Apr 01 20:26:59 kernel: [drm] UVD is enabled in physical mode
Apr 01 20:26:59 kernel: [drm] VCE enabled in physical mode
Apr 01 20:26:59 kernel: ATOM BIOS: 113-C91400-007
Apr 01 20:26:59 kernel: [drm] RAS INFO: ras initialized successfully, hardware
ability[0] ras_mask[0]
Apr 01 20:26:59 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
fragment size is 9-bit
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: VRAM: 512M 0x000000F400000000 -
0x000000F41FFFFFFF (512M used)
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: GART: 1024M 0x000000FF00000000 -
0x000000FF3FFFFFFF
Apr 01 20:26:59 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Apr 01 20:26:59 kernel: [drm] RAM width 64bits UNKNOWN
Apr 01 20:26:59 kernel: [TTM] Zone  kernel: Available graphics memory: 3804974
KiB
Apr 01 20:26:59 kernel: [TTM] Zone   dma32: Available graphics memory: 2097152
KiB
Apr 01 20:26:59 kernel: [TTM] Initializing pool allocator
Apr 01 20:26:59 kernel: [TTM] Initializing DMA pool allocator
Apr 01 20:26:59 kernel: [drm] amdgpu: 512M of VRAM memory ready
Apr 01 20:26:59 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Apr 01 20:26:59 kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Apr 01 20:26:59 kernel: [drm] PCIE GART of 1024M enabled (table at
0x000000F4007E9000).
Apr 01 20:26:59 kernel: [drm] Found UVD firmware Version: 1.91 Family ID: 11
Apr 01 20:26:59 kernel: [drm] UVD ENC is disabled
Apr 01 20:26:59 kernel: [drm] Found VCE firmware Version: 52.4 Binary ID: 3
Apr 01 20:26:59 kernel: smu version 27.17.00
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Engine clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         480000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         533340
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         576000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         626090
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         685720
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         720000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         757900
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Display clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         400000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         496560
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         626090
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         685720
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         757900
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         800000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         847060
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Memory clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         667000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         933000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 75790
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 93300
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm:construct [amdgpu]] *ERROR* construct: Invalid
Connector ObjectID from Adapter Service for connector index:2! type 0 expected
3
Apr 01 20:26:59 kernel: [drm] Display Core initialized with v3.2.24!
Apr 01 20:26:59 kernel: [drm] SADs count is: -2, don't need to read it
Apr 01 20:26:59 kernel: [drm] Supports vblank timestamp caching Rev 2
(21.10.2013).
Apr 01 20:26:59 kernel: [drm] Driver supports precise vblank timestamp query.

Apr 01 20:26:59 kernel: [drm] UVD initialized successfully.
Apr 01 20:26:59 kernel: [drm] VCE initialized successfully.
Apr 01 20:26:59 kernel: kfd kfd: Allocated 3969056 bytes on gart
Apr 01 20:26:59 kernel: Topology: Add APU node [0x9874:0x1002]
Apr 01 20:26:59 kernel: kfd kfd: added device 1002:9874
Apr 01 20:26:59 kernel: [drm] fb mappable at 0x21FDCD000
Apr 01 20:26:59 kernel: [drm] vram apper at 0x21F000000
Apr 01 20:26:59 kernel: [drm] size 8294400
Apr 01 20:26:59 kernel: [drm] fb depth is 24
Apr 01 20:26:59 kernel: [drm]    pitch is 7680
Apr 01 20:26:59 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Apr 01 20:26:59 kernel: Console: switching to colour frame buffer device 240x67
Apr 01 20:26:59 kernel: amdgpu 0000:00:01.0: fb0: amdgpudrmfb frame buffer
device
Apr 01 20:26:59 kernel: [drm] Initialized amdgpu 3.31.0 20150101 for
0000:00:01.0 on minor 0
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: enabling device (0002 -> 0003)
Apr 01 20:26:59 kernel: [drm] initializing kernel modesetting (POLARIS12
0x1002:0x699F 0x1025:0x1210 0xC3).
Apr 01 20:26:59 kernel: [drm] register mmio base: 0xD1200000
Apr 01 20:26:59 kernel: [drm] register mmio size: 262144
Apr 01 20:26:59 kernel: [drm] add ip block number 0 <vi_common>
Apr 01 20:26:59 kernel: [drm] add ip block number 1 <gmc_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 2 <tonga_ih>
Apr 01 20:26:59 kernel: [drm] add ip block number 3 <gfx_v8_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 4 <sdma_v3_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 5 <powerplay>
Apr 01 20:26:59 kernel: [drm] add ip block number 6 <dm>
Apr 01 20:26:59 kernel: [drm] add ip block number 7 <uvd_v6_0>
Apr 01 20:26:59 kernel: [drm] add ip block number 8 <vce_v3_0>
Apr 01 20:26:59 kernel: kfd kfd: skipped device 1002:699f, PCI rejects atomics
Apr 01 20:26:59 kernel: [drm] UVD is enabled in VM mode
Apr 01 20:26:59 kernel: [drm] UVD ENC is enabled in VM mode
Apr 01 20:26:59 kernel: [drm] VCE enabled in VM mode
Apr 01 20:26:59 kernel: vga_switcheroo: enabled
Apr 01 20:26:59 kernel: ATOM BIOS: SWBRT23054.001
Apr 01 20:26:59 kernel: [drm] GPU posting now...
Apr 01 20:26:59 kernel: [drm] RAS INFO: ras initialized successfully, hardware
ability[0] ras_mask[0]
Apr 01 20:26:59 kernel: [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
fragment size is 9-bit
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: VRAM: 2048M 0x000000F400000000 -
0x000000F47FFFFFFF (2048M used)
Apr 01 20:26:59 kernel: amdgpu 0000:03:00.0: GART: 256M 0x000000FF00000000 -
0x000000FF0FFFFFFF
Apr 01 20:26:59 kernel: [drm] Detected VRAM RAM=2048M, BAR=256M
Apr 01 20:26:59 kernel: [drm] RAM width 128bits GDDR5
Apr 01 20:26:59 kernel: [drm] amdgpu: 2048M of VRAM memory ready
Apr 01 20:26:59 kernel: [drm] amdgpu: 3072M of GTT memory ready.
Apr 01 20:26:59 kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
Apr 01 20:26:59 kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400000000).
Apr 01 20:26:59 kernel: [drm] Chained IB support enabled!
Apr 01 20:26:59 kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
Apr 01 20:26:59 kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] Voltage value looks like a Leakage
ID but it's not patched 
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Engine clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         214000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         551000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         734000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         921000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         980000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         1046000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 104600
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 125000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: values for Memory clock
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         300000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         625000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:         1250000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB: Validation clocks:
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    engine_max_clock: 104600
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    memory_max_clock: 125000
Apr 01 20:26:59 kernel: [drm] DM_PPLIB:    level           : 8
Apr 01 20:26:59 kernel: [drm] Display Core initialized with v3.2.24!
Apr 01 20:26:59 kernel: [drm] Supports vblank timestamp caching Rev 2
(21.10.2013).
Apr 01 20:26:59 kernel: [drm] Driver supports precise vblank timestamp query.
Apr 01 20:26:59 kernel: [drm] UVD and UVD ENC initialized successfully.
Apr 01 20:26:59 kernel: [drm] VCE initialized successfully.
Apr 01 20:26:59 kernel: [drm] Initialized amdgpu 3.31.0 20150101 for
0000:03:00.0 on minor 1
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15b ret is 0 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15a ret is 0 
Apr 01 20:26:59 kernel: [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]]
*ERROR* ib ring test failed (-110).
Apr 01 20:26:59 kernel: EXT4-fs (sda3): mounted filesystem with ordered data
mode. Opts: (null)
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 155 ret is 0 
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:26:59 kernel: amdgpu: [powerplay] 
                                failed to send message 15b ret is 0
Apr 01 20:27:48 kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400000000).
Apr 01 20:27:48 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:27:49 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:27:49 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <uvd_v6_0> failed -110
Apr 01 20:27:49 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-110).
Apr 01 20:27:50 kernel: amdgpu: [powerplay]
Apr 01 20:28:30 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 0 
Apr 01 20:28:31 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:31 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:32 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:32 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:33 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:33 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:34 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:34 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:35 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:35 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:36 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:36 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:37 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:37 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:38 kernel: [drm] Fence fallback timer expired on ring sdma0
Apr 01 20:28:39 kernel: amdgpu: [powerplay] 

Apr 01 20:29:12 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:29:13 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:29:13 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <uvd_v6_0> failed -110
Apr 01 20:29:13 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-110).

Apr 01 20:30:06 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:06 kernel: amdgpu: [powerplay] 
                                failed to send message 135 ret is 0 
Apr 01 20:30:07 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:07 kernel: amdgpu: [powerplay] 
                                failed to send message 190 ret is 0 
Apr 01 20:30:08 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:08 kernel: amdgpu: [powerplay] 
                                failed to send message 63 ret is 0 
Apr 01 20:30:09 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 0
Apr 01 20:30:09 kernel: amdgpu: [powerplay] 
                                failed to send message 84 ret is 0 
Apr 01 20:30:09 kernel: amdgpu 0000:03:00.0: GPU pci config reset
Apr 01 20:34:17 kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F400000000).
Apr 01 20:34:18 kernel: amdgpu: [powerplay] 
                                failed to send message 154 ret is 0 
Apr 01 20:34:18 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring uvd_enc0 test failed (-110)
Apr 01 20:34:18 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR*
resume of IP block <uvd_v6_0> failed -110
Apr 01 20:34:18 kernel: [drm:amdgpu_device_resume [amdgpu]] *ERROR*
amdgpu_device_ip_resume failed (-110).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-04-01 18:20 ` bugzilla-daemon
@ 2019-04-01 18:44 ` bugzilla-daemon
  2019-08-20 15:06 ` bugzilla-daemon
                   ` (91 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-04-01 18:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #7 from Dev Bazilio (anode.dev@gmail.com) ---
(In reply to Alex Deucher from comment #4)
> Can you bisect?

Unfortunately this is not possible as all latest kernels are now shipped with
Display Core enabled by default and as I told 4.14 vanilla kernel works like a
charm on same HW and with same mesa libs - no lags, no stucks or freezes and no
warnings like listed above. So it's no sense to do "git bisect" as it's not a
single commit which works incorrectly with GPU. DC - this a completely new
functionality which replaces old amdgpu code

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-04-01 18:44 ` bugzilla-daemon
@ 2019-08-20 15:06 ` bugzilla-daemon
  2019-09-11  8:36 ` bugzilla-daemon
                   ` (90 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-08-20 15:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

jens harms (au1064@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |au1064@gmail.com

--- Comment #8 from jens harms (au1064@gmail.com) ---
Hi, i have a very similar problem. My system is working with 4.15 and with
5.1.16 but not with other 5.x kernels:

The System does not boot with 5.x kernels. With 5.1.16 the gui system freezes
sometimes but sshd and mouse is still working. 


CPU: Ryzen 5 2400g, BOARD: AORUS B450 I PRO WIFI, X Server 1.19.6

Kernel 5.0.x not working (blank screen after boot)
Kernel 5.2.x ( x <= 9 ) is not working (blank screen after boot)

but Kernel 5.1.16 is working (mostly)!


Error LOG with 5.1.16:
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0: [gfxhub] no-retry page fault
(src_id:0 ring:24 vmid:3 pasid:32768, for process Xorg pid 1848 thread Xorg:cs0
pid 1849)
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0:   in page starting at address
0x000080010c205000 from 27
[Mi Aug 14 14:22:21 2019] amdgpu 0000:09:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=840738, emitted seq=840740
[Mi Aug 14 14:22:31 2019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process Xorg pid 1848 thread Xorg:cs0 pid 1849
[Mi Aug 14 14:22:31 2019] [drm] GPU recovery disabled.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-08-20 15:06 ` bugzilla-daemon
@ 2019-09-11  8:36 ` bugzilla-daemon
  2019-09-20 11:37 ` bugzilla-daemon
                   ` (89 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-09-11  8:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Ungureanu Alexandru (ungu_93@yahoo.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ungu_93@yahoo.com

--- Comment #9 from Ungureanu Alexandru (ungu_93@yahoo.com) ---
Just got something similar while playing Left 4 Dead. The system simply froze
with altered colors on the screen and the sound just looping over the last
second or so. Cannot confirm SSH access.


journalctl -b -1 ends with


[drm:gfx_v8_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command
stream
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
seq=2225992, emitted seq=2225993
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process
hl2_linux pid 12532 thread hl2_


OS: Ubuntu 19.04 on 
Kernel: 5.0.0-27-generic
GPU: Radeon RX580
CPU: Ryzen 5 1600x

Thanks!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-09-11  8:36 ` bugzilla-daemon
@ 2019-09-20 11:37 ` bugzilla-daemon
  2019-10-02 10:39 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-09-20 11:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #10 from Dev Bazilio (anode.dev@gmail.com) ---
(In reply to Ungureanu Alexandru from comment #9)
> Just got something similar while playing Left 4 Dead. The system simply
> froze with altered colors on the screen and the sound just looping over the
> last second or so. Cannot confirm SSH access.

> Kernel: 5.0.0-27-generic
> GPU: Radeon RX580
> CPU: Ryzen 5 1600x

5.0 is very outdated kernel, use latest from kernel.org

as for me all works perfectly in 5.3 (Chip polaris RX540)
finally I have no more any errors like these ones:
- ERROR* resume of IP block <uvd_v6_0> failed -110
- [drm] Fence fallback timer expired on ring sdma0
- last message was failed ret is **
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq...
- IP block:sdma_v3_0 is hung!
- Timeout, but no hardware hang detected.

Tested on youtube with HW accelerated video and in several games
Thank you guys from AMD a lot, I had to wait 1y+ to get these bugs fixed

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-09-20 11:37 ` bugzilla-daemon
@ 2019-10-02 10:39 ` bugzilla-daemon
  2019-10-11 22:00 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-02 10:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

lekto (lekto@o2.pl) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lekto@o2.pl

--- Comment #11 from lekto (lekto@o2.pl) ---
Same problem here. It happens when I run looking-glass [1], but not everytime.
I tied downgrading my kernel from 5.3.1 to 5.2.11 (I'm pretty sure it worked
then), downgrading mesa from 19.2.0 to 19.1.7 (I'm sure it worked with
19.2.0-rc) and downgrading my firmware to 2019-09-23 (oldest in repo).

When it happens looking glass starts blinking and sometimes my other monitor
stuck that I can only move cursor on it.

Spec:
Gentoo ~amd64
Ryzen 1600 (other have Ryzen too, coincidence?)
Linux GPU: R7 240 (with radeon driver)
Windows GPU: RX580
ASRock X370 Gaming X


[1] https://looking-glass.hostfission.com/

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-10-02 10:39 ` bugzilla-daemon
@ 2019-10-11 22:00 ` bugzilla-daemon
  2019-10-14 17:18 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-11 22:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Matthias Heinz (mh@familie-heinz.name) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mh@familie-heinz.name

--- Comment #12 from Matthias Heinz (mh@familie-heinz.name) ---
Hi,

I think I have the same bug and opened
https://bugzilla.kernel.org/show_bug.cgi?id=204683.

At first it looked a bit different, because in newer kernels the error message
has changed. But as you can see I did some testing and this seems to go way
back. Sadly I couldn't test a 4.18 kernel.

Can somebody mark my report as duplicate? Because I think it is.

And Would some more debug info help?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-10-11 22:00 ` bugzilla-daemon
@ 2019-10-14 17:18 ` bugzilla-daemon
  2019-10-24 16:39 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-14 17:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #13 from Matthias Heinz (mh@familie-heinz.name) ---
*** Bug 204683 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (12 preceding siblings ...)
  2019-10-14 17:18 ` bugzilla-daemon
@ 2019-10-24 16:39 ` bugzilla-daemon
  2019-10-24 16:40 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-24 16:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Konstantin Pereiaslov (perk11@perk11.info) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |perk11@perk11.info

--- Comment #14 from Konstantin Pereiaslov (perk11@perk11.info) ---
Also experiencing this with Radeon RX 5700 XT and amdgpu 
19.1.0+git1910111930.b467d2~oibaf~b 

Didn't have any heavy load for the GPU to do. 

First I had some artifacts appeared on Plasma Hard Disk Monitor widget and CPU
Load Widget (here is a screenshot:
https://i.perk11.info/20191024_193152_kernel.png) while PC was idle and screen
was locked, but everything else continued to work fine. 

I checked the logs for the period when this could've happened, but the only
logs from that period are from KScreen that start like this:

Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:
RRNotify_OutputProperty (ignored)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Property:  EDID
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
State (newValue, Deleted):  1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:
RRNotify_OutputProperty (ignored)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Property:  EDID
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
State (newValue, Deleted):  1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:
RRNotify_OutputChange
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
CRTC:  81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Mode:  97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Connection:  "Disconnected"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Subpixel Order:  0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:
RRScreenChangeNotify
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Window: 18874373
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Root: 1744
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Size ID: 65535
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Size:  7280 1440
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
SizeMM:  1926 381
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:
RRNotify_OutputChange
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Output:  88
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
CRTC:  81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Mode:  97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Rotation:  "Rotate_0"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Connection:  "Disconnected"
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xcb.helper:        
Subpixel Order:  0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr:
XRandROutput 88 update
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_connected: 0
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_crtc
XRandRCrtc(0x5655577da9f0)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          CRTC: 81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          MODE: 97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Connection: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Primary: false
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr: Output 88 :
connected = false , enabled = true
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]: kscreen.xrandr:
XRandROutput 88 update
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_connected: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          m_crtc
XRandRCrtc(0x5655577da9f0)
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          CRTC: 81
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          MODE: 97
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Connection: 1
Oct 24 16:34:58 perk11-home org.kde.KScreen[25804]:          Primary: false



90 minutes later, the system became unresponsive while I was typing a message
in Skype, but the audio I had playing in Audacity continued to play and the
cron jobs continued running normally for a few minutes while I was trying to
get the system unstuck without rebooting it which I couldn't.

Here are the errors:

Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!




Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:10 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_dm_commit_planes.constprop.0
[amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring sdma0 timeout, signaled seq=3485981, emitted seq=3485983
Oct 24 19:04:15 perk11-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process Xorg pid 2469 thread Xorg:cs0 pid 2491
Oct 24 19:04:15 perk11-home kernel: [drm] GPU recovery disabled.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (13 preceding siblings ...)
  2019-10-24 16:39 ` bugzilla-daemon
@ 2019-10-24 16:40 ` bugzilla-daemon
  2019-10-27 18:44 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-24 16:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #15 from Konstantin Pereiaslov (perk11@perk11.info) ---
My kernel version is 5.3.7-050307-generic running KDE Neon User edition with
latest updates.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (14 preceding siblings ...)
  2019-10-24 16:40 ` bugzilla-daemon
@ 2019-10-27 18:44 ` bugzilla-daemon
  2019-11-10  7:11 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-10-27 18:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #16 from shallowaloe@gmail.com ---
Created attachment 285665
  --> https://bugzilla.kernel.org/attachment.cgi?id=285665&action=edit
5 second video clip that triggers a crash

Hi,

I think I'm having the same problem as you guys.  I run a mythbackend where I
record cable television and those recordings often crash my system when
hardware decoding is enabled.  Usually it's just the screen that freezes and I
can still ssh to it.  

Kernel 5.1.6 was an exception for me too, with that kernel I'm able to restart
the display manager and recover without having to reboot.

Attached is a short video that crashes my system.  I can trigger the alert by
running:

mpv --vo=vaapi out.ts

I'm wondering if it crashes your systems too and if it's related.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (15 preceding siblings ...)
  2019-10-27 18:44 ` bugzilla-daemon
@ 2019-11-10  7:11 ` bugzilla-daemon
  2019-11-25  9:43 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-11-10  7:11 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

jmstylr@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jmstylr@gmail.com

--- Comment #17 from jmstylr@gmail.com ---
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
> 
> Hi,
> 
> I think I'm having the same problem as you guys.  I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled.  Usually it's just the screen that freezes and
> I can still ssh to it.  
> 
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
> 
> Attached is a short video that crashes my system.  I can trigger the alert
> by running:
> 
> mpv --vo=vaapi out.ts
> 
> I'm wondering if it crashes your systems too and if it's related.


Just to add a data point, I tried running `mpv --vo=vaapi out.ts` against your
file, and while it crashed the application, it did not freeze the system. 

My hardware is a Ryzen 3700X with a Radeon RX 5700, running Ubuntu 19.10 with
default kernel (5.3.0-19-generic).

The command did result in the following lines in /var/log/syslog repeated every
5 seconds:

Nov 10 07:04:23 redacted kernel: [ 2266.802162] gmc_v10_0_process_interrupt:
23900 callbacks suppressed
Nov 10 07:04:23 redacted kernel: [ 2266.802166] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802170] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802171] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802176] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802178] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802179] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.802566] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802568] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802569] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802573] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802575] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802576] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.802984] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802985] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802987] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.802993] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.802994] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.802995] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.803403] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803404] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803406] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.803410] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803411] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803412] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 10 07:04:23 redacted kernel: [ 2266.803822] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803824] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803825] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0000213D
Nov 10 07:04:23 redacted kernel: [ 2266.803831] amdgpu 0000:0b:00.0: [mmhub]
VMC page fault (src_id:0 ring:158 vmid:0 pasid:0)
Nov 10 07:04:23 redacted kernel: [ 2266.803833] amdgpu 0000:0b:00.0:   at page
0x0000000000000000 from 18
Nov 10 07:04:23 redacted kernel: [ 2266.803834] amdgpu 0000:0b:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (16 preceding siblings ...)
  2019-11-10  7:11 ` bugzilla-daemon
@ 2019-11-25  9:43 ` bugzilla-daemon
  2019-12-03 15:53 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-11-25  9:43 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #18 from Matthias Heinz (mh@familie-heinz.name) ---
Hi,

I recently built a 5.4.0-rc7 from drm-next (my HEAD was
17eee668b3cad423a47c090fe2275733c55db910) and also updated Mesa to 19.3.0-RC1.

Since then I didn't get any crashes. I have tested this for a few hours now,
but it's entirely possible that I just didn't run into the bug for some reason,
although it usually appeared after half an hour.

If possible please try this setup and see if it is fixed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (17 preceding siblings ...)
  2019-11-25  9:43 ` bugzilla-daemon
@ 2019-12-03 15:53 ` bugzilla-daemon
  2019-12-03 16:07 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-12-03 15:53 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

j.cordoba (j.cordoba@gmx.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |j.cordoba@gmx.net

--- Comment #19 from j.cordoba (j.cordoba@gmx.net) ---
Hi,

This issue is still present in the latest kernels:

5.4.1, 5.4, 5.3.14

Last usable kernel for me is 4.20.17

System Specs

- Gigabyte b450-ds3h
- Ryzen 5 3400G (with RX Vega 11)
- Mesa 19.1.2 - padoka PPA (Stable)
- Ubuntu 18.04.3 LTS

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-12-03 15:53 ` bugzilla-daemon
@ 2019-12-03 16:07 ` bugzilla-daemon
  2019-12-03 21:34 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-12-03 16:07 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #20 from Matthias Heinz (mh@familie-heinz.name) ---
Dear j.cordoba,

is it possible that you try to build 5.4.0-rc7 from drm-next and give it a test
as I mentioned in Comment 18?

I'm running on this for some time now and the bug should have appeared by now,
so I'm getting more confident that it is fixed.

Best regards
Matthias

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (19 preceding siblings ...)
  2019-12-03 16:07 ` bugzilla-daemon
@ 2019-12-03 21:34 ` bugzilla-daemon
  2019-12-04  9:54 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-12-03 21:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Łukasz Żarnowiecki (lukasz@zarnowiecki.pl) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lukasz@zarnowiecki.pl

--- Comment #21 from Łukasz Żarnowiecki (lukasz@zarnowiecki.pl) ---
Same is happening to me on 5.4.1.  No issue with 4.9.

[   44.172714] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[   49.292694] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[   58.469316] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[   63.586055] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[  156.606591] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (20 preceding siblings ...)
  2019-12-03 21:34 ` bugzilla-daemon
@ 2019-12-04  9:54 ` bugzilla-daemon
  2019-12-08 17:32 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-12-04  9:54 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pierre-eric.pelloux-prayer@
                   |                            |amd.com

--- Comment #22 from Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) ---
(In reply to shallowaloe from comment #16)
> Created attachment 285665 [details]
> 5 second video clip that triggers a crash
> 
> Hi,
> 
> I think I'm having the same problem as you guys.  I run a mythbackend where
> I record cable television and those recordings often crash my system when
> hardware decoding is enabled.  Usually it's just the screen that freezes and
> I can still ssh to it.  
> 
> Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> restart the display manager and recover without having to reboot.
> 
> Attached is a short video that crashes my system.  I can trigger the alert
> by running:
> 
> mpv --vo=vaapi out.ts
> 
> I'm wondering if it crashes your systems too and if it's related.


This one is probably a Mesa issue, see
https://gitlab.freedesktop.org/mesa/mesa/issues/2177

What Mesa version are you using?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (21 preceding siblings ...)
  2019-12-04  9:54 ` bugzilla-daemon
@ 2019-12-08 17:32 ` bugzilla-daemon
  2020-01-02  8:30 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2019-12-08 17:32 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #23 from shallowaloe@gmail.com ---
Thanks for the link to the bug. I'm running an ubuntu based system and am
using the oibaf ppa.  The current version is 20.0.



On Wed, Dec 4, 2019 at 1:54 AM <bugzilla-daemon@bugzilla.kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=201957
>
> Pierre-Eric Pelloux-Prayer (pierre-eric.pelloux-prayer@amd.com) changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|
> |pierre-eric.pelloux-prayer@
>                    |                            |amd.com
>
> --- Comment #22 from Pierre-Eric Pelloux-Prayer (
> pierre-eric.pelloux-prayer@amd.com) ---
> (In reply to shallowaloe from comment #16)
> > Created attachment 285665 [details]
> > 5 second video clip that triggers a crash
> >
> > Hi,
> >
> > I think I'm having the same problem as you guys.  I run a mythbackend
> where
> > I record cable television and those recordings often crash my system when
> > hardware decoding is enabled.  Usually it's just the screen that freezes
> and
> > I can still ssh to it.
> >
> > Kernel 5.1.6 was an exception for me too, with that kernel I'm able to
> > restart the display manager and recover without having to reboot.
> >
> > Attached is a short video that crashes my system.  I can trigger the
> alert
> > by running:
> >
> > mpv --vo=vaapi out.ts
> >
> > I'm wondering if it crashes your systems too and if it's related.
>
>
> This one is probably a Mesa issue, see
> https://gitlab.freedesktop.org/mesa/mesa/issues/2177
>
> What Mesa version are you using?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (22 preceding siblings ...)
  2019-12-08 17:32 ` bugzilla-daemon
@ 2020-01-02  8:30 ` bugzilla-daemon
  2020-01-02  9:11 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-02  8:30 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Janpieter Sollie (janpieter.sollie@dommel.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |janpieter.sollie@dommel.be

--- Comment #24 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Hi everyone,

I have the same issue with a Fiji Nano GPU: UVD6 and VCE3 timeout in ring
buffer test @ boot with the AMDGPU driver. Other rings seem to work correctly.
To make sure the hardware functions like it should, and it's not a HW error,
where (in the amdgpu driver) can I increase the timeout value?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (23 preceding siblings ...)
  2020-01-02  8:30 ` bugzilla-daemon
@ 2020-01-02  9:11 ` bugzilla-daemon
  2020-01-19 17:03 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-02  9:11 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #25 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Created attachment 286575
  --> https://bugzilla.kernel.org/attachment.cgi?id=286575&action=edit
kernel config 5.4.7 Fiji

Some additional info for my case:
- Running kernel 5.4.7 (vanilla), firmware 20191108 on gentoo
- Dmesg | grep -E "(drm)|(amdgpu)":
[    3.930023] [drm] amdgpu kernel modesetting enabled.
[    3.930217] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 0:
0xe0000000 -> 0xefffffff
[    3.930219] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 2:
0xf0000000 -> 0xf01fffff
[    3.930221] amdgpu 0000:0a:00.0: remove_conflicting_pci_framebuffers: bar 5:
0xfce00000 -> 0xfce3ffff
[    3.930224] fb0: switching to amdgpudrmfb from EFI VGA
[    3.930475] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300
0x1002:0x0B36 0xCA).
[    3.930486] [drm] register mmio base: 0xFCE00000
[    3.930486] [drm] register mmio size: 262144
[    3.930495] [drm] add ip block number 0 <vi_common>
[    3.930495] [drm] add ip block number 1 <gmc_v8_0>
[    3.930496] [drm] add ip block number 2 <tonga_ih>
[    3.930497] [drm] add ip block number 3 <gfx_v8_0>
[    3.930498] [drm] add ip block number 4 <sdma_v3_0>
[    3.930498] [drm] add ip block number 5 <powerplay>
[    3.930499] [drm] add ip block number 6 <dm>
[    3.930500] [drm] add ip block number 7 <uvd_v6_0>
[    3.930500] [drm] add ip block number 8 <vce_v3_0>
[    3.930715] [drm] UVD is enabled in physical mode
[    3.930715] [drm] VCE enabled in physical mode
[    3.930743] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment
size is 9-bit
[    3.930751] amdgpu 0000:0a:00.0: VRAM: 4096M 0x000000F400000000 -
0x000000F4FFFFFFFF (4096M used)
[    3.930753] amdgpu 0000:0a:00.0: GART: 1024M 0x000000FF00000000 -
0x000000FF3FFFFFFF
[    3.930758] [drm] Detected VRAM RAM=4096M, BAR=256M
[    3.930759] [drm] RAM width 512bits HBM
[    3.930838] [drm] amdgpu: 4096M of VRAM memory ready
[    3.930841] [drm] amdgpu: 4096M of GTT memory ready.
[    3.930860] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    3.930928] [drm] PCIE GART of 1024M enabled (table at 0x000000F4001D5000).
[    3.934174] [drm] Chained IB support enabled!
[    3.940198] amdgpu: [powerplay] hwmgr_sw_init smu backed is fiji_smu
[    3.941748] [drm] Found UVD firmware Version: 1.91 Family ID: 12
[    3.941752] [drm] UVD ENC is disabled
[    3.943542] [drm] Found VCE firmware Version: 55.2 Binary ID: 3
[    4.009146] [drm] dce110_link_encoder_construct: Failed to get
encoder_cap_info from VBIOS with error code 4!
[    4.040084] [drm] Display Core initialized with v3.2.48!
[    4.040542] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    4.040543] [drm] Driver supports precise vblank timestamp query.
[    4.067774] [drm] UVD initialized successfully.
[    4.168780] [drm] VCE initialized successfully.
[    4.170163] [drm] Cannot find any crtc or sizes
[    4.171948] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:0a:00.0 on
minor 0
[    7.280062] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on uvd (-110).
[    8.400365] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on vce0 (-110).
[    8.400370] [drm:process_one_work] *ERROR* ib ring test failed (-110).

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (24 preceding siblings ...)
  2020-01-02  9:11 ` bugzilla-daemon
@ 2020-01-19 17:03 ` bugzilla-daemon
  2020-01-19 17:04 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-19 17:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

F. Delente (delentef@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |delentef@gmail.com

--- Comment #26 from F. Delente (delentef@gmail.com) ---
Hello, I have the same problem on a Huawei Matebook D lapop, processor is an
AMD Ryzen 5 with an integrated Radeon Vega Mobile GPU.

I use Fedora 31. The problem appeared when upgrading from then 5.3.16 kernel to
the 5.4.6 kernel. Reverting to 5.3.16 solved the issue.

At some moments the UI (XFCE) freezes for about 5 seconds; I can move the mouse
cursor but I can't get any keyboard input (not in X, not by switching console).
Each time the freeze occurs dmesg shows the messages

[   45.530374] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[   50.139408] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered

I include /proc/cpuinfo and lspci outputs.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (25 preceding siblings ...)
  2020-01-19 17:03 ` bugzilla-daemon
@ 2020-01-19 17:04 ` bugzilla-daemon
  2020-01-19 17:04 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-19 17:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #27 from F. Delente (delentef@gmail.com) ---
Created attachment 286899
  --> https://bugzilla.kernel.org/attachment.cgi?id=286899&action=edit
/proc/cpuinfo

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (26 preceding siblings ...)
  2020-01-19 17:04 ` bugzilla-daemon
@ 2020-01-19 17:04 ` bugzilla-daemon
  2020-01-19 17:13 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-19 17:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #28 from F. Delente (delentef@gmail.com) ---
Created attachment 286901
  --> https://bugzilla.kernel.org/attachment.cgi?id=286901&action=edit
lspci output

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (27 preceding siblings ...)
  2020-01-19 17:04 ` bugzilla-daemon
@ 2020-01-19 17:13 ` bugzilla-daemon
  2020-04-04 21:54 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-01-19 17:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #29 from Matthias Heinz (mh@familie-heinz.name) ---
Hi. This bug is already reported here by me
https://gitlab.freedesktop.org/drm/amd/issues/953

If possible try a 5.5-rc kernel and see if it's fixed there. It's fixed - at
least for me - in the drm-tree.

Best regards
Matthias

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (28 preceding siblings ...)
  2020-01-19 17:13 ` bugzilla-daemon
@ 2020-04-04 21:54 ` bugzilla-daemon
  2020-05-01  9:03 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-04-04 21:54 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Steven Ellis (sellis@redhat.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sellis@redhat.com

--- Comment #30 from Steven Ellis (sellis@redhat.com) ---
I"m seeing the same issue on Ubuntu 18.04 with

Upstream PPA "sudo add-apt-repository ppa:oibaf/graphics-drivers"

[  321.412530] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out or interrupted!
[  326.286306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=4447, emitted seq=4449
[  326.286395] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process mythfrontend.re pid 2410 thread mythfronte:cs0 pid 2880


AMDGPUPRO driver 19.50-967956

[20913.330563] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[20918.450513] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[20923.570306] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[20928.690699] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (29 preceding siblings ...)
  2020-04-04 21:54 ` bugzilla-daemon
@ 2020-05-01  9:03 ` bugzilla-daemon
  2020-05-01 19:52 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-05-01  9:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #31 from Matthias Heinz (mh@familie-heinz.name) ---
Hi,

for me this bug is fixed with a 5.5 kernel. And I'm wondering if this is fixed
for all of you, too.

Best
Matthias

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (30 preceding siblings ...)
  2020-05-01  9:03 ` bugzilla-daemon
@ 2020-05-01 19:52 ` bugzilla-daemon
  2020-05-25 12:21 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-05-01 19:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #32 from j.cordoba (j.cordoba@gmx.net) ---
I agree. Fixed for me too

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (31 preceding siblings ...)
  2020-05-01 19:52 ` bugzilla-daemon
@ 2020-05-25 12:21 ` bugzilla-daemon
  2020-06-19 19:11 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-05-25 12:21 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

udo (udovdh@xs4all.nl) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |udovdh@xs4all.nl

--- Comment #33 from udo (udovdh@xs4all.nl) ---
I still see them on 5.6.13:

[191571.372560] sd 11:0:0:0: [sde] Synchronize Cache(10) failed: Result:
hostbyte=0x01 driverbyte=0x00
[205796.424607] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=4518280, emitted seq=4518282
[205796.424637] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process mpv pid 488243 thread mpv:cs0 pid 488257
[205796.424640] amdgpu 0000:0a:00.0: GPU reset begin!
[205800.840504] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[205800.937565] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205800.938060] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205800.938849] [drm] PSP is resuming...
[205800.958729] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205800.972414] [drm] psp command (0x5) failed and response status is
(0xFFFF0007)
[205801.176411] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205801.460775] [drm] kiq ring mec 2 pipe 1 q 0
[205801.460986] amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0002 address=0x800002300 flags=0x0000]
[205801.516698] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[205801.516709] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205801.516713] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205801.516717] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205801.516720] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205801.516724] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205801.516727] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205801.516730] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205801.516733] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205801.516736] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub
0
[205801.516740] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205801.516743] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205801.516746] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205801.516749] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205801.516752] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205801.516755] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205801.525996] [drm] recover vram bo from shadow start
[205801.525998] [drm] recover vram bo from shadow done
[205801.526008] [drm] Skip scheduling IBs!
[205801.526051] amdgpu 0000:0a:00.0: GPU reset(1) succeeded!
[205802.536444] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=4518342, emitted seq=4518344
[205802.536523] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process gnome-shell pid 3825 thread gnome-shel:cs0 pid 3834
[205802.536531] amdgpu 0000:0a:00.0: GPU reset begin!
[205806.728558] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[205806.821326] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205806.821578] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205806.821899] [drm] PSP is resuming...
[205806.841769] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205806.856213] [drm] psp command (0x5) failed and response status is
(0xFFFF0007)
[205807.072210] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205807.355997] [drm] kiq ring mec 2 pipe 1 q 0
[205807.356308] amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0002 address=0x800072f00 flags=0x0000]
[205807.409389] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[205807.409401] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205807.409406] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205807.409410] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205807.409415] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205807.409418] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205807.409422] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205807.409425] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205807.409429] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205807.409432] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub
0
[205807.409436] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205807.409440] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205807.409444] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205807.409447] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205807.409451] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205807.409454] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205807.418547] [drm] recover vram bo from shadow start
[205807.418549] [drm] recover vram bo from shadow done
[205807.418567] [drm] Skip scheduling IBs!
[205807.418569] [drm] Skip scheduling IBs!
[205807.418592] [drm] Skip scheduling IBs!
[205807.418613] amdgpu 0000:0a:00.0: GPU reset(2) succeeded!
[205808.428469] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
but soft recovered
[205809.458201] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=11463546, emitted seq=11463549
[205809.458282] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 3513 thread Xorg:cs0 pid 3514
[205809.458289] amdgpu 0000:0a:00.0: GPU reset begin!
[205812.872123] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[205812.981471] amdgpu 0000:0a:00.0: GPU reset succeeded, trying to resume
[205812.981823] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[205812.982264] [drm] PSP is resuming...
[205813.002134] [drm] reserve 0x400000 from 0xf47f800000 for PSP TMR
[205813.012088] [drm] psp command (0x5) failed and response status is
(0xFFFF0007)
[205813.208005] amdgpu 0000:0a:00.0: RAS: ras ta ucode is not available
[205813.497603] [drm] kiq ring mec 2 pipe 1 q 0
[205813.551494] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[205813.551506] amdgpu 0000:0a:00.0: ring gfx uses VM inv eng 0 on hub 0
[205813.551510] amdgpu 0000:0a:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[205813.551514] amdgpu 0000:0a:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[205813.551517] amdgpu 0000:0a:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[205813.551520] amdgpu 0000:0a:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[205813.551524] amdgpu 0000:0a:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[205813.551526] amdgpu 0000:0a:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[205813.551529] amdgpu 0000:0a:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[205813.551532] amdgpu 0000:0a:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub
0
[205813.551535] amdgpu 0000:0a:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[205813.551538] amdgpu 0000:0a:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[205813.551541] amdgpu 0000:0a:00.0: ring vcn_dec uses VM inv eng 1 on hub 1
[205813.551543] amdgpu 0000:0a:00.0: ring vcn_enc0 uses VM inv eng 4 on hub 1
[205813.551546] amdgpu 0000:0a:00.0: ring vcn_enc1 uses VM inv eng 5 on hub 1
[205813.551549] amdgpu 0000:0a:00.0: ring jpeg_dec uses VM inv eng 6 on hub 1
[205902.384966] traps: Bluez D-Bus thr[409727] trap invalid opcode
ip:555cd19202af sp:7f265cf9de10 error:0 in skypeforlinux[555ccfa02000+542a000]

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (32 preceding siblings ...)
  2020-05-25 12:21 ` bugzilla-daemon
@ 2020-06-19 19:11 ` bugzilla-daemon
  2020-08-10 23:49 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-06-19 19:11 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Panagiotis Polychronis (panospolychronis@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |panospolychronis@gmail.com

--- Comment #34 from Panagiotis Polychronis (panospolychronis@gmail.com) ---
The problem still exists with Linux Kernel 5.8-rc1 from git. (My graphics card
is Radeon 5600XT)


[20581.087159] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=2768656, emitted seq=2768658
[20581.087212] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process DOOMEternalx64v pid 8875 thread DOOMEternalx64v pid 8875
[20581.087217] amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
[20583.381257] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[20585.087232] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio
[20585.156036] snd_hda_codec_hdmi hdaudioC0D0: HDMI: ELD buf size is 0, force
128
[20585.156052] snd_hda_codec_hdmi hdaudioC0D0: HDMI: invalid ELD data byte 0
[20585.463157] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[20585.463205] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[20585.694999] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[20585.695047] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[20585.926951] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[20588.045497] amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded, trying to
resume
[20588.045605] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[20588.045682] [drm] VRAM is lost due to GPU reset!
[20588.048023] [drm] PSP is resuming...
[20588.218089] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
[20588.287093] amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[20588.293101] amdgpu: SMU is resuming...
[20588.295088] amdgpu: SMU is resumed successfully!
[20588.413155] [drm] kiq ring mec 2 pipe 1 q 0
[20588.417493] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[20588.417632] [drm] JPEG decode initialized successfully.
[20588.417690] amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[20588.417693] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[20588.417697] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[20588.417700] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[20588.417703] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[20588.417707] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[20588.417709] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[20588.417713] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[20588.417716] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[20588.417719] amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[20588.417721] amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[20588.417724] amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[20588.417726] amdgpu 0000:29:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on
hub 1
[20588.417728] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on
hub 1
[20588.417730] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on
hub 1
[20588.417732] amdgpu 0000:29:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 1
[20588.421588] [drm] recover vram bo from shadow start
[20588.427530] [drm] recover vram bo from shadow done
[20588.427534] [drm] Skip scheduling IBs!
[20588.427537] [drm] Skip scheduling IBs!
[20588.427565] [drm] Skip scheduling IBs!
[20588.427573] [drm] Skip scheduling IBs!
[20588.427583] [drm] Skip scheduling IBs!
[20588.427591] [drm] Skip scheduling IBs!
[20588.427597] [drm] Skip scheduling IBs!
[20588.427649] [drm] Skip scheduling IBs!
[20588.427669] [drm] Skip scheduling IBs!
[20588.427680] [drm] Skip scheduling IBs!
[20588.427692] [drm] Skip scheduling IBs!
[20588.427693] [drm] Skip scheduling IBs!
[20588.427699] [drm] Skip scheduling IBs!
[20588.427703] [drm] Skip scheduling IBs!
[20588.427710] [drm] Skip scheduling IBs!
[20588.427714] amdgpu 0000:29:00.0: amdgpu: GPU reset(2) succeeded!
[20588.427719] [drm] Skip scheduling IBs!
[20588.427721] [drm] Skip scheduling IBs!
[20588.427724] [drm] Skip scheduling IBs!
[20588.427726] [drm] Skip scheduling IBs!
[20600.095254] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=2768668, emitted seq=2768669
[20600.095404] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process plasmashell pid 1570 thread plasmashel:cs0 pid 1713
[20600.095413] amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
[20604.095435] amdgpu 0000:29:00.0: amdgpu: failed to suspend display audio
[20604.448799] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[20604.448848] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[20604.681029] amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[20604.681078] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[20604.913262] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[20605.288303] Disabling lock debugging due to kernel taint
[20605.288325] mce: [Hardware Error]: Machine check events logged
[20605.288327] [Hardware Error]: Uncorrected, software restartable error.
[20605.288330] [Hardware Error]: CPU:1 (17:8:2)
MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|UECC|-|Poison|-]: 0xbc002800000c0135
[20605.288335] [Hardware Error]: Error Addr: 0x00000000e8ac0000
[20605.288337] [Hardware Error]: IPID: 0x000000b000000000
[20605.288339] [Hardware Error]: Load Store Unit Ext. Error Code: 12, DC Data
error type 1 and poison consumption.
[20605.288341] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD
[20605.288345] mce: Uncorrected hardware memory error in user-access at
e8ac0000
[20605.288347] Memory failure: 0xe8ac0: memory outside kernel control
[20605.288348] mce: Memory error not recovered
[20605.288361] amdgpu 0000:29:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x8ac0000 flags=0x0000]
[20605.288375] amdgpu 0000:29:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0003 address=0x8ac0000 flags=0x0000]
[20607.031477] amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded, trying to
resume
[20607.031591] [drm] PCIE GART of 512M enabled (table at 0x0000008000E10000).
[20607.031613] [drm] VRAM is lost due to GPU reset!
[20607.034094] [drm] PSP is resuming...
[20607.204092] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
[20607.273093] amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[20607.279097] amdgpu: SMU is resuming...
[20607.281035] amdgpu: SMU is resumed successfully!
[20607.397649] [drm] kiq ring mec 2 pipe 1 q 0
[20607.402090] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[20607.402494] [drm] JPEG decode initialized successfully.
[20607.402540] amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[20607.402542] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[20607.402544] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[20607.402546] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[20607.402548] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[20607.402549] amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[20607.402551] amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[20607.402553] amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[20607.402554] amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[20607.402556] amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[20607.402558] amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[20607.402559] amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[20607.402561] amdgpu 0000:29:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on
hub 1
[20607.402563] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on
hub 1
[20607.402564] amdgpu 0000:29:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on
hub 1
[20607.402566] amdgpu 0000:29:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 1
[20607.405742] [drm] recover vram bo from shadow start
[20607.409317] [drm] recover vram bo from shadow done
[20607.409320] [drm] Skip scheduling IBs!
[20607.409410] amdgpu 0000:29:00.0: amdgpu: GPU reset(4) succeeded!
[20607.493800] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* got no
status for stream 00000000fbb3d792 on acrtc00000000bb69f545
[20607.494597] ------------[ cut here ]------------
[20607.494599] WARNING: CPU: 10 PID: 999 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7429
amdgpu_dm_atomic_commit_tail+0x1ada/0x22b0 [amdgpu]
[20607.494599] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev
mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev
snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1
snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec
edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper
sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid
pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables
xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20607.494633] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M             
5.8.0-rc1-MANJARO+ #2
[20607.494634] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470
GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20607.494635] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x1ada/0x22b0 [amdgpu]
[20607.494636] Code: 8b bd e8 fc ff ff e8 d5 7f 10 00 48 85 c0 0f 85 23 e9 ff
ff 49 8b b5 e8 01 00 00 4c 89 e2 48 c7 c7 e0 5c 91 c0 e8 f6 74 d0 ff <0f> 0b 49
8b 4f 08 e9 10 e9 ff ff 49 8b 45 00 48 8d b8 78 01 00 00
[20607.494637] RSP: 0018:ffffa6b781987838 EFLAGS: 00010246
[20607.494638] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[20607.494639] RDX: 0000000000000000 RSI: ffffffffaaf63047 RDI:
00000000ffffffff
[20607.494640] RBP: ffffa6b781987ba8 R08: 000000000000053e R09:
0000000000000001
[20607.494641] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff941201964000
[20607.494641] R13: ffff9410db79d400 R14: ffff94110b71bc00 R15:
ffff9410fcc69880
[20607.494642] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000)
knlGS:0000000000000000
[20607.494643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20607.494644] CR2: 0000000000fb1fe8 CR3: 0000000402700000 CR4:
00000000003406e0
[20607.494644] Call Trace:
[20607.494644]  ? sched_clock+0x5/0x10
[20607.494645]  ? irqtime_account_irq+0x90/0xc0
[20607.494646]  ? preempt_count_add+0x68/0xa0
[20607.494646]  commit_tail+0x94/0x130 [drm_kms_helper]
[20607.494647]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20607.494648]  drm_atomic_helper_update_plane+0xe9/0x140 [drm_kms_helper]
[20607.494648]  drm_mode_cursor_universal+0x128/0x240 [drm]
[20607.494649]  drm_mode_cursor_common+0x102/0x230 [drm]
[20607.494650]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[20607.494650]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20607.494651]  drm_ioctl+0x208/0x360 [drm]
[20607.494651]  ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
[20607.494652]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20607.494652]  ksys_ioctl+0x82/0xc0
[20607.494653]  __x64_sys_ioctl+0x16/0x20
[20607.494653]  do_syscall_64+0x44/0x70
[20607.494654]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20607.494655] RIP: 0033:0x7f87fca658eb
[20607.494655] Code: Bad RIP value.
[20607.494656] RSP: 002b:00007ffc20a98628 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[20607.494657] RAX: ffffffffffffffda RBX: 00007ffc20a98660 RCX:
00007f87fca658eb
[20607.494658] RDX: 00007ffc20a98660 RSI: 00000000c02464bb RDI:
000000000000000d
[20607.494659] RBP: 00000000c02464bb R08: 000055c87121c270 R09:
000000000000007f
[20607.494659] R10: 0000000000000a00 R11: 0000000000000246 R12:
000055c87109aad0
[20607.494660] R13: 000000000000000d R14: 0000000000000004 R15:
000055c87109b210
[20607.494661] ---[ end trace 96f7cc95700c9634 ]---
[20610.652685] GpuWatchdog[5225]: segfault at 0 ip 000055f7f6e6f76d sp
00007fa63e0b05d0 error 6 in chrome[55f7f27c2000+785b000]
[20610.652696] Code: Bad RIP value.
[20610.652994] audit: type=1701 audit(1592593154.666:113): auid=1000 uid=1000
gid=1000 ses=2 subj==unconfined pid=5147 comm="GpuWatchdog"
exe="/opt/google/chrome/chrome" sig=11 res=1
[20610.674438] audit: type=1334 audit(1592593154.687:114): prog-id=15 op=LOAD
[20610.674597] audit: type=1334 audit(1592593154.687:115): prog-id=16 op=LOAD
[20610.675951] audit: type=1130 audit(1592593154.688:116): pid=1 uid=0
auid=4294967295 ses=4294967295 subj==unconfined
msg='unit=systemd-coredump@0-10631-0 comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20611.663071] audit: type=1131 audit(1592593155.675:117): pid=1 uid=0
auid=4294967295 ses=4294967295 subj==unconfined
msg='unit=systemd-coredump@0-10631-0 comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[20611.701231] audit: type=1334 audit(1592593155.714:118): prog-id=16 op=UNLOAD
[20611.701236] audit: type=1334 audit(1592593155.714:119): prog-id=15 op=UNLOAD
[20617.685151] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:62:crtc-0] flip_done timed out
[20617.694549] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[20627.925351] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:62:crtc-0] flip_done timed out
[20638.165634] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CONNECTOR:80:DP-2] flip_done timed out
[20648.405154] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:55:plane-5] flip_done timed out
[20658.645157] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:61:plane-7] flip_done timed out
[20658.646471] ------------[ cut here ]------------
[20658.646473] WARNING: CPU: 10 PID: 999 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7016
amdgpu_dm_atomic_commit_tail+0x2139/0x22b0 [amdgpu]
[20658.646474] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev
mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev
snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1
snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec
edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper
sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid
pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables
xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20658.646503] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M    W        
5.8.0-rc1-MANJARO+ #2
[20658.646504] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470
GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20658.646505] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2139/0x22b0 [amdgpu]
[20658.646506] Code: 22 ef ff ff 41 8b 4c 24 60 48 c7 c2 20 bc 89 c0 bf 02 00
00 00 48 c7 c6 88 58 91 c0 e8 e0 6d d0 ff 49 8b 4f 08 e9 8f e0 ff ff <0f> 0b e9
0a f0 ff ff 0f 0b 0f 0b e9 21 f0 ff ff 48 8b 85 f0 fc ff
[20658.646506] RSP: 0018:ffffa6b781987948 EFLAGS: 00010002
[20658.646507] RAX: 0000000000000286 RBX: 0000000000000bfc RCX:
0000000000000000
[20658.646508] RDX: 0000000000000002 RSI: 0000000000000206 RDI:
0000000000000000
[20658.646509] RBP: ffffa6b781987cb8 R08: 0000000000000005 R09:
0000000000000000
[20658.646509] R10: ffffa6b7819878b0 R11: ffffa6b7819878b4 R12:
0000000000000286
[20658.646510] R13: ffff941201964000 R14: ffff9410db79c000 R15:
ffff9410fcc69600
[20658.646511] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000)
knlGS:0000000000000000
[20658.646511] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20658.646512] CR2: 00001a0ee45cb008 CR3: 0000000402700000 CR4:
00000000003406e0
[20658.646512] Call Trace:
[20658.646513]  commit_tail+0x94/0x130 [drm_kms_helper]
[20658.646514]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20658.646514]  drm_mode_obj_set_property_ioctl+0x156/0x320 [drm]
[20658.646515]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646515]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20658.646516]  drm_ioctl+0x208/0x360 [drm]
[20658.646516]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646517]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20658.646517]  ksys_ioctl+0x82/0xc0
[20658.646518]  __x64_sys_ioctl+0x16/0x20
[20658.646518]  do_syscall_64+0x44/0x70
[20658.646519]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20658.646519] RIP: 0033:0x7f87fca658eb
[20658.646520] Code: Bad RIP value.
[20658.646520] RSP: 002b:00007ffc20a995c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[20658.646521] RAX: ffffffffffffffda RBX: 00007ffc20a99600 RCX:
00007f87fca658eb
[20658.646522] RDX: 00007ffc20a99600 RSI: 00000000c01864ba RDI:
000000000000000d
[20658.646523] RBP: 00000000c01864ba R08: 000000000000006c R09:
00000000cccccccc
[20658.646523] R10: 0000000000000fff R11: 0000000000000246 R12:
000055c87121db90
[20658.646524] R13: 000000000000000d R14: 0000000000000000 R15:
0000000000000003
[20658.646525] ---[ end trace 96f7cc95700c9635 ]---
[20658.646525] ------------[ cut here ]------------
[20658.646526] WARNING: CPU: 10 PID: 999 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:6613
amdgpu_dm_atomic_commit_tail+0x2142/0x22b0 [amdgpu]
[20658.646527] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq fuse joydev
mousedev input_leds hid_generic usbhid hid uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 snd_usb_audio videobuf2_common videodev
snd_usbmidi_lib snd_rawmidi snd_seq_device mc rfkill squashfs nls_iso8859_1
snd_hda_codec_realtek nls_cp437 vfat snd_hda_codec_generic fat ledtrig_audio
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg loop snd_hda_codec
edac_mce_amd amd_energy snd_hda_core kvm_amd snd_hwdep kvm wmi_bmof snd_pcm
irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel
snd_timer aesni_intel r8169 snd crypto_simd realtek cryptd ccp glue_helper
sp5100_tco k10temp soundcore libphy i2c_piix4 rng_core pcspkr wmi evdev mac_hid
pinctrl_amd gpio_amdpt acpi_cpufreq uinput sg crypto_user ip_tables x_tables
xhci_pci xhci_pci_renesas xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm
[20658.646556] CPU: 10 PID: 999 Comm: Xorg Tainted: G   M    W        
5.8.0-rc1-MANJARO+ #2
[20658.646557] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470
GAMING PLUS (MS-7B79), BIOS A.G0 11/11/2019
[20658.646557] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2142/0x22b0 [amdgpu]
[20658.646558] Code: 48 c7 c2 20 bc 89 c0 bf 02 00 00 00 48 c7 c6 88 58 91 c0
e8 e0 6d d0 ff 49 8b 4f 08 e9 8f e0 ff ff 0f 0b e9 0a f0 ff ff 0f 0b <0f> 0b e9
21 f0 ff ff 48 8b 85 f0 fc ff ff 48 8d 8d 64 fd ff ff 48
[20658.646559] RSP: 0018:ffffa6b781987948 EFLAGS: 00010082
[20658.646560] RAX: 0000000000000001 RBX: 0000000000000bfc RCX:
0000000000000000
[20658.646561] RDX: 0000000000000002 RSI: 0000000000000206 RDI:
0000000000000000
[20658.646561] RBP: ffffa6b781987cb8 R08: 0000000000000005 R09:
0000000000000000
[20658.646562] R10: ffffa6b7819878b0 R11: ffffa6b7819878b4 R12:
0000000000000286
[20658.646563] R13: ffff941201964000 R14: ffff9410db79c000 R15:
ffff9410fcc69600
[20658.646563] FS:  00007f87fbe2be80(0000) GS:ffff94120ea80000(0000)
knlGS:0000000000000000
[20658.646564] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20658.646564] CR2: 00001a0ee45cb008 CR3: 0000000402700000 CR4:
00000000003406e0
[20658.646565] Call Trace:
[20658.646565]  commit_tail+0x94/0x130 [drm_kms_helper]
[20658.646566]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[20658.646567]  drm_mode_obj_set_property_ioctl+0x156/0x320 [drm]
[20658.646567]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646568]  drm_ioctl_kernel+0xb2/0x100 [drm]
[20658.646568]  drm_ioctl+0x208/0x360 [drm]
[20658.646569]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[20658.646569]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[20658.646570]  ksys_ioctl+0x82/0xc0
[20658.646570]  __x64_sys_ioctl+0x16/0x20
[20658.646571]  do_syscall_64+0x44/0x70
[20658.646571]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20658.646572] RIP: 0033:0x7f87fca658eb
[20658.646572] Code: Bad RIP value.
[20658.646573] RSP: 002b:00007ffc20a995c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[20658.646574] RAX: ffffffffffffffda RBX: 00007ffc20a99600 RCX:
00007f87fca658eb
[20658.646574] RDX: 00007ffc20a99600 RSI: 00000000c01864ba RDI:
000000000000000d
[20658.646575] RBP: 00000000c01864ba R08: 000000000000006c R09:
00000000cccccccc
[20658.646576] R10: 0000000000000fff R11: 0000000000000246 R12:
000055c87121db90
[20658.646576] R13: 000000000000000d R14: 0000000000000000 R15:
0000000000000003
[20658.646577] ---[ end trace 96f7cc95700c9636 ]---
[20668.885142] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]]
*ERROR* [CRTC:62:crtc-0] flip_done timed out
[20684.245559] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [CRTC:62:crtc-0] flip_done timed out
[20694.485139] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]]
*ERROR* [PLANE:61:plane-7] flip_done timed out

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (33 preceding siblings ...)
  2020-06-19 19:11 ` bugzilla-daemon
@ 2020-08-10 23:49 ` bugzilla-daemon
  2020-09-01 14:00 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-08-10 23:49 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Randy (randyk161@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |randyk161@gmail.com

--- Comment #35 from Randy (randyk161@gmail.com) ---
I've been getting "ring gfx timeouts" for some time, most of the time it's when
the computer has not had any input for a while (while I'm away from it).  When
it freezes I can SSH into it but when I try to do a: "shutdown -h now" it boots
me out of SSH as it should but the computer never seems to actually shutdown. 
The screen stays frozen with whatever was on the display when it froze.  Any
help would be greatly appreciated, here is my info:

Mobo: AsRock AB350 Pro4 UEFI: 5.80
Video card: Sapphire Nitro+ RX580 (8GB)
Distro: Manjaro
Kernel: 5.7.9-1-MANJARO

Aug 09 21:33:06.054857 kernel: pcieport 0000:00:03.1: AER: Multiple Uncorrected
(Non-Fatal) error received: 0000:00:00.0
Aug 09 21:33:06.068305 kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error:
severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 09 21:33:06.068636 kernel: pcieport 0000:00:03.1: AER:   device [1022:1453]
error status/mask=00200000/00000000
Aug 09 21:33:06.068863 kernel: pcieport 0000:00:03.1: AER:    [21] ACSViol     
          (First)
Aug 09 21:33:06.069137 kernel: amdgpu 0000:0a:00.0: AER: can't recover (no
error_detected callback)
Aug 09 21:33:06.069421 kernel: snd_hda_intel 0000:0a:00.1: AER: can't recover
(no error_detected callback)
Aug 09 21:33:06.069633 kernel: pcieport 0000:00:03.1: AER: device recovery
failed
Aug 09 21:33:16.258283 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=9087, emitted seq=9089
Aug 09 21:33:16.258412 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Aug 09 21:33:16.258446 kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Aug 09 21:33:16.258741 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, but soft recovered
Aug 09 21:33:16.258773 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258803 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258835 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258869 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258896 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258925 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.258951 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.258977 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259009 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259035 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259060 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259084 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259108 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259131 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259156 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259186 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259213 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259242 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259272 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259298 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259324 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259350 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259373 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259400 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259426 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259456 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259483 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259509 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259540 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259566 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259592 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259617 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259642 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259671 kernel: amdgpu: [powerplay] 
                                failed to send message 261 ret is 65535 
Aug 09 21:33:16.259697 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259723 kernel: amdgpu: [powerplay] 
                                failed to send message 306 ret is 65535 
Aug 09 21:33:16.259754 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259785 kernel: amdgpu: [powerplay] 
                                failed to send message 5e ret is 65535 
Aug 09 21:33:16.259816 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259860 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:16.259913 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.259947 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:16.259976 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260003 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:16.260034 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260061 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:16.260088 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:16.260114 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:16.291929 kernel: [drm] REG_WAIT timeout 10us * 3000 tries -
dce110_stream_encoder_dp_blank line:955
Aug 09 21:33:16.292012 kernel: ------------[ cut here ]------------
Aug 09 21:33:16.292044 kernel: WARNING: CPU: 3 PID: 154 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:526
generic_reg_wait.cold+0x26/0x2d [amdgpu]
Aug 09 21:33:16.292070 kernel: Modules linked in: snd_seq_dummy snd_hrtimer
snd_seq fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables
ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw
iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter squashfs
loop nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev
snd_usbmidi_lib snd_rawmidi snd_seq_device mc joydev mousedev input_leds
wmi_bmof amdgpu snd_hda_codec_realtek snd_hda_codec_generic wl(POE)
ledtrig_audio snd_hda_codec_hdmi snd_hda_intel gpu_sched i2c_algo_bit
edac_mce_amd snd_intel_dspcfg ttm snd_hda_codec kvm_amd drm_kms_helper r8169
snd_hda_core kvm cfg80211 snd_hwdep snd_pcm cec realtek irqbypass rc_core
snd_timer libphy syscopyarea snd rfkill sysfillrect k10temp
Aug 09 21:33:16.292112 kernel:  pcspkr sysimgblt sp5100_tco i2c_piix4
fb_sys_fops soundcore wmi evdev mac_hid gpio_amdpt pinctrl_amd acpi_cpufreq drm
uinput sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16
mbcache jbd2 dm_crypt dm_mod uas usb_storage hid_logitech ff_memless
hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ccp xhci_pci
mpt3sas rng_core xhci_hcd raid_class scsi_transport_sas
Aug 09 21:33:16.292141 kernel: CPU: 3 PID: 154 Comm: kworker/3:1 Tainted: P    
      OE     5.7.9-1-MANJARO #1
Aug 09 21:33:16.292164 kernel: Hardware name: To Be Filled By O.E.M. To Be
Filled By O.E.M./AB350 Pro4, BIOS P5.80 06/14/2019
Aug 09 21:33:16.292188 kernel: Workqueue: events drm_sched_job_timedout
[gpu_sched]
Aug 09 21:33:16.292213 kernel: RIP: 0010:generic_reg_wait.cold+0x26/0x2d
[amdgpu]
Aug 09 21:33:16.292240 kernel: Code: a7 41 fd ff 44 8b 44 24 24 48 8b 4c 24 18
89 ee 48 c7 c7 08 14 cd c1 8b 54 24 20 e8 7a 91 d2 f9 83 7b 20 01 0f 84 c3 52
fd ff <0f> 0b e9 bc 52 fd ff 48 c7 c7 fd 4c c8 c1 e8 f3 c2 12 fa e8 4a 29
Aug 09 21:33:16.292263 kernel: RSP: 0018:ffffab9b806c3610 EFLAGS: 00010297
Aug 09 21:33:16.292284 kernel: RAX: 0000000000000052 RBX: ffff92334ad7fa40 RCX:
0000000000000000
Aug 09 21:33:16.292306 kernel: RDX: 0000000000000000 RSI: ffff92334e8d9ac8 RDI:
00000000ffffffff
Aug 09 21:33:16.292335 kernel: RBP: 000000000000000a R08: 0000000000000561 R09:
0000000000000001
Aug 09 21:33:16.292356 kernel: R10: 0000000000000000 R11: 0000000000000001 R12:
0000000000000000
Aug 09 21:33:16.292376 kernel: R13: 0000000000010000 R14: 0000000000004ea4 R15:
0000000000000bb9
Aug 09 21:33:16.292398 kernel: FS:  0000000000000000(0000)
GS:ffff92334e8c0000(0000) knlGS:0000000000000000
Aug 09 21:33:16.292421 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 09 21:33:16.292446 kernel: CR2: 00007f494fc04000 CR3: 00000003af1ce000 CR4:
00000000003406e0
Aug 09 21:33:16.292466 kernel: Call Trace:
Aug 09 21:33:16.292485 kernel:  dce110_stream_encoder_dp_blank+0xea/0x140
[amdgpu]
Aug 09 21:33:16.292507 kernel:  core_link_disable_stream+0x9c/0x200 [amdgpu]
Aug 09 21:33:16.292525 kernel:  dce110_reset_hw_ctx_wrap+0xbe/0x240 [amdgpu]
Aug 09 21:33:16.292543 kernel:  dce110_apply_ctx_to_hw+0x4f/0x570 [amdgpu]
Aug 09 21:33:16.292560 kernel:  ? hwmgr_handle_task+0x98/0xf0 [amdgpu]
Aug 09 21:33:16.292578 kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
Aug 09 21:33:16.292598 kernel:  ? dm_pp_apply_display_requirements+0x19e/0x1c0
[amdgpu]
Aug 09 21:33:16.292619 kernel:  dc_commit_state+0x323/0x970 [amdgpu]
Aug 09 21:33:16.292640 kernel:  amdgpu_dm_atomic_commit_tail+0x38c/0x2310
[amdgpu]
Aug 09 21:33:16.292662 kernel:  ? free_one_page+0x57/0xd0
Aug 09 21:33:16.292680 kernel:  ? kfree+0x219/0x250
Aug 09 21:33:16.292698 kernel:  ? bw_calcs+0xa30/0x4380 [amdgpu]
Aug 09 21:33:16.292718 kernel:  ? dc_validate_global_state+0x2f2/0x390 [amdgpu]
Aug 09 21:33:16.292736 kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Aug 09 21:33:16.292757 kernel:  drm_atomic_helper_commit+0x113/0x140
[drm_kms_helper]
Aug 09 21:33:16.292776 kernel:  drm_atomic_helper_disable_all+0x175/0x190
[drm_kms_helper]
Aug 09 21:33:16.292792 kernel:  drm_atomic_helper_suspend+0x78/0x150
[drm_kms_helper]
Aug 09 21:33:16.292810 kernel:  dm_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:16.292869 kernel:  amdgpu_device_ip_suspend_phase1+0x83/0xe0
[amdgpu]
Aug 09 21:33:16.292889 kernel:  ? _raw_spin_lock+0x13/0x30
Aug 09 21:33:16.292908 kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:16.292926 kernel:  amdgpu_device_pre_asic_reset+0x16b/0x182
[amdgpu]
Aug 09 21:33:16.292944 kernel:  amdgpu_device_gpu_recover.cold+0x42a/0xc74
[amdgpu]
Aug 09 21:33:16.292962 kernel:  amdgpu_job_timedout+0x105/0x130 [amdgpu]
Aug 09 21:33:16.292981 kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Aug 09 21:33:16.293001 kernel:  process_one_work+0x1da/0x3d0
Aug 09 21:33:16.293017 kernel:  worker_thread+0x4d/0x3e0
Aug 09 21:33:16.293036 kernel:  ? rescuer_thread+0x3f0/0x3f0
Aug 09 21:33:16.293057 kernel:  kthread+0x13e/0x160
Aug 09 21:33:16.293074 kernel:  ? __kthread_bind_mask+0x60/0x60
Aug 09 21:33:16.293097 kernel:  ret_from_fork+0x22/0x40
Aug 09 21:33:16.293123 kernel: ---[ end trace aa4b924a702f7188 ]---
Aug 09 21:33:26.298272 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios
stuck in loop for more than 10secs aborting
Aug 09 21:33:26.298425 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing DB6E (len 824, WS 0, PS 0) @ 0xDCEE
Aug 09 21:33:26.298470 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing DA28 (len 326, WS 0, PS 0) @ 0xDB18
Aug 09 21:33:26.298505 kernel: [drm:dce110_link_encoder_disable_output
[amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS
command table!
Aug 09 21:33:26.298535 kernel: ------------[ cut here ]------------
Aug 09 21:33:26.298571 kernel: WARNING: CPU: 3 PID: 154 at
drivers/gpu/drm/amd/amdgpu/../display/dc/dce/dce_link_encoder.c:1099
dce110_link_encoder_disable_output+0x141/0x150 [amdgpu]
Aug 09 21:33:26.298607 kernel: Modules linked in: snd_seq_dummy snd_hrtimer
snd_seq fuse nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_tables
ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw
iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter squashfs
loop nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 videobuf2_common snd_usb_audio videodev
snd_usbmidi_lib snd_rawmidi snd_seq_device mc joydev mousedev input_leds
wmi_bmof amdgpu snd_hda_codec_realtek snd_hda_codec_generic wl(POE)
ledtrig_audio snd_hda_codec_hdmi snd_hda_intel gpu_sched i2c_algo_bit
edac_mce_amd snd_intel_dspcfg ttm snd_hda_codec kvm_amd drm_kms_helper r8169
snd_hda_core kvm cfg80211 snd_hwdep snd_pcm cec realtek irqbypass rc_core
snd_timer libphy syscopyarea snd rfkill sysfillrect k10temp
Aug 09 21:33:26.298656 kernel:  pcspkr sysimgblt sp5100_tco i2c_piix4
fb_sys_fops soundcore wmi evdev mac_hid gpio_amdpt pinctrl_amd acpi_cpufreq drm
uinput sg crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16
mbcache jbd2 dm_crypt dm_mod uas usb_storage hid_logitech ff_memless
hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ccp xhci_pci
mpt3sas rng_core xhci_hcd raid_class scsi_transport_sas
Aug 09 21:33:26.298691 kernel: CPU: 3 PID: 154 Comm: kworker/3:1 Tainted: P    
   W  OE     5.7.9-1-MANJARO #1
Aug 09 21:33:26.298722 kernel: Hardware name: To Be Filled By O.E.M. To Be
Filled By O.E.M./AB350 Pro4, BIOS P5.80 06/14/2019
Aug 09 21:33:26.298753 kernel: Workqueue: events drm_sched_job_timedout
[gpu_sched]
Aug 09 21:33:26.298783 kernel: RIP:
0010:dce110_link_encoder_disable_output+0x141/0x150 [amdgpu]
Aug 09 21:33:26.298811 kernel: Code: 44 24 38 65 48 2b 04 25 28 00 00 00 75 20
48 83 c4 40 5b 5d 41 5c c3 48 c7 c6 60 4a c4 c1 48 c7 c7 30 f2 cb c1 e8 4f 2c
bd fe <0f> 0b eb d0 e8 76 01 db f9 66 0f 1f 44 00 00 0f 1f 44 00 00 41 57
Aug 09 21:33:26.298840 kernel: RSP: 0018:ffffab9b806c3600 EFLAGS: 00010246
Aug 09 21:33:26.298865 kernel: RAX: 0000000000000000 RBX: 0000000000000020 RCX:
0000000000000000
Aug 09 21:33:26.298896 kernel: RDX: 0000000000000000 RSI: 0000000000000086 RDI:
00000000ffffffff
Aug 09 21:33:26.298926 kernel: RBP: ffff923349574720 R08: 0000000000000598 R09:
0000000000000001
Aug 09 21:33:26.298954 kernel: R10: 0000000000000000 R11: 0000000000000001 R12:
ffffab9b806c3604
Aug 09 21:33:26.298979 kernel: R13: 0000000000000000 R14: ffff923251500000 R15:
ffff92334c016900
Aug 09 21:33:26.299004 kernel: FS:  0000000000000000(0000)
GS:ffff92334e8c0000(0000) knlGS:0000000000000000
Aug 09 21:33:26.299032 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Aug 09 21:33:26.299059 kernel: CR2: 00007f494fc04000 CR3: 000000038dd62000 CR4:
00000000003406e0
Aug 09 21:33:26.299087 kernel: Call Trace:
Aug 09 21:33:26.299111 kernel:  dp_disable_link_phy+0x83/0x150 [amdgpu]
Aug 09 21:33:26.299142 kernel:  disable_link+0x4f/0xa0 [amdgpu]
Aug 09 21:33:26.299170 kernel:  core_link_disable_stream+0xd6/0x200 [amdgpu]
Aug 09 21:33:26.299203 kernel:  dce110_reset_hw_ctx_wrap+0xbe/0x240 [amdgpu]
Aug 09 21:33:26.299231 kernel:  dce110_apply_ctx_to_hw+0x4f/0x570 [amdgpu]
Aug 09 21:33:26.299258 kernel:  ? hwmgr_handle_task+0x98/0xf0 [amdgpu]
Aug 09 21:33:26.299283 kernel:  ? pp_dpm_dispatch_tasks+0x45/0x60 [amdgpu]
Aug 09 21:33:26.299309 kernel:  ? dm_pp_apply_display_requirements+0x19e/0x1c0
[amdgpu]
Aug 09 21:33:26.299361 kernel:  dc_commit_state+0x323/0x970 [amdgpu]
Aug 09 21:33:26.299392 kernel:  amdgpu_dm_atomic_commit_tail+0x38c/0x2310
[amdgpu]
Aug 09 21:33:26.299421 kernel:  ? free_one_page+0x57/0xd0
Aug 09 21:33:26.299448 kernel:  ? kfree+0x219/0x250
Aug 09 21:33:26.299476 kernel:  ? bw_calcs+0xa30/0x4380 [amdgpu]
Aug 09 21:33:26.299502 kernel:  ? dc_validate_global_state+0x2f2/0x390 [amdgpu]
Aug 09 21:33:26.299532 kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Aug 09 21:33:26.299555 kernel:  drm_atomic_helper_commit+0x113/0x140
[drm_kms_helper]
Aug 09 21:33:26.299581 kernel:  drm_atomic_helper_disable_all+0x175/0x190
[drm_kms_helper]
Aug 09 21:33:26.299606 kernel:  drm_atomic_helper_suspend+0x78/0x150
[drm_kms_helper]
Aug 09 21:33:26.299633 kernel:  dm_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:26.299660 kernel:  amdgpu_device_ip_suspend_phase1+0x83/0xe0
[amdgpu]
Aug 09 21:33:26.299685 kernel:  ? _raw_spin_lock+0x13/0x30
Aug 09 21:33:26.299710 kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Aug 09 21:33:26.299736 kernel:  amdgpu_device_pre_asic_reset+0x16b/0x182
[amdgpu]
Aug 09 21:33:26.299761 kernel:  amdgpu_device_gpu_recover.cold+0x42a/0xc74
[amdgpu]
Aug 09 21:33:26.299787 kernel:  amdgpu_job_timedout+0x105/0x130 [amdgpu]
Aug 09 21:33:26.299818 kernel:  drm_sched_job_timedout+0x64/0xe0 [gpu_sched]
Aug 09 21:33:26.299844 kernel:  process_one_work+0x1da/0x3d0
Aug 09 21:33:26.299872 kernel:  worker_thread+0x4d/0x3e0
Aug 09 21:33:26.299898 kernel:  ? rescuer_thread+0x3f0/0x3f0
Aug 09 21:33:26.299925 kernel:  kthread+0x13e/0x160
Aug 09 21:33:26.299951 kernel:  ? __kthread_bind_mask+0x60/0x60
Aug 09 21:33:26.299979 kernel:  ret_from_fork+0x22/0x40
Aug 09 21:33:26.300004 kernel: ---[ end trace aa4b924a702f7189 ]---
Aug 09 21:33:36.301609 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios
stuck in loop for more than 10secs aborting
Aug 09 21:33:36.301729 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:33:36.334815 kernel: [drm] REG_WAIT timeout 10us * 3000 tries -
dce110_stream_encoder_dp_blank line:955
Aug 09 21:33:46.338270 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios
stuck in loop for more than 10secs aborting
Aug 09 21:33:46.338400 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing DB6E (len 824, WS 0, PS 0) @ 0xDCEE
Aug 09 21:33:46.338434 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing DA28 (len 326, WS 0, PS 0) @ 0xDB18
Aug 09 21:33:46.338466 kernel: [drm:dce110_link_encoder_disable_output
[amdgpu]] *ERROR* dce110_link_encoder_disable_output: Failed to execute VBIOS
command table!
Aug 09 21:33:56.339196 plasmashell[1403]:
qrc:/plasma/plasmoids/org.kde.plasma.volume/contents/ui/ListItemBase.qml:151:
TypeError: Cannot read property 'ports' of undefined
Aug 09 21:33:56.346378 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios
stuck in loop for more than 10secs aborting
Aug 09 21:33:56.346481 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:33:56.346519 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346572 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:56.346606 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346632 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:56.346659 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:56.346692 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:56.345571 plasmashell[1403]:
qrc:/plasma/plasmoids/org.kde.plasma.volume/contents/ui/main.qml:550:39: QML
DeviceListItem: Binding loop detected for property "width"
Aug 09 21:33:56.591481 kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <vce_v3_0> failed -110
Aug 09 21:33:57.054823 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.054914 kernel: amdgpu: [powerplay] 
                                failed to send message 133 ret is 65535 
Aug 09 21:33:57.054952 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.054971 kernel: amdgpu: [powerplay] 
                                failed to send message 306 ret is 65535 
Aug 09 21:33:57.054990 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055010 kernel: amdgpu: [powerplay] 
                                failed to send message 5e ret is 65535 
Aug 09 21:33:57.055027 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055047 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:57.055064 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055080 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:57.055097 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055113 kernel: amdgpu: [powerplay] 
                                failed to send message 148 ret is 65535 
Aug 09 21:33:57.055134 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055151 kernel: amdgpu: [powerplay] 
                                failed to send message 145 ret is 65535 
Aug 09 21:33:57.055165 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055180 kernel: amdgpu: [powerplay] 
                                failed to send message 146 ret is 65535 
Aug 09 21:33:57.055195 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055208 kernel: amdgpu: [powerplay] 
                                failed to send message 16a ret is 65535 
Aug 09 21:33:57.055225 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055238 kernel: amdgpu: [powerplay] 
                                failed to send message 186 ret is 65535 
Aug 09 21:33:57.055253 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.055267 kernel: amdgpu: [powerplay] 
                                failed to send message 54 ret is 65535 
Aug 09 21:33:57.558146 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558240 kernel: amdgpu: [powerplay] 
                                failed to send message 26b ret is 65535 
Aug 09 21:33:57.558260 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558279 kernel: amdgpu: [powerplay] 
                                failed to send message 13d ret is 65535 
Aug 09 21:33:57.558297 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558313 kernel: amdgpu: [powerplay] 
                                failed to send message 14f ret is 65535 
Aug 09 21:33:57.558329 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558342 kernel: amdgpu: [powerplay] 
                                failed to send message 151 ret is 65535 
Aug 09 21:33:57.558356 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558369 kernel: amdgpu: [powerplay] 
                                failed to send message 135 ret is 65535 
Aug 09 21:33:57.558384 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558398 kernel: amdgpu: [powerplay] 
                                failed to send message 190 ret is 65535 
Aug 09 21:33:57.558415 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558428 kernel: amdgpu: [powerplay] 
                                failed to send message 63 ret is 65535 
Aug 09 21:33:57.558442 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:33:57.558454 kernel: amdgpu: [powerplay] 
                                failed to send message 84 ret is 65535 
Aug 09 21:33:57.558468 kernel: amdgpu: [powerplay] Failed to force to switch
arbf0!
Aug 09 21:33:57.558485 kernel: amdgpu: [powerplay] [disable_dpm_tasks] Failed
to disable DPM!
Aug 09 21:33:57.558502 kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <powerplay> failed -22
Aug 09 21:33:57.811494 kernel: amdgpu 0000:0a:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
Aug 09 21:33:57.811816 kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ
disable failed
Aug 09 21:33:58.314928 kernel: cp is busy, skip halt cp
Aug 09 21:33:58.564823 kernel: rlc is busy, skip halt rlc
Aug 09 21:33:58.818145 kernel: amdgpu 0000:0a:00.0: GPU BACO reset
Aug 09 21:34:59.601512 kernel: [drm:atom_op_jump [amdgpu]] *ERROR* atombios
stuck in loop for more than 10secs aborting
Aug 09 21:34:59.601664 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing C51A (len 62, WS 0, PS 0) @ 0xC536
Aug 09 21:34:59.601700 kernel: [drm:amdgpu_atom_execute_table_locked [amdgpu]]
*ERROR* atombios stuck executing ADA0 (len 142, WS 0, PS 8) @ 0xADBB
Aug 09 21:34:59.601732 kernel: [drm] asic atom init failed!
Aug 09 21:34:59.601767 kernel: amdgpu 0000:0a:00.0: GPU reset succeeded, trying
to resume
Aug 09 21:34:59.851491 kernel: amdgpu 0000:0a:00.0: Wait for MC idle timedout !
Aug 09 21:35:00.101588 kernel: amdgpu 0000:0a:00.0: Wait for MC idle timedout !
Aug 09 21:35:00.104823 kernel: [drm] PCIE GART of 256M enabled (table at
0x000000F4007E9000).
Aug 09 21:35:00.104893 kernel: [drm] VRAM is lost due to GPU reset!
Aug 09 21:35:00.121493 kernel: amdgpu: [powerplay] Failed to send Message.
Aug 09 21:35:00.121580 kernel: amdgpu: [powerplay] SMC address must be 4 byte
aligned.
Aug 09 21:35:00.121616 kernel: amdgpu: [powerplay]
[AVFS][Polaris10_SetupGfxLvlStruct] Problems copying VRConfig value over to SMC
Aug 09 21:35:00.121645 kernel: amdgpu: [powerplay]
[AVFS][Polaris10_AVFSEventMgr] Could not Copy Graphics Level table over to SMU
Aug 09 21:35:00.121672 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121706 kernel: amdgpu: [powerplay] 
                                failed to send message 252 ret is 65535 
Aug 09 21:35:00.121740 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121767 kernel: amdgpu: [powerplay] 
                                failed to send message 253 ret is 65535 
Aug 09 21:35:00.121796 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121822 kernel: amdgpu: [powerplay] 
                                failed to send message 250 ret is 65535 
Aug 09 21:35:00.121853 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121879 kernel: amdgpu: [powerplay] 
                                failed to send message 251 ret is 65535 
Aug 09 21:35:00.121911 kernel: amdgpu: [powerplay] 
                                last message was failed ret is 65535
Aug 09 21:35:00.121940 kernel: amdgpu: [powerplay] 
                                failed to send message 254 ret is 65535 
Aug 09 21:35:00.374824 kernel: [drm] Timeout wait for RLC serdes 0,0
Aug 09 21:35:00.624828 kernel: amdgpu 0000:0a:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Aug 09 21:35:00.625100 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v8_0> failed -110
Aug 09 21:35:00.625130 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625152 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625166 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625180 kernel: amdgpu 0000:0a:00.0: GPU reset(2) failed
Aug 09 21:35:00.625307 kernel: [drm] Skip scheduling IBs!
Aug 09 21:35:00.625320 kernel: amdgpu 0000:0a:00.0: GPU reset end with ret =
-110
Aug 09 21:35:10.818142 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=9089, emitted seq=9089
Aug 09 21:35:10.818255 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Aug 09 21:35:10.818280 kernel: amdgpu 0000:0a:00.0: GPU reset begin!

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (34 preceding siblings ...)
  2020-08-10 23:49 ` bugzilla-daemon
@ 2020-09-01 14:00 ` bugzilla-daemon
  2020-09-13 11:14 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-09-01 14:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Evgeniy A. Dushistov (dushistov@mail.ru) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dushistov@mail.ru

--- Comment #36 from Evgeniy A. Dushistov (dushistov@mail.ru) ---
Linux kernel 5.4.61/amd64 / 
Radeon RX 560 got the same problem today:

[86631.543134] [drm] Fence fallback timer expired on ring gfx
[86642.133543] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=1349762, emitted seq=1349767
[86642.133628] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xorg pid 8032 thread Xorg:cs0 pid 8155
[86642.133634] amdgpu 0000:41:00.0: GPU reset begin!
[86642.134073] amdgpu: [powerplay] 
                last message was failed ret is 65535
[86642.134075] amdgpu: [powerplay] 
                failed to send message 281 ret is 65535 

I have never seen a similar problem before.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (35 preceding siblings ...)
  2020-09-01 14:00 ` bugzilla-daemon
@ 2020-09-13 11:14 ` bugzilla-daemon
  2020-11-23 16:27 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-09-13 11:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

juan.zenos@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |juan.zenos@gmail.com

--- Comment #37 from juan.zenos@gmail.com ---
I have this problem with 2 different brand new rx580s in a brand new asus
prime-p x570 and an old asus p9x79 with various ubuntu 20.04 kernels 5.4.x -
5.8.x - ...

I wanted to play these games on Linux so badly, the heartbreaking solution is
to purchase a windows license... ;_;

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (36 preceding siblings ...)
  2020-09-13 11:14 ` bugzilla-daemon
@ 2020-11-23 16:27 ` bugzilla-daemon
  2021-01-24 19:37 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2020-11-23 16:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

majorgonzo@juno.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |majorgonzo@juno.com

--- Comment #38 from majorgonzo@juno.com ---
I have a similar problem, a cascade of errors that typically starts with one of
these:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
seq=1093546, emitted seq=1093548

This used to occur only when playing Dauntless, and only after my MSI Radeon
RX580 ran hot for a while.  Warframe never crashed.  Totally different methods
of running the games (Dauntless=Lutris and Epic Games Store, Warframe = Steam
and Proton).  Something then changed after one of the updates within the last
month, and now it crashes on both Warframe and Dauntless well before the card
is at a high temp.  Basically can't run more than about 5 minutes.  

I was running Ubuntu 18.04, so I figured maybe a newer kernel would fix this,
but updating to 20.10 did nothing but waste a couple of days of reloading
everything.

System:  Ryzen 5 3600 on Gigabyte x570 UD with a MSI Radeon RX580 8GB

I'm willing to work with whoever sending whatever info/logs are necessary to
get this fixed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (37 preceding siblings ...)
  2020-11-23 16:27 ` bugzilla-daemon
@ 2021-01-24 19:37 ` bugzilla-daemon
  2021-01-24 22:26 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-24 19:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #39 from Randune (randyk161@gmail.com) ---
There doesn't appear to be any progress on this bug, does anyone have any
suggestions with regards on how to fix this issue?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (38 preceding siblings ...)
  2021-01-24 19:37 ` bugzilla-daemon
@ 2021-01-24 22:26 ` bugzilla-daemon
  2021-01-24 22:51 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-24 22:26 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #40 from j.cordoba (j.cordoba@gmx.net) ---
(In reply to Randune from comment #39)
> There doesn't appear to be any progress on this bug, does anyone have any
> suggestions with regards on how to fix this issue?

Try to add iommu=pt as parameter

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (39 preceding siblings ...)
  2021-01-24 22:26 ` bugzilla-daemon
@ 2021-01-24 22:51 ` bugzilla-daemon
  2021-01-24 22:56 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-24 22:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #41 from Panagiotis Polychronis (panospolychronis@gmail.com) ---
(In reply to j.cordoba from comment #40)
> (In reply to Randune from comment #39)
> > There doesn't appear to be any progress on this bug, does anyone have any
> > suggestions with regards on how to fix this issue?
> 
> Try to add iommu=pt as parameter

I'm running Linux Kernel 5.10.9 with those kernel parameters 
"amdgpu.ppfeaturemask=0xffffbffb amdgpu.noretry=0 amdgpu.lockup_timeout=0
amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt"
My graphics card is Radeon 5600XT and i can confirm that this issue still exist
:)
Meanwhile i looked at
https://lists.freedesktop.org/archives/amd-gfx/2021-January/date.html and there
are some patches about ring timeout which i think they aren't yet merged for
the next Linux Kernel release. Probably Alex Deucher will merge them later.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (40 preceding siblings ...)
  2021-01-24 22:51 ` bugzilla-daemon
@ 2021-01-24 22:56 ` bugzilla-daemon
  2021-01-25 22:24 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-24 22:56 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #42 from MajorGonzo (majorgonzo@juno.com) ---
I made a change a while back.  I added:
amdgpu.gpu_recovery=1 
as a grub parameter.  I have no other (of the many suggested) parameters set:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amdgpu.ppfeaturemask=0xfffd7fff
amdgpu.gpu_recovery=1"

The feature mask was used to enable reducing the top speed of my video card to
reduce heating, and I was using corectrl for that.  However, it was something I
had to set manually after each boot.  Of course, I forgot to do so, and yet it
still stopped occurring.  So in reality, I don't think I need that anymore,
either.

Just checked my linux logs grepping for "ring gfx".  Before the change, I had
multiples each day up to Dec 10th.  Since then, I've had 3.

Also of note - for the last two, it was when I WASN'T playing.  Well, I was
playing a game, but I was AFK.  It seemed when I returned and did something, it
went black then.

Lastly, just to confirm, I checked my change log (my own log), and I did,
indeed, make that change on 10 Dec.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (41 preceding siblings ...)
  2021-01-24 22:56 ` bugzilla-daemon
@ 2021-01-25 22:24 ` bugzilla-daemon
  2021-01-26  3:22 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-25 22:24 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #43 from Randune (randyk161@gmail.com) ---
(In reply to Panagiotis Polychronis from comment #41)
> (In reply to j.cordoba from comment #40)
> > (In reply to Randune from comment #39)
> > > There doesn't appear to be any progress on this bug, does anyone have any
> > > suggestions with regards on how to fix this issue?
> > 
> > Try to add iommu=pt as parameter
> 
> I'm running Linux Kernel 5.10.9 with those kernel parameters 
> "amdgpu.ppfeaturemask=0xffffbffb amdgpu.noretry=0 amdgpu.lockup_timeout=0
> amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on
> iommu=pt" My graphics card is Radeon 5600XT and i can confirm that this
> issue still exist :)
> Meanwhile i looked at
> https://lists.freedesktop.org/archives/amd-gfx/2021-January/date.html and
> there are some patches about ring timeout which i think they aren't yet
> merged for the next Linux Kernel release. Probably Alex Deucher will merge
> them later.

Thanks for the suggestion Panagliotis Polychronis, I've tried that in the past
and it didn't seem to help.  I'm running Manjaro currently and I'm on the Linux
5.11.rc3 kernel as supposedly there are many changes regarding AMDGPU (I'm not
sure if there are many changes for my RX580) but it's worth a shot, I'm
basically shooting in the dark at this point :).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (42 preceding siblings ...)
  2021-01-25 22:24 ` bugzilla-daemon
@ 2021-01-26  3:22 ` bugzilla-daemon
  2021-02-14 19:48 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-01-26  3:22 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #44 from MajorGonzo (majorgonzo@juno.com) ---
Here's another thing I tried which also may have made a difference.  Gonna
sound weird, but worth a try.  I had a 675VA UPS that my system was plugged
into.  One time, it started shrieking (weird beepish sounds) as I was doing
heavy gaming with lots of visual effects going on.  I looked it up, and it
seems that if your UPS, or your power strip, can't deliver enough power, it can
cause the issues with these GPU cards.  I mentioned Dec 10th as the date I made
the change for my boot parameters, but it's also the date I plugged my system
directly into the wall.  Responding yesterday reminded me I have a new, more
powerful UPS and I plugged my system into that today.  I'll see if it changes
anything.

P.S.  I know the argument...power is power...but it's not.  If the surge
protector, or UPS has cheap, thin wiring, then that restricts the amount of
amps that can flow though them.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (43 preceding siblings ...)
  2021-01-26  3:22 ` bugzilla-daemon
@ 2021-02-14 19:48 ` bugzilla-daemon
  2021-02-28 12:35 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-02-14 19:48 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

slidercrank (playdohcrow@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |playdohcrow@gmail.com

--- Comment #45 from slidercrank (playdohcrow@gmail.com) ---
I still have this issue when I play "Interstellar Marines"

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
seq=163824, emitted seq=163826
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process
InterstellarMar pid 4378 thread Interstell:cs0 pid 4382

kernel: 5.10.14-200.fc33.

videocard: Radeon HD7770

When this happens, the image freezes, the system stops responding to keypresses
but the background music plays for a few minutes and I have to hit <reset>.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (44 preceding siblings ...)
  2021-02-14 19:48 ` bugzilla-daemon
@ 2021-02-28 12:35 ` bugzilla-daemon
  2021-03-28 13:19 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-02-28 12:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Fice (fice@inbox.ru) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fice@inbox.ru

--- Comment #46 from Fice (fice@inbox.ru) ---
(In reply to MajorGonzo from comment #44)
> Here's another thing I tried which also may have made a difference.  Gonna
> sound weird, but worth a try.  I had a 675VA UPS that my system was plugged
> into.  One time, it started shrieking (weird beepish sounds) as I was doing
> heavy gaming with lots of visual effects going on.  I looked it up, and it
> seems that if your UPS, or your power strip, can't deliver enough power, it
> can cause the issues with these GPU cards.  I mentioned Dec 10th as the date
> I made the change for my boot parameters, but it's also the date I plugged
> my system directly into the wall.  Responding yesterday reminded me I have a
> new, more powerful UPS and I plugged my system into that today.  I'll see if
> it changes anything.
> 
> P.S.  I know the argument...power is power...but it's not.  If the surge
> protector, or UPS has cheap, thin wiring, then that restricts the amount of
> amps that can flow though them.

I had an old PSU, which was repaired once, so I replaced it. That did not
resolve the issue. The PSU is connected directly to the wall socket.

Kernel 5.10.18-200.fc33
AMD Ryzen 3 2200G with Radeon Vega Graphics

The bug is most often triggered when using Firefox.

[42174.187004] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:1 pasid:32772, for process firefox pid 21156 thread
firefox:cs0 pid 21244)
[42174.187007] amdgpu 0000:06:00.0: amdgpu:   in page starting at address
0x0000000000200000 from client 27
[42174.187008] amdgpu 0000:06:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00100431
[42174.187009] amdgpu 0000:06:00.0: amdgpu:      Faulty UTCL2 client ID: IA
(0x2)
[42174.187010] amdgpu 0000:06:00.0: amdgpu:      MORE_FAULTS: 0x1
[42174.187010] amdgpu 0000:06:00.0: amdgpu:      WALKER_ERROR: 0x0
[42174.187011] amdgpu 0000:06:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[42174.187012] amdgpu 0000:06:00.0: amdgpu:      MAPPING_ERROR: 0x0
[42174.187012] amdgpu 0000:06:00.0: amdgpu:      RW: 0x0
... (the above messages are repeated many times)
[42184.187655] amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault
(src_id:0 ring:0 vmid:1 pasid:32772, for process firefox pid 21156 thread
firefox:cs0 pid 21244)
[42184.187656] amdgpu 0000:06:00.0: amdgpu:   in page starting at address
0x0000000000200000 from client 27
[42184.187656] amdgpu 0000:06:00.0: amdgpu:
VM_L2_PROTECTION_FAULT_STATUS:0x00100431
[42184.187657] amdgpu 0000:06:00.0: amdgpu:      Faulty UTCL2 client ID: IA
(0x2)
[42184.187657] amdgpu 0000:06:00.0: amdgpu:      MORE_FAULTS: 0x1
[42184.187658] amdgpu 0000:06:00.0: amdgpu:      WALKER_ERROR: 0x0
[42184.187658] amdgpu 0000:06:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[42184.187659] amdgpu 0000:06:00.0: amdgpu:      MAPPING_ERROR: 0x0
[42184.187660] amdgpu 0000:06:00.0: amdgpu:      RW: 0x0
[42184.328388] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=109568, emitted seq=109570
[42184.328538] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process firefox pid 21156 thread firefox:cs0 pid 21244
[42184.328542] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
[42184.330868] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd079a0 flags=0x0070]
[42184.330878] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd079c0 flags=0x0070]
[42184.330894] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd40000 flags=0x0070]
[42184.330901] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd079e0 flags=0x0070]
[42184.330909] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd40000 flags=0x0070]
[42184.330917] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd07a00 flags=0x0070]
[42184.330924] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd40000 flags=0x0070]
[42184.330942] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd07a20 flags=0x0070]
[42184.330950] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd40000 flags=0x0070]
[42184.330966] amdgpu 0000:06:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
domain=0x0000 address=0x10cd07a40 flags=0x0070]
[42184.421882] [drm] free PSP TMR buffer
[42184.451954] amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to
resume
[42184.452090] [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
[42184.452275] [drm] PSP is resuming...
[42184.472305] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
[42184.537825] amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[42184.546811] amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not
available
[42184.759724] [drm] kiq ring mec 2 pipe 1 q 0
[42184.958535] amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring gfx test failed (-110)
[42184.958584] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
IP block <gfx_v9_0> failed -110
[42184.958597] amdgpu 0000:06:00.0: amdgpu: GPU reset(2) failed
[42184.958667] amdgpu 0000:06:00.0: amdgpu: GPU reset end with ret = -110
[42195.061025] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[42205.292585] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but
soft recovered
[42243.148200] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=21546, emitted seq=21548
[42243.148346] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process  pid 0 thread  pid 0
[42243.148351] amdgpu 0000:06:00.0: amdgpu: GPU reset begin!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (45 preceding siblings ...)
  2021-02-28 12:35 ` bugzilla-daemon
@ 2021-03-28 13:19 ` bugzilla-daemon
  2021-08-22 20:01 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-03-28 13:19 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Csaba Tímár (csaba.timar01@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |csaba.timar01@gmail.com

--- Comment #47 from Csaba Tímár (csaba.timar01@gmail.com) ---
I have something very similar with my Vega56. I can reproduce it with Win10
too. 
I think it's an AMD Hw issue. 

march 28 15:07:35 PC-home kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
march 28 15:07:35 PC-home kernel: qcm fence wait loop timeout expired
march 28 15:07:35 PC-home kernel: The cp might be in an unrecoverable state due
to an unsuccessful queues preemption
march 28 15:07:35 PC-home kernel: amdgpu: Failed to evict process queues
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu: Failed to quiesce KFD
march 28 15:07:35 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring gfx timeout, signaled seq=567492, emitted seq=567494
march 28 15:07:35 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process vkcube pid 7677 thread vkcube pid 7677
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:35 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR
for s_job:869c2, as another already in progress
march 28 15:07:36 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
ring page1 timeout, signaled seq=20352, emitted seq=20353
march 28 15:07:36 PC-home kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: GPU reset begin!
march 28 15:07:36 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: Bailing on TDR
for s_job:4f80, as another already in progress
march 28 15:07:39 PC-home kernel: amdgpu 0000:0a:00.0: amdgpu: failed to
suspend display audio
march 28 15:07:39 PC-home kernel: BUG: unable to handle page fault for address:
ffffa9c54bb4f910
march 28 15:07:39 PC-home kernel: #PF: supervisor write access in kernel mode
march 28 15:07:39 PC-home kernel: #PF: error_code(0x0002) - not-present page
march 28 15:07:39 PC-home kernel: PGD 100000067 P4D 100000067 PUD 1001b9067 PMD
1cdabb067 PTE 0
march 28 15:07:39 PC-home kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
march 28 15:07:39 PC-home kernel: CPU: 9 PID: 8586 Comm: kworker/9:0 Tainted: G
          OE     5.11.6-1-MANJARO #1


march 28 15:07:39 PC-home kernel: Hardware name: System manufacturer System
Product Name/PRIME A320M-K, BIOS 5603 10/14/2020
march 28 15:07:39 PC-home kernel: Workqueue: events kfd_process_hw_exception
[amdgpu]
march 28 15:07:39 PC-home kernel: RIP: 0010:amdgpu_device_lock_adev+0x2b/0x83
[amdgpu]
march 28 15:07:39 PC-home kernel: Code: 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f
b1 97 f4 77 01 00 45 31 c0 85 c0 75 64 53 48 89 fb 48 8d bf 00 78 01 00 e8 e7
16 27 c9 <f0> ff 83 40 >
march 28 15:07:39 PC-home kernel: RSP: 0018:ffffa9c54c73be00 EFLAGS: 00010246
march 28 15:07:39 PC-home kernel: RAX: ffff951f0c155dc0 RBX: ffffa9c54bb495d0
RCX: 0000000000000001
march 28 15:07:39 PC-home kernel: RDX: 0000000000000001 RSI: 0000000000000000
RDI: ffffa9c54bb60dd0
march 28 15:07:39 PC-home kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
march 28 15:07:39 PC-home kernel: R10: 0000000000000003 R11: 0000000000000000
R12: ffffa9c54bb495d0
march 28 15:07:39 PC-home kernel: R13: ffff951e19160000 R14: ffff951e19170e30
R15: 00000000000000e0
march 28 15:07:39 PC-home kernel: FS:  0000000000000000(0000)
GS:ffff95210ea40000(0000) knlGS:0000000000000000
march 28 15:07:39 PC-home kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910 CR3: 0000000385410000
CR4: 00000000003506e0
march 28 15:07:39 PC-home kernel: Call Trace:
march 28 15:07:39 PC-home kernel:  amdgpu_device_gpu_recover.cold+0x180/0x95d
[amdgpu]
march 28 15:07:39 PC-home kernel:  ?
amdgpu_device_doorbell_init.part.0+0x71/0xc0 [amdgpu]
march 28 15:07:39 PC-home kernel:  process_one_work+0x214/0x3e0
march 28 15:07:39 PC-home kernel:  worker_thread+0x4d/0x3d0
march 28 15:07:39 PC-home kernel:  ? rescuer_thread+0x3c0/0x3c0
march 28 15:07:39 PC-home kernel:  kthread+0x142/0x160
march 28 15:07:39 PC-home kernel:  ? __kthread_bind_mask+0x60/0x60
march 28 15:07:39 PC-home kernel:  ret_from_fork+0x22/0x30
march 28 15:07:39 PC-home kernel: Modules linked in: rfcomm cmac algif_hash
algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc
uas usb_storage mousedev>
march 28 15:07:39 PC-home kernel:  gpio_amdpt acpi_cpufreq drm uinput sg fuse
crypto_user agpgart ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
crc32c_intel xhci_pci
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910
march 28 15:07:39 PC-home kernel: ---[ end trace 2eaf88bedaabd891 ]---
march 28 15:07:39 PC-home kernel: RIP: 0010:amdgpu_device_lock_adev+0x2b/0x83
[amdgpu]
march 28 15:07:39 PC-home kernel: Code: 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f
b1 97 f4 77 01 00 45 31 c0 85 c0 75 64 53 48 89 fb 48 8d bf 00 78 01 00 e8 e7
16 27 c9 <f0> ff 83 40 >
march 28 15:07:39 PC-home kernel: RSP: 0018:ffffa9c54c73be00 EFLAGS: 00010246
march 28 15:07:39 PC-home kernel: RAX: ffff951f0c155dc0 RBX: ffffa9c54bb495d0
RCX: 0000000000000001
march 28 15:07:39 PC-home kernel: RDX: 0000000000000001 RSI: 0000000000000000
RDI: ffffa9c54bb60dd0
march 28 15:07:39 PC-home kernel: RBP: 0000000000000000 R08: 0000000000000000
R09: 0000000000000000
march 28 15:07:39 PC-home kernel: R10: 0000000000000003 R11: 0000000000000000
R12: ffffa9c54bb495d0
march 28 15:07:39 PC-home kernel: R13: ffff951e19160000 R14: ffff951e19170e30
R15: 00000000000000e0
march 28 15:07:39 PC-home kernel: FS:  0000000000000000(0000)
GS:ffff95210ea40000(0000) knlGS:0000000000000000
march 28 15:07:39 PC-home kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
march 28 15:07:39 PC-home kernel: CR2: ffffa9c54bb4f910 CR3: 00000002fa6de000
CR4: 00000000003506e0

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (46 preceding siblings ...)
  2021-03-28 13:19 ` bugzilla-daemon
@ 2021-08-22 20:01 ` bugzilla-daemon
  2021-11-17  7:14 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-08-22 20:01 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #48 from i-am-not-a-robot@riseup.net ---
This seems to be a firmware(-related) problem. After downgrading to linux
firmware  2020-09-18, I'm running 6 days without a crash on the same work
loads. (I was getting multiple crashes per day before).

My GPU is Vega8 Mobile (ThinkPad A485). Currently running 5.13.11.

An extensive discussion of different firmware versions in the context of a
similar issue on Arch Forums:
https://bbs.archlinux.org/viewtopic.php?id=266358&p=5

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (47 preceding siblings ...)
  2021-08-22 20:01 ` bugzilla-daemon
@ 2021-11-17  7:14 ` bugzilla-daemon
  2021-11-26  2:09 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-11-17  7:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Fusion Future (qydwhotmail@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |qydwhotmail@gmail.com

--- Comment #49 from Fusion Future (qydwhotmail@gmail.com) ---
Ryzen 4700U same error. openSUSE Tumbleweed

X11 

Kernel version is 5.14.14 

Mesa version is 21.2.5-293.2 

Firmware version is 20211027-1.1

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (48 preceding siblings ...)
  2021-11-17  7:14 ` bugzilla-daemon
@ 2021-11-26  2:09 ` bugzilla-daemon
  2021-12-12 21:59 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-11-26  2:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

T. Noah (aussir@tutanota.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aussir@tutanota.com

--- Comment #50 from T. Noah (aussir@tutanota.com) ---
(In reply to i-am-not-a-robot from comment #48)
> This seems to be a firmware(-related) problem. After downgrading to linux
> firmware  2020-09-18, I'm running 6 days without a crash on the same work
> loads. (I was getting multiple crashes per day before).

Did you test any other versions? Was 09-18 the last working release?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (49 preceding siblings ...)
  2021-11-26  2:09 ` bugzilla-daemon
@ 2021-12-12 21:59 ` bugzilla-daemon
  2021-12-22 20:33 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-12-12 21:59 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #51 from T. Noah (aussir@tutanota.com) ---
A possible solution is to pass
amdgpu.dpm=0
as a kernel launch option.

However: this kills fps in many games and probably anything that depends on the
gpu for rendering.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (50 preceding siblings ...)
  2021-12-12 21:59 ` bugzilla-daemon
@ 2021-12-22 20:33 ` bugzilla-daemon
  2022-01-01  4:29 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2021-12-22 20:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

roman (coolx67@gmx.at) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |coolx67@gmx.at

--- Comment #52 from roman (coolx67@gmx.at) ---
I can confirm that 
amdgpu.dpm=0 
removes the issue 
on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW:
20211027.1d00989-1|mesa: 21.3.2-1

Works perfectly fine in Gnome as long as there is no application accessing the
2nd GPU. 

When opening Radeon-profile as long as card0 is selected, there is no issue but
as soon as I select card1 I get instantly 
Dec 22 21:15:46 Workstation kernel: amdgpu: 
                                     failed to send message 171 ret is 0 
Dec 22 21:15:49 Workstation kernel: amdgpu: 
                                     last message was failed ret is 0

The application Radeon-profile freezes but desktop is still responsive. 



When opening CS:GO with mangohud and configuring either

pci_dev = 0000:3d:00.0 # primary card works fine
or 
pci_dev = 0000:3e:00.0 # secondary card, errors from above occur and CS:GO
loads super slow and after menu is visible it is stuck 

When CSM is disabled in BIOS I have 2 GPUs 

Dec 22 20:45:50 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 22 20:45:50 Workstation kernel: amdgpu: CRAT table not found
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: vgaarb: deactivate vga
console
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:50 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:50 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:50 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:50 Workstation kernel: snd_hda_intel 0000:3d:00.1: bound
0000:3d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 22 20:45:50 Workstation kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:50 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:50 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:50 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:50 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:50 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: [drm] fb0: amdgpu
frame buffer device
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3d:00.0 on minor 0
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: enabling device (0106
-> 0107)
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 22 20:45:51 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801SL-102
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 22 20:45:51 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 22 20:45:51 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 22 20:45:51 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 22 20:45:51 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 22 20:45:51 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 22 20:45:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Using BACO for
runtime pm
Dec 22 20:45:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3e:00.0 on minor 1
Dec 22 20:45:53 Workstation gnome-shell[1988]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 22 20:45:53 Workstation gnome-shell[1988]: Added device '/dev/dri/card1'
(amdgpu) using atomic mode setting.
Dec 22 20:45:55 Workstation gnome-shell[1988]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.
Dec 22 20:46:03 Workstation gnome-shell[2527]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 22 20:46:04 Workstation gnome-shell[2527]: Added device '/dev/dri/card1'
(amdgpu) using atomic mode setting.
Dec 22 20:46:05 Workstation gnome-shell[2527]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.


With enabled CSM only the primary GPU is available
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu kernel modesetting enabled.
Dec 17 18:17:51 Workstation kernel: amdgpu: CRAT table not found
Dec 17 18:17:51 Workstation kernel: amdgpu: Virtual CRAT table created for CPU
Dec 17 18:17:51 Workstation kernel: amdgpu: Topology: Add CPU node
Dec 17 18:17:51 Workstation kernel: fb0: switching to amdgpu from EFI VGA
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: vgaarb: deactivate vga
console
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: No more image in the
PCI ROM
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 17 18:17:51 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801MS-102
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 2: releasing [mem
0xb0000000-0xb01fffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 0: releasing [mem
0xa0000000-0xafffffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 0: assigned [mem
0x388000000000-0x3880ffffffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: BAR 2: assigned [mem
0x388100000000-0x3881001fffff 64bit pref]
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: VRAM: 4096M
0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: GART: 1024M
0x000000FF00000000 - 0x000000FF3FFFFFFF
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu: 4096M of VRAM memory ready
Dec 17 18:17:51 Workstation kernel: [drm] amdgpu: 4096M of GTT memory ready.
Dec 17 18:17:51 Workstation kernel: amdgpu: hwmgr_sw_init smu backed is
fiji_smu
Dec 17 18:17:51 Workstation kernel: snd_hda_intel 0000:3d:00.1: bound
0000:3d:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dec 17 18:17:51 Workstation kernel: [drm:retrieve_link_cap [amdgpu]] *ERROR*
retrieve_link_cap: Read receiver caps dpcd data failed.
Dec 17 18:17:51 Workstation kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on
gart
Dec 17 18:17:51 Workstation kernel: amdgpu: Virtual CRAT table created for GPU
Dec 17 18:17:51 Workstation kernel: amdgpu: Topology: Add dGPU node
[0x7300:0x1002]
Dec 17 18:17:51 Workstation kernel: kfd kfd: amdgpu: added device 1002:7300
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: SE 4, SH per
SE 1, CU per SH 16, active_cu_number 64
Dec 17 18:17:51 Workstation kernel: fbcon: amdgpu (fb0) is primary device
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: [drm] fb0: amdgpu
frame buffer device
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3d:00.0: amdgpu: Using BACO for
runtime pm
Dec 17 18:17:51 Workstation kernel: [drm] Initialized amdgpu 3.42.0 20150101
for 0000:3d:00.0 on minor 0
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3e:00.0: enabling device (0100
-> 0103)
Dec 17 18:17:51 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Trusted Memory
Zone (TMZ) feature not supported
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fetched VBIOS
from ROM BAR
Dec 17 18:17:52 Workstation kernel: amdgpu: ATOM BIOS: 113-C88801SL-102
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: BAR 2: releasing [???
0x00000000 flags 0x0]
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: BAR 0: releasing [???
0x00000000 flags 0x0]
Dec 17 18:17:52 Workstation kernel: [drm:amdgpu_device_resize_fb_bar [amdgpu]]
*ERROR* Problem resizing BAR0 (-16).
Dec 17 18:17:52 Workstation kernel: [drm:amdgpu_device_init.cold [amdgpu]]
*ERROR* sw_init of IP block <gmc_v8_0> failed -19
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu:
amdgpu_device_ip_init failed
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: Fatal error
during GPU init
Dec 17 18:17:52 Workstation kernel: amdgpu 0000:3e:00.0: amdgpu: amdgpu:
finishing device.
Dec 17 18:18:00 Workstation gnome-shell[1921]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 17 18:18:02 Workstation gnome-shell[1921]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.
Dec 17 18:18:13 Workstation gnome-shell[2410]: Added device '/dev/dri/card0'
(amdgpu) using atomic mode setting.
Dec 17 18:18:14 Workstation gnome-shell[2410]: Disabling DMA buffer screen
sharing for driver 'amdgpu'.

Hopefully @Alex  can do/forward this since this is a P1 blocking issue and open
for 3 years.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (51 preceding siblings ...)
  2021-12-22 20:33 ` bugzilla-daemon
@ 2022-01-01  4:29 ` bugzilla-daemon
  2022-01-09 18:06 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-01  4:29 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Spencer (smp@nandre.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smp@nandre.com

--- Comment #53 from Spencer (smp@nandre.com) ---
(In reply to roman from comment #52)
> I can confirm that 
> amdgpu.dpm=0 
> removes the issue 
> on an AMD Radeon PRO FIJI (Dual Fury) kernel: 5.15.10|FW:
> 20211027.1d00989-1|mesa: 21.3.2-1
> 
> Works perfectly fine in Gnome as long as there is no application accessing
> the 2nd GPU. 

In sourse games it works fine for me but in many non-source games it'll just
fucking die.
Anyways, now I cant boot withouth dpm, it freezes, meaning that source games
will crash, along with Risk of Rain 2 and others.

> Hopefully @Alex  can do/forward this since this is a P1 blocking issue and
> open for 3 years.

I can only hope it gets fixed one day soon.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (52 preceding siblings ...)
  2022-01-01  4:29 ` bugzilla-daemon
@ 2022-01-09 18:06 ` bugzilla-daemon
  2022-01-22 23:54 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-09 18:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

James (james.a.elian@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |james.a.elian@gmail.com

--- Comment #54 from James (james.a.elian@gmail.com) ---
I can confirm as well that disabling dynamic power management with the
amdgpu.drm=0 kernel parameter removes the issue with Dishonored 2 on Ubuntu
21.10, kernel 5.13.0, Radeon RX 580 with Mesa 21.2.2.

Same boat as Spencer: hope it gets fixed one day.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (53 preceding siblings ...)
  2022-01-09 18:06 ` bugzilla-daemon
@ 2022-01-22 23:54 ` bugzilla-daemon
  2022-01-22 23:56 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-22 23:54 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

techxgames@outlook.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |techxgames@outlook.com

--- Comment #55 from techxgames@outlook.com ---
I don't know if it's related, but my display freaks out before shutting off.
It's still on, and it doesn't reboot when I do it by SSH.  I have to do it on
the desktop itself.

Jan 22 06:17:30 Y4M1-II kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR*
resume of IP block <psp> failed -62
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume
failed
Jan 22 06:17:30 Y4M1-II kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create
ring failed!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PSP is resuming...
Jan 22 06:17:30 Y4M1-II kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 06:17:30 Y4M1-II kernel: [drm] PCIE GART of 512M enabled (table at
0x0000008000753000).
Jan 22 06:17:30 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset
succeeded, trying to resume
Jan 22 06:17:26 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1
reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 22 06:17:19 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 22 06:17:19 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <psp> failed -22
Jan 22 06:17:19 Y4M1-II kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to
terminate ras ta
Jan 22 06:17:19 Y4M1-II kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and
response status is (0x0)
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 06:17:16 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries -
hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: [drm] REG_WAIT timeout 1us * 200 tries -
hubp2_set_blank line:950
Jan 22 06:17:15 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to disable
gfxoff!
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:80:crtc-1] flip_done timed out
Jan 22 06:17:15 Y4M1-II kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:77:crtc-0] flip_done timed out
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for
s_job:18e3f, as another already in progress
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process Xorg pid 1688 thread Xorg:cs0 pid 1731
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=112513, emitted seq=112515
Jan 22 06:17:10 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma1 timeout, signaled seq=7058, emitted seq=7059
Jan 22 06:17:10 Y4M1-II kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR*
Waiting for fences timed out!
Jan 22 06:17:05 Y4M1-II kernel: [drm:amdgpu_dm_commit_planes [amdgpu]] *ERROR*
Waiting for fences timed out!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (54 preceding siblings ...)
  2022-01-22 23:54 ` bugzilla-daemon
@ 2022-01-22 23:56 ` bugzilla-daemon
  2022-01-24 23:17 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-22 23:56 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #56 from techxgames@outlook.com ---
Another instance, when my desktop has been idle for a while and the display has
been shut off for a while, the display won't come back on.  Here's the journal
entry I think is relevant to this:

Jan 22 08:07:58 Y4M1-II kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 08:07:58 Y4M1-II kernel:       Tainted: G           OE    
5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21
Jan 22 08:07:58 Y4M1-II kernel: INFO: task Xorg:1692 blocked for more than 120
seconds.
Jan 22 08:07:58 Y4M1-II kernel:  </TASK>
Jan 22 08:07:58 Y4M1-II kernel:  ret_from_fork+0x22/0x30
Jan 22 08:07:58 Y4M1-II kernel:  ? set_kthread_struct+0x50/0x50
Jan 22 08:07:58 Y4M1-II kernel:  ? process_one_work+0x3d0/0x3d0
Jan 22 08:07:58 Y4M1-II kernel:  kthread+0x11e/0x140
Jan 22 08:07:58 Y4M1-II kernel:  worker_thread+0x53/0x420
Jan 22 08:07:58 Y4M1-II kernel:  process_one_work+0x22b/0x3d0
Jan 22 08:07:58 Y4M1-II kernel:  drm_sched_job_timedout+0x6f/0x110 [gpu_sched]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_job_timedout+0x14f/0x170 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_gpu_recover.cold+0x6ec/0x8f8
[amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? drm_fb_helper_set_suspend_unlocked+0x33/0xa0
[drm_kms_helper]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_pre_asic_reset+0xdd/0x480
[amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  amdgpu_device_ip_suspend_phase1+0xa3/0x180
[amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? amdgpu_device_set_cg_state+0x12f/0x280
[amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  ? nv_common_set_clockgating_state+0x9f/0xb0
[amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  dm_suspend+0xaa/0x270 [amdgpu]
Jan 22 08:07:58 Y4M1-II kernel:  mutex_lock+0x34/0x40
Jan 22 08:07:58 Y4M1-II kernel:  __mutex_lock_slowpath+0x13/0x20
Jan 22 08:07:58 Y4M1-II kernel:  __mutex_lock.constprop.0+0x263/0x490
Jan 22 08:07:58 Y4M1-II kernel:  schedule_preempt_disabled+0xe/0x10
Jan 22 08:07:58 Y4M1-II kernel:  schedule+0x4e/0xb0
Jan 22 08:07:58 Y4M1-II kernel:  __schedule+0x23d/0x590
Jan 22 08:07:58 Y4M1-II kernel:  <TASK>
Jan 22 08:07:58 Y4M1-II kernel: Call Trace:
Jan 22 08:07:58 Y4M1-II kernel: Workqueue: events drm_sched_job_timedout
[gpu_sched]
Jan 22 08:07:58 Y4M1-II kernel: task:kworker/12:1    state:D stack:    0 pid: 
246 ppid:     2 flags:0x00004000
Jan 22 08:07:58 Y4M1-II kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 22 08:07:58 Y4M1-II kernel:       Tainted: G           OE    
5.15.11-76051511-generic #202112220937~1640185481~21.10~b3a2c21
Jan 22 08:07:58 Y4M1-II kernel: INFO: task kworker/12:1:246 blocked for more
than 120 seconds.
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for
s_job:1123, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Bailing on TDR for
s_job:43c, as another already in progress
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=4303, emitted seq=4305
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma3 timeout, signaled seq=1084, emitted seq=1086
Jan 22 08:05:24 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma2 timeout, signaled seq=4379, emitted seq=4381
Jan 22 08:05:20 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:20 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:19 Y4M1-II kernel: amdgpu_cs_ioctl: 59 callbacks suppressed
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset end with
ret = -62
Jan 22 08:05:14 Y4M1-II kernel: snd_hda_intel 0000:0c:00.1: CORB reset
timeout#2, CORBRP = 65535
Jan 22 08:05:14 Y4M1-II kernel: snd_hda_intel 0000:0c:00.1: refused to change
power state from D3hot to D0
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed
to initialize parser -125!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
...
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2)
failed
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm] Skip scheduling IBs!
Jan 22 08:05:14 Y4M1-II kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR*
resume of IP block <psp> failed -62
Jan 22 08:05:14 Y4M1-II kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume
failed
Jan 22 08:05:14 Y4M1-II kernel: [drm:psp_hw_start [amdgpu]] *ERROR* PSP create
ring failed!
Jan 22 08:05:14 Y4M1-II kernel: [drm] PSP is resuming...
Jan 22 08:05:14 Y4M1-II kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 08:05:14 Y4M1-II kernel: [drm] PCIE GART of 512M enabled (table at
0x0000008000753000).
Jan 22 08:05:14 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset
succeeded, trying to resume
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: ASIC reset failed
with error, -62 for drm dev, 0000:0c:00.0
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
failed
Jan 22 08:05:03 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done
with your previous command!
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1
reset
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 22 08:04:58 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 22 08:04:58 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <psp> failed -22
Jan 22 08:04:58 Y4M1-II kernel: [drm:psp_suspend [amdgpu]] *ERROR* Failed to
terminate ras ta
Jan 22 08:04:58 Y4M1-II kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and
response status is (0x0)
Jan 22 08:04:56 Y4M1-II kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]]
*ERROR* suspend of IP block <smu> failed -62
Jan 22 08:04:56 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Fail to disable
dpm features!
Jan 22 08:04:56 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to disable
smu features.
Jan 22 08:04:51 Y4M1-II kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ
disable failed
Jan 22 08:04:51 Y4M1-II kernel: amdgpu 0000:0c:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
Jan 22 08:04:50 Y4M1-II kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ
disable failed
Jan 22 08:04:50 Y4M1-II kernel: amdgpu 0000:0c:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
Jan 22 08:04:50 Y4M1-II kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 22 08:04:50 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process Xorg pid 1692 thread Xorg:cs0 pid 1745
Jan 22 08:04:50 Y4M1-II kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=570767, emitted seq=570769

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (55 preceding siblings ...)
  2022-01-22 23:56 ` bugzilla-daemon
@ 2022-01-24 23:17 ` bugzilla-daemon
  2022-01-25  8:56 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-24 23:17 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #57 from Spencer (smp@nandre.com) ---
Created attachment 300315
  --> https://bugzilla.kernel.org/attachment.cgi?id=300315&action=edit
Kernel config

OS: Gentoo
Kernel: 5.15.16, config attached, built with make -j12
Launch options: root=/dev/sda2 ro quiet

I'd like to be able to boot with amdgpu.dpm=0, as this seems to fix the bug
with minor tradeoffs, however:
When I boot with dpm disabled, my screen will freeze and leave this nice little
stinker to ruin my day

Jan 24 16:33:05 [kernel] [    2.572474] Loading firmware: amdgpu/navi10_pfp.bin
Jan 24 16:33:05 [kernel] [    2.572475] Loading firmware: amdgpu/navi10_me.bin
Jan 24 16:33:05 [kernel] [    2.572476] Loading firmware: amdgpu/navi10_ce.bin
Jan 24 16:33:05 [kernel] [    2.572477] Loading firmware: amdgpu/navi10_rlc.bin
Jan 24 16:33:05 [kernel] [    2.572477] Loading firmware: amdgpu/navi10_mec.bin
Jan 24 16:33:05 [kernel] [    2.572478] Loading firmware:
amdgpu/navi10_mec2.bin
Jan 24 16:33:05 [kernel] [    2.572968] EXT4-fs (sdb1): mounted filesystem with
ordered data mode. Opts: discard. Quota mode: none.
Jan 24 16:33:05 [kernel] [    2.573030] Loading firmware:
amdgpu/navi10_sdma.bin
Jan 24 16:33:05 [kernel] [    2.573032] Loading firmware:
amdgpu/navi10_sdma1.bin
Jan 24 16:33:05 [kernel] [    2.573071] Loading firmware: amdgpu/navi10_vcn.bin
Jan 24 16:33:05 [kernel] [    2.573072] [drm] Found VCN firmware Version ENC:
1.14 DEC: 5 VEP: 0 Revision: 20
Jan 24 16:33:05 [kernel] [    2.573075] amdgpu 0000:28:00.0: amdgpu: Will use
PSP to load VCN firmware
Jan 24 16:33:05 [kernel] [    2.747244] [drm] reserve 0x900000 from
0x817e400000 for PSP TMR
Jan 24 16:33:05 [kernel] [    2.785931] amdgpu 0000:28:00.0: amdgpu: RAS:
optional ras ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790137] amdgpu 0000:28:00.0: amdgpu: RAP:
optional rap ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790138] amdgpu 0000:28:00.0: amdgpu:
SECUREDISPLAY: securedisplay ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790140] amdgpu: smu firmware loading failed
Jan 24 16:33:05 [kernel] [    2.790141] amdgpu 0000:28:00.0: amdgpu:
amdgpu_device_ip_init failed
Jan 24 16:33:05 [kernel] [    2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal
error during GPU init
Jan 24 16:33:05 [kernel] [    2.790144] amdgpu 0000:28:00.0: amdgpu: amdgpu:
finishing device.
Jan 24 16:33:05 [kernel] [    2.793726] [drm] free PSP TMR buffer
Jan 24 16:33:05 [kernel] [    2.825874] amdgpu: probe of 0000:28:00.0 failed
with error -95
Jan 24 16:33:05 [kernel] [    2.825951] BUG: unable to handle page fault for
address: ffffa4af5100d000
Jan 24 16:33:05 [kernel] [    2.825954] #PF: supervisor write access in kernel
mode
Jan 24 16:33:05 [kernel] [    2.825955] #PF: error_code(0x0002) - not-present
page
Jan 24 16:33:05 [kernel] [    2.825957] PGD 100000067 P4D 100000067 PUD
100104067 PMD 0
Jan 24 16:33:05 [kernel] [    2.825960] Oops: 0002 [#1] SMP NOPTI
Jan 24 16:33:05 [kernel] [    2.825962] CPU: 6 PID: 759 Comm: systemd-udevd Not
tainted 5.15.16-gentoo #8
Jan 24 16:33:05 [kernel] [    2.825965] Hardware name: Micro-Star International
Co., Ltd MS-7B86/B450 GAMING PLUS MAX (MS-7B86), BIOS H.60 04/18/2020
Jan 24 16:33:05 [kernel] [    2.825967] RIP: 0010:vcn_v2_0_sw_fini+0x65/0x80
[amdgpu]
Jan 24 16:33:05 [kernel] [    2.826139] Code: 89 ef e8 fe 1b ff ff 85 c0 75 08
48 89 ef e8 42 1a ff ff 48 8b 54 24 08 65 48 2b 14 25 28 00 00 00 75 18 48 83
c4 10 5b 5d c3 <c7> 03 00 00 00 00 8b 7c 24 04 e8 4c c4 4d e9 eb bc e8 15 cd ab
e9
Jan 24 16:33:05 [kernel] [    2.826142] RSP: 0018:ffffa4af40bc7c30 EFLAGS:
00010202

TL;DR: amdgpu: smu firmware loading failed
What it means exactly, I know not, but I know it means my screen is frozen
Is there a trick? A workaround to this?
If there is some info I left out ask for it and I'll fetch it

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (56 preceding siblings ...)
  2022-01-24 23:17 ` bugzilla-daemon
@ 2022-01-25  8:56 ` bugzilla-daemon
  2022-01-25 18:19 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-25  8:56 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #58 from Andrew Ammerlaan (andrewammerlaan@gentoo.org) ---
> Jan 24 16:33:05 [kernel] [    2.785931] amdgpu 0000:28:00.0: amdgpu: RAS:
> optional ras ta ucode is not available
> Jan 24 16:33:05 [kernel] [    2.790137] amdgpu 0000:28:00.0: amdgpu: RAP:
> optional rap ta ucode is not available
> Jan 24 16:33:05 [kernel] [    2.790138] amdgpu 0000:28:00.0: amdgpu:
> SECUREDISPLAY: securedisplay ta ucode is not available
> Jan 24 16:33:05 [kernel] [    2.790140] amdgpu: smu firmware loading failed
> Jan 24 16:33:05 [kernel] [    2.790141] amdgpu 0000:28:00.0: amdgpu:
> amdgpu_device_ip_init failed
> Jan 24 16:33:05 [kernel] [    2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal
> error during GPU init


Is this a custom built kernel? Is amdgpu built into the kernel or enabled as a
module? In the former case, is all required firmware also built into the
kernel? In the later case, is all required firmware available on the initramfs
(if amdgpu is incorporated in the initramfs)? The required firmware files are
listed here: https://wiki.gentoo.org/wiki/AMDGPU#Known_firmware_blobs

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (57 preceding siblings ...)
  2022-01-25  8:56 ` bugzilla-daemon
@ 2022-01-25 18:19 ` bugzilla-daemon
  2022-01-25 18:49 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-25 18:19 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #59 from Spencer (smp@nandre.com) ---
>Is this a custom built kernel? Is amdgpu built into the kernel or enabled as a
>module? In the former case, is all required firmware also built into the
>kernel? In the later case, is all required firmware available on the initramfs
>(if amdgpu is incorporated in the initramfs)? The required firmware files are
>listed here: 

It's a custom, but I have them all builtin.
>grep navi10 .config && echo
>amdgpu/navi10_{asd,ce,gpu_info,me,mec2,mec,pfp,rlc,sdma1,sdma,smc,sos,ta,vcn}.bin
amdgpu/navi10_asd.bin amdgpu/navi10_ce.bin amdgpu/navi10_gpu_info.bin
amdgpu/navi10_me.bin amdgpu/navi10_mec2.bin amdgpu/navi10_mec.bin
amdgpu/navi10_pfp.bin amdgpu/navi10_rlc.bin amdgpu/navi10_sdma1.bin
amdgpu/navi10_sdma.bin amdgpu/navi10_smc.bin amdgpu/navi10_sos.bin
amdgpu/navi10_ta.bin amdgpu/navi10_vcn.bin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (58 preceding siblings ...)
  2022-01-25 18:19 ` bugzilla-daemon
@ 2022-01-25 18:49 ` bugzilla-daemon
  2022-02-02 11:39 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-01-25 18:49 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #60 from Spencer (smp@nandre.com) ---
As an append to both comments, a working boot spits out this:

Loading firmware: amdgpu/navi10_sos.bin
Loading firmware: amdgpu/navi10_asd.bin
Loading firmware: amdgpu/navi10_ta.bin
amdgpu 0000:28:00.0: amdgpu: PSP runtime database doesn't exist
Loading firmware: amdgpu/navi10_smc.bin
Loading firmware: amdgpu/navi10_pfp.bin
Loading firmware: amdgpu/navi10_me.bin
Loading firmware: amdgpu/navi10_ce.bin
Loading firmware: amdgpu/navi10_rlc.bin
Loading firmware: amdgpu/navi10_mec.bin
Loading firmware: amdgpu/navi10_mec2.bin
Loading firmware: amdgpu/navi10_sdma.bin
Loading firmware: amdgpu/navi10_sdma1.bin
Loading firmware: amdgpu/navi10_vcn.bin
amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not
available
amdgpu 0000:28:00.0: amdgpu: use vbios provided pptable
amdgpu 0000:28:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
amdgpu 0000:28:00.0: amdgpu: SMU is initialized successfully!
kfd kfd: amdgpu: Allocated 3969056 bytes on gart
amdgpu: HMM registered 6128MB device memory
amdgpu: SRAT table not found
amdgpu: Virtual CRAT table created for GPU
amdgpu: Topology: Add dGPU node [0x731f:0x1002]
kfd kfd: amdgpu: added device 1002:731f
amdgpu 0000:28:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number
36
fbcon: amdgpudrmfb (fb0) is primary device

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (59 preceding siblings ...)
  2022-01-25 18:49 ` bugzilla-daemon
@ 2022-02-02 11:39 ` bugzilla-daemon
  2022-02-03  1:37 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-02 11:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Jon (kernelorg@digininja.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kernelorg@digininja.net

--- Comment #61 from Jon (kernelorg@digininja.net) ---
Chiming in as another victim of:
[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Radeon RX 6700 XT (NAVY_FLOUNDER, DRM 3.42.0, 5.15.15-76051515-generic, LLVM
12.0.1)
AMD Ryzen 9 5900X 
Ubuntu Mate
Mesa 21.2.2

Haven't attempted the amdgpu.dpm=0 workaround because the side effects of it
appear to be bad.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (60 preceding siblings ...)
  2022-02-02 11:39 ` bugzilla-daemon
@ 2022-02-03  1:37 ` bugzilla-daemon
  2022-02-03  1:39 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-03  1:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #62 from Randune (randyk161@gmail.com) ---
I've been getting "ring gfx timeouts" for some time (See comment 35), most of
the time it's when the computer has not had any input for a while (while I'm
away from it).  When it freezes I can SSH into it but when I try to do a:
"shutdown -h now" it boots me out of SSH as it should but the computer never
seems to actually shutdown.

I've tried many different kernel parameters but no luck so far.  I'm now trying
the amdgpu.runpm=0 as suggested here: https://wiki.archlinux.org/title/AMDGPU
(at the very bottom of the page: Issues with power management / dynamic
re-activation of a discrete amdgpu graphics card) I haven't seen any
performance repercussions yet. I'll just have to wait it out and see.

For my system specs see my previous comment 35.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (61 preceding siblings ...)
  2022-02-03  1:37 ` bugzilla-daemon
@ 2022-02-03  1:39 ` bugzilla-daemon
  2022-02-03  3:42 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-03  1:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #63 from Randune (randyk161@gmail.com) ---
(In reply to Jon from comment #61)
> Chiming in as another victim of:
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> 
> Radeon RX 6700 XT (NAVY_FLOUNDER, DRM 3.42.0, 5.15.15-76051515-generic, LLVM
> 12.0.1)
> AMD Ryzen 9 5900X 
> Ubuntu Mate
> Mesa 21.2.2
> 
> Haven't attempted the amdgpu.dpm=0 workaround because the side effects of it
> appear to be bad.

I've tried amdgpu.dpm=0 and it seriously kills the frame rate in super tux kart
at least.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (62 preceding siblings ...)
  2022-02-03  1:39 ` bugzilla-daemon
@ 2022-02-03  3:42 ` bugzilla-daemon
  2022-02-11 12:23 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-03  3:42 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #64 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Jon from comment #61)
> Chiming in as another victim of:
> [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> 

This is just a symptom of an application trying to use the GPU after a GPU
reset without re-initializing it's context.  The cause of a GPU reset can be a
lot of things.  If you have different hardware from other people on this
ticket, it's not likely the same issue.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (63 preceding siblings ...)
  2022-02-03  3:42 ` bugzilla-daemon
@ 2022-02-11 12:23 ` bugzilla-daemon
  2022-02-24 23:40 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-11 12:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Ilia (inferrna@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |inferrna@gmail.com

--- Comment #65 from Ilia (inferrna@gmail.com) ---
I have same bug with firefox (happened once a day, starting about a week ago)

[ 4409.071226] BUG: unable to handle page fault for address: fffffffffffffff8
[ 4409.071234] #PF: supervisor read access in kernel mode
[ 4409.071235] #PF: error_code(0x0000) - not-present page
[ 4409.071237] PGD 427e12067 P4D 427e12067 PUD 427e14067 PMD 0 
[ 4409.071240] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 4409.071242] CPU: 18 PID: 191 Comm: uvd Tainted: G           OE    
5.16.8uksm #1
[ 4409.071245] Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS
J61 v03.96 10/29/2019
[ 4409.071246] RIP: 0010:swake_up_locked+0x17/0x40
[ 4409.071251] Code: ff ff ff eb ad 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00
0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b
f8 e8 80 7f fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
[ 4409.071253] RSP: 0018:ffffbbdf012b7e70 EFLAGS: 00010007
[ 4409.071254] RAX: ffff9719549270b0 RBX: 0000000000000000 RCX:
0000000000000000
[ 4409.071256] RDX: 0000000000000000 RSI: ffff97185d547250 RDI:
ffff9719549270a8
[ 4409.071257] RBP: ffff9719549270a8 R08: ffff9716473efec0 R09:
ffff9716473efed8
[ 4409.071258] R10: ffff971646cc3000 R11: ffff971646cc3000 R12:
0000000000000286
[ 4409.071259] R13: ffff9716473eebe0 R14: ffff9716ee901bc0 R15:
ffff9719549270a0
[ 4409.071260] FS:  0000000000000000(0000) GS:ffff97213fc80000(0000)
knlGS:0000000000000000
[ 4409.071262] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4409.071263] CR2: fffffffffffffff8 CR3: 0000000427e10006 CR4:
00000000001706e0
[ 4409.071264] Call Trace:
[ 4409.071267]  <TASK>
[ 4409.071269]  complete+0x2f/0x40
[ 4409.071271]  drm_sched_main+0x24b/0x450
[ 4409.071274]  ? wait_woken+0x70/0x70
[ 4409.071289]  ? drm_sched_job_done.isra.0+0x130/0x130
[ 4409.071290]  kthread+0x169/0x190
[ 4409.071294]  ? set_kthread_struct+0x40/0x40
[ 4409.071297]  ret_from_fork+0x1f/0x30
[ 4409.071301]  </TASK>
[ 4409.071302] Modules linked in: xt_conntrack nfnetlink xfrm_user xfrm_algo
xt_addrtype br_netfilter cmac rfcomm vboxnetadp(OE) vboxnetflt(OE)
iptable_mangle xt_CHECKSUM xt_tcpudp iptable_nat xt_comment xt_MASQUERADE
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc overlay
iptable_filter vboxdrv(OE) bnep cpufreq_powersave zram binfmt_misc squashfs
snd_emu10k1_synth snd_hda_codec_realtek snd_emux_synth snd_seq_midi_emul
snd_seq_virmidi snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
snd_hda_intel intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_emu10k1
snd_hda_codec snd_util_mem snd_ac97_codec snd_hda_core nls_iso8859_1 hp_wmi
nls_cp866 ac97_bus platform_profile sparse_keymap snd_hwdep wmi_bmof btusb
snd_pcm sb_edac btrtl x86_pkg_temp_thermal intel_powerclamp snd_seq_midi btbcm
snd_seq_midi_event btintel snd_rawmidi kvm_intel bluetooth input_leds snd_seq
kvm ecdh_generic snd_seq_device snd_timer irqbypass emu10k1_gp serio_raw snd
gameport ioatdma soundcore dca
[ 4409.071342]  wmi mac_hid xpad ff_memless coretemp mei_me mei hwmon_vid
i5500_temp msr ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq
zstd_compress libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid
hid crc32_pclmul ghash_clmulni_intel aesni_intel e1000e psmouse crypto_simd
cryptd ahci i2c_i801 libahci lpc_ich i2c_smbus [last unloaded: cpuid]
[ 4409.071362] CR2: fffffffffffffff8
[ 4409.071364] ---[ end trace a6d18badbe55bb92 ]---
[ 4409.071365] RIP: 0010:swake_up_locked+0x17/0x40
[ 4409.071367] Code: ff ff ff eb ad 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00
0f 1f 44 00 00 48 8b 57 08 48 8d 47 08 48 39 c2 74 25 53 48 8b 5f 08 <48> 8b 7b
f8 e8 80 7f fe ff 48 8b 13 48 8b 43 08 48 89 42 08 48 89
[ 4409.071368] RSP: 0018:ffffbbdf012b7e70 EFLAGS: 00010007
[ 4409.071370] RAX: ffff9719549270b0 RBX: 0000000000000000 RCX:
0000000000000000
[ 4409.071371] RDX: 0000000000000000 RSI: ffff97185d547250 RDI:
ffff9719549270a8
[ 4409.071372] RBP: ffff9719549270a8 R08: ffff9716473efec0 R09:
ffff9716473efed8
[ 4409.071373] R10: ffff971646cc3000 R11: ffff971646cc3000 R12:
0000000000000286
[ 4409.071374] R13: ffff9716473eebe0 R14: ffff9716ee901bc0 R15:
ffff9719549270a0
[ 4409.071375] FS:  0000000000000000(0000) GS:ffff97213fc80000(0000)
knlGS:0000000000000000
[ 4409.071377] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4409.071378] CR2: fffffffffffffff8 CR3: 0000000427e10006 CR4:
00000000001706e0
[ 4409.071379] note: uvd[191] exited with preempt_count 1
[ 4419.193226] [drm:amdgpu_job_timedout] *ERROR* ring uvd timeout, signaled
seq=14, emitted seq=14
[ 4419.193237] [drm:amdgpu_job_timedout] *ERROR* Process information: process
RDD Process pid 37880 thread firefox:cs0 pid 46445
[ 4419.193242] amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
[ 4419.193305] ------------[ cut here ]------------
[ 4419.193307] WARNING: CPU: 18 PID: 45938 at kernel/kthread.c:596
kthread_park+0x6d/0x90
[ 4419.193312] Modules linked in: xt_conntrack nfnetlink xfrm_user xfrm_algo
xt_addrtype br_netfilter cmac rfcomm vboxnetadp(OE) vboxnetflt(OE)
iptable_mangle xt_CHECKSUM xt_tcpudp iptable_nat xt_comment xt_MASQUERADE
nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc overlay
iptable_filter vboxdrv(OE) bnep cpufreq_powersave zram binfmt_misc squashfs
snd_emu10k1_synth snd_hda_codec_realtek snd_emux_synth snd_seq_midi_emul
snd_seq_virmidi snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi
snd_hda_intel intel_rapl_msr snd_intel_dspcfg intel_rapl_common snd_emu10k1
snd_hda_codec snd_util_mem snd_ac97_codec snd_hda_core nls_iso8859_1 hp_wmi
nls_cp866 ac97_bus platform_profile sparse_keymap snd_hwdep wmi_bmof btusb
snd_pcm sb_edac btrtl x86_pkg_temp_thermal intel_powerclamp snd_seq_midi btbcm
snd_seq_midi_event btintel snd_rawmidi kvm_intel bluetooth input_leds snd_seq
kvm ecdh_generic snd_seq_device snd_timer irqbypass emu10k1_gp serio_raw snd
gameport ioatdma soundcore dca
[ 4419.193358]  wmi mac_hid xpad ff_memless coretemp mei_me mei hwmon_vid
i5500_temp msr ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq
zstd_compress libcrc32c hid_logitech_hidpp hid_logitech_dj hid_generic usbhid
hid crc32_pclmul ghash_clmulni_intel aesni_intel e1000e psmouse crypto_simd
cryptd ahci i2c_i801 libahci lpc_ich i2c_smbus [last unloaded: cpuid]
[ 4419.193380] CPU: 18 PID: 45938 Comm: kworker/18:1 Tainted: G      D    OE   
 5.16.8uksm #1
[ 4419.193383] Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS
J61 v03.96 10/29/2019
[ 4419.193384] Workqueue: events drm_sched_job_timedout
[ 4419.193388] RIP: 0010:kthread_park+0x6d/0x90
[ 4419.193391] Code: 20 e8 a7 50 dd 00 be 40 00 00 00 48 89 ef e8 7a 1d 01 00
48 85 c0 74 25 31 c0 5b 5d c3 0f 0b a8 04 48 8b 9d a0 05 00 00 74 b2 <0f> 0b b8
da ff ff ff 5b 5d c3 0f 0b b8 f0 ff ff ff eb dd 0f 0b eb
[ 4419.193394] RSP: 0018:ffffbbdf30497d10 EFLAGS: 00010202
[ 4419.193396] RAX: 000000000020804c RBX: ffff97164124c780 RCX:
0000000000000001
[ 4419.193397] RDX: 0000000000000000 RSI: ffff97185d547000 RDI:
ffff971646e38000
[ 4419.193399] RBP: ffff971646e38000 R08: 0000000000000000 R09:
ffff97213fcaab70
[ 4419.193400] R10: ffff971646e3c1e8 R11: ffff971646e3c1d8 R12:
ffff9716473eea68
[ 4419.193401] R13: 0000000000000060 R14: ffff971642540000 R15:
ffff9716473eebd0
[ 4419.193403] FS:  0000000000000000(0000) GS:ffff97213fc80000(0000)
knlGS:0000000000000000
[ 4419.193404] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4419.193406] CR2: 00007fad273687f0 CR3: 0000000427e10002 CR4:
00000000001706e0
[ 4419.193408] Call Trace:
[ 4419.193410]  <TASK>
[ 4419.193413]  drm_sched_stop+0x31/0x160
[ 4419.193416]  amdgpu_device_gpu_recover.cold+0xa34/0xa6c
[ 4419.193422]  amdgpu_job_timedout+0x145/0x170
[ 4419.193425]  drm_sched_job_timedout+0x63/0x100
[ 4419.193427]  process_one_work+0x1d8/0x3b0
[ 4419.193430]  worker_thread+0x4d/0x3d0
[ 4419.193431]  ? rescuer_thread+0x360/0x360
[ 4419.193433]  kthread+0x169/0x190
[ 4419.193436]  ? set_kthread_struct+0x40/0x40
[ 4419.193439]  ret_from_fork+0x1f/0x30
[ 4419.193444]  </TASK>
[ 4419.193445] ---[ end trace a6d18badbe55bb93 ]---

Also no problem with 3d-games.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (64 preceding siblings ...)
  2022-02-11 12:23 ` bugzilla-daemon
@ 2022-02-24 23:40 ` bugzilla-daemon
  2022-02-25 14:20 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-24 23:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #66 from Randune (randyk161@gmail.com) ---
So I've been running for about 2.5 weeks now using the amdgpu.runpm=0 kernel
parameter and I've had no crashes or freezes so far. I'm cautiously optimistic
that for me at least this may have solved the problem.  So far I haven't
noticed any side effects (performance degradation etc.).

I understand that amdgpu.runpm=0 is related to power management but I don't
know the specifics. Possibly Alex Deucher can chime in and specify exactly what
this parameter does?

See my previous comments for some context:
comment 35
comment 62
comment 63

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (65 preceding siblings ...)
  2022-02-24 23:40 ` bugzilla-daemon
@ 2022-02-25 14:20 ` bugzilla-daemon
  2022-05-05 15:19 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-02-25 14:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #67 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Randune from comment #66)
> 
> I understand that amdgpu.runpm=0 is related to power management but I don't
> know the specifics. Possibly Alex Deucher can chime in and specify exactly
> what this parameter does?

The runpm parameter allows you to disable runtime power management which powers
down dGPUs at runtime if they are not being used (e.g., hybrid graphics laptops
or desktop systems with multiple GPUs) to save power.  It does not affect
dynamic power management while the chip is powered up.  Disabling it will
increase idle power usage.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (66 preceding siblings ...)
  2022-02-25 14:20 ` bugzilla-daemon
@ 2022-05-05 15:19 ` bugzilla-daemon
  2022-05-05 19:14 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-05-05 15:19 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Danil (s48gs.w@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |s48gs.w@gmail.com

--- Comment #68 from Danil (s48gs.w@gmail.com) ---
Had this problem with Ryzen3 3200 CPU (Vega8 integrated) on A320M-DVS R4.0
motherboard. 
microcode: CPU: patch_level=0x08108109
microcode: Microcode Update Driver: v2.2.

I had 100% scenario to trigger freeze:
1. play video (in webbrowser or video player, should stay visible(dont hide tab
or minimize window))
2. open shadertoy website (any shader, keep it rendering also keep window
visible) 
3. open any OpenGL or Vulkan application (that use integrated GPU)
4. start pressing fullscreen/un-fullscreen button on shadertoy shader (~5 times
is enough to trigger bug, system will slowdown slowly in next 10-20 mins till
freeze, just wait(visible on shadertoy FPS counter))
... and freeze

I use this PC for 2 years, every Linux kernel had this "freeze" when used
integrated GPU. Current kernel OpenSuse 5.17.4-1-default.
(my solution for all this time was obvious - disable integrated GPU in BIOS and
use discrete only, and everything works)

Today I checked motherboard website -
https://asrock.com/MB/AMD/A320M-DVS%20R4.0/index.asp#BIOS they have 7.00 and
7.10 BIOS, I was on 4.00 BIOS
So I updated BIOS to 7.00 and 7.10 (now)... and everything works - no freezes
anymore.
So it was firmware problem (atleast for me) that fixed by BIOS update.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (67 preceding siblings ...)
  2022-05-05 15:19 ` bugzilla-daemon
@ 2022-05-05 19:14 ` bugzilla-daemon
  2022-06-11 22:06 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-05-05 19:14 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #69 from Danil (s48gs.w@gmail.com) ---
Edit - got freeze after using PC for 4 hours, before it was 20 min longest time
I could use integrated GPU, so it not fixed completely look like, just some
improvement(or I just got lucky)... im back to use Discrete GPU.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (68 preceding siblings ...)
  2022-05-05 19:14 ` bugzilla-daemon
@ 2022-06-11 22:06 ` bugzilla-daemon
  2022-06-13  1:20 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-11 22:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Martin von Wittich (martin.von.wittich@iserv.eu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |martin.von.wittich@iserv.eu

--- Comment #70 from Martin von Wittich (martin.von.wittich@iserv.eu) ---
My Ubuntu 20.04 desktop is crashing several times per day due to this bug since
I've upgraded my computer from an old Intel Xeon to an AMD Ryzen 9 5900X on a
B550 mainboard. I've had the same AMD RX Vega 56 graphics card in both
computers, so I assume this is probably more related to the mainboard/CPU than
to the graphics card.

The crashes from today:

```
martin@martin ~ % grep amdgpu /var/log/syslog | grep ERROR | grep -v 'Failed to
initialize parser'
Jun 11 03:15:33 martin kernel: [21494.642889] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750601, emitted seq=1750603
Jun 11 03:15:33 martin kernel: [21494.643055] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
firefox:cs0 pid 5123
Jun 11 03:15:50 martin kernel: [21511.795007] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750605, emitted seq=1750608
Jun 11 03:15:50 martin kernel: [21511.795174] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
firefox:cs0 pid 5123
Jun 11 15:56:07 martin kernel: [ 1477.069969] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=216293, emitted seq=216295
Jun 11 15:56:07 martin kernel: [ 1477.070140] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 5237 thread
firefox:cs0 pid 5302
Jun 11 15:56:22 martin kernel: [ 1492.174077] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=216297, emitted seq=216300
Jun 11 15:56:22 martin kernel: [ 1492.174248] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun 11 16:03:28 martin kernel: [ 1918.161101] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=264406, emitted seq=264408
Jun 11 16:03:28 martin kernel: [ 1918.161271] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
firefox:cs0 pid 10633
Jun 11 16:03:49 martin kernel: [ 1938.385307] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=264410, emitted seq=264413
Jun 11 16:03:49 martin kernel: [ 1938.385479] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
firefox:cs0 pid 10633
Jun 11 23:28:12 martin kernel: [25491.854294] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390985, emitted seq=2390987
Jun 11 23:28:12 martin kernel: [25491.854460] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 4922 thread
firefox:cs0 pid 4989
Jun 11 23:28:28 martin kernel: [25507.982446] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390989, emitted seq=2390992
Jun 11 23:28:28 martin kernel: [25507.982613] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jun 11 23:29:51 martin kernel: [25591.333483] amdgpu 0000:2d:00.0: amdgpu:     
 WALKER_ERROR: 0x0
Jun 11 23:29:51 martin kernel: [25591.333485] amdgpu 0000:2d:00.0: amdgpu:     
 MAPPING_ERROR: 0x0
Jun 11 23:30:01 martin kernel: [25601.412838] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring uvd_0 timeout, signaled seq=308, emitted seq=310
Jun 11 23:30:01 martin kernel: [25601.413009] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process mpv pid 44110 thread mpv:cs0 pid
44122
Jun 11 23:30:16 martin kernel: [25616.014983] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx timeout, signaled seq=2409182, emitted seq=2409185
Jun 11 23:30:16 martin kernel: [25616.015151] [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process firefox pid 42941 thread
firefox:cs0 pid 43005
```

When I upgraded my computer at the end of 2021, I had to switch from the
default Ubuntu 20.04 kernel `linux-image-generic` (5.4.0) to
`linux-image-generic-hwe-20.04` (5.11.0) because of some hardware issues with
the new computer (I don't remember what exactly didn't work, IIRC the network).

I'm not exactly sure when the crashes started, but I changed from
`linux-image-generic-hwe-20.04` (5.14) to `linux-image-oem-20.04d` (5.14) on
2022-04-30 in the hopes that that might resolve the issue, but unfortunately it
didn't help.

I tried the `amdgpu.runpm=0` workaround today which also didn't help.

I can also confirm that the attached video "5 second video clip that triggers a
crash" successfully triggers the crash on my system.

The main other thing that seems to trigger the crash is to open new tabs in
Firefox (in that not every new tab I open causes the crash, but when it
crashes, it's usually when I was trying to open a new tab).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (69 preceding siblings ...)
  2022-06-11 22:06 ` bugzilla-daemon
@ 2022-06-13  1:20 ` bugzilla-daemon
  2022-06-20 12:03 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-13  1:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #71 from Panagiotis Polychronis (panospolychronis@gmail.com) ---
(In reply to Martin von Wittich from comment #70)
> My Ubuntu 20.04 desktop is crashing several times per day due to this bug
> since I've upgraded my computer from an old Intel Xeon to an AMD Ryzen 9
> 5900X on a B550 mainboard. I've had the same AMD RX Vega 56 graphics card in
> both computers, so I assume this is probably more related to the
> mainboard/CPU than to the graphics card.
> 
> The crashes from today:
> 
> ```
> martin@martin ~ % grep amdgpu /var/log/syslog | grep ERROR | grep -v 'Failed
> to initialize parser'
> Jun 11 03:15:33 martin kernel: [21494.642889] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750601, emitted seq=1750603
> Jun 11 03:15:33 martin kernel: [21494.643055] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
> firefox:cs0 pid 5123
> Jun 11 03:15:50 martin kernel: [21511.795007] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1750605, emitted seq=1750608
> Jun 11 03:15:50 martin kernel: [21511.795174] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 5037 thread
> firefox:cs0 pid 5123
> Jun 11 15:56:07 martin kernel: [ 1477.069969] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=216293, emitted seq=216295
> Jun 11 15:56:07 martin kernel: [ 1477.070140] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 5237 thread
> firefox:cs0 pid 5302
> Jun 11 15:56:22 martin kernel: [ 1492.174077] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=216297, emitted seq=216300
> Jun 11 15:56:22 martin kernel: [ 1492.174248] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
> Jun 11 16:03:28 martin kernel: [ 1918.161101] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=264406, emitted seq=264408
> Jun 11 16:03:28 martin kernel: [ 1918.161271] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
> firefox:cs0 pid 10633
> Jun 11 16:03:49 martin kernel: [ 1938.385307] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=264410, emitted seq=264413
> Jun 11 16:03:49 martin kernel: [ 1938.385479] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 10569 thread
> firefox:cs0 pid 10633
> Jun 11 23:28:12 martin kernel: [25491.854294] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390985, emitted seq=2390987
> Jun 11 23:28:12 martin kernel: [25491.854460] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 4922 thread
> firefox:cs0 pid 4989
> Jun 11 23:28:28 martin kernel: [25507.982446] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2390989, emitted seq=2390992
> Jun 11 23:28:28 martin kernel: [25507.982613] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
> Jun 11 23:29:51 martin kernel: [25591.333483] amdgpu 0000:2d:00.0: amdgpu:  
> WALKER_ERROR: 0x0
> Jun 11 23:29:51 martin kernel: [25591.333485] amdgpu 0000:2d:00.0: amdgpu:  
> MAPPING_ERROR: 0x0
> Jun 11 23:30:01 martin kernel: [25601.412838] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring uvd_0 timeout, signaled seq=308, emitted seq=310
> Jun 11 23:30:01 martin kernel: [25601.413009] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process mpv pid 44110 thread mpv:cs0
> pid 44122
> Jun 11 23:30:16 martin kernel: [25616.014983] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2409182, emitted seq=2409185
> Jun 11 23:30:16 martin kernel: [25616.015151] [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox pid 42941 thread
> firefox:cs0 pid 43005
> ```
> 
> When I upgraded my computer at the end of 2021, I had to switch from the
> default Ubuntu 20.04 kernel `linux-image-generic` (5.4.0) to
> `linux-image-generic-hwe-20.04` (5.11.0) because of some hardware issues
> with the new computer (I don't remember what exactly didn't work, IIRC the
> network).
> 
> I'm not exactly sure when the crashes started, but I changed from
> `linux-image-generic-hwe-20.04` (5.14) to `linux-image-oem-20.04d` (5.14) on
> 2022-04-30 in the hopes that that might resolve the issue, but unfortunately
> it didn't help.
> 
> I tried the `amdgpu.runpm=0` workaround today which also didn't help.
> 
> I can also confirm that the attached video "5 second video clip that
> triggers a crash" successfully triggers the crash on my system.
> 
> The main other thing that seems to trigger the crash is to open new tabs in
> Firefox (in that not every new tab I open causes the crash, but when it
> crashes, it's usually when I was trying to open a new tab).

Did you try with the latest Linux Kernel? I had a lot of gpu lockups like this.
Also try these kernel parameters : "amdgpu.ppfeaturemask=0xffffbffb 
amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1 amdgpu.audio=0
amdgpu.deep_color=1 amd_iommu=on iommu=pt"" ( you might also try with
amdgpu.ppfeaturemask=0xfffd7fff or amdgpu.ppfeaturemask=0xffffffff )

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (70 preceding siblings ...)
  2022-06-13  1:20 ` bugzilla-daemon
@ 2022-06-20 12:03 ` bugzilla-daemon
  2022-06-20 12:06 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-20 12:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #72 from Martin von Wittich (martin.von.wittich@iserv.eu) ---
I can confirm that adding "amdgpu.dpm=0" to the kernel command line seems to
resolve this issue - I enabled that option on 2022-06-12 13:24, and my system
didn't crash at all on 2022-06-12 - 2022-06-14 (I was on vacation from
2022-06-15 on and didn't use my computer from then on).

I don't use Linux for gaming and therefore can't comment how badly this affects
gaming performance, but I did notice mpv could no longer play 1080p x264 video
without stuttering when it defaults to --vo=gpu. Using another --vo like sdl
seems to be a viable workaround.

> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like this. Also try these kernel parameters : "amdgpu.ppfeaturemask=0xffffbffb  amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt"" ( you might also try with amdgpu.ppfeaturemask=0xfffd7fff or amdgpu.ppfeaturemask=0xffffffff )

I'll try these next.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (71 preceding siblings ...)
  2022-06-20 12:03 ` bugzilla-daemon
@ 2022-06-20 12:06 ` bugzilla-daemon
  2022-06-22 12:56 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-20 12:06 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #73 from Martin von Wittich (martin.von.wittich@iserv.eu) ---
Sorry, forgot to mention in my last post and now can't edit: interestingly
enough, the attached video "5 second video clip that triggers a crash" still
successfully triggers the crash.

Seems to me like the root issue isn't actually in the dynamic power management
code, but somewhere else, and the DPM is just one of several things that can
trigger it?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (72 preceding siblings ...)
  2022-06-20 12:06 ` bugzilla-daemon
@ 2022-06-22 12:56 ` bugzilla-daemon
  2022-06-23 10:04 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-22 12:56 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #74 from Martin von Wittich (martin.von.wittich@iserv.eu) ---
> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like this. Also try these kernel parameters : "amdgpu.ppfeaturemask=0xffffbffb  amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt"" ( you might also try with amdgpu.ppfeaturemask=0xfffd7fff or amdgpu.ppfeaturemask=0xffffffff )

I can confirm that at least on the current Ubuntu linux-image-oem-20.04d
kernel, these options do not resolve the issue:

```
martin@martin ~ % uname -a
Linux martin 5.14.0-1042-oem #47-Ubuntu SMP Fri Jun 3 18:17:11 UTC 2022 x86_64
x86_64 x86_64 GNU/Linux
martin@martin ~ % cat /proc/cmdline 
BOOT_IMAGE=/vmlinuz-5.14.0-1042-oem
root=UUID=1bd000ac-1487-4457-be1a-5ea901ded9e9 ro
amdgpu.ppfeaturemask=0xffffbffb amdgpu.noretry=0 amdgpu.lockup_timeout=0
amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt
quiet
martin@martin ~ % dmesg -T | grep 'ring gfx timeout'
[Mi Jun 22 14:48:07 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=1820983, emitted seq=1820985
[Mi Jun 22 14:48:18 2022] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=1820987, emitted seq=1820990
```

I had enabled these options on 2022-06-20 14:14 UTC+2, this is the first crash
I've encountered since then.

I have no idea how to build the latest kernel and therefore haven't tested that
yet.

I'll now revert back to amdgpu.dpm=0.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (73 preceding siblings ...)
  2022-06-22 12:56 ` bugzilla-daemon
@ 2022-06-23 10:04 ` bugzilla-daemon
  2022-06-23 10:26 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-23 10:04 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #75 from Danil (s48gs.w@gmail.com) ---
> Did you try with the latest Linux Kernel? I had a lot of gpu lockups like
> this. Also try these kernel parameters : "amdgpu.ppfeaturemask=0xffffbffb 
> amdgpu.noretry=0 amdgpu.lockup_timeout=0 amdgpu.gpu_recovery=1
> amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt"" ( you might also
> try with amdgpu.ppfeaturemask=0xfffd7fff or amdgpu.ppfeaturemask=0xffffffff )

I tried.

my kernel:
"Linux 5.17.4-1-default #1 SMP PREEMPT Wed Apr 20 07:43:03 UTC 2022 (75e9961)
x86_64 x86_64 x86_64 GNU/Linux"

(this video linked above - were not able to freeze integrated AMD GPU for me, I
mean before I tested with no kernel parameters)

Result is surprising - no crash/freeze for 4+ hours already, I did launch lots
of apps that were reason of freeze for me before. 

As I described above - https://bugzilla.kernel.org/show_bug.cgi?id=201957#c68
for me this freeze happening only when I used OpenGL/Vulkan and video on
background(everything on integrated GPU), and how it was looking from user
experience - when bug triggered(randomly) everything just slowly become lower
and lower FPS, apps that was working on 60fps on fullscreen drop to 5 FPS, and
video also drop to 5-10fps (UI still was responsible)... and freeze in next few
mins/seconds.

Full kernel boot option now: "splash=silent quiet
amdgpu.ppfeaturemask=0xffffbffb amdgpu.noretry=0 amdgpu.lockup_timeout=0
amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.deep_color=1 amd_iommu=on iommu=pt
"

Now, after boot with these options, I see:

Just after boot everything working (OpenGL/Vulkan acceleration by integrated
GPU) with expected performance.

After trying to "trigger bug" (opening multiple OpenGL apps with Vulkan and
WebGL and playing many videos) - OpenGL and Vulkan drops FPS to 20(constant for
single triangle in fullscreen), WebGL2 does not work anymore in webbrowser(even
after browser restart), but Video - still playing with 60 fps with no lag, and
system UI also does not lag.

So GPU graphics acceleration just drop to very low performance mode look like,
but everything else works fine. (also launching graphic apps(native only) using
Nvidia GPU works with 60fps as expected).

Interesting - since FPS droped 20 I can no longer launch "anything" in Wine
(any version include Proton) (after boot it was working), I launched few apps
after boot and check them when GPU FPS drops wine always crash with:
"wine: Unhandled page fault on execute access to 00007F894E200460 at address
00007F894E200460 (thread 0070), starting debugger..."
(not being able to use Wine is a big disadvantage)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (74 preceding siblings ...)
  2022-06-23 10:04 ` bugzilla-daemon
@ 2022-06-23 10:26 ` bugzilla-daemon
  2022-06-23 11:05 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-23 10:26 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #76 from Danil (s48gs.w@gmail.com) ---
Wine problem - this happened because (how/why/when)
'/usr/share/vulkan/icd.d/nvidia_icd.json' file was deleted... no idea how and
why this happened when AMD GPU drops its FPS(obviously this file exists when I
use just Nvidia GPU with integrated AMD disabled)

so fix for wine gonna be -
"VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/radeon_icd.x86_64.json winecfg" 

super weird, so wine problem fixed I think

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (75 preceding siblings ...)
  2022-06-23 10:26 ` bugzilla-daemon
@ 2022-06-23 11:05 ` bugzilla-daemon
  2022-06-23 11:44 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-23 11:05 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #77 from Danil (s48gs.w@gmail.com) ---
but even creating nvidia_icd.json
{
    "file_format_version" : "1.0.0",
    "ICD": {
        "library_path": "/usr/lib64/libGLX_nvidia.so.0",
        "api_version" : "1.3.211"
    }
}

does not help wine, Wine still crashing with same error on trying
use/initialize Nvidia
but I can use Nvidia outside of Wine from native apps (and Vulkan works), so it
must be related to AMD gpu driver somehow (before it was not happening, I first
time seeing wine crashing this way(in previous times when I tested AMD GPU
integrated))

P.S. I have second PC with same AMD Vega 8 integrated GPU, and there it works
fine(never crashed/freeze even once), other PC has other motherboard, this why
I originally think it problem with motherboard, but current "boot option" help
to make integrated GPU stable on this PC.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (76 preceding siblings ...)
  2022-06-23 11:05 ` bugzilla-daemon
@ 2022-06-23 11:44 ` bugzilla-daemon
  2022-06-23 22:12 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-23 11:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #78 from Danil (s48gs.w@gmail.com) ---
(I did small mistake in my file organizing, creating nvidia_icd.json with
listed above content is enough to fix Wine for me, everything works now)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (77 preceding siblings ...)
  2022-06-23 11:44 ` bugzilla-daemon
@ 2022-06-23 22:12 ` bugzilla-daemon
  2022-06-29  2:58 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-23 22:12 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #79 from Danil (s48gs.w@gmail.com) ---
Updated to kernel 5.18.4-1-default #1 SMP PREEMPT_DYNAMIC Wed Jun 15 06:00:33
UTC 2022 (ed6345d) x86_64 x86_64 x86_64 GNU/Linux (OpenSuSe latest for now)

Seems my integrated AMD GPU freeze completely fixed even without using previous
boot option (in 5.17 it was freezing without boot option), also integrated GPU
does not go to "low performance mode forever"(like it was with boot option
before) it continues working for hours on max performance(I mean it works
without slowdown like before)

... but now Nvidia GPU does not work anymore from AMD (when integrated is main
GPU), Nvidia 515.48.07 driver(latest now), in X11 and Wayland, Nvidia driver
correctly installed and device visible (nvidia-smi works and vulkaninfo
--summary list Nvidia GPU correctly), on creating Vulkan surface on Nvidia
device application always crash (any application)... (just tested - disabling
AMD integrated and boot using Nvidia - everything works there, Vulkan etc)

So fixing integrated AMD GPU result in Nvidia does not work anymore... okey (im
back to use discrete Nvidia only again)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (78 preceding siblings ...)
  2022-06-23 22:12 ` bugzilla-daemon
@ 2022-06-29  2:58 ` bugzilla-daemon
  2022-07-14 10:17 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-06-29  2:58 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

rafael castillo (jrch2k10@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jrch2k10@gmail.com

--- Comment #80 from rafael castillo (jrch2k10@gmail.com) ---
same issue here with (also LTS kernel as well)

Linux archlinux 5.18.7-262-tkg-pds #1 TKG SMP PREEMPT_DYNAMIC Mon, 27 Jun 2022
15:50:06 +0000 x86_64 GNU/Linux

[11090.086287] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.086296] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.086302] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.195133] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.195139] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.195143] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.195150] [drm] Cannot get clockgating state when UVD is powergated.
[11090.195152] [drm] Cannot get clockgating state when VCE is powergated.
[11090.695288] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11090.699331] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11091.194893] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11091.194898] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11091.194901] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11091.194908] [drm] Cannot get clockgating state when UVD is powergated.
[11091.194909] [drm] Cannot get clockgating state when VCE is powergated.
[11091.695473] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11092.194965] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11092.194969] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11092.194973] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11092.194979] [drm] Cannot get clockgating state when UVD is powergated.
[11092.194980] [drm] Cannot get clockgating state when VCE is powergated.
[11092.695749] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11093.195046] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11093.195050] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11093.195053] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11093.195060] [drm] Cannot get clockgating state when UVD is powergated.
[11093.195061] [drm] Cannot get clockgating state when VCE is powergated.
[11093.695004] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11094.195065] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11094.195070] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11094.195074] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11094.195082] [drm] Cannot get clockgating state when UVD is powergated.
[11094.195083] [drm] Cannot get clockgating state when VCE is powergated.
[11094.695286] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11095.131026] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out!
[11095.195055] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11095.195061] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11095.195065] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11095.195071] [drm] Cannot get clockgating state when UVD is powergated.
[11095.195072] [drm] Cannot get clockgating state when VCE is powergated.
[11095.695232] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11096.195132] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11096.195137] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11096.195140] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11096.195146] [drm] Cannot get clockgating state when UVD is powergated.
[11096.195147] [drm] Cannot get clockgating state when VCE is powergated.
[11096.694900] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11097.195057] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11097.195061] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11097.195064] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11097.195070] [drm] Cannot get clockgating state when UVD is powergated.
[11097.195071] [drm] Cannot get clockgating state when VCE is powergated.
[11097.695156] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11098.195054] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11098.195058] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11098.195062] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11098.195068] [drm] Cannot get clockgating state when UVD is powergated.
[11098.195069] [drm] Cannot get clockgating state when VCE is powergated.
[11098.695226] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11099.195056] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11099.195060] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11099.195064] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11099.195070] [drm] Cannot get clockgating state when UVD is powergated.
[11099.195071] [drm] Cannot get clockgating state when VCE is powergated.
[11099.695224] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11100.175702] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2678111, emitted seq=2678113
[11100.175937] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process ArcheAge.exe pid 702264 thread ArcheAge.e:cs0 pid 703382
[11100.176120] amdgpu 0000:02:00.0: amdgpu: GPU reset begin!
[11104.176155] amdgpu 0000:02:00.0: amdgpu: failed to suspend display audio
[11104.176290] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176294] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176296] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176298] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176299] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176301] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176303] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176305] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176307] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176309] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176311] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176312] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176314] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176316] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176318] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176320] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176321] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176417] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176420] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176421] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176423] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176425] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11104.176427] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11118.768958] audit: type=1100 audit(1656469160.416:402): pid=707085 uid=0
auid=4294967295 ses=4294967295 msg='op=PAM:authentication
grantors=pam_shells,pam_faillock,pam_permit,pam_faillock acct="junior"
exe="/usr/bin/sshd" hostname=192.168.10.47 addr=192.168.10.47 terminal=ssh
res=success'
[11118.769433] audit: type=1101 audit(1656469160.416:403): pid=707085 uid=0
auid=4294967295 ses=4294967295 msg='op=PAM:accounting
grantors=pam_access,pam_unix,pam_permit,pam_time acct="junior"
exe="/usr/bin/sshd" hostname=192.168.10.47 addr=192.168.10.47 terminal=ssh
res=success'
[11118.769972] audit: type=1103 audit(1656469160.418:404): pid=707085 uid=0
auid=4294967295 ses=4294967295 msg='op=PAM:setcred
grantors=pam_shells,pam_faillock,pam_permit,pam_faillock acct="junior"
exe="/usr/bin/sshd" hostname=192.168.10.47 addr=192.168.10.47 terminal=ssh
res=success'
[11118.770029] audit: type=1006 audit(1656469160.418:405): pid=707085 uid=0
old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=5 res=1
[11118.770038] audit: type=1300 audit(1656469160.418:405): arch=c000003e
syscall=1 success=yes exit=4 a0=3 a1=7ffd3b3d22d0 a2=4 a3=7ffd3b3d1fe4 items=0
ppid=759 pid=707085 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
fsgid=0 tty=(none) ses=5 comm="sshd" exe="/usr/bin/sshd" key=(null)
[11118.770040] audit: type=1327 audit(1656469160.418:405):
proctitle=737368643A206A756E696F72205B707269765D
[11118.785798] audit: type=1105 audit(1656469160.434:406): pid=707085 uid=0
auid=1000 ses=5 msg='op=PAM:session_open
grantors=pam_loginuid,pam_keyinit,pam_systemd_home,pam_limits,pam_unix,pam_permit,pam_mail,pam_systemd,pam_env
acct="junior" exe="/usr/bin/sshd" hostname=192.168.10.47 addr=192.168.10.47
terminal=ssh res=success'
[11118.786714] audit: type=1103 audit(1656469160.434:407): pid=707087 uid=0
auid=1000 ses=5 msg='op=PAM:setcred
grantors=pam_shells,pam_faillock,pam_permit,pam_faillock acct="junior"
exe="/usr/bin/sshd" hostname=192.168.10.47 addr=192.168.10.47 terminal=ssh
res=success'
[11124.189733] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for
more than 20secs aborting
[11124.189930] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing D718 (len 824, WS 0, PS 0) @ 0xD898
[11124.190079] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing D5D2 (len 326, WS 0, PS 0) @ 0xD6C2
[11124.190230] [drm:dce110_link_encoder_disable_output [amdgpu]] *ERROR*
dce110_link_encoder_disable_output: Failed to execute VBIOS command table!
[11126.469943] audit: type=1101 audit(1656469168.118:408): pid=707219 uid=1000
auid=1000 ses=5 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time
acct="junior" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0
res=success'
[11126.470552] audit: type=1110 audit(1656469168.118:409): pid=707219 uid=1000
auid=1000 ses=5 msg='op=PAM:setcred
grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root"
exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[11126.472793] audit: type=1105 audit(1656469168.120:410): pid=707219 uid=1000
auid=1000 ses=5 msg='op=PAM:session_open
grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root"
exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[11126.492151] audit: type=1106 audit(1656469168.139:411): pid=707219 uid=1000
auid=1000 ses=5 msg='op=PAM:session_close
grantors=pam_systemd_home,pam_limits,pam_unix,pam_permit acct="root"
exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[11126.492202] audit: type=1104 audit(1656469168.139:412): pid=707219 uid=1000
auid=1000 ses=5 msg='op=PAM:setcred
grantors=pam_faillock,pam_permit,pam_env,pam_faillock acct="root"
exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[11144.191100] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for
more than 20secs aborting
[11144.191292] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing C16E (len 62, WS 0, PS 0) @ 0xC18A
[11164.192468] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for
more than 20secs aborting
[11164.192658] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing B190 (len 1227, WS 8, PS 8) @ 0xB418
[11164.192828] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.192831] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.192833] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.201396] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block <vce_v3_0> failed -110
[11164.216360] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216364] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216366] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216368] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216370] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216371] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216373] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216375] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216377] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.216378] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436229] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436234] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436236] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436238] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436240] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436241] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436243] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436246] amdgpu 0000:02:00.0: amdgpu: 
               last message was failed ret is 65535
[11164.436248] amdgpu: Failed to force to switch arbf0!
[11164.436249] amdgpu: [disable_dpm_tasks] Failed to disable DPM!
[11164.436250] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block <powerplay> failed -22
[11164.546720] amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[11164.546864] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[11164.767164] amdgpu: cp is busy, skip halt cp
[11164.877251] amdgpu: rlc is busy, skip halt rlc
[11164.988549] CPU: 2 PID: 705317 Comm: kworker/u48:4 Tainted: G           OE  
  5.18.7-262-tkg-pds #1 ab3a1701b6bb2d2603e5fe14656a947bbae77de2
[11164.988553] Hardware name: ATERMITER ZX-99EV3/ZX-99EV3, BIOS X99AT011
10/15/2020
[11164.988554] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[11164.988561] Call Trace:
[11164.988562]  <TASK>
[11164.988563]  dump_stack_lvl+0x48/0x5d
[11164.988570]  amdgpu_do_asic_reset+0x2a/0x470 [amdgpu
d2028a110b701082c428a38d2a7699ba96e2f894]
[11164.988790]  amdgpu_device_gpu_recover_imp.cold+0x537/0x8cc [amdgpu
d2028a110b701082c428a38d2a7699ba96e2f894]
[11164.989002]  amdgpu_job_timedout+0x18c/0x1c0 [amdgpu
d2028a110b701082c428a38d2a7699ba96e2f894]
[11164.989183]  drm_sched_job_timedout+0x76/0x100 [gpu_sched
ca892a3eb32539b04f830de75b342015ecf19774]
[11164.989188]  process_one_work+0x1c7/0x380
[11164.989192]  worker_thread+0x51/0x380
[11164.989195]  ? rescuer_thread+0x3a0/0x3a0
[11164.989197]  kthread+0xde/0x110
[11164.989200]  ? kthread_complete_and_exit+0x20/0x20
[11164.989203]  ret_from_fork+0x22/0x30
[11164.989208]  </TASK>
[11164.989212] amdgpu 0000:02:00.0: amdgpu: BACO reset
[drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:53:crtc-0] hw_done or
flip_done timed out
[11187.893035] radeon-profile[54935]: segfault at 0 ip 00007fe553eee6ef sp
00007ffc8035f9e0 error 4 in libQt5Core.so.5.15.5[7fe553e9f000+2d6000]
[11187.893049] Code: 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 e8 d5
98 ff ff 48 85 c0 0f 84 f2 3c fb ff 48 89 c3 4c 8d 68 50 48 8b 40 50 <49> 63 2c
24 3b 68 04 7d 78 8b 10 83 fa 01 76 26 8b 70 08 81 e6 ff

[drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs
aborting
[11206.839405] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing C16E (len 62, WS 0, PS 0) @ 0xC18A
[11206.839546] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios
stuck executing AB18 (len 142, WS 0, PS 8) @ 0xAB33
[11206.839688] amdgpu 0000:02:00.0: amdgpu: asic atom init failed!
[11206.839725] amdgpu 0000:02:00.0: amdgpu: GPU reset(2) failed
[11206.839746] amdgpu 0000:02:00.0: amdgpu: GPU reset end with ret = -22
[11206.839748] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed:
-22

[11216.913239] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=2678113, emitted seq=2678113
[11216.913503] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process ArcheAge.exe pid 702264 thread ArcheAge.e:cs0 pid 703382
[11216.913700] amdgpu 0000:02:00.0: amdgpu: GPU reset begin!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (79 preceding siblings ...)
  2022-06-29  2:58 ` bugzilla-daemon
@ 2022-07-14 10:17 ` bugzilla-daemon
  2022-07-17 10:28 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-07-14 10:17 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #81 from Danil (s48gs.w@gmail.com) ---
Nvidia released 515.57 drivers that fix "Nvidia being broken when used as
second GPU in Linux", my bug above.
Nvidia GPU works again when AMD GPU main.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (80 preceding siblings ...)
  2022-07-14 10:17 ` bugzilla-daemon
@ 2022-07-17 10:28 ` bugzilla-daemon
  2022-07-17 20:08 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-07-17 10:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #82 from Danil (s48gs.w@gmail.com) ---
Afteer using this PC for few days with AMD Vega 8 (integrated) as main GPU I
see no freezes at all. (before in 2021 it was freeze every 10-20 mins so I had
to use Nvidia as main GPU)
(works with and without listed above kernel boot option)

I use OpenSuse kernel 5.18.4-1-default (not going to update for some time,
because it works)

Maybe it just fixed for "my motherboard+CPU combination", my hardware:
Ryzen3 3200 CPU (Vega8 integrated) on A320M-DVS R4.0 motherboard. 
microcode: CPU: patch_level=0x08108109
microcode: Microcode Update Driver: v2.2.

Wayland and x11 works, with Nvidia as second GPU.
Wayland slowdown(to like 1-2FPS whole UI performance) once after few hours of
using, but it fixed just by switching to system-terminal(ctrl+alt+f1) and back,
nothing crash video apps and graphic keep working.

integrated GPU performance still goes down(in few hours, randomly in 2-6 hours
of PC use) and never go back, but its fine(since I have Nvidia second GPU for
complex graphic), Vega 8 performance go down only in "complex shaders" FPS drop
from 60 fullscreen(1080p) to 10-20 on complex raymarching shaders, but for
system UI (Wayland/x11 Gnome 42) this is not noticeable, and video play on
60fps as expected. (Sleep mode also works, not every time(because Nvidia) but
most of the time, same as when used Nvidia as main GPU)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (81 preceding siblings ...)
  2022-07-17 10:28 ` bugzilla-daemon
@ 2022-07-17 20:08 ` bugzilla-daemon
  2022-08-11  2:59 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-07-17 20:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #83 from Danil (s48gs.w@gmail.com) ---
Log from what I described above - "fixed just by switching to
system-terminal(ctrl+alt+f1)", nothing crash even GPU apps keep working, just
huge mouse+UI freeze and switching to F1 terminal and back fix it (Wayland).
Logs:

Jul 17 22:54:04 home-danil kernel: amdgpu 0000:07:00.0: amdgpu: Failed to send
Message 7.
Jul 17 22:54:09 home-danil kernel: amdgpu 0000:07:00.0: amdgpu: Failed to send
Message 7.
Jul 17 22:54:12 home-danil kernel: ------------[ cut here ]------------
Jul 17 22:54:12 home-danil kernel: WARNING: CPU: 1 PID: 1100 at
drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn10/rv1_clk_mgr_vbios_smu.c:120
rv1_vbios_smu_send_msg_with_param+0xa3/0xb0 [amdgpu]
Jul 17 22:54:12 home-danil kernel: Modules linked in: dm_crypt essiv authenc
trusted asn1_encoder tee nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE)
snd_seq_dummy snd_hrtimer snd_seq snd_seq_device af_packet nft_fib_inet
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6
nft_reject nft_ct nft_chain_nat nf_tables ebtable_nat ebtable_broute
ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw
iptable_security ip_set iscsi_ibft iscsi_boot_sysfs nfnetlink rfkill
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilter qrtr
vboxnetadp(O) vboxnetflt(O) vboxdrv(O) dmi_sysfs joydev intel_rapl_msr
intel_rapl_common snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio edac_mce_amd snd_hda_intel snd_intel_dspcfg
kvm_amd snd_intel_sdw_acpi snd_hda_codec r8169 pcspkr snd_hda_core kvm realtek
snd_hwdep snd_pcm wmi_bmof mdio_devres snd_timer
Jul 17 22:54:12 home-danil kernel:  libphy irqbypass snd soundcore efi_pstore
i2c_piix4 gpio_amdpt gpio_generic acpi_cpufreq k10temp tiny_power_button
nls_iso8859_1 squashfs nls_cp437 loop ext4 mbcache vfat jbd2 fat fuse configfs
ip_tables x_tables hid_generic usbhid uas usb_storage amdgpu crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel drm_ttm_helper ttm iommu_v2 gpu_sched
i2c_algo_bit drm_dp_helper drm_kms_helper aesni_intel crypto_simd syscopyarea
sysfillrect sysimgblt fb_sys_fops cryptd drm cec xhci_pci xhci_pci_renesas
sp5100_tco ccp rc_core xhci_hcd usbcore wmi video button btrfs blake2b_generic
libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac
scsi_dh_emc scsi_dh_alua msr efivarfs
Jul 17 22:54:12 home-danil kernel: CPU: 1 PID: 1100 Comm: systemd-logind
Tainted: P           OE     5.18.4-1-default #1 openSUSE Tumbleweed
59778fa2462c9ee971468464596d3fbe14e51d2e
Jul 17 22:54:12 home-danil kernel: Hardware name: To Be Filled By O.E.M.
A320M-DVS R4.0/A320M-DVS R4.0, BIOS P7.10 12/23/2021
Jul 17 22:54:12 home-danil kernel: RIP:
0010:rv1_vbios_smu_send_msg_with_param+0xa3/0xb0 [amdgpu]
Jul 17 22:54:12 home-danil kernel: Code: 62 01 00 e8 8f 4e f5 ff 85 c0 74 d8 83
f8 01 75 19 48 8b 7d 00 5b be 93 62 01 00 48 c7 c2 00 99 cd c0 5d 41 5c e9 6d
4e f5 ff <0f> 0b eb e3 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 81 c6 e7 03
Jul 17 22:54:12 home-danil kernel: RSP: 0018:ffff9f0a00b1f580 EFLAGS: 00010246
Jul 17 22:54:12 home-danil kernel: RAX: 00007570227d95d8 RBX: 0000000000000000
RCX: 0000000000000001
Jul 17 22:54:12 home-danil kernel: RDX: 0000000000009288 RSI: 0000000000008b82
RDI: 00007570227d0350
Jul 17 22:54:12 home-danil kernel: RBP: ffff8b0388bf3c00 R08: 0000000000002700
R09: 0000000000002700
Jul 17 22:54:12 home-danil kernel: R10: ffff9f0a00b1f630 R11: 0000000000000003
R12: 0000000000000097
Jul 17 22:54:12 home-danil kernel: R13: ffff8b0386ec98a0 R14: ffff8b0388bf3c00
R15: ffff8b03c04a0000
Jul 17 22:54:12 home-danil kernel: FS:  00007fb68308cb40(0000)
GS:ffff8b06c0a40000(0000) knlGS:0000000000000000
Jul 17 22:54:12 home-danil kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Jul 17 22:54:12 home-danil kernel: CR2: 00003e74003afe38 CR3: 000000018ef3c000
CR4: 00000000003506e0
Jul 17 22:54:12 home-danil kernel: Call Trace:
Jul 17 22:54:12 home-danil kernel:  <TASK>
Jul 17 22:54:12 home-danil kernel:  rv1_vbios_smu_set_dispclk+0x46/0xb0 [amdgpu
e7857b98c028928796f1e71af6f4284e57f7c0e3]
Jul 17 22:54:12 home-danil kernel:  rv1_update_clocks+0x254/0x500 [amdgpu
e7857b98c028928796f1e71af6f4284e57f7c0e3]
Jul 17 22:54:12 home-danil kernel:  dcn10_prepare_bandwidth+0x6b/0x130 [amdgpu
e7857b98c028928796f1e71af6f4284e57f7c0e3]
Jul 17 22:54:12 home-danil kernel:  dc_commit_updates_for_stream+0x1b69/0x1f90
[amdgpu e7857b98c028928796f1e71af6f4284e57f7c0e3]
Jul 17 22:54:12 home-danil kernel:  ? mutex_lock+0xe/0x30
Jul 17 22:54:12 home-danil kernel:  ? flush_workqueue+0x177/0x3a0
Jul 17 22:54:12 home-danil kernel:  amdgpu_dm_atomic_commit_tail+0x1627/0x2720
[amdgpu e7857b98c028928796f1e71af6f4284e57f7c0e3]
Jul 17 22:54:12 home-danil kernel:  ? ttm_resource_compat+0x23/0x50 [ttm
63072f655d2dc7ed260c9d980e7b7104612ede60]
Jul 17 22:54:12 home-danil kernel:  commit_tail+0x94/0x120 [drm_kms_helper
9e4d316863dffca879cbc8a3a12d452ad7e0a149]
Jul 17 22:54:12 home-danil kernel:  drm_atomic_helper_commit+0x10f/0x140
[drm_kms_helper 9e4d316863dffca879cbc8a3a12d452ad7e0a149]
Jul 17 22:54:12 home-danil kernel: 
drm_client_modeset_commit_atomic+0x1e4/0x220 [drm
93e548a999b532667e8d1d66f85cd72b61d212a3]
Jul 17 22:54:12 home-danil kernel:  drm_client_modeset_commit_locked+0x56/0x150
[drm 93e548a999b532667e8d1d66f85cd72b61d212a3]
Jul 17 22:54:12 home-danil kernel:  drm_fb_helper_set_par+0x78/0xd0
[drm_kms_helper 9e4d316863dffca879cbc8a3a12d452ad7e0a149]
Jul 17 22:54:12 home-danil kernel:  fb_set_var+0x19d/0x380
Jul 17 22:54:12 home-danil kernel:  ? update_load_avg+0x7e/0x730
Jul 17 22:54:12 home-danil kernel:  ? update_load_avg+0x7e/0x730
Jul 17 22:54:12 home-danil kernel:  fbcon_blank+0x206/0x2c0
Jul 17 22:54:12 home-danil kernel:  do_unblank_screen+0xa7/0x150
Jul 17 22:54:12 home-danil kernel:  complete_change_console+0x54/0x120
Jul 17 22:54:12 home-danil kernel:  vt_ioctl+0x12c8/0x13b0
Jul 17 22:54:12 home-danil kernel:  ? __x64_sys_ioctl+0x8d/0xc0
Jul 17 22:54:12 home-danil kernel:  tty_ioctl+0x283/0x860
Jul 17 22:54:12 home-danil kernel:  ? __sys_sendmsg+0x57/0xa0
Jul 17 22:54:12 home-danil kernel:  ? __seccomp_filter+0x314/0x4d0
Jul 17 22:54:12 home-danil kernel:  __x64_sys_ioctl+0x8d/0xc0
Jul 17 22:54:12 home-danil kernel:  do_syscall_64+0x5b/0x80
Jul 17 22:54:12 home-danil kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jul 17 22:54:12 home-danil kernel: RIP: 0033:0x7fb683be145f
Jul 17 22:54:12 home-danil kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60
c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00
00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Jul 17 22:54:12 home-danil kernel: RSP: 002b:00007ffd5c30c340 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Jul 17 22:54:12 home-danil kernel: RAX: ffffffffffffffda RBX: 0000000000000017
RCX: 00007fb683be145f
Jul 17 22:54:12 home-danil kernel: RDX: 0000000000000001 RSI: 0000000000005605
RDI: 0000000000000017
Jul 17 22:54:12 home-danil kernel: RBP: 0000000000000000 R08: 00007ffd5c30c340
R09: 000055c0f8a6f55e
Jul 17 22:54:12 home-danil kernel: R10: 00007ffd5c30c380 R11: 0000000000000246
R12: 000055c0f8a45430
Jul 17 22:54:12 home-danil kernel: R13: 00007ffd5c30c420 R14: 00007ffd5c30c418
R15: 0000000000000006
Jul 17 22:54:12 home-danil kernel:  </TASK>
Jul 17 22:54:12 home-danil kernel: ---[ end trace 0000000000000000 ]---
Jul 17 22:54:15 home-danil kernel: amdgpu 0000:07:00.0: amdgpu: Failed to send
Message 7.
Jul 17 22:54:15 home-danil kernel: rfkill: input handler enabled
Jul 17 22:54:20 home-danil systemd[1]: Started Getty on tty2.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (82 preceding siblings ...)
  2022-07-17 20:08 ` bugzilla-daemon
@ 2022-08-11  2:59 ` bugzilla-daemon
  2023-01-11  1:13 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2022-08-11  2:59 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

david (291765088@qq.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |291765088@qq.com

--- Comment #84 from david (291765088@qq.com) ---
amd driver problem,u can connect me ,i'll give u the final solution,email
1015501184@qq.com ,maybe in China will get more efficent communication

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (83 preceding siblings ...)
  2022-08-11  2:59 ` bugzilla-daemon
@ 2023-01-11  1:13 ` bugzilla-daemon
  2023-05-23 10:27 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-01-11  1:13 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Ralldi (hcarter1112@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hcarter1112@gmail.com

--- Comment #85 from Ralldi (hcarter1112@gmail.com) ---
[67760.805903] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=19820784, emitted seq=19820786
[67760.806285] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process valheim.x86_64 pid 464107 thread valheim.x8:cs0 pid 464109
[67760.806667] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
[67761.257012] amdgpu 0000:0d:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[67761.257232] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
[67761.307862] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:80:crtc-1]
hw_done or flip_done timed out
[67761.516374] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
[67761.542980] [drm] free PSP TMR buffer
[67761.587266] amdgpu 0000:0d:00.0: amdgpu: MODE1 reset
[67761.587269] amdgpu 0000:0d:00.0: amdgpu: GPU mode1 reset
[67761.587329] amdgpu 0000:0d:00.0: amdgpu: GPU smu mode1 reset
[67762.091974] amdgpu 0000:0d:00.0: amdgpu: GPU reset succeeded, trying to
resume
[67762.092156] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[67762.092219] [drm] VRAM is lost due to GPU reset!
[67762.092220] [drm] PSP is resuming...
[67762.168492] [drm] reserve 0xa00000 from 0x8001000000 for PSP TMR
[67762.269801] amdgpu 0000:0d:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[67762.283510] amdgpu 0000:0d:00.0: amdgpu: SECUREDISPLAY: securedisplay ta
ucode is not available
[67762.283513] amdgpu 0000:0d:00.0: amdgpu: SMU is resuming...
[67762.283516] amdgpu 0000:0d:00.0: amdgpu: smu driver if version = 0x0000000e,
smu fw if version = 0x00000012, smu fw program = 0, version = 0x00413900
(65.57.0)
[67762.283519] amdgpu 0000:0d:00.0: amdgpu: SMU driver if version not matched
[67762.283549] amdgpu 0000:0d:00.0: amdgpu: use vbios provided pptable
[67762.343739] amdgpu 0000:0d:00.0: amdgpu: SMU is resumed successfully!
[67762.345104] [drm] DMUB hardware initialized: version=0x02020017
[67762.615558] [drm] kiq ring mec 2 pipe 1 q 0
[67762.618728] [drm] VCN decode and encode initialized successfully(under DPG
Mode).
[67762.618910] [drm] JPEG decode initialized successfully.
[67762.618918] amdgpu 0000:0d:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on
hub 0
[67762.618921] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1
on hub 0
[67762.618922] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4
on hub 0
[67762.618924] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5
on hub 0
[67762.618925] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6
on hub 0
[67762.618926] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7
on hub 0
[67762.618927] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8
on hub 0
[67762.618929] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9
on hub 0
[67762.618930] amdgpu 0000:0d:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10
on hub 0
[67762.618931] amdgpu 0000:0d:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11
on hub 0
[67762.618933] amdgpu 0000:0d:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on
hub 0
[67762.618934] amdgpu 0000:0d:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on
hub 0
[67762.618936] amdgpu 0000:0d:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on
hub 1
[67762.618937] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1
on hub 1
[67762.618938] amdgpu 0000:0d:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4
on hub 1
[67762.618940] amdgpu 0000:0d:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on
hub 1
[67762.622875] amdgpu 0000:0d:00.0: amdgpu: recover vram bo from shadow start
[67762.622989] amdgpu 0000:0d:00.0: amdgpu: recover vram bo from shadow done
[67762.622991] [drm] Skip scheduling IBs!
[67762.622993] [drm] Skip scheduling IBs!
[67762.623004] amdgpu 0000:0d:00.0: amdgpu: GPU reset(2) succeeded!
[67762.623027] [drm] Skip scheduling IBs!
[67762.623044] [drm] Skip scheduling IBs!
[67762.623052] [drm] Skip scheduling IBs!
[67762.623057] [drm] Skip scheduling IBs!
[67762.623058] [drm] Skip scheduling IBs!
[67762.623064] [drm] Skip scheduling IBs!
[67762.623067] [drm] Skip scheduling IBs!
[67762.623069] [drm] Skip scheduling IBs!
[67762.623071] [drm] Skip scheduling IBs!
[67762.623073] [drm] Skip scheduling IBs!
[67762.623076] [drm] Skip scheduling IBs!
[67762.623076] [drm] Skip scheduling IBs!
[67762.623080] [drm] Skip scheduling IBs!
[67762.623082] [drm] Skip scheduling IBs!
[67762.623083] [drm] Skip scheduling IBs!
[67762.623086] [drm] Skip scheduling IBs!
[67762.623086] [drm] Skip scheduling IBs!
[67762.623090] [drm] Skip scheduling IBs!
[67762.623091] [drm] Skip scheduling IBs!
[67762.623093] [drm] Skip scheduling IBs!
[67762.623096] [drm] Skip scheduling IBs!
[67762.623097] [drm] Skip scheduling IBs!
[67762.623100] [drm] Skip scheduling IBs!
[67762.623101] [drm] Skip scheduling IBs!
[67762.623104] [drm] Skip scheduling IBs!
[67762.623107] [drm] Skip scheduling IBs!
[67762.623107] [drm] Skip scheduling IBs!
[67762.623111] [drm] Skip scheduling IBs!
[67762.623112] [drm] Skip scheduling IBs!
[67762.623114] [drm] Skip scheduling IBs!
[67762.623117] [drm] Skip scheduling IBs!
[67762.623117] [drm] Skip scheduling IBs!
[67762.623121] [drm] Skip scheduling IBs!
[67762.623122] [drm] Skip scheduling IBs!
[67762.623124] [drm] Skip scheduling IBs!
[67762.623127] [drm] Skip scheduling IBs!
[67762.623127] [drm] Skip scheduling IBs!
[67762.623130] [drm] Skip scheduling IBs!
[67762.623132] [drm] Skip scheduling IBs!
[67762.623133] [drm] Skip scheduling IBs!
[67762.623136] [drm] Skip scheduling IBs!
[67762.623139] [drm] Skip scheduling IBs!
[67762.623143] [drm] Skip scheduling IBs!
[67762.623144] [drm] Skip scheduling IBs!
[67762.623148] [drm] Skip scheduling IBs!
[67762.623148] [drm] Skip scheduling IBs!
[67762.623152] [drm] Skip scheduling IBs!
[67762.623153] [drm] Skip scheduling IBs!
[67762.623157] [drm] Skip scheduling IBs!
[67762.623158] [drm] Skip scheduling IBs!
[67762.623161] [drm] Skip scheduling IBs!
[67762.623163] [drm] Skip scheduling IBs!
[67762.623166] [drm] Skip scheduling IBs!
[67762.623168] [drm] Skip scheduling IBs!
[67762.623170] [drm] Skip scheduling IBs!
[67762.623173] [drm] Skip scheduling IBs!
[67762.623174] [drm] Skip scheduling IBs!
[67762.623177] [drm] Skip scheduling IBs!
[67762.623178] [drm] Skip scheduling IBs!
[67762.623182] [drm] Skip scheduling IBs!
[67762.623182] [drm] Skip scheduling IBs!
[67762.623187] [drm] Skip scheduling IBs!
[67762.623187] [drm] Skip scheduling IBs!
[67762.623192] [drm] Skip scheduling IBs!
[67762.623192] [drm] Skip scheduling IBs!
[67762.623197] [drm] Skip scheduling IBs!
[67762.623197] [drm] Skip scheduling IBs!
[67762.623202] [drm] Skip scheduling IBs!
[67762.623202] [drm] Skip scheduling IBs!
[67762.623206] [drm] Skip scheduling IBs!
[67762.623207] [drm] Skip scheduling IBs!
[67762.623210] [drm] Skip scheduling IBs!
[67762.623212] [drm] Skip scheduling IBs!
[67762.623214] [drm] Skip scheduling IBs!
[67762.623216] [drm] Skip scheduling IBs!
[67762.623217] [drm] Skip scheduling IBs!
[67762.623221] [drm] Skip scheduling IBs!
[67762.623221] [drm] Skip scheduling IBs!
[67762.623225] [drm] Skip scheduling IBs!
[67762.623226] [drm] Skip scheduling IBs!
[67762.623230] [drm] Skip scheduling IBs!
[67762.623230] [drm] Skip scheduling IBs!
[67762.623233] [drm] Skip scheduling IBs!
[67762.623234] [drm] Skip scheduling IBs!
[67762.623236] [drm] Skip scheduling IBs!
[67762.623239] [drm] Skip scheduling IBs!
[67762.623243] [drm] Skip scheduling IBs!
[67762.623246] [drm] Skip scheduling IBs!
[67762.623250] [drm] Skip scheduling IBs!
[67762.623254] [drm] Skip scheduling IBs!
[67762.623257] [drm] Skip scheduling IBs!
[67762.623260] [drm] Skip scheduling IBs!
[67762.623263] [drm] Skip scheduling IBs!
[67762.623267] [drm] Skip scheduling IBs!
[67762.623270] [drm] Skip scheduling IBs!
[67762.623273] [drm] Skip scheduling IBs!
[67762.623277] [drm] Skip scheduling IBs!
[67762.623280] [drm] Skip scheduling IBs!
[67762.623284] [drm] Skip scheduling IBs!
[67762.623287] [drm] Skip scheduling IBs!
[67762.623290] [drm] Skip scheduling IBs!
[67762.623293] [drm] Skip scheduling IBs!
[67762.623298] [drm] Skip scheduling IBs!
[67762.623301] [drm] Skip scheduling IBs!
[67762.623305] [drm] Skip scheduling IBs!
[67762.623309] [drm] Skip scheduling IBs!
[67762.623312] [drm] Skip scheduling IBs!
[67762.623316] [drm] Skip scheduling IBs!
[67762.623319] [drm] Skip scheduling IBs!
[67762.623321] [drm] Skip scheduling IBs!
[67762.623324] [drm] Skip scheduling IBs!
[67762.623327] [drm] Skip scheduling IBs!
[67762.623331] [drm] Skip scheduling IBs!
[67762.623334] [drm] Skip scheduling IBs!
[67762.623337] [drm] Skip scheduling IBs!
[67762.623340] [drm] Skip scheduling IBs!
[67762.623343] [drm] Skip scheduling IBs!
[67762.623345] [drm] Skip scheduling IBs!
[67762.623349] [drm] Skip scheduling IBs!
[67762.623353] [drm] Skip scheduling IBs!
[67762.623356] [drm] Skip scheduling IBs!
[67762.623359] [drm] Skip scheduling IBs!
[67762.623362] [drm] Skip scheduling IBs!
[67762.623366] [drm] Skip scheduling IBs!
[67762.623369] [drm] Skip scheduling IBs!
[67762.623373] [drm] Skip scheduling IBs!
[67762.623376] [drm] Skip scheduling IBs!
[67762.623379] [drm] Skip scheduling IBs!
[67762.623382] [drm] Skip scheduling IBs!
[67762.623385] [drm] Skip scheduling IBs!
[67762.623388] [drm] Skip scheduling IBs!
[67762.623392] [drm] Skip scheduling IBs!
[67762.623395] [drm] Skip scheduling IBs!
[67762.623398] [drm] Skip scheduling IBs!
[67762.623401] [drm] Skip scheduling IBs!
[67762.623404] [drm] Skip scheduling IBs!
[67762.623407] [drm] Skip scheduling IBs!
[67762.623411] [drm] Skip scheduling IBs!
[67762.623414] [drm] Skip scheduling IBs!
[67762.623417] [drm] Skip scheduling IBs!
[67762.623420] [drm] Skip scheduling IBs!
[67762.623423] [drm] Skip scheduling IBs!
[67762.623426] [drm] Skip scheduling IBs!
[67762.623429] [drm] Skip scheduling IBs!
[67762.623433] [drm] Skip scheduling IBs!
[67762.623437] [drm] Skip scheduling IBs!
[67762.623440] [drm] Skip scheduling IBs!
[67762.623443] [drm] Skip scheduling IBs!
[67762.623446] [drm] Skip scheduling IBs!
[67762.623450] [drm] Skip scheduling IBs!
[67762.623453] [drm] Skip scheduling IBs!
[67762.623456] [drm] Skip scheduling IBs!
[67762.623460] [drm] Skip scheduling IBs!
[67762.623463] [drm] Skip scheduling IBs!
[67762.623466] [drm] Skip scheduling IBs!
[67762.623469] [drm] Skip scheduling IBs!
[67762.623473] [drm] Skip scheduling IBs!
[67762.623476] [drm] Skip scheduling IBs!
[67762.623479] [drm] Skip scheduling IBs!
[67762.623482] [drm] Skip scheduling IBs!
[67762.623485] [drm] Skip scheduling IBs!
[67762.623489] [drm] Skip scheduling IBs!
[67762.623492] [drm] Skip scheduling IBs!
[67762.623495] [drm] Skip scheduling IBs!
[67762.623498] [drm] Skip scheduling IBs!
[67762.623501] [drm] Skip scheduling IBs!
[67762.623505] [drm] Skip scheduling IBs!
[67762.623508] [drm] Skip scheduling IBs!
[67762.623511] [drm] Skip scheduling IBs!
[67762.623515] [drm] Skip scheduling IBs!
[67762.623518] [drm] Skip scheduling IBs!
[67762.623522] [drm] Skip scheduling IBs!
[67762.623525] [drm] Skip scheduling IBs!
[67762.623529] [drm] Skip scheduling IBs!
[67762.623533] [drm] Skip scheduling IBs!
[67762.623537] [drm] Skip scheduling IBs!
[67762.623541] [drm] Skip scheduling IBs!
[67762.623544] [drm] Skip scheduling IBs!
[67762.623546] amdgpu_cs_ioctl: 7 callbacks suppressed
[67762.623548] [drm] Skip scheduling IBs!
[67762.623553] [drm] Skip scheduling IBs!
[67762.623557] [drm] Skip scheduling IBs!
[67762.623560] [drm] Skip scheduling IBs!
[67762.623565] [drm] Skip scheduling IBs!
[67762.623568] [drm] Skip scheduling IBs!
[67762.623572] [drm] Skip scheduling IBs!
[67762.623575] [drm] Skip scheduling IBs!
[67762.623549] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[67762.636312] traps: xss-lock[2346] trap int3 ip:7f86599e4e51 sp:7ffc0f5bdc20
error:0 in libglib-2.0.so.0.7200.3[7f86599a8000+91000]
[67762.645640] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[67762.862396] qtile[2274]: segfault at 7fa29b6baae0 ip 00007fa29b6baae0 sp
00007fff9d25d758 error 14 in libgobject-2.0.so.0.7200.3[7fa29b70f000+e000]
[67762.862415] Code: Unable to access opcode bytes at RIP 0x7fa29b6baab6.
[67765.682610] rfkill: input handler disabled
[67766.056883] usb 4-2: current rate 16000 is different from the runtime rate
48000
[67766.120888] usb 4-2: current rate 16000 is different from the runtime rate
48000
[67766.184883] usb 4-2: current rate 16000 is different from the runtime rate
48000
[67774.117179] rfkill: input handler enabled
------------------------------------------------------------------------------------------
I am having this same issue. It is with the following hardware and only while
gaming. When I am doing anything else besides gaming, everything is fine... I
don't game often but it is commonly on overwatch and valheim. in case that
helps. 
-----------------------------------------------------------------------------------------
OS: Nobara Linux 36 (Thirty Six) x86_64 
Kernel: 6.0.14-201.fsync.fc36.x86_64 
CPU: AMD Ryzen 5 3600 (12) @ 3.600GHz 
GPU: AMD 6700 XT
Memory: 5382MiB / 32002MiB 
MOBO: Asus Prime MA Wifi II

0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi
22 [Radeon RX 6700/6700 XT/6750 XT / 6800M] (rev c1) (prog-if 00 [VGA
controller])
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device 0e36
        Flags: bus master, fast devsel, latency 0, IRQ 104, IOMMU group 18
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at e0000000 (64-bit, prefetchable) [size=2M]
        I/O ports at e000 [size=256]
        Memory at fc900000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (84 preceding siblings ...)
  2023-01-11  1:13 ` bugzilla-daemon
@ 2023-05-23 10:27 ` bugzilla-daemon
  2023-05-24  8:55 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-05-23 10:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Stuart Foster (smf-linux@virginmedia.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smf-linux@virginmedia.com

--- Comment #86 from Stuart Foster (smf-linux@virginmedia.com) ---
Created attachment 304307
  --> https://bugzilla.kernel.org/attachment.cgi?id=304307&action=edit
Started testing kernel 6.4-rc3 got the same problem

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (85 preceding siblings ...)
  2023-05-23 10:27 ` bugzilla-daemon
@ 2023-05-24  8:55 ` bugzilla-daemon
  2023-08-15 12:33 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-05-24  8:55 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #87 from Stuart Foster (smf-linux@virginmedia.com) ---
Is it worth the effort of bisecting this as it seems to be on a lot of kernel
versions ?

thanks

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (86 preceding siblings ...)
  2023-05-24  8:55 ` bugzilla-daemon
@ 2023-08-15 12:33 ` bugzilla-daemon
  2023-08-24 15:52 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-08-15 12:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Timon Z. (kernel.org@timonz.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kernel.org@timonz.de

--- Comment #88 from Timon Z. (kernel.org@timonz.de) ---
Status = NEW after nearly 5 years?
I have the same problem


Aug 15 14:18:19 nb-tz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=3442457, emitted seq=3442459
Aug 15 14:18:19 nb-tz kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process gnome-shell pid 2628 thread gnome-shel:cs0 pid
2679

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (87 preceding siblings ...)
  2023-08-15 12:33 ` bugzilla-daemon
@ 2023-08-24 15:52 ` bugzilla-daemon
  2023-09-21 22:38 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-08-24 15:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Priit O. (priit@ww.ee) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |priit@ww.ee

--- Comment #89 from Priit O. (priit@ww.ee) ---
AMD Vega 64 (vega10 chip)
kernel: 6.4.9

linux-firmware: 20230724

# graphical session died and had to log in again, computer didn't boot
though...
aug   20 02:11:06 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_low timeout, signaled seq=368426139, emitted seq=368426141
aug   20 02:11:06 Zen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process firefox pid 414636 thread firefox:cs0 pid 414712

linux-firmware: 20230810 (upgraded it... although there was no "vega10" changes
inbetween)

# just freeze for like 30s and then it got unstuck again.
aug   23 23:09:24 Zen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:60:crtc-0] hw_done or flip_done timed out
aug   23 23:09:34 Zen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:63:crtc-1] hw_done or flip_done timed out
aug   23 23:09:44 Zen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:66:crtc-2] hw_done or flip_done timed out

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (88 preceding siblings ...)
  2023-08-24 15:52 ` bugzilla-daemon
@ 2023-09-21 22:38 ` bugzilla-daemon
  2023-09-23  1:52 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-21 22:38 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

G OConnor (graham.oconnor@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |graham.oconnor@gmail.com

--- Comment #90 from G OConnor (graham.oconnor@gmail.com) ---
AMD Ryzen 3700U APU (Vega 10)

This issue has recently started happening, mostly when firing up games or
graphically intensive tasks. One case of lockup during normal desktop use.

Worked fine on 6.4.X series (currently running on 6.4.12). However, all kernels
in the 6.5 series cause the following:

[  112.727138] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout,
signaled seq=9861, emitted seq=9863
[  112.728214] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process Xwayland pid 919 thread Xwayland:cs0 pid 928
[  112.729270] amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
[  112.885652] amdgpu 0000:04:00.0: amdgpu: MODE2 reset
[  112.885709] amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to
resume
[  112.886024] [drm] PCIE GART of 1024M enabled.
[  112.886027] [drm] PTB located at 0x000000F400A00000
[  112.886143] [drm] PSP is resuming...
[  112.906168] [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
[  112.985033] amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not
available
[  112.992320] amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not
available
[  113.733685] [drm] kiq ring mec 2 pipe 1 q 0
[  113.998619] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring gfx test failed (-110)
[  113.999249] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
IP block <gfx_v9_0> failed -110
[  113.999957] amdgpu 0000:04:00.0: amdgpu: GPU reset(2) failed
[  114.000006] amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -110
[  114.000010] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed:
-110

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (89 preceding siblings ...)
  2023-09-21 22:38 ` bugzilla-daemon
@ 2023-09-23  1:52 ` bugzilla-daemon
  2023-09-30 10:25 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-23  1:52 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

KC (kcohar@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kcohar@gmail.com

--- Comment #91 from KC (kcohar@gmail.com) ---
I can confirm this bug

Experiencing it on an AMD Ryzen 5 3500U (Vega 8), Fedora 39 beta, kernel 6.5.2.
Also on Arch (kernel 6.5.2).
No problems on Fedora 38 (kernel 6.2.x).

In my case it happens frequently with normal desktop use on Fedora and Arch.

Sep 23 03:39:34 jackdaw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_low timeout, signaled seq=10067, emitted seq=10069
Sep 23 03:39:34 jackdaw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process nautilus pid 5981 thread nautilus:cs0 pid 6173
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: MODE2 reset
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset
succeeded, trying to resume
Sep 23 03:39:34 jackdaw kernel: [drm] PCIE GART of 1024M enabled.
Sep 23 03:39:34 jackdaw kernel: [drm] PTB located at 0x000000F400A00000
Sep 23 03:39:34 jackdaw kernel: [drm] PSP is resuming...
Sep 23 03:39:34 jackdaw kernel: [drm] reserve 0x400000 from 0xf47fc00000 for
PSP TMR
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: RAS: optional ras
ta ucode is not available
Sep 23 03:39:34 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: RAP: optional rap
ta ucode is not available
Sep 23 03:39:34 jackdaw kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Sep 23 03:39:35 jackdaw kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]]
*ERROR* resume of IP block <gfx_v9_0> failed -110
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset(2)
failed
Sep 23 03:39:35 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset end with
ret = -110
Sep 23 03:39:35 jackdaw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU
Recovery Failed: -110
Sep 23 03:39:35 jackdaw kernel: [drm] Skip scheduling IBs!
Sep 23 03:39:45 jackdaw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_high timeout, signaled seq=9114, emitted seq=9116
Sep 23 03:39:45 jackdaw kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process gnome-shell pid 2206 thread gnome-shel:cs0 pid
2258
Sep 23 03:39:45 jackdaw kernel: amdgpu 0000:05:00.0: amdgpu: GPU reset begin!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (90 preceding siblings ...)
  2023-09-23  1:52 ` bugzilla-daemon
@ 2023-09-30 10:25 ` bugzilla-daemon
  2023-09-30 18:57 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-30 10:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |ANSWERED

--- Comment #92 from Artem S. Tashkinov (aros@gmx.com) ---
AMDGPU development is on its own bug tracker:

https://gitlab.freedesktop.org/drm/amd/-/issues

If you're still affected, check for existing bug reports and if there are none,
please repost over there.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (91 preceding siblings ...)
  2023-09-30 10:25 ` bugzilla-daemon
@ 2023-09-30 18:57 ` bugzilla-daemon
  2023-09-30 19:08 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-30 18:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

aspicer@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |aspicer@gmail.com

--- Comment #93 from aspicer@gmail.com ---
I have also been having this issue. It started occurring recently (last 2-3
months). No other changes.

Mostly lockups while gaming (yuzu), one lockup because of chrome.

I was able to fix this issue by switching from HDMI to DP or DVI.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (92 preceding siblings ...)
  2023-09-30 18:57 ` bugzilla-daemon
@ 2023-09-30 19:08 ` bugzilla-daemon
  2023-09-30 19:35 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-30 19:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #94 from KC (kcohar@gmail.com) ---
In my case the fix was adding amdgpu.mcbp=0 to the kernel parameters.

On Sat, Sep 30, 2023 at 8:57 PM <bugzilla-daemon@kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=201957
>
> aspicer@gmail.com changed:
>
>            What    |Removed                     |Added
>
> ----------------------------------------------------------------------------
>                  CC|                            |aspicer@gmail.com
>
> --- Comment #93 from aspicer@gmail.com ---
> I have also been having this issue. It started occurring recently (last 2-3
> months). No other changes.
>
> Mostly lockups while gaming (yuzu), one lockup because of chrome.
>
> I was able to fix this issue by switching from HDMI to DP or DVI.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (93 preceding siblings ...)
  2023-09-30 19:08 ` bugzilla-daemon
@ 2023-09-30 19:35 ` bugzilla-daemon
  2023-09-30 19:47 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-30 19:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #95 from aspicer@gmail.com ---
(In reply to KC from comment #94)

Did you have it set to 1 previously? If not, I'm not sure if that was the
silver bullet, because it looks like it defaults to 0.
https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html

mcbp (int)

It is used to enable mid command buffer preemption. (0 = disabled (default), 1
= enabled)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (94 preceding siblings ...)
  2023-09-30 19:35 ` bugzilla-daemon
@ 2023-09-30 19:47 ` bugzilla-daemon
  2023-10-21 14:29 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-09-30 19:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #96 from KC (kcohar@gmail.com) ---
The default is now -1.
https://unix.stackexchange.com/questions/756281/kernel-6-5-2-seems-to-have-amdgpu-crash-on-no-retry-page-fault
https://www.kernel.org/doc/html/v6.5/gpu/amdgpu/module-parameters.html

I set it to zero and I haven't had a single crash since (Fedora 39 beta,
Linux 6.5.5).
This one parameter change made my system entirely unusable (it would crash
very quickly after booting).


On Sat, Sep 30, 2023 at 9:35 PM <bugzilla-daemon@kernel.org> wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=201957
>
> --- Comment #95 from aspicer@gmail.com ---
> (In reply to KC from comment #94)
>
> Did you have it set to 1 previously? If not, I'm not sure if that was the
> silver bullet, because it looks like it defaults to 0.
> https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html
>
> mcbp (int)
>
> It is used to enable mid command buffer preemption. (0 = disabled
> (default), 1
> = enabled)
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are on the CC list for the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (95 preceding siblings ...)
  2023-09-30 19:47 ` bugzilla-daemon
@ 2023-10-21 14:29 ` bugzilla-daemon
  2023-10-22 17:35 ` bugzilla-daemon
  2023-10-23 17:22 ` bugzilla-daemon
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-10-21 14:29 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

jeremy boyd (jer@jerdboyd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jer@jerdboyd.com

--- Comment #97 from jeremy boyd (jer@jerdboyd.com) ---
Hello, I'm having this same issue with my thinkpad z16 laptop, Ryzen 6850H and
Radeon RX 6500M graphics card.

I do not use the laptop for gaming but for audio and video editing. I have not
had trouble with any video editing software but I can easily reproduce the
issue by loading up Ardour or Mixbus32C and either leaving it alone or working.
After 15 minutes the screen freezes although audio will continue for a time. At
this point Ardour or Mixbus will close and I can continue using the machine. If
I load up either program again it will fail again, usually within a couple
minutes and the whole laptop will freeze up until I ctrl-alt-F2 to get to a
terminal prompt.

The issue always happens when Im recording audio with an HDMI device attached
and 90% of the time without HDMI

I will attempt to set this kernel parameter amdgpu.mcbp=0 and report back.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (96 preceding siblings ...)
  2023-10-21 14:29 ` bugzilla-daemon
@ 2023-10-22 17:35 ` bugzilla-daemon
  2023-10-23 17:22 ` bugzilla-daemon
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-10-22 17:35 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

--- Comment #98 from jeremy boyd (jer@jerdboyd.com) ---
(In reply to jeremy boyd from comment #97)
> Hello, I'm having this same issue with my thinkpad z16 laptop, Ryzen 6850H
> and Radeon RX 6500M graphics card.
> 
> I do not use the laptop for gaming but for audio and video editing. I have
> not had trouble with any video editing software but I can easily reproduce
> the issue by loading up Ardour or Mixbus32C and either leaving it alone or
> working. After 15 minutes the screen freezes although audio will continue
> for a time. At this point Ardour or Mixbus will close and I can continue
> using the machine. If I load up either program again it will fail again,
> usually within a couple minutes and the whole laptop will freeze up until I
> ctrl-alt-F2 to get to a terminal prompt.
> 
> The issue always happens when Im recording audio with an HDMI device
> attached and 90% of the time without HDMI
> 
> I will attempt to set this kernel parameter amdgpu.mcbp=0 and report back.

I can confirm that this did not solve my problem. I tested my system out for
several hours with no issue and thought that perhaps it had been solved but
while doing a libreoffice presentation with my audio software running it
happened again. here is the error from journalctl

Oct 22 09:40:01 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=433823, emitted seq=433825
Oct 22 09:40:01 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process Xorg pid 2189 thread Xorg:cs0 pid 2319
Oct 22 09:40:01 fedora kernel: amdgpu 0000:67:00.0: amdgpu: GPU reset begin!
Oct 22 09:40:02 fedora kernel: amdgpu 0000:67:00.0: amdgpu: MODE2 reset
Oct 22 09:40:02 fedora kernel: amdgpu 0000:67:00.0: amdgpu: GPU reset
succeeded, trying to resume

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* [Bug 201957] amdgpu: ring gfx timeout
  2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
                   ` (97 preceding siblings ...)
  2023-10-22 17:35 ` bugzilla-daemon
@ 2023-10-23 17:22 ` bugzilla-daemon
  98 siblings, 0 replies; 100+ messages in thread
From: bugzilla-daemon @ 2023-10-23 17:22 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=201957

Mario Limonciello (AMD) (mario.limonciello@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mario.limonciello@amd.com

--- Comment #99 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
#98

The amdgpu.mcbp=0  will only help GFX9 products.  For GFX10 this is a different
problem, please open at AMD Gitlab.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2023-10-23 17:22 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-11  4:52 [Bug 201957] New: amdgpu: ring gfx timeout bugzilla-daemon
2018-12-11 14:57 ` [Bug 201957] " bugzilla-daemon
2018-12-11 18:18 ` bugzilla-daemon
2019-03-07  5:20 ` bugzilla-daemon
2019-03-07  5:24 ` bugzilla-daemon
2019-03-12 13:15 ` bugzilla-daemon
2019-04-01 18:20 ` bugzilla-daemon
2019-04-01 18:44 ` bugzilla-daemon
2019-08-20 15:06 ` bugzilla-daemon
2019-09-11  8:36 ` bugzilla-daemon
2019-09-20 11:37 ` bugzilla-daemon
2019-10-02 10:39 ` bugzilla-daemon
2019-10-11 22:00 ` bugzilla-daemon
2019-10-14 17:18 ` bugzilla-daemon
2019-10-24 16:39 ` bugzilla-daemon
2019-10-24 16:40 ` bugzilla-daemon
2019-10-27 18:44 ` bugzilla-daemon
2019-11-10  7:11 ` bugzilla-daemon
2019-11-25  9:43 ` bugzilla-daemon
2019-12-03 15:53 ` bugzilla-daemon
2019-12-03 16:07 ` bugzilla-daemon
2019-12-03 21:34 ` bugzilla-daemon
2019-12-04  9:54 ` bugzilla-daemon
2019-12-08 17:32 ` bugzilla-daemon
2020-01-02  8:30 ` bugzilla-daemon
2020-01-02  9:11 ` bugzilla-daemon
2020-01-19 17:03 ` bugzilla-daemon
2020-01-19 17:04 ` bugzilla-daemon
2020-01-19 17:04 ` bugzilla-daemon
2020-01-19 17:13 ` bugzilla-daemon
2020-04-04 21:54 ` bugzilla-daemon
2020-05-01  9:03 ` bugzilla-daemon
2020-05-01 19:52 ` bugzilla-daemon
2020-05-25 12:21 ` bugzilla-daemon
2020-06-19 19:11 ` bugzilla-daemon
2020-08-10 23:49 ` bugzilla-daemon
2020-09-01 14:00 ` bugzilla-daemon
2020-09-13 11:14 ` bugzilla-daemon
2020-11-23 16:27 ` bugzilla-daemon
2021-01-24 19:37 ` bugzilla-daemon
2021-01-24 22:26 ` bugzilla-daemon
2021-01-24 22:51 ` bugzilla-daemon
2021-01-24 22:56 ` bugzilla-daemon
2021-01-25 22:24 ` bugzilla-daemon
2021-01-26  3:22 ` bugzilla-daemon
2021-02-14 19:48 ` bugzilla-daemon
2021-02-28 12:35 ` bugzilla-daemon
2021-03-28 13:19 ` bugzilla-daemon
2021-08-22 20:01 ` bugzilla-daemon
2021-11-17  7:14 ` bugzilla-daemon
2021-11-26  2:09 ` bugzilla-daemon
2021-12-12 21:59 ` bugzilla-daemon
2021-12-22 20:33 ` bugzilla-daemon
2022-01-01  4:29 ` bugzilla-daemon
2022-01-09 18:06 ` bugzilla-daemon
2022-01-22 23:54 ` bugzilla-daemon
2022-01-22 23:56 ` bugzilla-daemon
2022-01-24 23:17 ` bugzilla-daemon
2022-01-25  8:56 ` bugzilla-daemon
2022-01-25 18:19 ` bugzilla-daemon
2022-01-25 18:49 ` bugzilla-daemon
2022-02-02 11:39 ` bugzilla-daemon
2022-02-03  1:37 ` bugzilla-daemon
2022-02-03  1:39 ` bugzilla-daemon
2022-02-03  3:42 ` bugzilla-daemon
2022-02-11 12:23 ` bugzilla-daemon
2022-02-24 23:40 ` bugzilla-daemon
2022-02-25 14:20 ` bugzilla-daemon
2022-05-05 15:19 ` bugzilla-daemon
2022-05-05 19:14 ` bugzilla-daemon
2022-06-11 22:06 ` bugzilla-daemon
2022-06-13  1:20 ` bugzilla-daemon
2022-06-20 12:03 ` bugzilla-daemon
2022-06-20 12:06 ` bugzilla-daemon
2022-06-22 12:56 ` bugzilla-daemon
2022-06-23 10:04 ` bugzilla-daemon
2022-06-23 10:26 ` bugzilla-daemon
2022-06-23 11:05 ` bugzilla-daemon
2022-06-23 11:44 ` bugzilla-daemon
2022-06-23 22:12 ` bugzilla-daemon
2022-06-29  2:58 ` bugzilla-daemon
2022-07-14 10:17 ` bugzilla-daemon
2022-07-17 10:28 ` bugzilla-daemon
2022-07-17 20:08 ` bugzilla-daemon
2022-08-11  2:59 ` bugzilla-daemon
2023-01-11  1:13 ` bugzilla-daemon
2023-05-23 10:27 ` bugzilla-daemon
2023-05-24  8:55 ` bugzilla-daemon
2023-08-15 12:33 ` bugzilla-daemon
2023-08-24 15:52 ` bugzilla-daemon
2023-09-21 22:38 ` bugzilla-daemon
2023-09-23  1:52 ` bugzilla-daemon
2023-09-30 10:25 ` bugzilla-daemon
2023-09-30 18:57 ` bugzilla-daemon
2023-09-30 19:08 ` bugzilla-daemon
2023-09-30 19:35 ` bugzilla-daemon
2023-09-30 19:47 ` bugzilla-daemon
2023-10-21 14:29 ` bugzilla-daemon
2023-10-22 17:35 ` bugzilla-daemon
2023-10-23 17:22 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).