All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 204683] New: amdgpu: ring sdma0 timeout
@ 2019-08-24  9:28 bugzilla-daemon
  2019-08-25  8:44 ` [Bug 204683] " bugzilla-daemon
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-24  9:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

            Bug ID: 204683
           Summary: amdgpu: ring sdma0 timeout
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.3.0-rc5
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: mh@familie-heinz.name
        Regression: No

Hi,

when playing some games I randomly (sometimes after 5 minutes, sometimes after
2 hours) get a blank screen, sometimes audio still works, sometimes the whole
system locks up. I've seen this with Rise of the Tomb Raider and 7 Days to Die
so far.

I finally managed to sync the log files to disk to get an error, before whole
thing locked up:

Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=368056, emitted seq=368057
Aug 24 11:13:33 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process 7DaysToDie.x86_ pid 8108 thread 7DaysToDie:cs0
Aug 24 11:13:33 egalite kernel: amdgpu 0000:0c:00.0: GPU reset begin!
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, but soft recovered

Only a hard reset made me recover from that.


This is with a self-built kernel 5.3.0-rc5. Also happens with 5.2.1.
Mesa: 19.1.4-1
GPU: Vega 56

Best
Matthias

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
@ 2019-08-25  8:44 ` bugzilla-daemon
  2019-08-26 15:25 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-25  8:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #1 from Matthias Heinz (mh@familie-heinz.name) ---
So I tried the oldest installed kernel (a 4.19) I could find and didn't have
any problems with it. This seems to be a regression.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
  2019-08-25  8:44 ` [Bug 204683] " bugzilla-daemon
@ 2019-08-26 15:25 ` bugzilla-daemon
  2019-08-26 16:25 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-26 15:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

Alex Deucher (alexdeucher@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexdeucher@gmail.com

--- Comment #2 from Alex Deucher (alexdeucher@gmail.com) ---
Can you bisect?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
  2019-08-25  8:44 ` [Bug 204683] " bugzilla-daemon
  2019-08-26 15:25 ` bugzilla-daemon
@ 2019-08-26 16:25 ` bugzilla-daemon
  2019-08-28 19:40 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-26 16:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #3 from Matthias Heinz (mh@familie-heinz.name) ---
Already on it. It seems to be somewhere between 5.0.2 and 5.1.21.

This will take a while. Can take some hours to trigger it...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-08-26 16:25 ` bugzilla-daemon
@ 2019-08-28 19:40 ` bugzilla-daemon
  2019-08-28 19:41 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-28 19:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #4 from Matthias Heinz (mh@familie-heinz.name) ---
I still have 11 steps to go (as I mentioned it's a pretty lengthy task), but I
got some more debug output, before the system stopped working. Please see the
attached file, maybe it has some clues what's going wrong.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-08-28 19:40 ` bugzilla-daemon
@ 2019-08-28 19:41 ` bugzilla-daemon
  2019-09-05 12:08 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-08-28 19:41 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #5 from Matthias Heinz (mh@familie-heinz.name) ---
Created attachment 284667
  --> https://bugzilla.kernel.org/attachment.cgi?id=284667&action=edit
Kernel trace

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-08-28 19:41 ` bugzilla-daemon
@ 2019-09-05 12:08 ` bugzilla-daemon
  2019-09-06 20:22 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-09-05 12:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #6 from Matthias Heinz (mh@familie-heinz.name) ---
I had to switch to drm-next to do further bisecting and I think
634092b1b9f67bea23a87b77880df5e8012a411a is causing the problem.

I might be wrong though.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-09-05 12:08 ` bugzilla-daemon
@ 2019-09-06 20:22 ` bugzilla-daemon
  2019-09-07 13:44 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-09-06 20:22 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #7 from Soeren Grunewald (soeren.grunewald@gmx.net) ---
Created attachment 284869
  --> https://bugzilla.kernel.org/attachment.cgi?id=284869&action=edit
Kernel trace

It seems I have the same issue. I run on fedora 30 with testing-updates
enabled. The GPU is a Sapphire Pulse RX 56.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-09-06 20:22 ` bugzilla-daemon
@ 2019-09-07 13:44 ` bugzilla-daemon
  2019-09-12 12:51 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-09-07 13:44 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #8 from Matthias Heinz (mh@familie-heinz.name) ---
I was wrong. 2c3cd66f4c66, which is the predecessor of 634092b1b9f6 just
crashed on me. Well, back to the drawing board... (eh, bisecting)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-09-07 13:44 ` bugzilla-daemon
@ 2019-09-12 12:51 ` bugzilla-daemon
  2019-09-12 22:18 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-09-12 12:51 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #9 from Matthias Heinz (mh@familie-heinz.name) ---
A small update.

I managed to go down even further. I'm currently at e6d2421343a7 in drm-next
and I see the following error:

Sep 12 14:32:44 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Sep 12 14:32:44 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, signaled seq=1023042, emitted seq=1023043
Sep 12 14:32:44 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process 7DaysToDie.x86_ pid 6696 thread 7DaysToDie:cs0 pid
6698

Now it looks a lot like #201957, but I have no problems with kernels before and
5.0. It started with 5.1. So I'm not sure how similar it is.


I have one last idea what to do. The commit before e6d2421343a7 results in a
similar problem, but the display doesn't go blank and to standby. Only the
picture freezes and that's it.
I will try to find the commit that results in this bug and then see, if the
kernel of the commit before that one still has my main problem in it. If not
I'll post the range, probably somewhere inbetween is the error itself hidden.
Otherwise testing is not possible, since it freezes pretty fast and the ring
timeout bug takes up to 45 minutes to appear.

Since I'm at the 40th kernels so far, any help or even a hint is highly
appreciated. (I could use a faster testing solution.)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-09-12 12:51 ` bugzilla-daemon
@ 2019-09-12 22:18 ` bugzilla-daemon
  2019-10-05  9:25 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-09-12 22:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #10 from Matthias Heinz (mh@familie-heinz.name) ---
Created attachment 284945
  --> https://bugzilla.kernel.org/attachment.cgi?id=284945&action=edit
Kernel trace

Update 2 for today.

With de00d253bc85, which is the predecessor of e6d2421343a7, I get this kernel
bug.

I have never seen this one after de00d253bc85, so my guess is that e6d2421343a7
fixes it partially.

I will now start a bisect starting with the last known good kernel and
de00d253bc85 and try to figure out when this one was introduced. (Back to
kernel backing hell...)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-09-12 22:18 ` bugzilla-daemon
@ 2019-10-05  9:25 ` bugzilla-daemon
  2019-10-11 15:02 ` bugzilla-daemon
  2019-10-14 17:18 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-10-05  9:25 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #11 from Matthias Heinz (mh@familie-heinz.name) ---
The bug is still present in linux 5.3.

Also I'm not done yet bisecting. But the older kernels seem to have nasty fs
bugs and I'm not sure if I'm really willing to put my data on the line for
this. Is there really no other way to figure out where this originates from?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-10-05  9:25 ` bugzilla-daemon
@ 2019-10-11 15:02 ` bugzilla-daemon
  2019-10-14 17:18 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-10-11 15:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

--- Comment #12 from Matthias Heinz (mh@familie-heinz.name) ---
My last update, because I have no way to go forward from here on.

This bug seems to go way back longer than I initially thought. I'm currently at
"drm-fixes-2018-08-31" in linux-drm and it's already in there, so it's probably
pretty old.

I can't use any older kernel, because I need steam to run the games to test
this. But steam wont work with anything older than 4.19.

BUT I found a game that almost instantly triggers this bug on startup:
Insurgency. 

Just start it and if that doesn't trigger it immediately, quit the game and
start it again. It can take two to three times, joining a match helps, too, but
it takes less than 5 minutes for each test.

So, please go ahead and fix this already, it's annoying.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug 204683] amdgpu: ring sdma0 timeout
  2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-10-11 15:02 ` bugzilla-daemon
@ 2019-10-14 17:18 ` bugzilla-daemon
  12 siblings, 0 replies; 14+ messages in thread
From: bugzilla-daemon @ 2019-10-14 17:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=204683

Matthias Heinz (mh@familie-heinz.name) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #13 from Matthias Heinz (mh@familie-heinz.name) ---


*** This bug has been marked as a duplicate of bug 201957 ***

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-10-14 17:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-24  9:28 [Bug 204683] New: amdgpu: ring sdma0 timeout bugzilla-daemon
2019-08-25  8:44 ` [Bug 204683] " bugzilla-daemon
2019-08-26 15:25 ` bugzilla-daemon
2019-08-26 16:25 ` bugzilla-daemon
2019-08-28 19:40 ` bugzilla-daemon
2019-08-28 19:41 ` bugzilla-daemon
2019-09-05 12:08 ` bugzilla-daemon
2019-09-06 20:22 ` bugzilla-daemon
2019-09-07 13:44 ` bugzilla-daemon
2019-09-12 12:51 ` bugzilla-daemon
2019-09-12 22:18 ` bugzilla-daemon
2019-10-05  9:25 ` bugzilla-daemon
2019-10-11 15:02 ` bugzilla-daemon
2019-10-14 17:18 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.