All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
@ 2019-09-03 13:40 bugzilla-daemon
  2019-09-03 13:42 ` bugzilla-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-03 13:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 851 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

            Bug ID: 111551
           Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
                    timeout
           Product: DRI
           Version: XOrg git
          Hardware: ARM
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: 78666679@qq.com

The amdgpu(pollaries10, wx5100) drm drivers sometimes report:

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled
seq=24423862, emitted seq=24423865

and many threads run into disk sleeping state


kernel version: 4.19.36

mesa: 18.3.6

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
@ 2019-09-03 13:42 ` bugzilla-daemon
  2019-09-04  5:14 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-03 13:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 587 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

yanhua <78666679@qq.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |78666679@qq.com

--- Comment #1 from yanhua <78666679@qq.com> ---
Created attachment 145253
  --> https://bugs.freedesktop.org/attachment.cgi?id=145253&action=edit
dmesg output

grep drm dmesg.txt. there are sdma1 ring timout

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2048 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
  2019-09-03 13:42 ` bugzilla-daemon
@ 2019-09-04  5:14 ` bugzilla-daemon
  2019-09-04 11:45 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-04  5:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 390 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

--- Comment #2 from yanhua <78666679@qq.com> ---
Created attachment 145260
  --> https://bugs.freedesktop.org/attachment.cgi?id=145260&action=edit
The previous  dmesg.txt has  messages  been overwriten. from the dmesg-full.txt
can see more information

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1487 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
  2019-09-03 13:42 ` bugzilla-daemon
  2019-09-04  5:14 ` bugzilla-daemon
@ 2019-09-04 11:45 ` bugzilla-daemon
  2019-09-04 12:26 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-04 11:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 564 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

--- Comment #3 from Christian König <christian.koenig@amd.com> ---
As far as I can see this is a really large box with multiple GPUs installed.

The SDMA rarely locks up, especially not while executing page table updates. So
there is most likely something wrong with the hardware here.

Are you sure that the power supply is large enough for that system?

What system/platform is that? Could this be a coherency problem?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1380 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-09-04 11:45 ` bugzilla-daemon
@ 2019-09-04 12:26 ` bugzilla-daemon
  2019-09-04 12:35 ` bugzilla-daemon
  2019-09-04 12:50 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-04 12:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 699 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

--- Comment #4 from yanhua <78666679@qq.com> ---
I have asked hardware team, they have tested, and can be sure there are no
power supply problem.


The system is arm64 with 64 cores. and there are three amdgpu card in the
board.


there are rarely gfx timeout, sdma timeout, and vce timeout. When the ring
timeout occur, we can use amd supplied tools umr to read chip registers. can we
know the real cause from the register value?

with the coherency problem you said, I think if that was true. the problem
should occur more frequently. I'm not sure.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1496 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-09-04 12:26 ` bugzilla-daemon
@ 2019-09-04 12:35 ` bugzilla-daemon
  2019-09-04 12:50 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-04 12:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 724 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

Christian König <christian.koenig@amd.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|NEW                         |RESOLVED

--- Comment #5 from Christian König <christian.koenig@amd.com> ---
amdgpu is known to not work on arm64 until very recently.

So it is not a supprise that this isn't working. Please switch to a newer
kernel and re-test.

Apart from that there isn't much we can do about it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2284 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
  2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-09-04 12:35 ` bugzilla-daemon
@ 2019-09-04 12:50 ` bugzilla-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2019-09-04 12:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 304 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=111551

--- Comment #6 from yanhua <78666679@qq.com> ---
As far as I know, arm64 does not support wc memory. and We have already turn
the wc flag as newer kernel version does.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1157 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-04 12:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03 13:40 [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout bugzilla-daemon
2019-09-03 13:42 ` bugzilla-daemon
2019-09-04  5:14 ` bugzilla-daemon
2019-09-04 11:45 ` bugzilla-daemon
2019-09-04 12:26 ` bugzilla-daemon
2019-09-04 12:35 ` bugzilla-daemon
2019-09-04 12:50 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.