From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Tue, 03 Sep 2019 13:40:26 +0000
Message-ID:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0336710374=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id A065B8954A
for ; Tue, 3 Sep 2019 13:40:26 +0000 (UTC)
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0336710374==
Content-Type: multipart/alternative; boundary="15675180261.9916ad1e.13852"
Content-Transfer-Encoding: 7bit
--15675180261.9916ad1e.13852
Date: Tue, 3 Sep 2019 13:40:26 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
Bug ID: 111551
Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Product: DRI
Version: XOrg git
Hardware: ARM
OS: Linux (All)
Status: NEW
Severity: major
Priority: not set
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: 78666679@qq.com
The amdgpu(pollaries10, wx5100) drm drivers sometimes report:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled
seq=3D24423862, emitted seq=3D24423865
and many threads run into disk sleeping state
kernel version: 4.19.36
mesa: 18.3.6
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15675180261.9916ad1e.13852
Date: Tue, 3 Sep 2019 13:40:26 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Bug ID |
111551
|
Summary |
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout
|
Product |
DRI
|
Version |
XOrg git
|
Hardware |
ARM
|
OS |
Linux (All)
|
Status |
NEW
|
Severity |
major
|
Priority |
not set
|
Component |
DRM/AMDgpu
|
Assignee |
dri-devel@lists.freedesktop.org
|
Reporter |
78666679@qq.com
|
The amdgpu(pollaries10, wx5100) drm drivers sometimes report:
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled
seq=3D24423862, emitted seq=3D24423865
and many threads run into disk sleeping state
kernel version: 4.19.36
mesa: 18.3.6
You are receiving this mail because:
- You are the assignee for the bug.
=
--15675180261.9916ad1e.13852--
--===============0336710374==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0336710374==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Tue, 03 Sep 2019 13:42:45 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0150849971=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id 23D86892BD
for ; Tue, 3 Sep 2019 13:42:47 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0150849971==
Content-Type: multipart/alternative; boundary="15675181660.2caeaebC.13850"
Content-Transfer-Encoding: 7bit
--15675181660.2caeaebC.13850
Date: Tue, 3 Sep 2019 13:42:46 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
yanhua <78666679@qq.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |78666679@qq.com
--- Comment #1 from yanhua <78666679@qq.com> ---
Created attachment 145253
--> https://bugs.freedesktop.org/attachment.cgi?id=3D145253&action=3Dedit
dmesg output
grep drm dmesg.txt. there are sdma1 ring timout
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15675181660.2caeaebC.13850
Date: Tue, 3 Sep 2019 13:42:46 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
yanhua
changed
bug 11155=
1
What |
Removed |
Added |
CC |
|
78666679@qq.com
|
You are receiving this mail because:
- You are the assignee for the bug.
=
--15675181660.2caeaebC.13850--
--===============0150849971==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0150849971==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Wed, 04 Sep 2019 05:14:56 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0536380396=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 71C58899B5
for ; Wed, 4 Sep 2019 05:14:56 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0536380396==
Content-Type: multipart/alternative; boundary="15675740961.6Bf0.25462"
Content-Transfer-Encoding: 7bit
--15675740961.6Bf0.25462
Date: Wed, 4 Sep 2019 05:14:56 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
--- Comment #2 from yanhua <78666679@qq.com> ---
Created attachment 145260
--> https://bugs.freedesktop.org/attachment.cgi?id=3D145260&action=3Dedit
The previous dmesg.txt has messages been overwriten. from the dmesg-full=
.txt
can see more information
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15675740961.6Bf0.25462
Date: Wed, 4 Sep 2019 05:14:56 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
You are receiving this mail because:
- You are the assignee for the bug.
=
--15675740961.6Bf0.25462--
--===============0536380396==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0536380396==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Wed, 04 Sep 2019 11:45:27 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0984323435=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id A89438989C
for ; Wed, 4 Sep 2019 11:45:27 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============0984323435==
Content-Type: multipart/alternative; boundary="15675975272.7d7D0.773"
Content-Transfer-Encoding: 7bit
--15675975272.7d7D0.773
Date: Wed, 4 Sep 2019 11:45:27 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
--- Comment #3 from Christian K=C3=B6nig ---
As far as I can see this is a really large box with multiple GPUs installed.
The SDMA rarely locks up, especially not while executing page table updates=
. So
there is most likely something wrong with the hardware here.
Are you sure that the power supply is large enough for that system?
What system/platform is that? Could this be a coherency problem?
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15675975272.7d7D0.773
Date: Wed, 4 Sep 2019 11:45:27 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Commen=
t # 3
on bug 11155=
1
from Christian K=C3=B6nig
As far as I can see this is a really large box with multiple G=
PUs installed.
The SDMA rarely locks up, especially not while executing page table updates=
. So
there is most likely something wrong with the hardware here.
Are you sure that the power supply is large enough for that system?
What system/platform is that? Could this be a coherency problem?
You are receiving this mail because:
- You are the assignee for the bug.
=
--15675975272.7d7D0.773--
--===============0984323435==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============0984323435==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Wed, 04 Sep 2019 12:26:49 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1592578243=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 8EA6A89483
for ; Wed, 4 Sep 2019 12:26:49 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1592578243==
Content-Type: multipart/alternative; boundary="15676000091.99bfc20a.7378"
Content-Transfer-Encoding: 7bit
--15676000091.99bfc20a.7378
Date: Wed, 4 Sep 2019 12:26:49 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
--- Comment #4 from yanhua <78666679@qq.com> ---
I have asked hardware team, they have tested, and can be sure there are no
power supply problem.
The system is arm64 with 64 cores. and there are three amdgpu card in the
board.
there are rarely gfx timeout, sdma timeout, and vce timeout. When the ring
timeout occur, we can use amd supplied tools umr to read chip registers. ca=
n we
know the real cause from the register value?
with the coherency problem you said, I think if that was true. the problem
should occur more frequently. I'm not sure.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15676000091.99bfc20a.7378
Date: Wed, 4 Sep 2019 12:26:49 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Commen=
t # 4
on bug 11155=
1
from yanhua
I have asked hardware team, they have tested, and can be sure =
there are no
power supply problem.
The system is arm64 with 64 cores. and there are three amdgpu card in the
board.
there are rarely gfx timeout, sdma timeout, and vce timeout. When the ring
timeout occur, we can use amd supplied tools umr to read chip registers. ca=
n we
know the real cause from the register value?
with the coherency problem you said, I think if that was true. the problem
should occur more frequently. I'm not sure.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15676000091.99bfc20a.7378--
--===============1592578243==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1592578243==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Wed, 04 Sep 2019 12:35:52 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1059602337=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id 2E38989A32
for ; Wed, 4 Sep 2019 12:35:52 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1059602337==
Content-Type: multipart/alternative; boundary="15676005520.18E1125.7896"
Content-Transfer-Encoding: 7bit
--15676005520.18E1125.7896
Date: Wed, 4 Sep 2019 12:35:52 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
Christian K=C3=B6nig changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|NEW |RESOLVED
--- Comment #5 from Christian K=C3=B6nig ---
amdgpu is known to not work on arm64 until very recently.
So it is not a supprise that this isn't working. Please switch to a newer
kernel and re-test.
Apart from that there isn't much we can do about it.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15676005520.18E1125.7896
Date: Wed, 4 Sep 2019 12:35:52 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Christian K=C3=B6nig
changed
bug 11155=
1
What |
Removed |
Added |
Resolution |
---
|
INVALID
|
Status |
NEW
|
RESOLVED
|
Commen=
t # 5
on bug 11155=
1
from Christian K=C3=B6nig
amdgpu is known to not work on arm64 until very recently.
So it is not a supprise that this isn't working. Please switch to a newer
kernel and re-test.
Apart from that there isn't much we can do about it.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15676005520.18E1125.7896--
--===============1059602337==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1059602337==--
From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 111551] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1
timeout
Date: Wed, 04 Sep 2019 12:50:15 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1387536037=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[IPv6:2610:10:20:722:a800:ff:fe98:4b55])
by gabe.freedesktop.org (Postfix) with ESMTP id D6D6F899B0
for ; Wed, 4 Sep 2019 12:50:14 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1387536037==
Content-Type: multipart/alternative; boundary="15676014141.44b8f.11419"
Content-Transfer-Encoding: 7bit
--15676014141.44b8f.11419
Date: Wed, 4 Sep 2019 12:50:14 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D111551
--- Comment #6 from yanhua <78666679@qq.com> ---
As far as I know, arm64 does not support wc memory. and We have already turn
the wc flag as newer kernel version does.
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15676014141.44b8f.11419
Date: Wed, 4 Sep 2019 12:50:14 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Commen=
t # 6
on bug 11155=
1
from yanhua
As far as I know, arm64 does not support wc memory. and We hav=
e already turn
the wc flag as newer kernel version does.
You are receiving this mail because:
- You are the assignee for the bug.
=
--15676014141.44b8f.11419--
--===============1387536037==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs
--===============1387536037==--