From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 111481] AMD Navi GPU frequent freezes on both Manjaro/Ubuntu with kernel 5.3 and mesa 19.2 -git/llvm9 Date: Tue, 10 Sep 2019 21:02:44 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0757206598==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 20D346E9B1 for ; Tue, 10 Sep 2019 21:02:45 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0757206598== Content-Type: multipart/alternative; boundary="15681493651.3A67Fb3Cf.6020" Content-Transfer-Encoding: 7bit --15681493651.3A67Fb3Cf.6020 Date: Tue, 10 Sep 2019 21:02:45 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D111481 --- Comment #36 from Sebastian Meyer --- Created attachment 145326 --> https://bugs.freedesktop.org/attachment.cgi?id=3D145326&action=3Dedit umr output of sdma0/sdma1 after RotTR freeze Applied the provided WIP patch to linux-mainline 5.3-rc8 and started RotTR again in order to trigger a system freeze. This time I also got a ring sdma0 and sdma1 timeout: [ 632.175837] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting = for fences timed out or interrupted! [ 632.175973] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting = for fences timed out or interrupted! [ 637.299049] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=3D313757, emitted seq=3D313759 [ 637.299110] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati= on: process RiseOfTheTombRa pid 2584 thread RiseOfTheT:cs0 pid 2590 [ 637.299111] [drm] GPU recovery disabled. [ 646.468871] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou= t, signaled seq=3D278259, emitted seq=3D278263 [ 646.468961] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeou= t, signaled seq=3D21116, emitted seq=3D21119 [ 646.469052] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati= on: process pid 0 thread pid 0 [ 646.469141] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati= on: process plasmashell pid 989 thread plasmashel:cs0 pid 1155 [ 646.469141] [drm] GPU recovery disabled. [ 646.469142] [drm] GPU recovery disabled. Stdout of `umr -R sdma0` and `umr -R sdma1` is attached to this post, howev= er, I also got a couple of stderr messages like "[ERROR]: No valid mapping for 3@800000023f00" which I didn't include in the output. --=20 You are receiving this mail because: You are the assignee for the bug.= --15681493651.3A67Fb3Cf.6020 Date: Tue, 10 Sep 2019 21:02:45 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 36 on bug 11148= 1 from Sebastian Meyer
Created attachment 145326 [details]
umr output of sdma0/sdma1 after RotTR freeze

Applied the provided WIP patch to linux-mainline 5.3-rc8 and started RotTR
again in order to trigger a system freeze.
This time I also got a ring sdma0 and sdma1 timeout:

[  632.175837] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting =
for
fences timed out or interrupted!
[  632.175973] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting =
for
fences timed out or interrupted!
[  637.299049] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0
timeout, signaled seq=3D313757, emitted seq=3D313759
[  637.299110] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati=
on:
process RiseOfTheTombRa pid 2584 thread RiseOfTheT:cs0 pid 2590
[  637.299111] [drm] GPU recovery disabled.
[  646.468871] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D278259, emitted seq=3D278263
[  646.468961] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeou=
t,
signaled seq=3D21116, emitted seq=3D21119
[  646.469052] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati=
on:
process  pid 0 thread  pid 0
[  646.469141] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process informati=
on:
process plasmashell pid 989 thread plasmashel:cs0 pid 1155
[  646.469141] [drm] GPU recovery disabled.
[  646.469142] [drm] GPU recovery disabled.

Stdout of `umr -R sdma0` and `umr -R sdma1` is attached to this post, howev=
er,
I also got a couple of stderr messages like "[ERROR]: No valid mapping=
 for
3@800000023f00" which I didn't include in the output.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15681493651.3A67Fb3Cf.6020-- --===============0757206598== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0757206598==--