From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 20 Aug 2017 22:53:09 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1033773675==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 6A5286E045 for ; Sun, 20 Aug 2017 22:53:09 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1033773675== Content-Type: multipart/alternative; boundary="15032695890.efBB.32196"; charset="UTF-8" --15032695890.efBB.32196 Date: Sun, 20 Aug 2017 22:53:09 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 Bug ID: 102322 Summary: System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Product: DRI Version: DRI git Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: critical Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: jb5sgc1n.nya@20mm.eu I consistently experience complete system crashes when browsing web pages u= sing firefox for about 30 minutes, with the following dmesg output from the amdg= pu driver: [ 2330.720711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou= t, last signaled seq=3D40778, last emitted seq=3D40780 [ 2330.720768] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeou= t, last signaled seq=3D31305, last emitted seq=3D31306 [ 2330.720771] [drm] IP block:gmc_v8_0 is hung! [ 2330.720774] [drm] IP block:gmc_v8_0 is hung! [ 2330.720775] [drm] IP block:sdma_v3_0 is hung! [ 2330.720778] [drm] IP block:sdma_v3_0 is hung! (Above cited messages are the last to make it to a network-filesystem by running "dmesg -w" before the system stops to do anything.) I am running a kernel compiled from https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next as = of "commit 94097b0f7f1bfa54b3b1f8b0d74bbd271a0564e4" (so the very latest as of today). My GPU is an RX 460. Notice that this bug may be the same symptom as reported in https://bugs.freedesktop.org/show_bug.cgi?id=3D98874 However, the system crashes for me occur usually while vertically scrolling through some (ordinary) web page. --=20 You are receiving this mail because: You are the assignee for the bug.= --15032695890.efBB.32196 Date: Sun, 20 Aug 2017 22:53:09 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 102322
Summary System crashes after "[drm] IP block:gmc_v8_0 is hung!&q= uot; / [drm] IP block:sdma_v3_0 is hung!
Product DRI
Version DRI git
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity critical
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter jb5sgc1n.nya@20mm.eu

I consistently experience complete system crashes when browsin=
g web pages using
firefox for about 30 minutes, with the following dmesg output from the amdg=
pu
driver:

[ 2330.720711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
last signaled seq=3D40778, last emitted seq=3D40780
[ 2330.720768] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeou=
t,
last signaled seq=3D31305, last emitted seq=3D31306
[ 2330.720771] [drm] IP block:gmc_v8_0 is hung!
[ 2330.720774] [drm] IP block:gmc_v8_0 is hung!
[ 2330.720775] [drm] IP block:sdma_v3_0 is hung!
[ 2330.720778] [drm] IP block:sdma_v3_0 is hung!

(Above cited messages are the last to make it to a network-filesystem by
running "dmesg -w" before the system stops to do anything.)

I am running a kernel compiled from
https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm=
-next as of
"commit 94097b0f7f1bfa54b3b1f8b0d74bbd271a0564e4" (so the very la=
test as of
today).
My GPU is an RX 460.

Notice that this bug may be the same symptom as reported in
https://bugs.freedesktop.org/show_bug.c=
gi?id=3D98874

However, the system crashes for me occur usually while vertically scrolling
through some (ordinary) web page.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15032695890.efBB.32196-- --===============1033773675== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1033773675==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 19 Nov 2017 16:40:30 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1977814054==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 06FF16E03C for ; Sun, 19 Nov 2017 16:40:30 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1977814054== Content-Type: multipart/alternative; boundary="15111096290.aA07f.7901"; charset="UTF-8" --15111096290.aA07f.7901 Date: Sun, 19 Nov 2017 16:40:29 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #1 from dwagner --- Sadly, not only did this bug not attract any attention, it also still occur= s, and seemingly even more frequent than before, on current bleeding-edge kern= els from amd-staging-drm-next, and also with the now current Firefox 57 and the= now current versions of Xorg, Mesa etc. from Arch Linux. --=20 You are receiving this mail because: You are the assignee for the bug.= --15111096290.aA07f.7901 Date: Sun, 19 Nov 2017 16:40:29 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 1 on bug 10232= 2 from dwagner
Sadly, not only did this bug not attract any attention, it als=
o still occurs,
and seemingly even more frequent than before, on current bleeding-edge kern=
els
from amd-staging-drm-next, and also with the now current Firefox 57 and the=
 now
current versions of Xorg, Mesa etc. from Arch Linux.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15111096290.aA07f.7901-- --===============1977814054== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1977814054==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 24 Feb 2018 18:36:55 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2039658316==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id DF18F6E00A for ; Sat, 24 Feb 2018 18:36:54 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2039658316== Content-Type: multipart/alternative; boundary="15194974140.FFcB4.24107" Content-Transfer-Encoding: 7bit --15194974140.FFcB4.24107 Date: Sat, 24 Feb 2018 18:36:54 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #2 from dwagner --- Just to mention this once again: These system crashes still occur, and way = too frequently to consider the amdgpu driver stable enough for professional use. Sample dmesg output from today: Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D5430589, last emitted seq=3D5430591 Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung! Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung! Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeo= ut, last signaled seq=3D185928, last emitted seq=3D185930 Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung! Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung! Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc= -0] hw_done or flip_done timed out --=20 You are receiving this mail because: You are the assignee for the bug.= --15194974140.FFcB4.24107 Date: Sat, 24 Feb 2018 18:36:54 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 2 on bug 10232= 2 from dwagner
Just to mention this once again: These system crashes still oc=
cur, and way too
frequently to consider the amdgpu driver stable enough for professional use.
Sample dmesg output from today:

Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=3D5430589, last emitted seq=3D5430591
Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung!
Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung!
Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeo=
ut,
last signaled seq=3D185928, last emitted seq=3D185930
Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung!
Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung!
Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc=
-0]
hw_done or flip_done timed out


You are receiving this mail because:
  • You are the assignee for the bug.
= --15194974140.FFcB4.24107-- --===============2039658316== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2039658316==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 03 Jun 2018 21:00:01 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1413377485==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 3CDAB6E388 for ; Sun, 3 Jun 2018 21:00:01 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1413377485== Content-Type: multipart/alternative; boundary="15280596010.651AB80b.11556" Content-Transfer-Encoding: 7bit --15280596010.651AB80b.11556 Date: Sun, 3 Jun 2018 21:00:01 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #3 from dwagner --- Just for the record, others have reported similar symptoms - here is a rece= nt example: https://bugs.freedesktop.org/show_bug.cgi?id=3D106666 --=20 You are receiving this mail because: You are the assignee for the bug.= --15280596010.651AB80b.11556 Date: Sun, 3 Jun 2018 21:00:01 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 3 on bug 10232= 2 from dwagner
Just for the record, others have reported similar symptoms - h=
ere is a recent
example: https://bugs.freedesktop.org/show_bug.=
cgi?id=3D106666


You are receiving this mail because:
  • You are the assignee for the bug.
= --15280596010.651AB80b.11556-- --===============1413377485== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1413377485==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 03 Jun 2018 21:02:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0216497497==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 661996E38E for ; Sun, 3 Jun 2018 21:02:41 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0216497497== Content-Type: multipart/alternative; boundary="15280597611.9de572c.11762" Content-Transfer-Encoding: 7bit --15280597611.9de572c.11762 Date: Sun, 3 Jun 2018 21:02:41 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #4 from dwagner --- I was asked in https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-sou= rce-amd-linux/1027705-amdgpu-on-linux-4-18-to-offer-greater-vega-power-savi= ngs-displayport-1-4-fixes?p=3D1027933#post1027933 to mention here that I have experienced this kind of bug only when using the "new" display code (amdgpu.dc=3D1). I cannot strictly rule out that it could also happen with dc=3D0, since I h= ave tried dc=3D0 only for short periods occasionally, but during those periods = I did not see this kind of crash. --=20 You are receiving this mail because: You are the assignee for the bug.= --15280597611.9de572c.11762 Date: Sun, 3 Jun 2018 21:02:41 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 4 on bug 10232= 2 from dwagner
I was asked in
https://www.p=
horonix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux=
/1027705-amdgpu-on-linux-4-18-to-offer-greater-vega-power-savings-displaypo=
rt-1-4-fixes?p=3D1027933#post1027933
to mention here that I have experienced this kind of bug only when using the
"new" display code (amdgpu.dc=3D1).

I cannot strictly rule out that it could also happen with dc=3D0, since I h=
ave
tried dc=3D0 only for short periods occasionally, but during those periods =
I did
not see this kind of crash.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15280597611.9de572c.11762-- --===============0216497497== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0216497497==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 25 Jun 2018 21:43:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2096761811==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id A71EF6E0C4 for ; Mon, 25 Jun 2018 21:43:03 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2096761811== Content-Type: multipart/alternative; boundary="15299629830.Ebcfa0.16861" Content-Transfer-Encoding: 7bit --15299629830.Ebcfa0.16861 Date: Mon, 25 Jun 2018 21:43:03 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #5 from dwagner --- Just for the record: To rule out my personally compiled kernels are somehow "more buggy than what others compile", I tried the current Arch-Linux-suppl= ied Linux 4.17.2-1-ARCH kernel. Survives about 5 minutes of Firefox-browsing between crashes with: Jun 20 00:01:11 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng sdma0 timeout, last signaled seq=3D1895, last em> Jun 20 00:01:11 ryzen kernel: [drm] IP block:gmc_v8_0 is hung! (4.13.* did at least survive days.) --=20 You are receiving this mail because: You are the assignee for the bug.= --15299629830.Ebcfa0.16861 Date: Mon, 25 Jun 2018 21:43:03 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 5 on bug 10232= 2 from dwagner
Just for the record: To rule out my personally compiled kernel=
s are somehow
"more buggy than what others compile", I tried the current Arch-L=
inux-supplied
Linux 4.17.2-1-ARCH kernel.

Survives about 5 minutes of Firefox-browsing between crashes with:

Jun 20 00:01:11 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
sdma0 timeout, last signaled seq=3D1895, last em>
Jun 20 00:01:11 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!

(4.13.* did at least survive days.)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15299629830.Ebcfa0.16861-- --===============2096761811== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2096761811==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 25 Jun 2018 22:11:14 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1490146048==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id EC8676E1BC for ; Mon, 25 Jun 2018 22:11:14 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1490146048== Content-Type: multipart/alternative; boundary="15299646741.dC411.23401" Content-Transfer-Encoding: 7bit --15299646741.dC411.23401 Date: Mon, 25 Jun 2018 22:11:14 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #6 from Andrey Grodzovsky --- Verify you are using latest AMD firmware and up to date MESA/LLVM Firmware here (amdgpu folder) - https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ Andrey --=20 You are receiving this mail because: You are the assignee for the bug.= --15299646741.dC411.23401 Date: Mon, 25 Jun 2018 22:11:14 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 6 on bug 10232= 2 from Andrey Grodzovsky
Verify you are using latest AMD firmware and up to date MESA/L=
LVM

Firmware here  (amdgpu folder) -
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linu=
x-firmware.git/

Andrey


You are receiving this mail because:
  • You are the assignee for the bug.
= --15299646741.dC411.23401-- --===============1490146048== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1490146048==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 25 Jun 2018 23:08:10 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1727039950==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 24C526E1CD for ; Mon, 25 Jun 2018 23:08:10 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1727039950== Content-Type: multipart/alternative; boundary="15299680900.F814.5144" Content-Transfer-Encoding: 7bit --15299680900.F814.5144 Date: Mon, 25 Jun 2018 23:08:10 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #7 from dwagner --- (In reply to Andrey Grodzovsky from comment #6) > Verify you are using latest AMD firmware and up to date MESA/LLVM Firmware: pacman -Q linux-firmware linux-firmware 20180606.d114732-1 ll /usr/lib/firmware/amdgpu/vega10_vce.bin -rw-r--r-- 1 root root 165344 Jun 7 08:01 /usr/lib/firmware/amdgpu/vega10_vce.bin MESA: pacman -Q mesa mesa 18.1.2-1 LLVM: pacman -Q llvm-libs llvm-libs 6.0.0-4 Is this new enough? BTW: In a forum somebody asked what the dmesg output on crash looked like i= f I enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of output,= but still a fatal system crash: Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279 Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung! Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung! Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out --=20 You are receiving this mail because: You are the assignee for the bug.= --15299680900.F814.5144 Date: Mon, 25 Jun 2018 23:08:10 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 7 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #6)
> Verify you are using latest AMD firmware and up =
to date MESA/LLVM

Firmware:

pacman -Q linux-firmware
linux-firmware 20180606.d114732-1

ll  /usr/lib/firmware/amdgpu/vega10_vce.bin
-rw-r--r-- 1 root root 165344 Jun  7 08:01
/usr/lib/firmware/amdgpu/vega10_vce.bin


MESA:

pacman -Q mesa
mesa 18.1.2-1


LLVM:
pacman -Q llvm-libs
llvm-libs 6.0.0-4

Is this new enough?


BTW: In a forum somebody asked what the dmesg output on crash looked like i=
f I
enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of output,=
 but
still a fatal system crash:

Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279
Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!
Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung!
Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out


You are receiving this mail because:
  • You are the assignee for the bug.
= --15299680900.F814.5144-- --===============1727039950== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1727039950==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 26 Jun 2018 15:20:45 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1679670518==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 514E86E5AA for ; Tue, 26 Jun 2018 15:20:45 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1679670518== Content-Type: multipart/alternative; boundary="15300264450.b3E37c.31284" Content-Transfer-Encoding: 7bit --15300264450.b3E37c.31284 Date: Tue, 26 Jun 2018 15:20:45 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #8 from Andrey Grodzovsky --- (In reply to dwagner from comment #7) > (In reply to Andrey Grodzovsky from comment #6) > > Verify you are using latest AMD firmware and up to date MESA/LLVM >=20 > Firmware: >=20 > pacman -Q linux-firmware > linux-firmware 20180606.d114732-1 >=20 > ll /usr/lib/firmware/amdgpu/vega10_vce.bin > -rw-r--r-- 1 root root 165344 Jun 7 08:01 > /usr/lib/firmware/amdgpu/vega10_vce.bin >=20 >=20 > MESA: >=20 > pacman -Q mesa > mesa 18.1.2-1 >=20 >=20 > LLVM: > pacman -Q llvm-libs > llvm-libs 6.0.0-4 >=20 > Is this new enough? The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7. The firmware also looks pretty late but I still would advise to manually override all firmware files with files from here https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git= /tree/amdgpu Just backup your existing firmware/amdgpu folder for any case. >=20 >=20 > BTW: In a forum somebody asked what the dmesg output on crash looked like= if > I enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of out= put, > but still a fatal system crash: >=20 > Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* > ring gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279 > Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung! > Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung! > Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! > Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done > [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out > Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies > [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out > Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies > [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out It's a know issue, try the patch I attached to resolve the deadlock , but y= ou will probably experience other failures after that anyway.=20 Andrey --=20 You are receiving this mail because: You are the assignee for the bug.= --15300264450.b3E37c.31284 Date: Tue, 26 Jun 2018 15:20:45 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 8 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #7)
> (In reply to Andrey Grodzovsky from comment #6)
> > Verify you are using latest AMD firmware and up to date MESA/LLVM
>=20
> Firmware:
>=20
> pacman -Q linux-firmware
> linux-firmware 20180606.d114732-1
>=20
> ll  /usr/lib/firmware/amdgpu/vega10_vce.bin
> -rw-r--r-- 1 root root 165344 Jun  7 08:01
> /usr/lib/firmware/amdgpu/vega10_vce.bin
>=20
>=20
> MESA:
>=20
> pacman -Q mesa
> mesa 18.1.2-1
>=20
>=20
> LLVM:
> pacman -Q llvm-libs
> llvm-libs 6.0.0-4
>=20
> Is this new enough?

The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7.
The firmware also looks pretty late but I still would advise to manually
override all firmware files with files from here
https://git.kernel.org/pub/scm/linux/kernel/git/fi=
rmware/linux-firmware.git/tree/amdgpu
Just backup your existing firmware/amdgpu folder for any case.

>=20
>=20
> BTW: In a forum somebody asked what the dmesg output on crash looked l=
ike if
> I enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of =
output,
> but still a fatal system crash:
>=20
> Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERRO=
R*
> ring gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependen=
cies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependen=
cies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out

It's a know issue, try the patch I attached to resolve the deadlock , but y=
ou
will probably experience other failures after that anyway.=20

Andrey


You are receiving this mail because:
  • You are the assignee for the bug.
= --15300264450.b3E37c.31284-- --===============1679670518== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1679670518==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 26 Jun 2018 15:21:27 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1017897290==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id D911F6E5AF for ; Tue, 26 Jun 2018 15:21:27 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1017897290== Content-Type: multipart/alternative; boundary="15300264872.ef13be83f.31442" Content-Transfer-Encoding: 7bit --15300264872.ef13be83f.31442 Date: Tue, 26 Jun 2018 15:21:27 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #9 from Andrey Grodzovsky --- Created attachment 140345 --> https://bugs.freedesktop.org/attachment.cgi?id=3D140345&action=3Dedit Deadlock fix --=20 You are receiving this mail because: You are the assignee for the bug.= --15300264872.ef13be83f.31442 Date: Tue, 26 Jun 2018 15:21:27 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated


You are receiving this mail because:
  • You are the assignee for the bug.
= --15300264872.ef13be83f.31442-- --===============1017897290== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1017897290==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 26 Jun 2018 22:52:22 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1806198737==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 1A3A06E684 for ; Tue, 26 Jun 2018 22:52:22 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1806198737== Content-Type: multipart/alternative; boundary="15300535420.c3114b8.21728" Content-Transfer-Encoding: 7bit --15300535420.c3114b8.21728 Date: Tue, 26 Jun 2018 22:52:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #10 from dwagner --- (In reply to Andrey Grodzovsky from comment #8) > The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7. LLVM 7 has not been released, and replacing LLVM 6 with the current subvers= ion head of LLVM 7 means to basically recompile and reinstall half of the opera= ting system (starting at radeonsi, then Xorg, then its dependencies...) I'm fine with using experimental new kernels to find a more stable amdgpu driver - but if a kernel driver crashes just because some user-space application (X11) utilizes a wrong compiler version at run time, then some = part of the driver design is very wrong.=20 > The firmware also looks pretty late but I still would advise to manually > override all firmware files with files from here > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.g= it/ > tree/amdgpu I did a "diff -r" on the git files with the ones installed by Arch, they are all binary identical. > > Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependenc= ies > > [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out >=20 > It's a know issue, try the patch I attached to resolve the deadlock , but > you will probably experience other failures after that anyway.=20 Ok, thanks for the patch, will try this next time I compile a new kernel. --=20 You are receiving this mail because: You are the assignee for the bug.= --15300535420.c3114b8.21728 Date: Tue, 26 Jun 2018 22:52:22 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 10 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #8)
> The kernel and MESA seems new enough, LLVM is 6 =
so maybe you should try 7.

LLVM 7 has not been released, and replacing LLVM 6 with the current subvers=
ion
head of LLVM 7 means to basically recompile and reinstall half of the opera=
ting
system (starting at radeonsi, then Xorg, then its dependencies...)

I'm fine with using experimental new kernels to find a more stable amdgpu
driver - but if a kernel driver crashes just because some user-space
application (X11) utilizes a wrong compiler version at run time, then some =
part
of the driver design is very wrong.=20

> The firmware also looks pretty late but I still =
would advise to manually
> override all firmware files with files from here
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware=
/linux-firmware.git/
> tree/amdgpu

I did a "diff -r" on the git files with the ones installed by Arc=
h, they are
all binary identical.

> > Jun 26 00:50:25 ryzen kernel: [drm:drm_atom=
ic_helper_wait_for_dependencies
> > [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out
>=20
> It's a know issue, try the patch I attached to resolve the deadlock , =
but
> you will probably experience other failures after that anyway. 

Ok, thanks for the patch, will try this next time I compile a new kernel.
        


You are receiving this mail because:
  • You are the assignee for the bug.
= --15300535420.c3114b8.21728-- --===============1806198737== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1806198737==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 27 Jun 2018 07:48:45 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1610626640==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 4640289CC4 for ; Wed, 27 Jun 2018 07:48:45 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1610626640== Content-Type: multipart/alternative; boundary="15300857251.D12044B3c.31595" Content-Transfer-Encoding: 7bit --15300857251.D12044B3c.31595 Date: Wed, 27 Jun 2018 07:48:45 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #11 from Michel D=C3=A4nzer --- (In reply to Andrey Grodzovsky from comment #8) > The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7. LLVM 6 is fine. --=20 You are receiving this mail because: You are the assignee for the bug.= --15300857251.D12044B3c.31595 Date: Wed, 27 Jun 2018 07:48:45 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 11 on bug 10232= 2 from Michel D=C3=A4nzer
(In reply to Andrey Grodzovsky from comment #8)
> The kernel and MESA seems new enough, LLVM is 6 =
so maybe you should try 7.

LLVM 6 is fine.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15300857251.D12044B3c.31595-- --===============1610626640== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1610626640==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 27 Jun 2018 13:53:37 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1153255996==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 718316E954 for ; Wed, 27 Jun 2018 13:53:37 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1153255996== Content-Type: multipart/alternative; boundary="15301076171.D0C11.17877" Content-Transfer-Encoding: 7bit --15301076171.D0C11.17877 Date: Wed, 27 Jun 2018 13:53:37 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #12 from Andrey Grodzovsky --- (In reply to dwagner from comment #2) > Just to mention this once again: These system crashes still occur, and way > too frequently to consider the amdgpu driver stable enough for profession= al > use. Sample dmesg output from today: >=20 > Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeo= ut, > last signaled seq=3D5430589, last emitted seq=3D5430591 > Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung! > Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung! > Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > timeout, last signaled seq=3D185928, last emitted seq=3D185930 > Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung! > Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung! > Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* > [CRTC:43:crtc-0] hw_done or flip_done timed out Can you load the kernel with grub command line amdgpu.vm_update_mode=3D3 to= force CPU VM update mode and see if this helps ? --=20 You are receiving this mail because: You are the assignee for the bug.= --15301076171.D0C11.17877 Date: Wed, 27 Jun 2018 13:53:37 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 12 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #2)
> Just to mention this once again: These system cr=
ashes still occur, and way
> too frequently to consider the amdgpu driver stable enough for profess=
ional
> use. Sample dmesg output from today:
>=20
> Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx ti=
meout,
> last signaled seq=3D5430589, last emitted seq=3D5430591
> Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung!
> Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung!
> Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, last signaled seq=3D185928, last emitted seq=3D185930
> Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung!
> Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung!
> Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
> [CRTC:43:crtc-0] hw_done or flip_done timed out

Can you load the kernel with grub command line amdgpu.vm_update_mode=3D3 to=
 force
CPU VM update mode and see if this helps ?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301076171.D0C11.17877-- --===============1153255996== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1153255996==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 27 Jun 2018 23:15:48 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0029992141==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F30816E94E for ; Wed, 27 Jun 2018 23:15:48 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0029992141== Content-Type: multipart/alternative; boundary="15301413480.E6c70.18100" Content-Transfer-Encoding: 7bit --15301413480.E6c70.18100 Date: Wed, 27 Jun 2018 23:15:48 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #13 from dwagner --- (In reply to Andrey Grodzovsky from comment #12) > Can you load the kernel with grub command line amdgpu.vm_update_mode=3D3 = to > force CPU VM update mode and see if this helps ? Sure. Too early yet to say "hurray", but at an uptime of one hour, currentl= y, 4.17.2 survived with amdgpu.vm_update_mode=3D3 already about 20 times longe= r than without that option before the first crash. One (probably just informal) message is emitted by the kernel: [ 19.319565] CPU update of VM recommended only for large BAR system Can you explain a little: What is a "large BAR system", and what does the vm_update_mode=3D3 option actually cause? Should I expect any weird side ef= fects to look for? BTW: Not a result of that option, but of the kernel version, seems to be the fact that the shader clock keeps at a pretty high frequency all the time - = even without any 3d or compute load, just displaying a quiet 4k/60Hz desktop ima= ge: cat pp_dpm_sclk 0: 214Mhz=20 1: 481Mhz=20 2: 760Mhz=20 3: 1020Mhz=20 4: 1102Mhz=20 5: 1138Mhz=20 6: 1180Mhz * 7: 1220Mhz=20 Much lower shader clocks are used only if I lower the refresh rate of the screen. Is there a reason why the shader clocks should stay high even in the absence of 3d/compute load? (I would have better understood if the minimum memory clock was depending on the refresh rate, but memory clock stays as low as with the older kernels.) --=20 You are receiving this mail because: You are the assignee for the bug.= --15301413480.E6c70.18100 Date: Wed, 27 Jun 2018 23:15:48 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 13 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #12)
> Can you load the kernel with grub command line a=
mdgpu.vm_update_mode=3D3 to
> force CPU VM update mode and see if this helps ?

Sure. Too early yet to say "hurray", but at an uptime of one hour=
, currently,
4.17.2 survived with amdgpu.vm_update_mode=3D3 already about 20 times longe=
r than
without that option before the first crash.

One (probably just informal) message is emitted by the kernel:
[   19.319565] CPU update of VM recommended only for large BAR system

Can you explain a little: What is a "large BAR system", and what =
does the
vm_update_mode=3D3 option actually cause? Should I expect any weird side ef=
fects
to look for?


BTW: Not a result of that option, but of the kernel version, seems to be the
fact that the shader clock keeps at a pretty high frequency all the time - =
even
without any 3d or compute load, just displaying a quiet 4k/60Hz desktop ima=
ge:

cat pp_dpm_sclk
0: 214Mhz=20
1: 481Mhz=20
2: 760Mhz=20
3: 1020Mhz=20
4: 1102Mhz=20
5: 1138Mhz=20
6: 1180Mhz *
7: 1220Mhz=20

Much lower shader clocks are used only if I lower the refresh rate of the
screen. Is there a reason why the shader clocks should stay high even in the
absence of 3d/compute load?

(I would have better understood if the minimum memory clock was depending on
the refresh rate, but memory clock stays as low as with the older kernels.)=


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301413480.E6c70.18100-- --===============0029992141== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0029992141==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 02:17:57 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1162349487==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 61F356EBBE for ; Thu, 28 Jun 2018 02:17:58 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1162349487== Content-Type: multipart/alternative; boundary="15301522771.CdAA6Bc8.28142" Content-Transfer-Encoding: 7bit --15301522771.CdAA6Bc8.28142 Date: Thu, 28 Jun 2018 02:17:57 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #14 from Alex Deucher --- (In reply to dwagner from comment #13) >=20 > Much lower shader clocks are used only if I lower the refresh rate of the > screen. Is there a reason why the shader clocks should stay high even in = the > absence of 3d/compute load? >=20 Certain display requirements can cause the engine clock to be kept higher as well. --=20 You are receiving this mail because: You are the assignee for the bug.= --15301522771.CdAA6Bc8.28142 Date: Thu, 28 Jun 2018 02:17:57 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 14 on bug 10232= 2 from Alex Deucher
(In reply to dwagner from comment #13)
>=20
> Much lower shader clocks are used only if I lower the refresh rate of =
the
> screen. Is there a reason why the shader clocks should stay high even =
in the
> absence of 3d/compute load?
> 

Certain display requirements can cause the engine clock to be kept higher as
well.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301522771.CdAA6Bc8.28142-- --===============1162349487== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1162349487==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 04:17:19 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0306489572==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id BA6566E704 for ; Thu, 28 Jun 2018 04:17:19 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0306489572== Content-Type: multipart/alternative; boundary="15301594392.DABD003B.877" Content-Transfer-Encoding: 7bit --15301594392.DABD003B.877 Date: Thu, 28 Jun 2018 04:17:19 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #15 from Andrey Grodzovsky --- (In reply to dwagner from comment #13) > (In reply to Andrey Grodzovsky from comment #12) > > Can you load the kernel with grub command line amdgpu.vm_update_mode=3D= 3 to > > force CPU VM update mode and see if this helps ? >=20 > Sure. Too early yet to say "hurray", but at an uptime of one hour, > currently, 4.17.2 survived with amdgpu.vm_update_mode=3D3 already about 20 > times longer than without that option before the first crash. >=20 > One (probably just informal) message is emitted by the kernel: > [ 19.319565] CPU update of VM recommended only for large BAR system >=20 > Can you explain a little: What is a "large BAR system", and what does the > vm_update_mode=3D3 option actually cause? Should I expect any weird side > effects to look for? I think it just means systems with large VRAM so it will require large BAR = for mapping. But I am not sure on that point. vm_update_mode=3D3 means GPUVM page tables update is done using CPU. By def= ault we do it using DMA engine on the ASIC. The log showed a hang in this engine= so I assumed there is something wrong with SDMA commands we submit. I assume more CPU utilization as a side effect and maybe slower rendering. >=20 >=20 > BTW: Not a result of that option, but of the kernel version, seems to be = the > fact that the shader clock keeps at a pretty high frequency all the time - > even without any 3d or compute load, just displaying a quiet 4k/60Hz desk= top > image: >=20 > cat pp_dpm_sclk > 0: 214Mhz=20 > 1: 481Mhz=20 > 2: 760Mhz=20 > 3: 1020Mhz=20 > 4: 1102Mhz=20 > 5: 1138Mhz=20 > 6: 1180Mhz * > 7: 1220Mhz=20 >=20 > Much lower shader clocks are used only if I lower the refresh rate of the > screen. Is there a reason why the shader clocks should stay high even in = the > absence of 3d/compute load? >=20 > (I would have better understood if the minimum memory clock was depending= on > the refresh rate, but memory clock stays as low as with the older kernels= .) --=20 You are receiving this mail because: You are the assignee for the bug.= --15301594392.DABD003B.877 Date: Thu, 28 Jun 2018 04:17:19 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 15 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #13)
> (In reply to Andrey Grodzovsky from comment #12)
> > Can you load the kernel with grub command line amdgpu.vm_update_m=
ode=3D3 to
> > force CPU VM update mode and see if this helps ?
>=20
> Sure. Too early yet to say "hurray", but at an uptime of one=
 hour,
> currently, 4.17.2 survived with amdgpu.vm_update_mode=3D3 already abou=
t 20
> times longer than without that option before the first crash.
>=20
> One (probably just informal) message is emitted by the kernel:
> [   19.319565] CPU update of VM recommended only for large BAR system
>=20
> Can you explain a little: What is a "large BAR system", and =
what does the
> vm_update_mode=3D3 option actually cause? Should I expect any weird si=
de
> effects to look for?

I think it just means systems with large VRAM so it will require large BAR =
for
mapping. But I am not sure on that point.
vm_update_mode=3D3 means GPUVM page tables update is done using CPU. By def=
ault
we do it using DMA engine on the ASIC. The log showed a hang in this engine=
 so
I assumed there is something wrong with SDMA commands we submit.
I assume more CPU utilization as a side effect and maybe slower rendering.

>=20
>=20
> BTW: Not a result of that option, but of the kernel version, seems to =
be the
> fact that the shader clock keeps at a pretty high frequency all the ti=
me -
> even without any 3d or compute load, just displaying a quiet 4k/60Hz d=
esktop
> image:
>=20
> cat pp_dpm_sclk
> 0: 214Mhz=20
> 1: 481Mhz=20
> 2: 760Mhz=20
> 3: 1020Mhz=20
> 4: 1102Mhz=20
> 5: 1138Mhz=20
> 6: 1180Mhz *
> 7: 1220Mhz=20
>=20
> Much lower shader clocks are used only if I lower the refresh rate of =
the
> screen. Is there a reason why the shader clocks should stay high even =
in the
> absence of 3d/compute load?
>=20
> (I would have better understood if the minimum memory clock was depend=
ing on
> the refresh rate, but memory clock stays as low as with the older kern=
els.)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301594392.DABD003B.877-- --===============0306489572== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0306489572==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 04:36:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0066558637==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id A226E6EB94 for ; Thu, 28 Jun 2018 04:36:41 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0066558637== Content-Type: multipart/alternative; boundary="15301606012.aCEa71.6400" Content-Transfer-Encoding: 7bit --15301606012.aCEa71.6400 Date: Thu, 28 Jun 2018 04:36:41 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #16 from Alex Deucher --- (In reply to Andrey Grodzovsky from comment #15) > I think it just means systems with large VRAM so it will require large BAR > for mapping. But I am not sure on that point. That's correct. the updates are done with the CPU rather than the GPU (SDM= A).=20 The default BAR size on most systems is usually 256MB for 32 bit compatibil= ity so the window for CPU access to vram (where the page tables live) is limite= d. --=20 You are receiving this mail because: You are the assignee for the bug.= --15301606012.aCEa71.6400 Date: Thu, 28 Jun 2018 04:36:41 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 16 on bug 10232= 2 from Alex Deucher
(In reply to Andrey Grodzovsky from comment #15)
> I think it just means systems with large VRAM so=
 it will require large BAR
> for mapping. But I am not sure on that point.

That's correct.  the updates are done with the CPU rather than the GPU (SDM=
A).=20
The default BAR size on most systems is usually 256MB for 32 bit compatibil=
ity
so the window for CPU access to vram (where the page tables live) is limite=
d.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301606012.aCEa71.6400-- --===============0066558637== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0066558637==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 10:33:22 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0591171573==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 4020D6ECD3 for ; Thu, 28 Jun 2018 10:33:22 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0591171573== Content-Type: multipart/alternative; boundary="15301820021.973c.3907" Content-Transfer-Encoding: 7bit --15301820021.973c.3907 Date: Thu, 28 Jun 2018 10:33:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #17 from Andrey Grodzovsky --- (In reply to Alex Deucher from comment #16) > (In reply to Andrey Grodzovsky from comment #15) > > I think it just means systems with large VRAM so it will require large = BAR > > for mapping. But I am not sure on that point. >=20 > That's correct. the updates are done with the CPU rather than the GPU > (SDMA). The default BAR size on most systems is usually 256MB for 32 bit > compatibility so the window for CPU access to vram (where the page tables > live) is limited. Thanks Alex. dwagner, this is obviously just a work around and not a fix. It points to s= ome problem with SDMA packets, if you want to continue exploring we can try to = dump some fence traces and SDMA HW ring content to examine the latest packets be= fore the hang happened. --=20 You are receiving this mail because: You are the assignee for the bug.= --15301820021.973c.3907 Date: Thu, 28 Jun 2018 10:33:22 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 17 on bug 10232= 2 from Andrey Grodzovsky
(In reply to Alex Deucher from comment #16)
> (In reply to Andrey Grodzovsky from comment #15)
> > I think it just means systems with large VRAM so it will require =
large BAR
> > for mapping. But I am not sure on that point.
>=20
> That's correct.  the updates are done with the CPU rather than the GPU
> (SDMA).  The default BAR size on most systems is usually 256MB for 32 =
bit
> compatibility so the window for CPU access to vram (where the page tab=
les
> live) is limited.

Thanks Alex.

dwagner, this is obviously just a work around and not a fix. It points to s=
ome
problem with SDMA packets, if you want to continue exploring we can try to =
dump
some fence traces and SDMA HW ring content to examine the latest packets be=
fore
the hang happened.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15301820021.973c.3907-- --===============0591171573== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0591171573==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 19:56:46 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1105465556==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id A89916EE99 for ; Thu, 28 Jun 2018 19:56:46 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1105465556== Content-Type: multipart/alternative; boundary="15302158061.A79a.26489" Content-Transfer-Encoding: 7bit --15302158061.A79a.26489 Date: Thu, 28 Jun 2018 19:56:46 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #18 from dwagner --- The good news: So far no crashes during normal uptime with amdgpu.vm_update_mode=3D3 The bad news: System crashes immediately upon S3 resume (with messages quite different from the ones I saw with earlier S3-resume crashes) - I filed bug report https://bugs.freedesktop.org/show_bug.cgi?id=3D107065 on this. (In reply to Andrey Grodzovsky from comment #17) > dwagner, this is obviously just a work around and not a fix. It points to > some problem with SDMA packets, if you want to continue exploring we can = try > to dump some fence traces and SDMA HW ring content to examine the latest > packets before the hang happened. If you can include some debug output into "amd-staging-drm-next" that helps finding the root cause, I might be able to provide some output - if the ker= nel survives long enough after the crash to write the system journal - this has= not always been the case. --=20 You are receiving this mail because: You are the assignee for the bug.= --15302158061.A79a.26489 Date: Thu, 28 Jun 2018 19:56:46 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 18 on bug 10232= 2 from dwagner
The good news: So far no crashes during normal uptime with
amdgpu.vm_update_mode=3D3

The bad news: System crashes immediately upon S3 resume (with messages quite
different from the ones I saw with earlier S3-resume crashes) - I filed bug
report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107065 on this.

(In reply to Andrey Grodzovsky from comment #17)
> dwagner, this is obviously just a work around an=
d not a fix. It points to
> some problem with SDMA packets, if you want to continue exploring we c=
an try
> to dump some fence traces and SDMA HW ring content to examine the late=
st
> packets before the hang happened.

If you can include some debug output into "amd-staging-drm-next" =
that helps
finding the root cause, I might be able to provide some output - if the ker=
nel
survives long enough after the crash to write the system journal - this has=
 not
always been the case.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302158061.A79a.26489-- --===============1105465556== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1105465556==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 21:09:09 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1963752947==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 09CAA6E012 for ; Thu, 28 Jun 2018 21:09:09 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1963752947== Content-Type: multipart/alternative; boundary="15302201480.2d5EDF30E.17675" Content-Transfer-Encoding: 7bit --15302201480.2d5EDF30E.17675 Date: Thu, 28 Jun 2018 21:09:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #19 from Andrey Grodzovsky --- Can you use addr2line or gdb with 'list' command to give the line number matching (In reply to dwagner from comment #18) > The good news: So far no crashes during normal uptime with > amdgpu.vm_update_mode=3D3 >=20 > The bad news: System crashes immediately upon S3 resume (with messages qu= ite > different from the ones I saw with earlier S3-resume crashes) - I filed b= ug > report https://bugs.freedesktop.org/show_bug.cgi?id=3D107065 on this. >=20 > (In reply to Andrey Grodzovsky from comment #17) > > dwagner, this is obviously just a work around and not a fix. It points = to > > some problem with SDMA packets, if you want to continue exploring we ca= n try > > to dump some fence traces and SDMA HW ring content to examine the latest > > packets before the hang happened. >=20 > If you can include some debug output into "amd-staging-drm-next" that hel= ps > finding the root cause, I might be able to provide some output - if the > kernel survives long enough after the crash to write the system journal - > this has not always been the case. No need to recompile, just need to see what is the content of SDMA ring buf= fer when the hang occurs. Clone and build our register analyzer from here - https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run=20 sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.] I will probably need more info later but let's try this first. --=20 You are receiving this mail because: You are the assignee for the bug.= --15302201480.2d5EDF30E.17675 Date: Thu, 28 Jun 2018 21:09:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 19 on bug 10232= 2 from Andrey Grodzovsky
Can you use addr2line or gdb with 'list' command to give the l=
ine number
matching (In reply to dwagner from comment #18)
> The good news: So far no crashes during normal u=
ptime with
> amdgpu.vm_update_mode=3D3
>=20
> The bad news: System crashes immediately upon S3 resume (with messages=
 quite
> different from the ones I saw with earlier S3-resume crashes) - I file=
d bug
> report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107065 on this.
>=20
> (In reply to Andrey Grodzovsky from comment #17)
> > dwagner, this is obviously just a work around and not a fix. It p=
oints to
> > some problem with SDMA packets, if you want to continue exploring=
 we can try
> > to dump some fence traces and SDMA HW ring content to examine the=
 latest
> > packets before the hang happened.
>=20
> If you can include some debug output into "amd-staging-drm-next&q=
uot; that helps
> finding the root cause, I might be able to provide some output - if the
> kernel survives long enough after the crash to write the system journa=
l -
> this has not always been the case.

No need to recompile, just need to see what is the content of SDMA ring buf=
fer
when the hang occurs.

Clone and build our register analyzer from here -
https://cgit.freedesktop.=
org/amd/umr/ and once the hang happens just run=20

sudo umr -lb
sudo umr -R gfx[.]
sudo umr -R sdma0[.]
sudo umr -R sdma1[.]

I will probably need more info later but let's try this first.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302201480.2d5EDF30E.17675-- --===============1963752947== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1963752947==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 22:56:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0123080715==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 8ED546EDCC for ; Thu, 28 Jun 2018 22:56:03 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0123080715== Content-Type: multipart/alternative; boundary="15302265631.5f3F8A1.16683" Content-Transfer-Encoding: 7bit --15302265631.5f3F8A1.16683 Date: Thu, 28 Jun 2018 22:56:03 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #20 from dwagner --- (In reply to Andrey Grodzovsky from comment #19) > No need to recompile, just need to see what is the content of SDMA ring > buffer when the hang occurs. >=20 > Clone and build our register analyzer from here - > https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run= =20 >=20 > sudo umr -lb > sudo umr -R gfx[.] > sudo umr -R sdma0[.] > sudo umr -R sdma1[.] >=20 > I will probably need more info later but let's try this first. How can I run "umr" on a crashed system? I guess those register values are retained over a press of the reset button / reboot? --=20 You are receiving this mail because: You are the assignee for the bug.= --15302265631.5f3F8A1.16683 Date: Thu, 28 Jun 2018 22:56:03 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 20 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #19)
> No need to recompile, just need to see what is t=
he content of SDMA ring
> buffer when the hang occurs.
>=20
> Clone and build our register analyzer from here -
> https://cgit.freedes=
ktop.org/amd/umr/ and once the hang happens just run=20
>=20
> sudo umr -lb
> sudo umr -R gfx[.]
> sudo umr -R sdma0[.]
> sudo umr -R sdma1[.]
>=20
> I will probably need more info later but let's try this first.

How can I run "umr" on a crashed system? I guess those register v=
alues are
retained over a press of the reset button / reboot?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302265631.5f3F8A1.16683-- --===============0123080715== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0123080715==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 28 Jun 2018 22:57:21 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0560790392==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id AA8476EE11 for ; Thu, 28 Jun 2018 22:57:21 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0560790392== Content-Type: multipart/alternative; boundary="15302266411.3bde2.18990" Content-Transfer-Encoding: 7bit --15302266411.3bde2.18990 Date: Thu, 28 Jun 2018 22:57:21 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #21 from dwagner --- (I meant to write "I guess those register values are NOT retained over a reboot, right?") --=20 You are receiving this mail because: You are the assignee for the bug.= --15302266411.3bde2.18990 Date: Thu, 28 Jun 2018 22:57:21 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 21 on bug 10232= 2 from dwagner
(I meant to write "I guess those register values are NOT =
retained over a
reboot, right?")


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302266411.3bde2.18990-- --===============0560790392== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0560790392==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Fri, 29 Jun 2018 00:10:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1045635942==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id E39726EEE5 for ; Fri, 29 Jun 2018 00:10:02 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1045635942== Content-Type: multipart/alternative; boundary="15302310021.CA92a.27477" Content-Transfer-Encoding: 7bit --15302310021.CA92a.27477 Date: Fri, 29 Jun 2018 00:10:02 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #22 from Andrey Grodzovsky --- (In reply to dwagner from comment #21) > (I meant to write "I guess those register values are NOT retained over a > reboot, right?") Yes, my assumption was that at least some times you still have SSH access to the system in those cases. --=20 You are receiving this mail because: You are the assignee for the bug.= --15302310021.CA92a.27477 Date: Fri, 29 Jun 2018 00:10:02 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 22 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #21)
> (I meant to write "I guess those register v=
alues are NOT retained over a
> reboot, right?")

Yes, my assumption was that at least some times you still have SSH access to
the system in those cases.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15302310021.CA92a.27477-- --===============1045635942== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1045635942==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 04 Jul 2018 23:03:36 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2082978059==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 57CFD6E772 for ; Wed, 4 Jul 2018 23:03:37 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2082978059== Content-Type: multipart/alternative; boundary="15307454170.F4A896De1.31304" Content-Transfer-Encoding: 7bit --15307454170.F4A896De1.31304 Date: Wed, 4 Jul 2018 23:03:37 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #23 from dwagner --- Just for the record: At this point, I can say that with amggpu.vm_update_mode=3D3 4.17.2-ARCH runs at least for hours, not only the minutes it runs without this option before crashing. I cannot, however, say that above combination reaches the some-days-between-amdgpu-crashes uptimes that 4.13.x reached - in order to be able to test this, I would need S3 resumes to work, which is subject to bug report 107065. Without working S3 resumes, there is no way for me to test longer uptimes because amdgpu consistently crashes (in any version I know of) if I just let the system run but switch off the display, and I do not want to keep the connected 4k TV switched on all day and night. --=20 You are receiving this mail because: You are the assignee for the bug.= --15307454170.F4A896De1.31304 Date: Wed, 4 Jul 2018 23:03:37 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 23 on bug 10232= 2 from dwagner
Just for the record: At this point, I can say that with
amggpu.vm_update_mode=3D3 4.17.2-ARCH runs at least for hours,
not only the minutes it runs without this option before crashing.

I cannot, however, say that above combination reaches the
some-days-between-amdgpu-crashes uptimes that 4.13.x reached -
in order to be able to test this, I would need S3 resumes to work,
which is subject to bug report 107065.

Without working S3 resumes, there is no way for me to test longer
uptimes because amdgpu consistently crashes (in any version I know
of) if I just let the system run but switch off the display, and I do
not want to keep the connected 4k TV switched on all day and night.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15307454170.F4A896De1.31304-- --===============2082978059== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2082978059==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 05 Jul 2018 13:59:56 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0748250048==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id E30666EDC8 for ; Thu, 5 Jul 2018 13:59:56 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0748250048== Content-Type: multipart/alternative; boundary="15307991962.177C3D.11925" Content-Transfer-Encoding: 7bit --15307991962.177C3D.11925 Date: Thu, 5 Jul 2018 13:59:56 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #24 from Michel D=C3=A4nzer --- Can you try bisecting between 4.13 and 4.17 to find where stability went downhill for you? --=20 You are receiving this mail because: You are the assignee for the bug.= --15307991962.177C3D.11925 Date: Thu, 5 Jul 2018 13:59:56 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 24 on bug 10232= 2 from Michel D=C3=A4nzer
Can you try bisecting between 4.13 and 4.17 to find where stab=
ility went
downhill for you?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15307991962.177C3D.11925-- --===============0748250048== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0748250048==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 05 Jul 2018 23:32:43 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1332824484==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 778646E62D for ; Thu, 5 Jul 2018 23:32:43 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1332824484== Content-Type: multipart/alternative; boundary="15308335631.9Aff4aDd7.8367" Content-Transfer-Encoding: 7bit --15308335631.9Aff4aDd7.8367 Date: Thu, 5 Jul 2018 23:32:43 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #25 from dwagner --- (In reply to Michel D=C3=A4nzer from comment #24) > Can you try bisecting between 4.13 and 4.17 to find where stability went > downhill for you? A bisect like that is not likely to converge in any reasonable time, given = the stochastic nature of those crashes. While the mean-time-between-driver-crashes is dramatically different, there will be occasions on which 4.13 will crash early enough to yield a false "b= ad", and there will be occasions on which 4.17 is lasting like the 20 minutes or= so to assume a false "good". What about the multitude of debug-options - isn't there one that could allow for some more insight on when/why the driver crashes? --=20 You are receiving this mail because: You are the assignee for the bug.= --15308335631.9Aff4aDd7.8367 Date: Thu, 5 Jul 2018 23:32:43 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 25 on bug 10232= 2 from dwagner
(In reply to Michel D=C3=A4nzer from comment #24)
> Can you try bisecting between 4.13 and 4.17 to f=
ind where stability went
> downhill for you?

A bisect like that is not likely to converge in any reasonable time, given =
the
stochastic nature of those crashes.

While the mean-time-between-driver-crashes is dramatically different, there
will be occasions on which 4.13 will crash early enough to yield a false &q=
uot;bad",
and there will be occasions on which 4.17 is lasting like the 20 minutes or=
 so
to assume a false "good".

What about the multitude of debug-options - isn't there one that could allow
for some more insight on when/why the driver crashes?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15308335631.9Aff4aDd7.8367-- --===============1332824484== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1332824484==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Fri, 06 Jul 2018 23:20:20 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1631067309==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 2D65089079 for ; Fri, 6 Jul 2018 23:20:20 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1631067309== Content-Type: multipart/alternative; boundary="15309192200.6Fc2dCe.19306" Content-Transfer-Encoding: 7bit --15309192200.6Fc2dCe.19306 Date: Fri, 6 Jul 2018 23:20:20 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #26 from dwagner --- Today for the first time I had a sudden "crash while just browsing with Firefox" while using the amggpu.vm_update_mode=3D3 parameter with the current-as-of-today amd-staging-drm-next (bb2e406ba66c2573b68e609e148cab57b1447095) with patch=20 https://bugs.freedesktop.org/attachment.cgi?id=3D140418 applied on top. Different kernel messages than with previous crashed of this kind were emit= ted: Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146 0x0c80440c Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100190 Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20 VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7, pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68) Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, last signaled seq=3D75244, last emitted seq=3D75245 Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! Hope this helps somehow. --=20 You are receiving this mail because: You are the assignee for the bug.= --15309192200.6Fc2dCe.19306 Date: Fri, 6 Jul 2018 23:20:20 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 26 on bug 10232= 2 from dwagner
Today for the first time I had a sudden "crash while just=
 browsing with
Firefox" while using the amggpu.vm_update_mode=3D3 parameter with the
current-as-of-today amd-staging-drm-next
(bb2e406ba66c2573b68e609e148cab57b1447095) with patch=20
https:/=
/bugs.freedesktop.org/attachment.cgi?id=3D140418 applied on top.

Different kernel messages than with previous crashed of this kind were emit=
ted:

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c80440c
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:=20=20
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)
Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, last signaled seq=3D75244, last emitted seq=3D75245
Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!

Hope this helps somehow.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309192200.6Fc2dCe.19306-- --===============1631067309== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1631067309==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 07 Jul 2018 08:36:28 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2093848840==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F0AC96E18D for ; Sat, 7 Jul 2018 08:36:27 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2093848840== Content-Type: multipart/alternative; boundary="15309525871.E51DD.18069" Content-Transfer-Encoding: 7bit --15309525871.E51DD.18069 Date: Sat, 7 Jul 2018 08:36:27 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #27 from Michel D=C3=A4nzer --- (In reply to dwagner from comment #26) > Today for the first time I had a sudden "crash while just browsing with > Firefox" [...] That could be a Mesa issue, anyway it should probably be tracked separately from this report. --=20 You are receiving this mail because: You are the assignee for the bug.= --15309525871.E51DD.18069 Date: Sat, 7 Jul 2018 08:36:27 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 27 on bug 10232= 2 from Michel D=C3=A4nzer
(In reply to dwagner from comment #26)
> Today for the first time I had a sudden "cr=
ash while just browsing with
> Firefox" [...]

That could be a Mesa issue, anyway it should probably be tracked separately
from this report.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309525871.E51DD.18069-- --===============2093848840== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2093848840==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 07 Jul 2018 20:08:40 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1276847513==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 9B57A6E33B for ; Sat, 7 Jul 2018 20:08:40 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1276847513== Content-Type: multipart/alternative; boundary="15309941201.bbb1.4485" Content-Transfer-Encoding: 7bit --15309941201.bbb1.4485 Date: Sat, 7 Jul 2018 20:08:40 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #28 from dwagner --- (In reply to Michel D=C3=A4nzer from comment #27) > That could be a Mesa issue, anyway it should probably be tracked separate= ly > from this report. Created separate bug report https://bugs.freedesktop.org/show_bug.cgi?id=3D= 107152 (If that is a Mesa issue, no more than user processes / X11 should have cra= shed - but not the kernel amdgpu driver... right?) --=20 You are receiving this mail because: You are the assignee for the bug.= --15309941201.bbb1.4485 Date: Sat, 7 Jul 2018 20:08:40 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 28 on bug 10232= 2 from dwagner
(In reply to Michel D=C3=A4nzer from comment #27)
> That could be a Mesa issue, anyway it should pro=
bably be tracked separately
> from this report.

Created separate bug report https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107152

(If that is a Mesa issue, no more than user processes / X11 should have cra=
shed
- but not the kernel amdgpu driver... right?)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15309941201.bbb1.4485-- --===============1276847513== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1276847513==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 09 Jul 2018 14:34:51 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1743426101==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 4782A6E4F7 for ; Mon, 9 Jul 2018 14:34:51 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1743426101== Content-Type: multipart/alternative; boundary="15311468911.8d6408.23138" Content-Transfer-Encoding: 7bit --15311468911.8d6408.23138 Date: Mon, 9 Jul 2018 14:34:51 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #29 from Andrey Grodzovsky --- (In reply to dwagner from comment #28) > (In reply to Michel D=C3=A4nzer from comment #27) > > That could be a Mesa issue, anyway it should probably be tracked separa= tely > > from this report. >=20 > Created separate bug report > https://bugs.freedesktop.org/show_bug.cgi?id=3D107152 >=20 > (If that is a Mesa issue, no more than user processes / X11 should have > crashed - but not the kernel amdgpu driver... right?) Not exactly, MESA could create a bad request (faulty GPU address) which wou= ld lead to this. It can even be triggered on purpose using a debug flag from M= ESA. --=20 You are receiving this mail because: You are the assignee for the bug.= --15311468911.8d6408.23138 Date: Mon, 9 Jul 2018 14:34:51 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 29 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #28)
> (In reply to Michel D=C3=A4nzer from comment #27)
> > That could be a Mesa issue, anyway it should probably be tracked =
separately
> > from this report.
>=20
> Created separate bug report
> https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107152
>=20
> (If that is a Mesa issue, no more than user processes / X11 should have
> crashed - but not the kernel amdgpu driver... right?)

Not exactly, MESA could create a bad request (faulty GPU address) which wou=
ld
lead to this. It can even be triggered on purpose using a debug flag from M=
ESA.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15311468911.8d6408.23138-- --===============1743426101== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1743426101==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 11 Jul 2018 22:32:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0648847305==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id EC21C6EE2D for ; Wed, 11 Jul 2018 22:32:40 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0648847305== Content-Type: multipart/alternative; boundary="15313483602.4f90AaA.25365" Content-Transfer-Encoding: 7bit --15313483602.4f90AaA.25365 Date: Wed, 11 Jul 2018 22:32:40 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #30 from dwagner --- (In reply to Andrey Grodzovsky from comment #29) > > (If that is a Mesa issue, no more than user processes / X11 should have > > crashed - but not the kernel amdgpu driver... right?) >=20 > Not exactly, MESA could create a bad request (faulty GPU address) which > would lead to this. It can even be triggered on purpose using a debug flag > from MESA. My understanding is that all parts of MESA run as user processes, outside of the kernel space. If such code is allowed to pass parameters into kernel functions that make the kernel crash, that would be a veritable security ho= le which attackers could exploit to stage at least denial-of-service attacks, = if not worse. --=20 You are receiving this mail because: You are the assignee for the bug.= --15313483602.4f90AaA.25365 Date: Wed, 11 Jul 2018 22:32:40 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 30 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #29)
> > (If that is a Mesa issue, no more than user=
 processes / X11 should have
> > crashed - but not the kernel amdgpu driver... right?)
>=20
> Not exactly, MESA could create a bad request (faulty GPU address) which
> would lead to this. It can even be triggered on purpose using a debug =
flag
> from MESA.

My understanding is that all parts of MESA run as user processes, outside of
the kernel space. If such code is allowed to pass parameters into kernel
functions that make the kernel crash, that would be a veritable security ho=
le
which attackers could exploit to stage at least denial-of-service attacks, =
if
not worse.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15313483602.4f90AaA.25365-- --===============0648847305== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0648847305==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 15 Jul 2018 08:56:58 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0059962105==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 57A376E05F for ; Sun, 15 Jul 2018 08:56:58 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0059962105== Content-Type: multipart/alternative; boundary="15316450182.f6aAE83.30128" Content-Transfer-Encoding: 7bit --15316450182.f6aAE83.30128 Date: Sun, 15 Jul 2018 08:56:58 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #31 from Doctor --- I got that one too and was able to track the problem down a bit further. Ch= rome and video with the gpu enabled will blow it up too. Interesting I was able = to reproduce it consistantly with my rtl8188eu usb driver plug it in connect a= nd wpa_supplicant will cause it to explode. --=20 You are receiving this mail because: You are the assignee for the bug.= --15316450182.f6aAE83.30128 Date: Sun, 15 Jul 2018 08:56:58 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 31 on bug 10232= 2 from = Doctor
I got that one too and was able to track the problem down a bi=
t further. Chrome
and video with the gpu enabled will blow it up too. Interesting I was able =
to
reproduce it consistantly with my rtl8188eu usb driver plug it in connect a=
nd
wpa_supplicant will cause it to explode.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15316450182.f6aAE83.30128-- --===============0059962105== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0059962105==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 15 Jul 2018 09:03:01 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1295312593==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B47416E07C for ; Sun, 15 Jul 2018 09:03:01 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1295312593== Content-Type: multipart/alternative; boundary="15316453812.1d862.29050" Content-Transfer-Encoding: 7bit --15316453812.1d862.29050 Date: Sun, 15 Jul 2018 09:03:01 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #32 from Doctor --- I ended up due to working on a live dev cd for codexl since all my machines= are memory based and use no magnetic media. Just cherry picking the code back to the last 4.16 and no problems Heres the working 4.16 . I chased this rabbit for awhile and it pops up like the dam wood chuck in caddie shack. Here is the latest as of 11 hours ago 4.19-wip https://github.com/tekcomm/linux-image-4.19-wip-generic Here is the latest as of 11 hours ago 4.16 version from three weeks ago wit= h no woodchucks https://github.com/tekcomm/linux-kernel-amdgpu-binaries --=20 You are receiving this mail because: You are the assignee for the bug.= --15316453812.1d862.29050 Date: Sun, 15 Jul 2018 09:03:01 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 32 on bug 10232= 2 from = Doctor
I ended up due to working on a live dev cd for codexl since al=
l my machines are
memory based and use no magnetic media. Just cherry picking the code back to
the  last 4.16 and no problems Heres the working 4.16 . I chased this rabbit
for awhile and it pops up like the dam wood chuck in caddie shack.


Here is the latest as of 11 hours ago 4.19-wip
https:/=
/github.com/tekcomm/linux-image-4.19-wip-generic


Here is the latest as of 11 hours ago 4.16 version from three weeks ago wit=
h no
woodchucks
https:/=
/github.com/tekcomm/linux-kernel-amdgpu-binaries


You are receiving this mail because:
  • You are the assignee for the bug.
= --15316453812.1d862.29050-- --===============1295312593== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1295312593==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 15 Jul 2018 09:07:08 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0212924524==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id B00776E07D for ; Sun, 15 Jul 2018 09:07:08 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0212924524== Content-Type: multipart/alternative; boundary="15316456282.8cF0E2AF.2393" Content-Transfer-Encoding: 7bit --15316456282.8cF0E2AF.2393 Date: Sun, 15 Jul 2018 09:07:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #33 from Doctor --- I think it may be something as stupid as a var too. --=20 You are receiving this mail because: You are the assignee for the bug.= --15316456282.8cF0E2AF.2393 Date: Sun, 15 Jul 2018 09:07:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 33 on bug 10232= 2 from = Doctor
I think it may be something as stupid as a var too.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15316456282.8cF0E2AF.2393-- --===============0212924524== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0212924524==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 15 Jul 2018 19:59:36 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1668191109==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 46E956E114 for ; Sun, 15 Jul 2018 19:59:36 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1668191109== Content-Type: multipart/alternative; boundary="15316847762.bef545a63.21806" Content-Transfer-Encoding: 7bit --15316847762.bef545a63.21806 Date: Sun, 15 Jul 2018 19:59:36 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #34 from dwagner --- (In reply to Doctor from comment #32) > Just cherry picking the code > back to the last 4.16 and no problems Heres the working 4.16 . I chased > this rabbit for awhile and it pops up like the dam wood chuck in caddie > shack. >=20 > Here is the latest as of 11 hours ago 4.19-wip > https://github.com/tekcomm/linux-image-4.19-wip-generic I am not sure I understand what you are trying to tell us, here. The repository you linked does not seem to contain any relevant commits changing kernel source code. --=20 You are receiving this mail because: You are the assignee for the bug.= --15316847762.bef545a63.21806 Date: Sun, 15 Jul 2018 19:59:36 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 34 on bug 10232= 2 from dwagner
(In reply to Doctor from comment #32)
> Just cherry picking the code
> back to the  last 4.16 and no problems Heres the working 4.16 . I chas=
ed
> this rabbit for awhile and it pops up like the dam wood chuck in caddie
> shack.
>=20
> Here is the latest as of 11 hours ago 4.19-wip
> ht=
tps://github.com/tekcomm/linux-image-4.19-wip-generic

I am not sure I understand what you are trying to tell us, here.

The repository you linked does not seem to contain any relevant commits
changing kernel source code.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15316847762.bef545a63.21806-- --===============1668191109== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1668191109==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 16 Jul 2018 14:06:32 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2119969694==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id D02AF6E19F for ; Mon, 16 Jul 2018 14:06:31 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2119969694== Content-Type: multipart/alternative; boundary="15317499911.Ed11eBB7e.5818" Content-Transfer-Encoding: 7bit --15317499911.Ed11eBB7e.5818 Date: Mon, 16 Jul 2018 14:06:31 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #35 from Andrey Grodzovsky --- (In reply to dwagner from comment #30) > (In reply to Andrey Grodzovsky from comment #29) > > > (If that is a Mesa issue, no more than user processes / X11 should ha= ve > > > crashed - but not the kernel amdgpu driver... right?) > >=20 > > Not exactly, MESA could create a bad request (faulty GPU address) which > > would lead to this. It can even be triggered on purpose using a debug f= lag > > from MESA. >=20 > My understanding is that all parts of MESA run as user processes, outside= of > the kernel space. If such code is allowed to pass parameters into kernel > functions that make the kernel crash, that would be a veritable security > hole which attackers could exploit to stage at least denial-of-service > attacks, if not worse. There is no impact on the kernlel, please note that this is a GPU page faul= t, not CPU page fault so the kernel keeps working normal, doesn't hang and workable. You might get black screen out of this and have to reboot the gra= phic card or maybe the entire system to recover but I don't see any system secur= ity and stability compromise here. --=20 You are receiving this mail because: You are the assignee for the bug.= --15317499911.Ed11eBB7e.5818 Date: Mon, 16 Jul 2018 14:06:31 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 35 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #30)
> (In reply to Andrey Grodzovsky from comment #29)
> > > (If that is a Mesa issue, no more than user processes / X11 =
should have
> > > crashed - but not the kernel amdgpu driver... right?)
> >=20
> > Not exactly, MESA could create a bad request (faulty GPU address)=
 which
> > would lead to this. It can even be triggered on purpose using a d=
ebug flag
> > from MESA.
>=20
> My understanding is that all parts of MESA run as user processes, outs=
ide of
> the kernel space. If such code is allowed to pass parameters into kern=
el
> functions that make the kernel crash, that would be a veritable securi=
ty
> hole which attackers could exploit to stage at least denial-of-service
> attacks, if not worse.

There is no impact on the kernlel, please note that this is a GPU page faul=
t,
not CPU page fault so the kernel keeps working normal, doesn't hang and
workable. You might get black screen out of this and have to reboot the gra=
phic
card or maybe the entire system to recover but I don't see any system secur=
ity
and stability compromise here.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15317499911.Ed11eBB7e.5818-- --===============2119969694== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2119969694==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 29 Jul 2018 10:02:00 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0890603512==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 721A46E11D for ; Sun, 29 Jul 2018 10:02:00 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0890603512== Content-Type: multipart/alternative; boundary="15328585203.BCAE3.23186" Content-Transfer-Encoding: 7bit --15328585203.BCAE3.23186 Date: Sun, 29 Jul 2018 10:02:00 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 Roshless changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |roshless@gmail.com --- Comment #36 from Roshless --- *** Bug 107311 has been marked as a duplicate of this bug. *** --=20 You are receiving this mail because: You are the assignee for the bug.= --15328585203.BCAE3.23186 Date: Sun, 29 Jul 2018 10:02:00 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated Roshless changed bug 10232= 2
What Removed Added
CC   roshless@gmail.com

Comme= nt # 36 on bug 10232= 2 from = Roshless
*** Bug 107311 has been marked as a du=
plicate of this bug. ***


You are receiving this mail because:
  • You are the assignee for the bug.
= --15328585203.BCAE3.23186-- --===============0890603512== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0890603512==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 08 Aug 2018 23:07:38 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1186252450==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 806086E614 for ; Wed, 8 Aug 2018 23:07:38 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1186252450== Content-Type: multipart/alternative; boundary="15337696582.f595f250.4642" Content-Transfer-Encoding: 7bit --15337696582.f595f250.4642 Date: Wed, 8 Aug 2018 23:07:38 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #37 from dwagner --- In the related bug report (https://bugs.freedesktop.org/show_bug.cgi?id=3D1= 07152) I noticed that this bug can be triggered very reliably and quickly by playi= ng a video with a deliberately lowered frame rate: "mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_arbitrary_video.webm" This led me to assume this bug might be caused by the dynamic power managem= ent, that often ramps performance up/down when a video is played at such a low f= rame rate. And indeed, I found this confirmed by many experiments: If I use a script l= ike > #!/bin/bash > cd /sys/class/drm/card0/device > echo manual >power_dpm_force_performance_level > # low > echo 0 >pp_dpm_mclk=20 > echo 0 >pp_dpm_sclk > # medium > #echo 1 >pp_dpm_mclk=20 > #echo 1 >pp_dpm_sclk > # high > #echo 1 >pp_dpm_mclk=20 > #echo 6 >pp_dpm_sclk to enforce just any performance level, then the crashes do not occur anymor= e - also with the "low frame rate video test". So it seems that the transition from one "dpm" performance level to another, with a certain probability, causes these crashes. And the more often the transitions occur, the sooner one will experience them. (BTW: For unknown reason, invoking "xrandr" or enabling a monitor after sle= ep causes the above settings to get lost, so one has to invoke above script again.) --=20 You are receiving this mail because: You are the assignee for the bug.= --15337696582.f595f250.4642 Date: Wed, 8 Aug 2018 23:07:38 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 37 on bug 10232= 2 from dwagner
In the related bug report (https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107152)
I noticed that this bug can be triggered very reliably and quickly by playi=
ng a
video with a deliberately lowered frame rate:
 "mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_arbitrary_video.webm=
"

This led me to assume this bug might be caused by the dynamic power managem=
ent,
that often ramps performance up/down when a video is played at such a low f=
rame
rate.

And indeed, I found this confirmed by many experiments: If I use a script l=
ike
> #!/bin/bash
> cd /sys/class/drm/card0/device
> echo manual >power_dpm_force_performance_level
> # low
> echo 0 >pp_dpm_mclk=20
> echo 0 >pp_dpm_sclk
> # medium
> #echo 1 >pp_dpm_mclk=20
> #echo 1 >pp_dpm_sclk
> # high
> #echo 1 >pp_dpm_mclk=20
> #echo 6 >pp_dpm_sclk
to enforce just any performance level, then the crashes do not occur anymor=
e -
also with the "low frame rate video test".

So it seems that the transition from one "dpm" performance level =
to another,
with a certain probability, causes these crashes. And the more often the
transitions occur, the sooner one will experience them.

(BTW: For unknown reason, invoking "xrandr" or enabling a monitor=
 after sleep
causes the above settings to get lost, so one has to invoke above script
again.)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15337696582.f595f250.4642-- --===============1186252450== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1186252450==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 09 Aug 2018 20:56:06 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0665472732==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 134826E82F for ; Thu, 9 Aug 2018 20:56:07 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0665472732== Content-Type: multipart/alternative; boundary="15338481670.Fd1555FF.6075" Content-Transfer-Encoding: 7bit --15338481670.Fd1555FF.6075 Date: Thu, 9 Aug 2018 20:56:07 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #38 from dwagner --- *** Bug 107152 has been marked as a duplicate of this bug. *** --=20 You are receiving this mail because: You are the assignee for the bug.= --15338481670.Fd1555FF.6075 Date: Thu, 9 Aug 2018 20:56:07 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 38 on bug 10232= 2 from dwagner
*** Bug 107152 has been marked as a du=
plicate of this bug. ***


You are receiving this mail because:
  • You are the assignee for the bug.
= --15338481670.Fd1555FF.6075-- --===============0665472732== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0665472732==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 14 Aug 2018 21:27:41 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1539135548==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id AFC616E2E1 for ; Tue, 14 Aug 2018 21:27:41 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1539135548== Content-Type: multipart/alternative; boundary="15342820613.FBe893E2.27459" Content-Transfer-Encoding: 7bit --15342820613.FBe893E2.27459 Date: Tue, 14 Aug 2018 21:27:41 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #39 from Andrey Grodzovsky --- (In reply to dwagner from comment #37) > In the related bug report > (https://bugs.freedesktop.org/show_bug.cgi?id=3D107152) I noticed that th= is > bug can be triggered very reliably and quickly by playing a video with a > deliberately lowered frame rate: > "mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_arbitrary_video.webm" >=20 > This led me to assume this bug might be caused by the dynamic power > management, that often ramps performance up/down when a video is played at > such a low frame rate. I tried exactly the same - reproduce with same card model and latest kernel= and run webm clip with mpv same way you did and it didn't happen.=20 >=20 > And indeed, I found this confirmed by many experiments: If I use a script > like > > #!/bin/bash > > cd /sys/class/drm/card0/device > > echo manual >power_dpm_force_performance_level > > # low > > echo 0 >pp_dpm_mclk=20 > > echo 0 >pp_dpm_sclk > > # medium > > #echo 1 >pp_dpm_mclk=20 > > #echo 1 >pp_dpm_sclk > > # high > > #echo 1 >pp_dpm_mclk=20 > > #echo 6 >pp_dpm_sclk > to enforce just any performance level, then the crashes do not occur anym= ore > - also with the "low frame rate video test". >=20 > So it seems that the transition from one "dpm" performance level to anoth= er, > with a certain probability, causes these crashes. And the more often the > transitions occur, the sooner one will experience them. >=20 > (BTW: For unknown reason, invoking "xrandr" or enabling a monitor after > sleep causes the above settings to get lost, so one has to invoke above > script again.) --=20 You are receiving this mail because: You are the assignee for the bug.= --15342820613.FBe893E2.27459 Date: Tue, 14 Aug 2018 21:27:41 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 39 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #37)
> In the related bug report
> (https://bugs.freedesktop.org/show_bug.=
cgi?id=3D107152) I noticed that this
> bug can be triggered very reliably and quickly by playing a video with=
 a
> deliberately lowered frame rate:
>  "mpv --no-correct-pts --fps=3D3 --ao=3Dnull some_arbitrary_video=
.webm"

>=20
> This led me to assume this bug might be caused by the dynamic power
> management, that often ramps performance up/down when a video is playe=
d at
> such a low frame rate.

I tried exactly the same - reproduce with same card model and latest kernel=
 and
run webm clip with mpv same way you did and it didn't happen.=20

>=20
> And indeed, I found this confirmed by many experiments: If I use a scr=
ipt
> like
> > #!/bin/bash
> > cd /sys/class/drm/card0/device
> > echo manual >power_dpm_force_performance_level
> > # low
> > echo 0 >pp_dpm_mclk=20
> > echo 0 >pp_dpm_sclk
> > # medium
> > #echo 1 >pp_dpm_mclk=20
> > #echo 1 >pp_dpm_sclk
> > # high
> > #echo 1 >pp_dpm_mclk=20
> > #echo 6 >pp_dpm_sclk
> to enforce just any performance level, then the crashes do not occur a=
nymore
> - also with the "low frame rate video test".
>=20
> So it seems that the transition from one "dpm" performance l=
evel to another,
> with a certain probability, causes these crashes. And the more often t=
he
> transitions occur, the sooner one will experience them.
>=20
> (BTW: For unknown reason, invoking "xrandr" or enabling a mo=
nitor after
> sleep causes the above settings to get lost, so one has to invoke above
> script again.)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15342820613.FBe893E2.27459-- --===============1539135548== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1539135548==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 15 Aug 2018 14:24:24 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0222813462==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E4C56E3A4 for ; Wed, 15 Aug 2018 14:24:24 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0222813462== Content-Type: multipart/alternative; boundary="15343430644.0Cf9d5.11761" Content-Transfer-Encoding: 7bit --15343430644.0Cf9d5.11761 Date: Wed, 15 Aug 2018 14:24:24 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #40 from Andrey Grodzovsky --- Created attachment 141112 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141112&action=3Dedit .config I uploaded my .config file - maybe something in your Kconfig flags makes th= is happen - you can try and rebuild latest kernel from Alex's repository using= my .config and see if you don't experience this anymore.=20 https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next Other than that, since you system hard hangs so you can't do any postmortem dumps, you can at least provide output from events tracing though trace_pip= e to catch live logs on the fly. Maybe we can infer something from there... So again -=20 Load the system and before starting reproduce run the following trace comma= nd - sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv" then cd /sys/kernel/debug/tracing && cat trace_pipe When the problem happens just copy all the output from the terminal to a log file. Make sure your terminal app has largest possible buffer to catch ALL = the output. --=20 You are receiving this mail because: You are the assignee for the bug.= --15343430644.0Cf9d5.11761 Date: Wed, 15 Aug 2018 14:24:24 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 40 on bug 10232= 2 from Andrey Grodzovsky
Created attachment 141112 [details]<=
/a>
.config

I uploaded my .config file - maybe something in your Kconfig flags makes th=
is
happen - you can try and rebuild latest kernel from Alex's repository using=
 my
.config and see if you don't experience this anymore.=20
https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm=
-next

Other than that, since you system hard hangs so you can't do any postmortem
dumps, you can at least provide output from events tracing though trace_pip=
e to
catch live logs on the fly. Maybe we can infer something from there...

So again -=20
Load the system and before starting reproduce run the following trace comma=
nd -

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e &=
quot;amdgpu:amdgpu_iv"

then cd /sys/kernel/debug/tracing && cat trace_pipe

When the problem happens just copy all the output from the terminal to a log
file. Make sure your terminal app has largest possible buffer to catch ALL =
the
output.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15343430644.0Cf9d5.11761-- --===============0222813462== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0222813462==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 15 Aug 2018 22:03:38 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1318782486==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 59DED6E427 for ; Wed, 15 Aug 2018 22:03:38 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1318782486== Content-Type: multipart/alternative; boundary="15343706181.D2E6.7821" Content-Transfer-Encoding: 7bit --15343706181.D2E6.7821 Date: Wed, 15 Aug 2018 22:03:38 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #41 from dwagner --- (In reply to Andrey Grodzovsky from comment #40) > Created attachment 141112 [details] > .config >=20 > I uploaded my .config file - maybe something in your Kconfig flags makes > this happen - you can try and rebuild latest kernel from Alex's repository > using my .config and see if you don't experience this anymore.=20 > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next Did just that - but still the video test crashes after at most few minutes,= and does not crash with DPM turned off. So we can rule out our .config differen= ces (of which there are many). > Other than that, since you system hard hangs so you can't do any postmort= em > dumps, you can at least provide output from events tracing though trace_p= ipe > to catch live logs on the fly. Maybe we can infer something from there... >=20 > So again -=20 > Load the system and before starting reproduce run the following trace > command - >=20 > sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e > "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv" >=20 > then cd /sys/kernel/debug/tracing && cat trace_pipe >=20 > When the problem happens just copy all the output from the terminal to a = log > file. Make sure your terminal app has largest possible buffer to catch ALL > the output. Will try that on next opportunity, probably tomorrow evening. --=20 You are receiving this mail because: You are the assignee for the bug.= --15343706181.D2E6.7821 Date: Wed, 15 Aug 2018 22:03:38 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 41 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #40)
> Created attachment 141112<=
/a> [details]
> .config
>=20
> I uploaded my .config file - maybe something in your Kconfig flags mak=
es
> this happen - you can try and rebuild latest kernel from Alex's reposi=
tory
> using my .config and see if you don't experience this anymore.=20
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-stagin=
g-drm-next

Did just that - but still the video test crashes after at most few minutes,=
 and
does not crash with DPM turned off. So we can rule out our .config differen=
ces
(of which there are many).

> Other than that, since you system hard hangs so =
you can't do any postmortem
> dumps, you can at least provide output from events tracing though trac=
e_pipe
> to catch live logs on the fly. Maybe we can infer something from there=
...
>=20
> So again -=20
> Load the system and before starting reproduce run the following trace
> command -
>=20
> sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
> "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg"=
 -e "amdgpu:amdgpu_iv"
>=20
> then cd /sys/kernel/debug/tracing && cat trace_pipe
>=20
> When the problem happens just copy all the output from the terminal to=
 a log
> file. Make sure your terminal app has largest possible buffer to catch=
 ALL
> the output.

Will try that on next opportunity, probably tomorrow evening.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15343706181.D2E6.7821-- --===============1318782486== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1318782486==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 16 Aug 2018 21:53:57 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1879613841==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B1EE36E543 for ; Thu, 16 Aug 2018 21:53:57 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1879613841== Content-Type: multipart/alternative; boundary="15344564374.EfC3eE517.6382" Content-Transfer-Encoding: 7bit --15344564374.EfC3eE517.6382 Date: Thu, 16 Aug 2018 21:53:57 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #42 from dwagner --- Ok, did the proposed debugging session with trace-cmd, with output to a different PC over ssh. Using today's amd-staging-drm-next and btw., Arch updated the Xorg server earlier today. This time it took about 4 minutes until the video playback with 3 fps crash= ed - the symptom was the same (as in one-colored blank screen and a subsequent system crash), but this time the kernel and ssh session survived the crash = for some seconds, enough for me to also issue the earlier suggested "umr -O ver= bose -R gfx[.]" command after the amdgpu crash, so I can upload the output of th= at, too, but this was the last command executed, the system crashed completely while running it (so its output may be partial). Find attached dmesg, trace, and umr output. --=20 You are receiving this mail because: You are the assignee for the bug.= --15344564374.EfC3eE517.6382 Date: Thu, 16 Aug 2018 21:53:57 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 42 on bug 10232= 2 from dwagner
Ok, did the proposed debugging session with trace-cmd, with ou=
tput to a
different PC over ssh. Using today's amd-staging-drm-next and btw., Arch
updated the Xorg server earlier today.

This time it took about 4 minutes until the video playback with 3 fps crash=
ed -
the symptom was the same (as in one-colored blank screen and a subsequent
system crash), but this time the kernel and ssh session survived the crash =
for
some seconds, enough for me to also issue the earlier suggested "umr -=
O verbose
-R gfx[.]" command after the amdgpu crash, so I can upload the output =
of that,
too, but this was the last command executed, the system crashed completely
while running it (so its output may be partial).

Find attached dmesg, trace, and umr output.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15344564374.EfC3eE517.6382-- --===============1879613841== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1879613841==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 16 Aug 2018 21:55:49 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0936722875==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F018E6E545 for ; Thu, 16 Aug 2018 21:55:49 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0936722875== Content-Type: multipart/alternative; boundary="15344565494.Ecabff.6489" Content-Transfer-Encoding: 7bit --15344565494.Ecabff.6489 Date: Thu, 16 Aug 2018 21:55:49 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #43 from dwagner --- Created attachment 141155 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141155&action=3Dedit trace-cmd induced output during 3-fps-video replay and crash --=20 You are receiving this mail because: You are the assignee for the bug.= --15344565494.Ecabff.6489 Date: Thu, 16 Aug 2018 21:55:49 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 43 on bug 10232= 2 from dwagner
Created attachment 141155 [details]
trace-cmd induced output during 3-fps-video replay and crash


You are receiving this mail because:
  • You are the assignee for the bug.
= --15344565494.Ecabff.6489-- --===============0936722875== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0936722875==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 16 Aug 2018 21:56:38 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1981205583==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id E6FBB6E544 for ; Thu, 16 Aug 2018 21:56:38 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1981205583== Content-Type: multipart/alternative; boundary="15344565984.CfcBaBFB5.6400" Content-Transfer-Encoding: 7bit --15344565984.CfcBaBFB5.6400 Date: Thu, 16 Aug 2018 21:56:38 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #44 from dwagner --- Created attachment 141156 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141156&action=3Dedit dmesg from boot to after the 3-fps-video test crash --=20 You are receiving this mail because: You are the assignee for the bug.= --15344565984.CfcBaBFB5.6400 Date: Thu, 16 Aug 2018 21:56:38 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 44 on bug 10232= 2 from dwagner
Created attachment 141156 [details]
dmesg from boot to after the 3-fps-video test crash


You are receiving this mail because:
  • You are the assignee for the bug.
= --15344565984.CfcBaBFB5.6400-- --===============1981205583== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1981205583==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 16 Aug 2018 21:57:19 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0543110473==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 5E4346E54F for ; Thu, 16 Aug 2018 21:57:19 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0543110473== Content-Type: multipart/alternative; boundary="15344566394.b2bda.6400" Content-Transfer-Encoding: 7bit --15344566394.b2bda.6400 Date: Thu, 16 Aug 2018 21:57:19 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #45 from dwagner --- Created attachment 141157 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141157&action=3Dedit output of umr command after 3-fps-video test crash --=20 You are receiving this mail because: You are the assignee for the bug.= --15344566394.b2bda.6400 Date: Thu, 16 Aug 2018 21:57:19 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 45 on bug 10232= 2 from dwagner
Created attachment 141157 [details]
output of umr command after 3-fps-video test crash


You are receiving this mail because:
  • You are the assignee for the bug.
= --15344566394.b2bda.6400-- --===============0543110473== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0543110473==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 16 Aug 2018 22:31:11 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0028694923==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B8DE36E55C for ; Thu, 16 Aug 2018 22:31:11 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0028694923== Content-Type: multipart/alternative; boundary="15344586714.EE787.7812" Content-Transfer-Encoding: 7bit --15344586714.EE787.7812 Date: Thu, 16 Aug 2018 22:31:11 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #46 from Andrey Grodzovsky --- Thanks. --=20 You are receiving this mail because: You are the assignee for the bug.= --15344586714.EE787.7812 Date: Thu, 16 Aug 2018 22:31:11 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated


You are receiving this mail because:
  • You are the assignee for the bug.
= --15344586714.EE787.7812-- --===============0028694923== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0028694923==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Fri, 17 Aug 2018 21:25:08 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1498877135==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 7F0176E6E0 for ; Fri, 17 Aug 2018 21:25:08 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1498877135== Content-Type: multipart/alternative; boundary="15345411083.32ED90fcF.23776" Content-Transfer-Encoding: 7bit --15345411083.32ED90fcF.23776 Date: Fri, 17 Aug 2018 21:25:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #47 from Andrey Grodzovsky --- Created attachment 141174 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141174&action=3Dedit add_debug_info.patch A am attaching a basic debug patch, please try to apply it. It should give a bit more info in dmesg whe VM fault happens. I wasn't able to test it on my system so it might be buggy or crash. Reproduce again like before with the cmd-trace like before and once the fau= lt happens if possible try quickly run=20 sudo umr -O halt_waves -wa and only if you still have running system after that do the=20 sudo umr -O verbose -R gfx[.] The driver should be loaded amdgpu.vm_fault_stop=3D2 from grub Also check if adding amdgpu.vm_debug=3D1 makes the issue reproduce more qui= ckly --=20 You are receiving this mail because: You are the assignee for the bug.= --15345411083.32ED90fcF.23776 Date: Fri, 17 Aug 2018 21:25:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 47 on bug 10232= 2 from Andrey Grodzovsky
Created attachment 141174 [details] [review]
add_debug_info.patch

A am attaching a basic debug patch, please try to apply it. It should give a
bit more info in dmesg whe VM fault happens. I wasn't able to test it on  my
system so it might be buggy or crash.

Reproduce again like before with the cmd-trace like before and once the fau=
lt
happens if possible try quickly run=20

sudo umr -O halt_waves -wa

and only if you still have running system after that do the=20
sudo umr -O verbose -R gfx[.]

The driver should be loaded amdgpu.vm_fault_stop=3D2 from grub
Also check if adding amdgpu.vm_debug=3D1 makes the issue reproduce more qui=
ckly


You are receiving this mail because:
  • You are the assignee for the bug.
= --15345411083.32ED90fcF.23776-- --===============1498877135== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1498877135==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 18 Aug 2018 21:36:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0652518633==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id CB81572324 for ; Sat, 18 Aug 2018 21:36:09 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0652518633== Content-Type: multipart/alternative; boundary="15346281680.452BEB.26022" Content-Transfer-Encoding: 7bit --15346281680.452BEB.26022 Date: Sat, 18 Aug 2018 21:36:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #48 from dwagner --- (In reply to Andrey Grodzovsky from comment #47) > Created attachment 141174 [details] [review] > add_debug_info.patch >=20 > A am attaching a basic debug patch, please try to apply it. Done. > It should give a > bit more info in dmesg whe VM fault happens.=20 Hmm - I could not see any additional output resulting from it. > Reproduce again like before with the cmd-trace like before and once the > fault happens if possible try quickly run=20 >=20 > sudo umr -O halt_waves -wa >=20 > and only if you still have running system after that do the=20 > sudo umr -O verbose -R gfx[.] >=20 > The driver should be loaded amdgpu.vm_fault_stop=3D2 from grub Did that - will attach the script "gpu_debug3.sh" and its output - this tim= e, dmesg and trace output are in the same file, if you want to look only at the dmesg part, "grep '^\[' gpu_debug_3.txt" will get it.=20 I reproduced the bug 4 times, on 2 occasions no error was emitted before crashing, the 2 other times both umr commands could still run - since the e= rror message looked the same, I'll attach the shorter file, where the crash occu= rred more quickly. > Also check if adding amdgpu.vm_debug=3D1 makes the issue reproduce more q= uickly I used that setting, but it did not seem to make a difference for how quick= ly the crash occurred - still "some seconds to some minutes". --=20 You are receiving this mail because: You are the assignee for the bug.= --15346281680.452BEB.26022 Date: Sat, 18 Aug 2018 21:36:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 48 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #47)
> Created attac=
hment 141174 [details] [review] [re=
view]
> add_debug_info.patch
>=20
> A am attaching a basic debug patch, please try to apply it.

Done.

> It should give a
> bit more info in dmesg whe VM fault happens. 

Hmm - I could not see any additional output resulting from it.

> Reproduce again like before with the cmd-trace l=
ike before and once the
> fault happens if possible try quickly run=20
>=20
> sudo umr -O halt_waves -wa
>=20
> and only if you still have running system after that do the=20
> sudo umr -O verbose -R gfx[.]
>=20
> The driver should be loaded amdgpu.vm_fault_stop=3D2 from grub

Did that - will attach the script "gpu_debug3.sh" and its output =
- this time,
dmesg and trace output are in the same file, if you want to look only at the
dmesg part, "grep '^\[' gpu_debug_3.txt" will get it.=20

I reproduced the bug 4 times, on 2 occasions no error wa=
s emitted before
crashing, the 2 other times both umr commands could still run - since the e=
rror
message looked the same, I'll attach the shorter file, where the crash occu=
rred
more quickly.

> Also check if adding amdgpu.vm_debug=3D1 makes t=
he issue reproduce more quickly

I used that setting, but it did not seem to make a difference for how quick=
ly
the crash occurred - still "some seconds to some minutes".


You are receiving this mail because:
  • You are the assignee for the bug.
= --15346281680.452BEB.26022-- --===============0652518633== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0652518633==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 18 Aug 2018 21:37:20 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2103256750==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 26E7A72322 for ; Sat, 18 Aug 2018 21:37:20 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2103256750== Content-Type: multipart/alternative; boundary="15346282401.52d9fe.26074" Content-Transfer-Encoding: 7bit --15346282401.52d9fe.26074 Date: Sat, 18 Aug 2018 21:37:20 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #49 from dwagner --- Created attachment 141189 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141189&action=3Dedit script used to generate the gpu_debug_3.txt (when executed via ssh -t ...) --=20 You are receiving this mail because: You are the assignee for the bug.= --15346282401.52d9fe.26074 Date: Sat, 18 Aug 2018 21:37:20 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 49 on bug 10232= 2 from dwagner
Created attachment 141189 [details]
script used to generate the gpu_debug_3.txt (when executed via ssh -t ...)<=
/pre>
        


You are receiving this mail because:
  • You are the assignee for the bug.
= --15346282401.52d9fe.26074-- --===============2103256750== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2103256750==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 18 Aug 2018 21:38:10 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1202424556==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 7C6C689938 for ; Sat, 18 Aug 2018 21:38:10 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1202424556== Content-Type: multipart/alternative; boundary="15346282901.BDbCb72a.26131" Content-Transfer-Encoding: 7bit --15346282901.BDbCb72a.26131 Date: Sat, 18 Aug 2018 21:38:10 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #50 from dwagner --- Created attachment 141190 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141190&action=3Dedit dmesg / trace / umr output from gpu_debug3.sh --=20 You are receiving this mail because: You are the assignee for the bug.= --15346282901.BDbCb72a.26131 Date: Sat, 18 Aug 2018 21:38:10 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 50 on bug 10232= 2 from dwagner
Created attachment 141190 [details]=

dmesg / trace / umr output from gpu_debug3.sh


You are receiving this mail because:
  • You are the assignee for the bug.
= --15346282901.BDbCb72a.26131-- --===============1202424556== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1202424556==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 18 Aug 2018 21:40:01 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0043394548==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 2E63F72324 for ; Sat, 18 Aug 2018 21:40:01 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0043394548== Content-Type: multipart/alternative; boundary="15346284012.DDeAd3.26252" Content-Transfer-Encoding: 7bit --15346284012.DDeAd3.26252 Date: Sat, 18 Aug 2018 21:40:01 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 dwagner changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #141190|0 |1 is obsolete| | --- Comment #51 from dwagner --- Created attachment 141191 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141191&action=3Dedit xz-compressed output of gpu_debug3.sh - dmesg, trace, umr --=20 You are receiving this mail because: You are the assignee for the bug.= --15346284012.DDeAd3.26252 Date: Sat, 18 Aug 2018 21:40:01 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated dwagner changed bug 10232= 2
What Removed Added
Attachment #141190 is obsolete   1

Comme= nt # 51 on bug 10232= 2 from dwagner
Created attachment 141191 [details]
xz-compressed output of gpu_debug3.sh - dmesg, trace, umr


You are receiving this mail because:
  • You are the assignee for the bug.
= --15346284012.DDeAd3.26252-- --===============0043394548== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0043394548==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 18 Aug 2018 21:43:23 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2121463759==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id D2C3C89BFB for ; Sat, 18 Aug 2018 21:43:22 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2121463759== Content-Type: multipart/alternative; boundary="15346286020.445A.26414" Content-Transfer-Encoding: 7bit --15346286020.445A.26414 Date: Sat, 18 Aug 2018 21:43:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #52 from dwagner --- One other experiment I made: I wrote a script to quickly toggle pp_dpm_mclk= and pp_dpm_sclk while playing a 3 fps video with power_dpm_force_performance_level=3Dmanual. Could not reproduce the crashes= that happen with power_dpm_force_performance_level=3Dauto this way. --=20 You are receiving this mail because: You are the assignee for the bug.= --15346286020.445A.26414 Date: Sat, 18 Aug 2018 21:43:22 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 52 on bug 10232= 2 from dwagner
One other experiment I made: I wrote a script to quickly toggl=
e pp_dpm_mclk and
pp_dpm_sclk while playing a 3 fps video with
power_dpm_force_performance_level=3Dmanual. Could not reproduce the crashes=
 that
happen with power_dpm_force_performance_level=3Dauto this way.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15346286020.445A.26414-- --===============2121463759== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2121463759==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 20 Aug 2018 14:16:08 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0801339438==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id EAEA46E277 for ; Mon, 20 Aug 2018 14:16:08 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0801339438== Content-Type: multipart/alternative; boundary="15347745683.BF66a.14137" Content-Transfer-Encoding: 7bit --15347745683.BF66a.14137 Date: Mon, 20 Aug 2018 14:16:08 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #53 from Andrey Grodzovsky --- Created attachment 141198 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141198&action=3Dedit add_debug_info2.patch Try this patch instead, i might be missing some prints in the first one. In the last log you attached I haven't seen any UMR dumps or GPU fault prin= ts in dmesg. THe GPU fault has to be in the log to compare the faulty address against the debug prints in the patch. --=20 You are receiving this mail because: You are the assignee for the bug.= --15347745683.BF66a.14137 Date: Mon, 20 Aug 2018 14:16:08 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 53 on bug 10232= 2 from Andrey Grodzovsky
Created attachment 141198<=
/a> [details] [review]
add_debug_info2.patch

Try this patch instead, i might be missing some prints in the first one.
In the last log you attached I haven't seen any UMR dumps or GPU fault prin=
ts
in dmesg. THe GPU fault has to be in the log to compare the faulty address
against the debug prints in the patch.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15347745683.BF66a.14137-- --===============0801339438== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0801339438==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 21 Aug 2018 08:41:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0062121752==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 3FD556E293 for ; Tue, 21 Aug 2018 08:41:53 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0062121752== Content-Type: multipart/alternative; boundary="15348409131.6Ebc.15406" Content-Transfer-Encoding: 7bit --15348409131.6Ebc.15406 Date: Tue, 21 Aug 2018 08:41:53 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #54 from dwagner --- (In reply to Andrey Grodzovsky from comment #53) > Created attachment 141198 [details] [review] > add_debug_info2.patch >=20 > Try this patch instead, i might be missing some prints in the first one. Can try that this evening. > In the last log you attached I haven't seen any UMR dumps or GPU fault > prints in dmesg. THe GPU fault has to be in the log to compare the faulty > address against the debug prints in the patch. In above attached file "xz-compressed output of gpu_debug3.sh" there is umr output at the time of the crash (238 seconds after the reboot): ---------------------------------------------- ... mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start: driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal: driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled: driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210 kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled: driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211 [ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou= t, signaled seq=3D32624, emitted seq=3D32626 [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! crash detected! executing umr -O halt_waves -wa No active waves! executing umr -O verbose -R gfx[.] polaris11.gfx.rptr =3D=3D 1792 polaris11.gfx.wptr =3D=3D 1792 polaris11.gfx.drv_wptr =3D=3D 1792 polaris11.gfx.ring[1761] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1762] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1763] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1764] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1765] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1766] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1767] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1768] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1769] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1770] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1771] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1772] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1773] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1774] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1775] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1776] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1777] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1778] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1779] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1780] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1781] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1782] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1783] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1784] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1785] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1786] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1787] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1788] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1789] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1790] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1791] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[1792] =3D=3D 0xc0032200 rwD=20 trying to get ADR from dmesg output for 'umr -O verbose -vm ...' trying to get VMID from dmesg output for 'umr -O verbose -vm ...' done after crash, flashing NUMLOCK LED. amdgpu_cs:0-799 [001] .... 286.852838: amdgpu_bo_list_set: list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072 amdgpu_cs:0-799 [001] .... 286.852846: amdgpu_bo_list_set: list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072 ... ---------------------------------------------- But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages = this time. Sometimes such are emitted, sometimes not. --=20 You are receiving this mail because: You are the assignee for the bug.= --15348409131.6Ebc.15406 Date: Tue, 21 Aug 2018 08:41:53 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 54 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #53)
> Created atta=
chment 141198 [details] [review] [re=
view]
> add_debug_info2.patch
>=20
> Try this patch instead, i might be missing some prints in the first on=
e.

Can try that this evening.

> In the last log you attached I haven't seen any =
UMR dumps or GPU fault
> prints in dmesg. THe GPU fault has to be in the log to compare the fau=
lty
> address against the debug prints in the patch.

In above attached file "xz-compressed output of gpu_debug3.sh" th=
ere is umr
output at the time of the crash (238 seconds after the reboot):

----------------------------------------------
...
          mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
          mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
     kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
     kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
[  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D32624, emitted seq=3D32626
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!

crash detected!

executing umr -O halt_waves -wa
No active waves!


executing umr -O verbose -R gfx[.]

polaris11.gfx.rptr =3D=3D 1792
polaris11.gfx.wptr =3D=3D 1792
polaris11.gfx.drv_wptr =3D=3D 1792
polaris11.gfx.ring[1761] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1762] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1763] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1764] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1765] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1766] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1767] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1768] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1769] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1770] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1771] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1772] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1773] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1774] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1775] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1776] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1777] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1778] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1779] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1780] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1781] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1782] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1783] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1784] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1785] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1786] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1787] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1788] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1789] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1790] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1791] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[1792] =3D=3D 0xc0032200    rwD=20

trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'

done after crash, flashing NUMLOCK LED.
     amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072
     amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072
...
----------------------------------------------

But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error=
 messages this
time. Sometimes such are emitted, sometimes not.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348409131.6Ebc.15406-- --===============0062121752== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0062121752==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 21 Aug 2018 14:43:24 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1360968901==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id A58776E1E5 for ; Tue, 21 Aug 2018 14:43:24 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1360968901== Content-Type: multipart/alternative; boundary="15348626044.DA8A.2275" Content-Transfer-Encoding: 7bit --15348626044.DA8A.2275 Date: Tue, 21 Aug 2018 14:43:24 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #55 from Andrey Grodzovsky --- (In reply to dwagner from comment #54) > (In reply to Andrey Grodzovsky from comment #53) > > Created attachment 141198 [details] [review] [review] > > add_debug_info2.patch > >=20 > > Try this patch instead, i might be missing some prints in the first one. >=20 > Can try that this evening. >=20 > > In the last log you attached I haven't seen any UMR dumps or GPU fault > > prints in dmesg. THe GPU fault has to be in the log to compare the faul= ty > > address against the debug prints in the patch. >=20 > In above attached file "xz-compressed output of gpu_debug3.sh" there is u= mr > output at the time of the crash (238 seconds after the reboot): >=20 > ---------------------------------------------- > ... > mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start: > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal: > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled: > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210 > kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled: > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211 > [ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > timeout, signaled seq=3D32624, emitted seq=3D32626 > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! >=20 > crash detected! >=20 > executing umr -O halt_waves -wa > No active waves! Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happened t= hat should have froze GPUs compute units and hence the above command would prod= uce a lot of wave info. >=20 >=20 > executing umr -O verbose -R gfx[.] >=20 > polaris11.gfx.rptr =3D=3D 1792 > polaris11.gfx.wptr =3D=3D 1792 > polaris11.gfx.drv_wptr =3D=3D 1792 > polaris11.gfx.ring[1761] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1762] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1763] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1764] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1765] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1766] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1767] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1768] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1769] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1770] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1771] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1772] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1773] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1774] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1775] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1776] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1777] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1778] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1779] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1780] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1781] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1782] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1783] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1784] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1785] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1786] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1787] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1788] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1789] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1790] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1791] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[1792] =3D=3D 0xc0032200 rwD=20 >=20 > trying to get ADR from dmesg output for 'umr -O verbose -vm ...' > trying to get VMID from dmesg output for 'umr -O verbose -vm ...' >=20 > done after crash, flashing NUMLOCK LED. > amdgpu_cs:0-799 [001] .... 286.852838: amdgpu_bo_list_set: > list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072 > amdgpu_cs:0-799 [001] .... 286.852846: amdgpu_bo_list_set: > list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072 > ... > ---------------------------------------------- >=20 > But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages > this time. Sometimes such are emitted, sometimes not. --=20 You are receiving this mail because: You are the assignee for the bug.= --15348626044.DA8A.2275 Date: Tue, 21 Aug 2018 14:43:24 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 55 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #54)
> (In reply to Andrey Grodzovsky from comment #53)
> > Created attachment 141198=
 [details] [review] [review] [review]
> > add_debug_info2.patch
> >=20
> > Try this patch instead, i might be missing some prints in the fir=
st one.
>=20
> Can try that this evening.
>=20
> > In the last log you attached I haven't seen any UMR dumps or GPU =
fault
> > prints in dmesg. THe GPU fault has to be in the log to compare th=
e faulty
> > address against the debug prints in the patch.
>=20
> In above attached file "xz-compressed output of gpu_debug3.sh&quo=
t; there is umr
> output at the time of the crash (238 seconds after the reboot):
>=20
> ----------------------------------------------
> ...
>           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
> driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
>           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signa=
l:
> driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
>      kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
> driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
>      kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
> driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
> [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=3D32624, emitted seq=3D32626
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
>=20
> crash detected!
>=20
> executing umr -O halt_waves -wa
> No active waves!

Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happened t=
hat
should have froze GPUs compute units and hence the above command would prod=
uce
a lot of wave info.

>=20
>=20
> executing umr -O verbose -R gfx[.]
>=20
> polaris11.gfx.rptr =3D=3D 1792
> polaris11.gfx.wptr =3D=3D 1792
> polaris11.gfx.drv_wptr =3D=3D 1792
> polaris11.gfx.ring[1761] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1762] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1763] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1764] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1765] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1766] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1767] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1768] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1769] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1770] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1771] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1772] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1773] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1774] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1775] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1776] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1777] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1778] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1779] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1780] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1781] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1782] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1783] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1784] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1785] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1786] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1787] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1788] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1789] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1790] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1791] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[1792] =3D=3D 0xc0032200    rwD=20
>=20
> trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
> trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
>=20
> done after crash, flashing NUMLOCK LED.
>      amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
> list=3D0000000099c16b5c, bo=3D000000001771c26f, bo_size=3D131072
>      amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
> list=3D0000000099c16b5c, bo=3D0000000046bfd439, bo_size=3D131072
> ...
> ----------------------------------------------
>=20
> But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" =
error messages
> this time. Sometimes such are emitted, sometimes not.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348626044.DA8A.2275-- --===============1360968901== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1360968901==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 21 Aug 2018 21:16:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1052334486==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 7469789FE6 for ; Tue, 21 Aug 2018 21:16:52 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1052334486== Content-Type: multipart/alternative; boundary="15348862121.DdADd.15550" Content-Transfer-Encoding: 7bit --15348862121.DdADd.15550 Date: Tue, 21 Aug 2018 21:16:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #56 from dwagner --- (In reply to Andrey Grodzovsky from comment #55) > > In above attached file "xz-compressed output of gpu_debug3.sh" there is= umr > > output at the time of the crash (238 seconds after the reboot): > >=20 > > ---------------------------------------------- > > ... > > mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start: > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > mpv/vo-897 [005] d... 235.191548: dma_fence_enable_signal: > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled: > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210 > > kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled: > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211 > > [ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > > timeout, signaled seq=3D32624, emitted seq=3D32626 > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > >=20 > > crash detected! > >=20 > > executing umr -O halt_waves -wa > > No active waves! >=20 > Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happened= that > should have froze GPUs compute units and hence the above command would > produce a lot of wave info. Yes I did, as can be seen from the kernel command line at the very beginnin= g of the file I attached: [ 0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 video= =3DDP-1:d video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_update_= mode=3D0 amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=3D2 amdgpu.vm_debug=3D1 Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a proced= ure that discards whatever has been in thoses "waves" before? If yes, could amdgpu.gpu_recovery=3D0 prevent that from happening? --=20 You are receiving this mail because: You are the assignee for the bug.= --15348862121.DdADd.15550 Date: Tue, 21 Aug 2018 21:16:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 56 on bug 10232= 2 from dwagner
(In reply to Andrey Grodzovsky from comment #55)
> > In above attached file "xz-compressed =
output of gpu_debug3.sh" there is umr
> > output at the time of the crash (238 seconds after the reboot):
> >=20
> > ----------------------------------------------
> > ...
> >           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_st=
art:
> > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> >           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_=
signal:
> > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> >      kworker/0:2-92    [000] ....   238.275988: dma_fence_signale=
d:
> > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
> >      kworker/0:2-92    [000] ....   238.276004: dma_fence_signale=
d:
> > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
> > [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sd=
ma0
> > timeout, signaled seq=3D32624, emitted seq=3D32626
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> >=20
> > crash detected!
> >=20
> > executing umr -O halt_waves -wa
> > No active waves!
>=20
> Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happe=
ned that
> should have froze GPUs compute units and hence the above command would
> produce a lot of wave info.

Yes I did, as can be seen from the kernel command line at the very beginnin=
g of
the file I attached:
[    0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd
root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw
cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 video=
=3DDP-1:d
video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_update_=
mode=3D0
amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=3D2
amdgpu.vm_debug=3D1

Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicat=
e a procedure
that discards whatever has been in thoses "waves" before? If yes,=
 could
amdgpu.gpu_recovery=3D0 prevent that from happening?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348862121.DdADd.15550-- --===============1052334486== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1052334486==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 21 Aug 2018 21:29:48 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0161517789==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id A2EDD6E16F for ; Tue, 21 Aug 2018 21:29:48 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0161517789== Content-Type: multipart/alternative; boundary="15348869884.4CBC79.16641" Content-Transfer-Encoding: 7bit --15348869884.4CBC79.16641 Date: Tue, 21 Aug 2018 21:29:48 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #57 from Andrey Grodzovsky --- (In reply to dwagner from comment #56) > (In reply to Andrey Grodzovsky from comment #55) > > > In above attached file "xz-compressed output of gpu_debug3.sh" there = is umr > > > output at the time of the crash (238 seconds after the reboot): > > >=20 > > > ---------------------------------------------- > > > ... > > > mpv/vo-897 [005] .... 235.191542: dma_fence_wait_start: > > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > > mpv/vo-897 [005] d... 235.191548: dma_fence_enable_sign= al: > > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87 > > > kworker/0:2-92 [000] .... 238.275988: dma_fence_signaled: > > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210 > > > kworker/0:2-92 [000] .... 238.276004: dma_fence_signaled: > > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211 > > > [ 238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > > > timeout, signaled seq=3D32624, emitted seq=3D32626 > > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > > > [ 238.180641] amdgpu 0000:0a:00.0: GPU reset begin! > > >=20 > > > crash detected! > > >=20 > > > executing umr -O halt_waves -wa > > > No active waves! > >=20 > > Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault happen= ed that > > should have froze GPUs compute units and hence the above command would > > produce a lot of wave info. >=20 > Yes I did, as can be seen from the kernel command line at the very beginn= ing > of the file I attached: > [ 0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd > root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw > cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 video= =3DDP-1:d > video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_updat= e_mode=3D0 > amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=3D2 > amdgpu.vm_debug=3D1 >=20 > Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a > procedure that discards whatever has been in thoses "waves" before? If ye= s, > could amdgpu.gpu_recovery=3D0 prevent that from happening? Yes, missed that one. No resets. --=20 You are receiving this mail because: You are the assignee for the bug.= --15348869884.4CBC79.16641 Date: Tue, 21 Aug 2018 21:29:48 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 57 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #56)
> (In reply to Andrey Grodzovsky from comment #55)
> > > In above attached file "xz-compressed output of gpu_deb=
ug3.sh" there is umr
> > > output at the time of the crash (238 seconds after the reboo=
t):
> > >=20
> > > ----------------------------------------------
> > > ...
> > >           mpv/vo-897   [005] ....   235.191542: dma_fence_wa=
it_start:
> > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> > >           mpv/vo-897   [005] d...   235.191548: dma_fence_en=
able_signal:
> > > driver=3Ddrm_sched timeline=3Dgfx context=3D162 seqno=3D87
> > >      kworker/0:2-92    [000] ....   238.275988: dma_fence_si=
gnaled:
> > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D210
> > >      kworker/0:2-92    [000] ....   238.276004: dma_fence_si=
gnaled:
> > > driver=3Damdgpu timeline=3Dsdma1 context=3D11 seqno=3D211
> > > [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng sdma0
> > > timeout, signaled seq=3D32624, emitted seq=3D32626
> > > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > >=20
> > > crash detected!
> > >=20
> > > executing umr -O halt_waves -wa
> > > No active waves!
> >=20
> > Did you use amdgpu.vm_fault_stop=3D2 parameter ? In case a fault =
happened that
> > should have froze GPUs compute units and hence the above command =
would
> > produce a lot of wave info.
>=20
> Yes I did, as can be seen from the kernel command line at the very beg=
inning
> of the file I attached:
> [    0.000000] Command line: BOOT_IMAGE=3D/vmlinuz-linux_amd
> root=3DUUID=3Db5d56e15-18f3-4783-af84-bbff3bbff3ef rw
> cryptdevice=3D/dev/nvme0n1p2:root:allow-discards libata.force=3D1.5 vi=
deo=3DDP-1:d
> video=3DDVI-D-1:d video=3DHDMI-A-1:1024x768 amdgpu.dc=3D1 amdgpu.vm_up=
date_mode=3D0
> amdgpu.dpm=3D-1 amdgpu.ppfeaturemask=3D0xffffffff amdgpu.vm_fault_stop=
=3D2
> amdgpu.vm_debug=3D1
>=20
> Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message in=
dicate a
> procedure that discards whatever has been in thoses "waves" =
before? If yes,
> could amdgpu.gpu_recovery=3D0 prevent that from happening?

Yes, missed that one. No resets.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348869884.4CBC79.16641-- --===============0161517789== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0161517789==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 22 Aug 2018 00:24:35 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2123262021==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 71C8C89B55 for ; Wed, 22 Aug 2018 00:24:35 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2123262021== Content-Type: multipart/alternative; boundary="15348974754.2Bb10afD.2211" Content-Transfer-Encoding: 7bit --15348974754.2Bb10afD.2211 Date: Wed, 22 Aug 2018 00:24:35 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #58 from dwagner --- Here comes another trace log, with your info2.patch applied. Something must have changed since the last test, as it took pretty long this time to reproduce the crash. Could that have been caused by https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu= /nbio_v7_4.c?h=3Damd-staging-drm-next&id=3Db385925f3922faca7435e50e31380bb2= 602fd6b8 now being part of the kernel? However, the latest trace you find attached below is not much different to = the last one, xzcat /tmp/gpu_debug5.txt.xz | grep '^\[' will tell you: [ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou= t, signaled seq=3D475104, emitted seq=3D475106 [ 1510.023117] [drm] GPU recovery disabled. amdgpu_cs:0-806 [012] .... 1787.493126: amdgpu_vm_bo_cs: soffs=3D00001001a0, eoffs=3D00001001b9, flags=3D70 amdgpu_cs:0-806 [012] .... 1787.493127: amdgpu_vm_bo_cs: soffs=3D0000100200, eoffs=3D00001021e0, flags=3D70 amdgpu_cs:0-806 [012] .... 1787.493127: amdgpu_vm_bo_cs: soffs=3D0000102200, eoffs=3D00001041e0, flags=3D70 amdgpu_cs:0-806 [012] .... 1787.493129: amdgpu_vm_bo_cs: soffs=3D000010c1e0, eoffs=3D000010c2e1, flags=3D70 amdgpu_cs:0-806 [012] .... 1787.493131: drm_sched_job: entity=3D00000000406345a7, id=3D10239, fence=3D000000007a120377, ring=3Dgfx= , job count:8, hw job count:0 And later in the file you can find: ------------------------------------------------------ crash detected! executing umr -O halt_waves -wa No active waves! executing umr -O verbose -R gfx[.] polaris11.gfx.rptr =3D=3D 512 polaris11.gfx.wptr =3D=3D 512 polaris11.gfx.drv_wptr =3D=3D 512 polaris11.gfx.ring[ 481] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 482] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 483] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 484] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 485] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 486] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 487] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 488] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 489] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 490] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 491] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 492] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 493] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 494] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 495] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 496] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 497] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 498] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 499] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 500] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 501] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 502] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 503] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 504] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 505] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 506] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 507] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 508] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 509] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 510] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 511] =3D=3D 0xffff1000 ...=20 polaris11.gfx.ring[ 512] =3D=3D 0xc0032200 rwD=20 trying to get ADR from dmesg output for 'umr -O verbose -vm ...' trying to get VMID from dmesg output for 'umr -O verbose -vm ...' done after crash. ------------------------------------------- So even without GPU reset, still no "waves". And the error message also does not state any VM fault address. --=20 You are receiving this mail because: You are the assignee for the bug.= --15348974754.2Bb10afD.2211 Date: Wed, 22 Aug 2018 00:24:35 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 58 on bug 10232= 2 from dwagner
Here comes another trace log, with your info2.patch applied.

Something must have changed since the last test, as it took pretty long this
time to reproduce the crash. Could that have been caused by
https://cgit.freedesktop.org/~agd5f/linux/commit/d=
rivers/gpu/drm/amd/amdgpu/nbio_v7_4.c?h=3Damd-staging-drm-next&id=3Db38=
5925f3922faca7435e50e31380bb2602fd6b8
now being part of the kernel?

However, the latest trace you find attached below is not much different to =
the
last one, xzcat /tmp/gpu_debug5.txt.xz  | grep '^\[' will tell you:

[ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeou=
t,
signaled seq=3D475104, emitted seq=3D475106
[ 1510.023117] [drm] GPU recovery disabled.

     amdgpu_cs:0-806   [012] ....  1787.493126: amdgpu_vm_bo_cs:
soffs=3D00001001a0, eoffs=3D00001001b9, flags=3D70
     amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
soffs=3D0000100200, eoffs=3D00001021e0, flags=3D70
     amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
soffs=3D0000102200, eoffs=3D00001041e0, flags=3D70
     amdgpu_cs:0-806   [012] ....  1787.493129: amdgpu_vm_bo_cs:
soffs=3D000010c1e0, eoffs=3D000010c2e1, flags=3D70
     amdgpu_cs:0-806   [012] ....  1787.493131: drm_sched_job:
entity=3D00000000406345a7, id=3D10239, fence=3D000000007a120377, ring=3Dgfx=
, job
count:8, hw job count:0

And later in the file you can find:
------------------------------------------------------
crash detected!

executing umr -O halt_waves -wa
No active waves!

executing umr -O verbose -R gfx[.]

polaris11.gfx.rptr =3D=3D 512
polaris11.gfx.wptr =3D=3D 512
polaris11.gfx.drv_wptr =3D=3D 512
polaris11.gfx.ring[ 481] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 482] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 483] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 484] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 485] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 486] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 487] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 488] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 489] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 490] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 491] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 492] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 493] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 494] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 495] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 496] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 497] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 498] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 499] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 500] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 501] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 502] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 503] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 504] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 505] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 506] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 507] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 508] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 509] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 510] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 511] =3D=3D 0xffff1000    ...=20
polaris11.gfx.ring[ 512] =3D=3D 0xc0032200    rwD=20


trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'

done after crash.
-------------------------------------------

So even without GPU reset, still no "waves". And the error messag=
e also does
not state any VM fault address.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348974754.2Bb10afD.2211-- --===============2123262021== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2123262021==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 22 Aug 2018 00:26:06 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0971939922==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id E09946E058 for ; Wed, 22 Aug 2018 00:26:06 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0971939922== Content-Type: multipart/alternative; boundary="15348975665.775B.2470" Content-Transfer-Encoding: 7bit --15348975665.775B.2470 Date: Wed, 22 Aug 2018 00:26:06 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #59 from dwagner --- Created attachment 141228 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141228&action=3Dedit latest crash trace output, without gpu_reset --=20 You are receiving this mail because: You are the assignee for the bug.= --15348975665.775B.2470 Date: Wed, 22 Aug 2018 00:26:06 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated


You are receiving this mail because:
  • You are the assignee for the bug.
= --15348975665.775B.2470-- --===============0971939922== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0971939922==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 22 Aug 2018 14:33:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1933027203==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 144706E1F7 for ; Wed, 22 Aug 2018 14:33:03 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1933027203== Content-Type: multipart/alternative; boundary="15349483820.99a31a.9759" Content-Transfer-Encoding: 7bit --15349483820.99a31a.9759 Date: Wed, 22 Aug 2018 14:33:02 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #60 from Andrey Grodzovsky --- (In reply to dwagner from comment #58) > Here comes another trace log, with your info2.patch applied. >=20 > Something must have changed since the last test, as it took pretty long t= his > time to reproduce the crash. Could that have been caused by > https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdg= pu/ > nbio_v7_4.c?h=3Damd-staging-drm- > next&id=3Db385925f3922faca7435e50e31380bb2602fd6b8 now being part of the > kernel? Don't think it's related. This code is more related to virtualization. >=20 > However, the latest trace you find attached below is not much different to > the last one, xzcat /tmp/gpu_debug5.txt.xz | grep '^\[' will tell you: >=20 > [ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 > timeout, signaled seq=3D475104, emitted seq=3D475106 > [ 1510.023117] [drm] GPU recovery disabled. That just means you are again running with GPU VM update mode set to use SD= MA. Which is seen in you dmesg (amdgpu.vm_update_mode=3D0) , so are again experiencing the original issue of SDMA hang. Please use amdgpu.vm_update_mode=3D3 to get back to VM_FAULTs issue. >=20 > amdgpu_cs:0-806 [012] .... 1787.493126: amdgpu_vm_bo_cs: > soffs=3D00001001a0, eoffs=3D00001001b9, flags=3D70 > amdgpu_cs:0-806 [012] .... 1787.493127: amdgpu_vm_bo_cs: > soffs=3D0000100200, eoffs=3D00001021e0, flags=3D70 > amdgpu_cs:0-806 [012] .... 1787.493127: amdgpu_vm_bo_cs: > soffs=3D0000102200, eoffs=3D00001041e0, flags=3D70 > amdgpu_cs:0-806 [012] .... 1787.493129: amdgpu_vm_bo_cs: > soffs=3D000010c1e0, eoffs=3D000010c2e1, flags=3D70 > amdgpu_cs:0-806 [012] .... 1787.493131: drm_sched_job: > entity=3D00000000406345a7, id=3D10239, fence=3D000000007a120377, ring=3Dg= fx, job > count:8, hw job count:0 >=20 > And later in the file you can find: > ------------------------------------------------------ > crash detected! >=20 > executing umr -O halt_waves -wa > No active waves! >=20 > executing umr -O verbose -R gfx[.] >=20 > polaris11.gfx.rptr =3D=3D 512 > polaris11.gfx.wptr =3D=3D 512 > polaris11.gfx.drv_wptr =3D=3D 512 > polaris11.gfx.ring[ 481] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 482] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 483] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 484] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 485] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 486] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 487] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 488] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 489] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 490] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 491] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 492] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 493] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 494] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 495] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 496] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 497] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 498] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 499] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 500] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 501] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 502] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 503] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 504] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 505] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 506] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 507] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 508] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 509] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 510] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 511] =3D=3D 0xffff1000 ...=20 > polaris11.gfx.ring[ 512] =3D=3D 0xc0032200 rwD=20 >=20 >=20 > trying to get ADR from dmesg output for 'umr -O verbose -vm ...' > trying to get VMID from dmesg output for 'umr -O verbose -vm ...' >=20 > done after crash. > ------------------------------------------- >=20 > So even without GPU reset, still no "waves". And the error message also d= oes > not state any VM fault address. --=20 You are receiving this mail because: You are the assignee for the bug.= --15349483820.99a31a.9759 Date: Wed, 22 Aug 2018 14:33:02 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 60 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #58)
> Here comes another trace log, with your info2.pa=
tch applied.
>=20
> Something must have changed since the last test, as it took pretty lon=
g this
> time to reproduce the crash. Could that have been caused by
> https://cgit.freedesktop.org/~agd5f/linux/commit/drivers=
/gpu/drm/amd/amdgpu/
> nbio_v7_4.c?h=3Damd-staging-drm-
> next&id=3Db385925f3922faca7435e50e31380bb2602fd6b8 now being part =
of the
> kernel?

Don't think it's related. This code is more related to virtualization.

>=20
> However, the latest trace you find attached below is not much differen=
t to
> the last one, xzcat /tmp/gpu_debug5.txt.xz  | grep '^\[' will tell you:
>=20
> [ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=3D475104, emitted seq=3D475106
> [ 1510.023117] [drm] GPU recovery disabled.

That just means you are again running with GPU VM update mode set to use SD=
MA.
Which is seen in you dmesg (amdgpu.vm_update_mode=3D0) , so are again
experiencing the original issue of SDMA hang. Please use
amdgpu.vm_update_mode=3D3 to get back to VM_FAULTs issue.

>=20
>      amdgpu_cs:0-806   [012] ....  1787.493126: amdgpu_vm_bo_cs:
> soffs=3D00001001a0, eoffs=3D00001001b9, flags=3D70
>      amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
> soffs=3D0000100200, eoffs=3D00001021e0, flags=3D70
>      amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
> soffs=3D0000102200, eoffs=3D00001041e0, flags=3D70
>      amdgpu_cs:0-806   [012] ....  1787.493129: amdgpu_vm_bo_cs:
> soffs=3D000010c1e0, eoffs=3D000010c2e1, flags=3D70
>      amdgpu_cs:0-806   [012] ....  1787.493131: drm_sched_job:
> entity=3D00000000406345a7, id=3D10239, fence=3D000000007a120377, ring=
=3Dgfx, job
> count:8, hw job count:0
>=20
> And later in the file you can find:
> ------------------------------------------------------
> crash detected!
>=20
> executing umr -O halt_waves -wa
> No active waves!
>=20
> executing umr -O verbose -R gfx[.]
>=20
> polaris11.gfx.rptr =3D=3D 512
> polaris11.gfx.wptr =3D=3D 512
> polaris11.gfx.drv_wptr =3D=3D 512
> polaris11.gfx.ring[ 481] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 482] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 483] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 484] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 485] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 486] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 487] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 488] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 489] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 490] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 491] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 492] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 493] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 494] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 495] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 496] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 497] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 498] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 499] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 500] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 501] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 502] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 503] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 504] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 505] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 506] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 507] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 508] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 509] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 510] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 511] =3D=3D 0xffff1000    ...=20
> polaris11.gfx.ring[ 512] =3D=3D 0xc0032200    rwD=20
>=20
>=20
> trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
> trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
>=20
> done after crash.
> -------------------------------------------
>=20
> So even without GPU reset, still no "waves". And the error m=
essage also does
> not state any VM fault address.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15349483820.99a31a.9759-- --===============1933027203== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1933027203==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 22 Aug 2018 22:18:11 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0621500812==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 0D6DF6E45A for ; Wed, 22 Aug 2018 22:18:11 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0621500812== Content-Type: multipart/alternative; boundary="15349762910.e4fA.2619" Content-Transfer-Encoding: 7bit --15349762910.e4fA.2619 Date: Wed, 22 Aug 2018 22:18:10 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #61 from dwagner --- > Please use amdgpu.vm_update_mode=3D3 to get back to VM_FAULTs issue. The "good" news is that reproduction of the crashes with 3-fps-video-replay= is very quick when using amdgpu.vm_update_mode=3D3. But the bad news is that I have not been able to get useful error output wh= en using vm_update_mode=3D3. At first I tried with also amdgpu.vm_debug=3D1, and with that in 10 crashes= not a single error output line was emitted to either the ssh channel or the system journal. I then tried with amdgpu.vm_debug=3D0, and while a few error lines output b= ecome logged, then, not quite anything useful - see also in attached example: [ 912.447139] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=3D12818, emitted seq=3D12819 [ 912.447145] [drm] GPU recovery disabled. These are the only lines indicating the error, not even the echo "crash detected!" after the "dmesg -w | tee /dev/tty | grep -m 1 -e "amdgpu.*GPU" -e "amdgpu.*ERROR" gets emitted, much less the theoretically following umr commands. What could I do to not let the kernel die so quickly when using amdgpu.vm_update_mode=3D3? --=20 You are receiving this mail because: You are the assignee for the bug.= --15349762910.e4fA.2619 Date: Wed, 22 Aug 2018 22:18:11 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 61 on bug 10232= 2 from dwagner
> Please use amdgpu.vm_update_mode=3D=
3 to get back to VM_FAULTs issue.

The "good" news is that reproduction of the crashes with 3-fps-vi=
deo-replay is
very quick when using amdgpu.vm_update_mode=3D3.

But the bad news is that I have not been able to get useful error output wh=
en
using vm_update_mode=3D3.

At first I tried with also amdgpu.vm_debug=3D1, and with that in 10 crashes=
 not a
single error output line was emitted to either the ssh channel or the system
journal.

I then tried with amdgpu.vm_debug=3D0, and while a few error lines output b=
ecome
logged, then, not quite anything useful - see also in attached example:

[  912.447139] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=3D12818, emitted seq=3D12819
[  912.447145] [drm] GPU recovery disabled.

These are the only lines indicating the error, not even the
 echo "crash detected!"
after the
 "dmesg -w | tee /dev/tty | grep -m 1 -e "amdgpu.*GPU" -e &q=
uot;amdgpu.*ERROR"
gets emitted, much less the theoretically following umr commands.

What could I do to not let the kernel die so quickly when using
amdgpu.vm_update_mode=3D3?


You are receiving this mail because:
  • You are the assignee for the bug.
= --15349762910.e4fA.2619-- --===============0621500812== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0621500812==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 22 Aug 2018 22:18:49 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0980563234==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 4FF5C8985A for ; Wed, 22 Aug 2018 22:18:49 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0980563234== Content-Type: multipart/alternative; boundary="15349763292.0AEAaf.2619" Content-Transfer-Encoding: 7bit --15349763292.0AEAaf.2619 Date: Wed, 22 Aug 2018 22:18:49 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #62 from dwagner --- Created attachment 141243 --> https://bugs.freedesktop.org/attachment.cgi?id=3D141243&action=3Dedit crash trace with amdgpu.vm_update_mode=3D3 --=20 You are receiving this mail because: You are the assignee for the bug.= --15349763292.0AEAaf.2619 Date: Wed, 22 Aug 2018 22:18:49 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 62 on bug 10232= 2 from dwagner
Created attachment 141243 [details]=

crash trace with amdgpu.vm_update_mode=3D3


You are receiving this mail because:
  • You are the assignee for the bug.
= --15349763292.0AEAaf.2619-- --===============0980563234== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0980563234==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 19 Sep 2018 23:35:10 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2123058253==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 459D86E527 for ; Wed, 19 Sep 2018 23:35:12 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============2123058253== Content-Type: multipart/alternative; boundary="15374001121.4efE74.23799" Content-Transfer-Encoding: 7bit --15374001121.4efE74.23799 Date: Wed, 19 Sep 2018 23:35:12 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #63 from Anthony Ruhier --- FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have be= en fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed. --=20 You are receiving this mail because: You are the assignee for the bug.= --15374001121.4efE74.23799 Date: Wed, 19 Sep 2018 23:35:12 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 63 on bug 10232= 2 from Anthony Ruhier
FYI, I also had this bug under linux 4.17 and 4.18, but it see=
ms to have been
fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15374001121.4efE74.23799-- --===============2123058253== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============2123058253==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 19 Sep 2018 23:35:42 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1988630973==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id E5B316E53E for ; Wed, 19 Sep 2018 23:35:42 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1988630973== Content-Type: multipart/alternative; boundary="15374001423.9b69f.22934" Content-Transfer-Encoding: 7bit --15374001423.9b69f.22934 Date: Wed, 19 Sep 2018 23:35:42 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #64 from Anthony Ruhier --- (In reply to Anthony Ruhier from comment #63) > FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have > been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed. Forgot to say that I have a vega 64. --=20 You are receiving this mail because: You are the assignee for the bug.= --15374001423.9b69f.22934 Date: Wed, 19 Sep 2018 23:35:42 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 64 on bug 10232= 2 from Anthony Ruhier
(In reply to Anthony Ruhier from comment #63)
> FYI, I also had this bug under linux 4.17 and 4.=
18, but it seems to have
> been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixe=
d.

Forgot to say that I have a vega 64.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15374001423.9b69f.22934-- --===============1988630973== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1988630973==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 23 Sep 2018 22:04:23 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1223850988==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 8F3BE6E174 for ; Sun, 23 Sep 2018 22:04:23 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1223850988== Content-Type: multipart/alternative; boundary="15377402634.E0d9AA5Ab.25335" Content-Transfer-Encoding: 7bit --15377402634.E0d9AA5Ab.25335 Date: Sun, 23 Sep 2018 22:04:23 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #65 from dwagner --- (In reply to Anthony Ruhier from comment #63) > FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have > been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed. Unluckily, I cannot confirm either observation: The current amd-staging-drm-next git head still crashes on me quickly, still well reproduceable with the 3-fps-video-replay test. And going into S3 suspend does not work for me with the current amd-staging-drm-next either. --=20 You are receiving this mail because: You are the assignee for the bug.= --15377402634.E0d9AA5Ab.25335 Date: Sun, 23 Sep 2018 22:04:23 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 65 on bug 10232= 2 from dwagner
(In reply to Anthony Ruhier from comment #63)
> FYI, I also had this bug under linux 4.17 and 4.=
18, but it seems to have
> been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixe=
d.

Unluckily, I cannot confirm either observation: The current
amd-staging-drm-next git head still crashes on me quickly, still well
reproduceable with the 3-fps-video-replay test.

And going into S3 suspend does not work for me with the current
amd-staging-drm-next either.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15377402634.E0d9AA5Ab.25335-- --===============1223850988== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1223850988==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sun, 23 Sep 2018 23:42:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0910041709==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 5AA236E1C9 for ; Sun, 23 Sep 2018 23:42:52 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0910041709== Content-Type: multipart/alternative; boundary="15377461725.e6c25331.11978" Content-Transfer-Encoding: 7bit --15377461725.e6c25331.11978 Date: Sun, 23 Sep 2018 23:42:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #66 from Anthony Ruhier --- (In reply to dwagner from comment #65) > (In reply to Anthony Ruhier from comment #63) > > FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have > > been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed. >=20 > Unluckily, I cannot confirm either observation: The current > amd-staging-drm-next git head still crashes on me quickly, still well > reproduceable with the 3-fps-video-replay test. >=20 > And going into S3 suspend does not work for me with the current > amd-staging-drm-next either. Last time I tested, amd-staging-drm-next seemed to be based on 4.19-rc1, on which I had the issue too. I switched to vanilla 4.19-rc4 (now -rc5) and it= was fixed. --=20 You are receiving this mail because: You are the assignee for the bug.= --15377461725.e6c25331.11978 Date: Sun, 23 Sep 2018 23:42:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 66 on bug 10232= 2 from Anthony Ruhier
(In reply to dwagner from comment #65)
> (In reply to Anthony Ruhier from comment #63)
> > FYI, I also had this bug under linux 4.17 and 4.18, but it seems =
to have
> > been fixed in 4.19-rc3. The suspend/hibernate issue has also been=
 fixed.
>=20
> Unluckily, I cannot confirm either observation: The current
> amd-staging-drm-next git head still crashes on me quickly, still well
> reproduceable with the 3-fps-video-replay test.
>=20
> And going into S3 suspend does not work for me with the current
> amd-staging-drm-next either.

Last time I tested, amd-staging-drm-next seemed to be based on 4.19-rc1, on
which I had the issue too. I switched to vanilla 4.19-rc4 (now -rc5) and it=
 was
fixed.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15377461725.e6c25331.11978-- --===============0910041709== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0910041709==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 25 Sep 2018 12:11:29 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0442366989==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 172236E3A1 for ; Tue, 25 Sep 2018 12:11:30 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0442366989== Content-Type: multipart/alternative; boundary="15378774890.be3a274.8631" Content-Transfer-Encoding: 7bit --15378774890.be3a274.8631 Date: Tue, 25 Sep 2018 12:11:29 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #67 from Roshless --- Tried on 4.19-rc5, still crashes for me after about 2-3 days (of 6-12h use) --=20 You are receiving this mail because: You are the assignee for the bug.= --15378774890.be3a274.8631 Date: Tue, 25 Sep 2018 12:11:29 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 67 on bug 10232= 2 from = Roshless
Tried on 4.19-rc5, still crashes for me after about 2-3 days (=
of 6-12h use)


You are receiving this mail because:
  • You are the assignee for the bug.
= --15378774890.be3a274.8631-- --===============0442366989== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0442366989==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 14 Nov 2018 00:23:15 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0095406626==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 4A80F6E417 for ; Wed, 14 Nov 2018 00:23:16 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0095406626== Content-Type: multipart/alternative; boundary="15421549963.481599B.6448" Content-Transfer-Encoding: 7bit --15421549963.481599B.6448 Date: Wed, 14 Nov 2018 00:23:16 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #68 from dwagner --- Tested today's current amd-staging-drm-next git head, to see if there has b= een any improvement over the last two months. The bad news: The 3-fps-video-replay test still crashes the driver reproduc= ably after few minutes, as long as the default automatic power management is act= ive. The mediocre news: At least it looks as if the linux kernel now survives the driver crash to some extent, I found messages in the journal like this: Nov 14 00:59:36 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng sdma0 timeout, signaled seq=3D22008, emitted seq=3D22010 Nov 14 00:59:36 ryzen kernel: [drm] GPU recovery disabled. Nov 14 00:59:37 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng sdma1 timeout, signaled seq=3D107, emitted seq=3D109 Nov 14 00:59:37 ryzen kernel: [drm] GPU recovery disabled. Nov 14 00:59:40 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng sdma0 timeout, signaled seq=3D22008, emitted seq=3D22010 Nov 14 00:59:40 ryzen kernel: [drm] GPU recovery disabled. Nov 14 00:59:41 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng sdma1 timeout, signaled seq=3D107, emitted seq=3D109 ... and so on repeating for several minutes after the screen went blank. Will test tomorrow if this means I can now collect the diagnostics outputs = that were asked for earlier. Some good news: S3 suspends/resumes are working fine right now. There are s= ome scary messages emitted upon resume, but they do not seem to have bad consequences: [ 281.465654] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read E= DID [ 281.490719] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read E= DID [ 282.006225] [drm] Fence fallback timer expired on ring sdma0 [ 282.512879] [drm] Fence fallback timer expired on ring sdma0 [ 282.556651] [drm] UVD and UVD ENC initialized successfully. [ 282.657771] [drm] VCE initialized successfully. --=20 You are receiving this mail because: You are the assignee for the bug.= --15421549963.481599B.6448 Date: Wed, 14 Nov 2018 00:23:16 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 68 on bug 10232= 2 from dwagner
Tested today's current amd-staging-drm-next git head, to see i=
f there has been
any improvement over the last two months.

The bad news: The 3-fps-video-replay test still crashes the driver reproduc=
ably
after few minutes, as long as the default automatic power management is act=
ive.

The mediocre news: At least it looks as if the linux kernel now survives the
driver crash to some extent, I found messages in the journal like this:

Nov 14 00:59:36 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
sdma0 timeout, signaled seq=3D22008, emitted seq=3D22010
Nov 14 00:59:36 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:37 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
sdma1 timeout, signaled seq=3D107, emitted seq=3D109
Nov 14 00:59:37 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:40 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
sdma0 timeout, signaled seq=3D22008, emitted seq=3D22010
Nov 14 00:59:40 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:41 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
sdma1 timeout, signaled seq=3D107, emitted seq=3D109

... and so on repeating for several minutes after the screen went blank.

Will test tomorrow if this means I can now collect the diagnostics outputs =
that
were asked for earlier.

Some good news: S3 suspends/resumes are working fine right now. There are s=
ome
scary messages emitted upon resume, but they do not seem to have bad
consequences:

[  281.465654] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read E=
DID
[  281.490719] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read E=
DID
[  282.006225] [drm] Fence fallback timer expired on ring sdma0
[  282.512879] [drm] Fence fallback timer expired on ring sdma0
[  282.556651] [drm] UVD and UVD ENC initialized successfully.
[  282.657771] [drm] VCE initialized successfully.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15421549963.481599B.6448-- --===============0095406626== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0095406626==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 15 Nov 2018 23:37:57 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0654535591==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 006676E532 for ; Thu, 15 Nov 2018 23:37:57 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0654535591== Content-Type: multipart/alternative; boundary="15423250761.3B08.19827" Content-Transfer-Encoding: 7bit --15423250761.3B08.19827 Date: Thu, 15 Nov 2018 23:37:56 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #69 from dwagner --- As promised in above comment, today I ran my debug script "gpu_debug4.sh" to obtain the diagnostic output after the crash as requested above. This output is in attached "gpu_debug4_output.txt". Since the trace output, the "dmesg -w" output and stdout are written to the same file, they are roughly chronologic. If you want to look only at the dmesg-output, use > grep '^\[' gpu_debug4_output.txt (gpu_debug4.sh is a slight variation of earlier gpu_debug3.sh, just writing= to a local log file.) BTW: I ran the script multiple times, crashes occurred after 5 to 300 secon= ds, the diagnostic output always looked like in attached gpu_debug4_output.txt. --=20 You are receiving this mail because: You are the assignee for the bug.= --15423250761.3B08.19827 Date: Thu, 15 Nov 2018 23:37:56 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 69 on bug 10232= 2 from dwagner
As promised in above comment, today I ran my debug script &quo=
t;gpu_debug4.sh" to
obtain the diagnostic output after the crash as requested above.
This output is in attached "gpu_debug4_output.txt".
Since the trace output, the "dmesg -w" output and stdout are writ=
ten to the
same file, they are roughly chronologic.

If you want to look only at the dmesg-output, use
> grep '^\[' gpu_debug4_output.txt

(gpu_debug4.sh is a slight variation of earlier gpu_debug3.sh, just writing=
 to
a local log file.)

BTW: I ran the script multiple times, crashes occurred after 5 to 300 secon=
ds,
the diagnostic output always looked like in attached gpu_debug4_output.txt.=


You are receiving this mail because:
  • You are the assignee for the bug.
= --15423250761.3B08.19827-- --===============0654535591== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0654535591==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 15 Nov 2018 23:38:29 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0235321041==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F2BC36E532 for ; Thu, 15 Nov 2018 23:38:28 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0235321041== Content-Type: multipart/alternative; boundary="15423251080.d33D979.19355" Content-Transfer-Encoding: 7bit --15423251080.d33D979.19355 Date: Thu, 15 Nov 2018 23:38:28 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #70 from dwagner --- Created attachment 142483 --> https://bugs.freedesktop.org/attachment.cgi?id=3D142483&action=3Dedit test script --=20 You are receiving this mail because: You are the assignee for the bug.= --15423251080.d33D979.19355 Date: Thu, 15 Nov 2018 23:38:28 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated


You are receiving this mail because:
  • You are the assignee for the bug.
= --15423251080.d33D979.19355-- --===============0235321041== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0235321041==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 15 Nov 2018 23:39:44 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0519291687==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 734AC6E6A8 for ; Thu, 15 Nov 2018 23:39:44 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0519291687== Content-Type: multipart/alternative; boundary="15423251846.eb6281c.19978" Content-Transfer-Encoding: 7bit --15423251846.eb6281c.19978 Date: Thu, 15 Nov 2018 23:39:44 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #71 from dwagner --- Created attachment 142484 --> https://bugs.freedesktop.org/attachment.cgi?id=3D142484&action=3Dedit gpu_debug4_output.txt.gz --=20 You are receiving this mail because: You are the assignee for the bug.= --15423251846.eb6281c.19978 Date: Thu, 15 Nov 2018 23:39:44 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 71 on bug 10232= 2 from dwagner
Created attachment 1424=
84 [details]
gpu_debug4_output.txt.gz


You are receiving this mail because:
  • You are the assignee for the bug.
= --15423251846.eb6281c.19978-- --===============0519291687== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0519291687==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 17 Dec 2018 22:56:07 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1803497117==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id F28486E732 for ; Mon, 17 Dec 2018 22:56:06 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1803497117== Content-Type: multipart/alternative; boundary="15450873660.66ae8E5.18022" Content-Transfer-Encoding: 7bit --15450873660.66ae8E5.18022 Date: Mon, 17 Dec 2018 22:56:06 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #72 from dwagner --- Just for the record, since another month has passed: I can still reproduce = the crash with today's git head of amd-staging-drm-next within minutes. (Also u= sing the very latest firmware files from https://people.freedesktop.org/~agd5f/radeon_ucode/ ) --=20 You are receiving this mail because: You are the assignee for the bug.= --15450873660.66ae8E5.18022 Date: Mon, 17 Dec 2018 22:56:06 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 72 on bug 10232= 2 from dwagner
Just for the record, since another month has passed: I can sti=
ll reproduce the
crash with today's git head of amd-staging-drm-next within minutes. (Also u=
sing
the very latest firmware files from
https://peo=
ple.freedesktop.org/~agd5f/radeon_ucode/ )


You are receiving this mail because:
  • You are the assignee for the bug.
= --15450873660.66ae8E5.18022-- --===============1803497117== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1803497117==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 22 Dec 2018 20:41:14 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0987487681==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 46A4B6E418 for ; Sat, 22 Dec 2018 20:41:15 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0987487681== Content-Type: multipart/alternative; boundary="15455112753.fd05.29868" Content-Transfer-Encoding: 7bit --15455112753.fd05.29868 Date: Sat, 22 Dec 2018 20:41:15 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #73 from J=C4=81nis Jansons --- Someone suggested I buy Ryzen 2400G APU, but almost every time some network= lag happens while watching TV stream through Kodi and FPS of that video goes to= 0, display just freezes and you have to power cycle the computer. There is no space for external graphics card in my case and I don't want the increased power consumption, so at this point I'm just considering switch to Intel CPU. I have been following this case for 4 months now with hope that it would mo= ve forward a bit but it seems stuck. I can give additional dumps and test some patches if that would help but se= ems like others have given plenty of information on how to reproduce it. --=20 You are receiving this mail because: You are the assignee for the bug.= --15455112753.fd05.29868 Date: Sat, 22 Dec 2018 20:41:15 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 73 on bug 10232= 2 from J=C4=81nis Jansons
Someone suggested I buy Ryzen 2400G APU, but almost every time=
 some network lag
happens while watching TV stream through Kodi and FPS of that video goes to=
 0,
display just freezes and you have to power cycle the computer.
There is no space for external graphics card in my case and I don't want the
increased power consumption, so at this point I'm just considering switch to
Intel CPU.

I have been following this case for 4 months now with hope that it would mo=
ve
forward a bit but it seems stuck.

I can give additional dumps and test some patches if that would help but se=
ems
like others have given plenty of information on how to reproduce it.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15455112753.fd05.29868-- --===============0987487681== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0987487681==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 24 Dec 2018 12:56:16 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1330023839==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id DEBDA6E5D9 for ; Mon, 24 Dec 2018 12:56:16 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1330023839== Content-Type: multipart/alternative; boundary="15456561765.BFB75aa91.16670" Content-Transfer-Encoding: 7bit --15456561765.BFB75aa91.16670 Date: Mon, 24 Dec 2018 12:56:16 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #74 from fin4478@hotmail.com --- The Firefox browser requires the pulseaudio driver. Use the Alsa audio and = the chrome/chromium browser. Disable hardware acceleration in browser settings. --=20 You are receiving this mail because: You are the assignee for the bug.= --15456561765.BFB75aa91.16670 Date: Mon, 24 Dec 2018 12:56:16 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 74 on bug 10232= 2 from fin4478@hotm= ail.com
The Firefox browser requires the pulseaudio driver. Use the Al=
sa audio and the
chrome/chromium browser. Disable hardware acceleration in browser settings.=


You are receiving this mail because:
  • You are the assignee for the bug.
= --15456561765.BFB75aa91.16670-- --===============1330023839== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1330023839==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 24 Dec 2018 14:49:24 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0304399217==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 4D3766E5E1 for ; Mon, 24 Dec 2018 14:49:24 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0304399217== Content-Type: multipart/alternative; boundary="15456629644.3C7eA87a.21274" Content-Transfer-Encoding: 7bit --15456629644.3C7eA87a.21274 Date: Mon, 24 Dec 2018 14:49:24 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #75 from dwagner --- Audio is unrelated to this bug. In my reproduction scripts, I do not output= any audio at all.=20 The video-at-3-fps replay that I use for reproduction seems to just trigger= a certain pattern of the memory- and shader-clocks getting increased/decreased (with dynamic power management being enabled) that makes the occurrence of = this bug likely. Any other GPU-usage pattern that triggers a lot of memory/shader clock changes seems to also increase the crash likelihood - manual use of s= ome web-browser where GPU load spikes are caused a few times per second seems t= o be also a scenario where this bug is triggered now and then. --=20 You are receiving this mail because: You are the assignee for the bug.= --15456629644.3C7eA87a.21274 Date: Mon, 24 Dec 2018 14:49:24 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 75 on bug 10232= 2 from dwagner
Audio is unrelated to this bug. In my reproduction scripts, I =
do not output any
audio at all.=20

The video-at-3-fps replay that I use for reproduction seems to just trigger=
 a
certain pattern of the memory- and shader-clocks getting increased/decreased
(with dynamic power management being enabled) that makes the occurrence of =
this
bug likely. Any other GPU-usage pattern that triggers a lot of memory/shader
clock changes seems to also increase the crash likelihood - manual use of s=
ome
web-browser where GPU load spikes are caused a few times per second seems t=
o be
also a scenario where this bug is triggered now and then.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15456629644.3C7eA87a.21274-- --===============0304399217== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0304399217==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 19 Jan 2019 17:01:52 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0946652444==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id CDD1B6E5C3 for ; Sat, 19 Jan 2019 17:01:52 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0946652444== Content-Type: multipart/alternative; boundary="15479173123.a1ddf3D9.11191" Content-Transfer-Encoding: 7bit --15479173123.a1ddf3D9.11191 Date: Sat, 19 Jan 2019 17:01:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #76 from dwagner --- Just for the record, since another month has passed: I can still reproduce = the crash with today's git head of amd-staging-drm-next within minutes. As a bonus bug, with today's git head I also get unexplainable "minimal" me= mory and shader clock values - and a doubled power consumption (12W instead of 6= W) for my default 3840x2160 60Hz display mode in comparison to last month's drm-next of the day: > cd /sys/class/drm/card0/device > xrandr --output HDMI-A-0 --mode 3840x2160 --rate 30 > echo manual >power_dpm_force_performance_level > echo 0 >pp_dpm_mclk > echo 0 >pp_dpm_sclk > grep -H \\* pp_dpm_mclk pp_dpm_sclk pp_dpm_mclk:0: 300Mhz * pp_dpm_sclk:0: 214Mhz * > xrandr --output HDMI-A-0 --mode 3840x2160 --rate 50 > echo manual >power_dpm_force_performance_level > echo 0 >pp_dpm_mclk > echo 0 >pp_dpm_sclk > grep -H \\* pp_dpm_mclk pp_dpm_sclk pp_dpm_mclk:1: 1750Mhz * pp_dpm_sclk:1: 481Mhz * > xrandr --output HDMI-A-0 --mode 3840x2160 --rate 60 > echo manual >power_dpm_force_performance_level > echo 0 >pp_dpm_mclk > echo 0 >pp_dpm_sclk > grep -H \\* pp_dpm_mclk pp_dpm_sclk pp_dpm_mclk:0: 300Mhz * pp_dpm_sclk:6: 1180Mhz * But that power consumption issue is negligible in comparison to the show-stopping crashes that are the topic of this bug report. --=20 You are receiving this mail because: You are the assignee for the bug.= --15479173123.a1ddf3D9.11191 Date: Sat, 19 Jan 2019 17:01:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 76 on bug 10232= 2 from dwagner
Just for the record, since another month has passed: I can sti=
ll reproduce the
crash with today's git head of amd-staging-drm-next within minutes.

As a bonus bug, with today's git head I also get unexplainable "minima=
l" memory
and shader clock values - and a doubled power consumption (12W instead of 6=
W)
for my default 3840x2160 60Hz display mode in comparison to last month's
drm-next of the day:

> cd /sys/class/drm/card0/device

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate=
 30
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:0: 300Mhz *
pp_dpm_sclk:0: 214Mhz *

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate=
 50
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:1: 1750Mhz *
pp_dpm_sclk:1: 481Mhz *

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate=
 60
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:0: 300Mhz *
pp_dpm_sclk:6: 1180Mhz *

But that power consumption issue is negligible in comparison to the
show-stopping crashes that are the topic of this bug report.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15479173123.a1ddf3D9.11191-- --===============0946652444== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0946652444==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 16 Feb 2019 15:06:38 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1940764894==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 0A12C89826 for ; Sat, 16 Feb 2019 15:06:39 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1940764894== Content-Type: multipart/alternative; boundary="15503295986.B60347566.31591" Content-Transfer-Encoding: 7bit --15503295986.B60347566.31591 Date: Sat, 16 Feb 2019 15:06:38 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #77 from dwagner --- Since another month has passed: I can still reproduce the crash with today's git head of amd-staging-drm-next (and an up-to-date Arch Linux) within minu= tes by replaying a video at 3 fps. Additional new bonus bugs this time: - system consistently hangs at soft-reboots if X11 was started before - system crashes immediately upon X11 start if vm_update_mode=3D3 is used - system crashes if the HDMI-connected TV is shut off while screen blanking Again, the bonus bugs are either irrelevant in comparison to the instability this report is about or have been reported already by others. --=20 You are receiving this mail because: You are the assignee for the bug.= --15503295986.B60347566.31591 Date: Sat, 16 Feb 2019 15:06:38 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 77 on bug 10232= 2 from dwagner
Since another month has passed: I can still reproduce the cras=
h with today's
git head of amd-staging-drm-next (and an up-to-date Arch Linux) within minu=
tes
by replaying a video at 3 fps.

Additional new bonus bugs this time:
- system consistently hangs at soft-reboots if X11 was started before
- system crashes immediately upon X11 start if vm_update_mode=3D3 is used
- system crashes if the HDMI-connected TV is shut off while screen blanking

Again, the bonus bugs are either irrelevant in comparison to the instability
this report is about or have been reported already by others.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15503295986.B60347566.31591-- --===============1940764894== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1940764894==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 11 Apr 2019 06:40:13 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1161088058==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 58FCA897F6 for ; Thu, 11 Apr 2019 06:40:13 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1161088058== Content-Type: multipart/alternative; boundary="15549648131.DbE19.26664" Content-Transfer-Encoding: 7bit --15549648131.DbE19.26664 Date: Thu, 11 Apr 2019 06:40:13 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #78 from Mauro Gaspari --- Hi, I am affected by similar issues too using AMDGPU drivers on linux, and I have opened another bug, before finding this. You can have a look at my findings and the workarounds I am applying. So fa= r I had good success with those, but I am interested in knowing your thoughts, recommendations, and feedback. Also if the bug I opened is a duplicate of this one, feel free to let me kn= ow and I will mark it as duplicate. https://bugs.freedesktop.org/show_bug.cgi?id=3D109955 Cheers Mauro --=20 You are receiving this mail because: You are the assignee for the bug.= --15549648131.DbE19.26664 Date: Thu, 11 Apr 2019 06:40:13 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 78 on bug 10232= 2 from = Mauro Gaspari
Hi, I am affected by similar issues too using AMDGPU drivers o=
n linux, and I
have opened another bug, before finding this.
You can have a look at my findings and the workarounds I am applying. So fa=
r I
had good success with those, but I am interested in knowing your thoughts,
recommendations, and feedback.

Also if the bug I opened is a duplicate of this one, feel free to let me kn=
ow
and I will mark it as duplicate.

https://bugs.freedesktop.org/show_bug.=
cgi?id=3D109955

Cheers
Mauro


You are receiving this mail because:
  • You are the assignee for the bug.
= --15549648131.DbE19.26664-- --===============1161088058== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1161088058==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Fri, 12 Apr 2019 22:11:37 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1732097451==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id C8940899E7 for ; Fri, 12 Apr 2019 22:11:37 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1732097451== Content-Type: multipart/alternative; boundary="155510709710.f4B560a4.21764" Content-Transfer-Encoding: 7bit --155510709710.f4B560a4.21764 Date: Fri, 12 Apr 2019 22:11:37 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #79 from Jaap Buurman --- I am also running into the same issue. I have two questions that might help tracking down why we are having issues, but not all people that are running= a Vega graphics card. 1) What is the output of the following command for you guys? cat /sys/class/drm/card0/device/vbios_version=20 I am running the following version: 113-D0500100-103 According to the techpowerup GPU bios database, this is a vega bios that was replaced two days (!) later by a new version. Perhaps issues were found that required another bios update? I might install Windows on a spare HDD and tr= y to flash my Vega to see if that changes anything. 2) Memory clocking is different for people running multiple monitors. Are you = guys also running multiple monitors by any chance? --=20 You are receiving this mail because: You are the assignee for the bug.= --155510709710.f4B560a4.21764 Date: Fri, 12 Apr 2019 22:11:37 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 79 on bug 10232= 2 from Jaap Buurman
I am also running into the same issue. I have two questions th=
at might help
tracking down why we are having issues, but not all people that are running=
 a
Vega graphics card.

1)

What is the output of the following command for you guys?

cat /sys/class/drm/card0/device/vbios_version=20

I am running the following version:

113-D0500100-103

According to the techpowerup GPU bios database, this is a vega bios that was
replaced two days (!) later by a new version. Perhaps issues were found that
required another bios update? I might install Windows on a spare HDD and tr=
y to
flash my Vega to see if that changes anything.

2)

Memory clocking is different for people running multiple monitors. Are you =
guys
also running multiple monitors by any chance?


You are receiving this mail because:
  • You are the assignee for the bug.
= --155510709710.f4B560a4.21764-- --===============1732097451== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1732097451==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Fri, 12 Apr 2019 23:00:53 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0915078502==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id CB30889A1A for ; Fri, 12 Apr 2019 23:00:53 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0915078502== Content-Type: multipart/alternative; boundary="15551100535.2361CFbBA.8535" Content-Transfer-Encoding: 7bit --15551100535.2361CFbBA.8535 Date: Fri, 12 Apr 2019 23:00:53 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #80 from dwagner --- (In reply to Jaap Buurman from comment #79) > I am also running into the same issue. I have two questions that might he= lp > tracking down why we are having issues, but not all people that are runni= ng > a Vega graphics card. As you can see from my initial description, I'm running an RX460, which uses not a "Vega", but a "Polaris 11" AMD GPU. > What is the output of the following command for you guys? >=20 > cat /sys/class/drm/card0/device/vbios_version=20 "113-BAFFIN_PRO_1606" I have not heard of any update to this from the vendor - there is just some unofficial hacked version around (which I do not use) that is said to enable some switched-off CUs. > Memory clocking is different for people running multiple monitors. Are you > guys also running multiple monitors by any chance? No, I'm using just one 3840x2160 @ 60Hz HDMI display. --=20 You are receiving this mail because: You are the assignee for the bug.= --15551100535.2361CFbBA.8535 Date: Fri, 12 Apr 2019 23:00:53 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 80 on bug 10232= 2 from dwagner
(In reply to Jaap Buurman from comment #79)
> I am also running into the same issue. I have tw=
o questions that might help
> tracking down why we are having issues, but not all people that are ru=
nning
> a Vega graphics card.

As you can see from my initial description, I'm running an RX460, which uses
not a "Vega", but a "Polaris 11" AMD GPU.

> What is the output of the following command for =
you guys?
>=20
> cat /sys/class/drm/card0/device/vbios_version 

"113-BAFFIN_PRO_1606"

I have not heard of any update to this from the vendor - there is just some
unofficial hacked version around (which I do not use) that is said to enable
some switched-off CUs.

> Memory clocking is different for people running =
multiple monitors. Are you
> guys also running multiple monitors by any chance?

No, I'm using just one 3840x2160 @ 60Hz HDMI display.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15551100535.2361CFbBA.8535-- --===============0915078502== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0915078502==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 13 Apr 2019 13:27:53 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0280736073==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 194A7892D8 for ; Sat, 13 Apr 2019 13:27:54 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0280736073== Content-Type: multipart/alternative; boundary="15551620740.449eC.10280" Content-Transfer-Encoding: 7bit --15551620740.449eC.10280 Date: Sat, 13 Apr 2019 13:27:54 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #81 from Jaap Buurman --- (In reply to Alex Deucher from comment #14) > (In reply to dwagner from comment #13) > >=20 > > Much lower shader clocks are used only if I lower the refresh rate of t= he > > screen. Is there a reason why the shader clocks should stay high even i= n the > > absence of 3d/compute load? > >=20 >=20 > Certain display requirements can cause the engine clock to be kept higher= as > well. In this bug report and another similar one (https://bugs.freedesktop.org/show_bug.cgi?id=3D109955), everybody having t= he issue seems to be using a setup that requires higher engine clocks in idle AFAIK. Either high refresh displays, or in my case, multiple monitors. Could this be part of the issue that seems to trigger this bug? I might be graspi= ng at straws here, but I have had this problem for as long as I have this Vega= 64 (bought at launch), while it is 100% stable under Windows 10 in the same se= tup. --=20 You are receiving this mail because: You are the assignee for the bug.= --15551620740.449eC.10280 Date: Sat, 13 Apr 2019 13:27:54 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 81 on bug 10232= 2 from Jaap Buurman
(In reply to Alex Deucher from comment #14)
> (In reply to dwagner from comment #13)
> >=20
> > Much lower shader clocks are used only if I lower the refresh rat=
e of the
> > screen. Is there a reason why the shader clocks should stay high =
even in the
> > absence of 3d/compute load?
> >=20
>=20
> Certain display requirements can cause the engine clock to be kept hig=
her as
> well.

In this bug report and another similar one
(https://bugs.freedesktop.org/show_bug.=
cgi?id=3D109955), everybody having the
issue seems to be using a setup that requires higher engine clocks in idle
AFAIK. Either high refresh displays, or in my case, multiple monitors. Could
this be part of the issue that seems to trigger this bug? I might be graspi=
ng
at straws here, but I have had this problem for as long as I have this Vega=
64
(bought at launch), while it is 100% stable under Windows 10 in the same se=
tup.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15551620740.449eC.10280-- --===============0280736073== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0280736073==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 03 Jun 2019 20:03:49 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1584309014==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 0FFB589330 for ; Mon, 3 Jun 2019 20:03:49 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1584309014== Content-Type: multipart/alternative; boundary="15595922280.6216.13865" Content-Transfer-Encoding: 7bit --15595922280.6216.13865 Date: Mon, 3 Jun 2019 20:03:48 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #82 from Matt Coffin --- I am also experiencing this issue. * Kernel: 5.1.3-arch2-1-ARCH * LLVM 8.0.0 * AMDVLK (dev branch pulled 20190602) * Mesa 19.0.4 * Card: XFX Radeon RX 590 I've seen this error, bug 105733, bug 105152, bug 107536, and bug 109955 all repeatable (which one each time appears to be non-deterministic) with the s= ame process. I just launch "House Flipper" from Steam (DX11 title), with DXVK 1.2.1, on either the mesa RADV or AMDVLK vulkan implementations. At 2560x1440 resolution (both 60Hz and 144Hz refresh rates), the crash(es) occur. At 1080p@60Hz, I get no crashes, but they come back if I disable v-s= ync and framerate limiting. I logged power consumption with `sensors | egrep '^power' | awk '{ print $1= " " $2; }'`, and found that the crash often occurs soon after the card hits its maximum power draw at around 190W. I don't have much experience debugging or developing software at the kernel/driver level, but I'm happy to help with providing information as I = go through the learning process here. I'll compile the amd-staging-drm-next ke= rnel later tonight and post some results and logs. Please let me know if there's more information I could provide that may be = of use here. Thanks for your hard work! --=20 You are receiving this mail because: You are the assignee for the bug.= --15595922280.6216.13865 Date: Mon, 3 Jun 2019 20:03:48 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 82 on bug 10232= 2 from Matt Coffin
I am also experiencing this issue.

* Kernel: 5.1.3-arch2-1-ARCH
* LLVM 8.0.0
* AMDVLK (dev branch pulled 20190602)
* Mesa 19.0.4
* Card: XFX Radeon RX 590

I've seen this error, bug 105733, bug 105152, bug 107536, and bug 109955 all
repeatable (which one each time appears to be non-deterministic) with the s=
ame
process.

I just launch "House Flipper" from Steam (DX11 title), with DXVK =
1.2.1, on
either the mesa RADV or AMDVLK vulkan implementations.

At 2560x1440 resolution (both 60Hz and 144Hz refresh rates), the crash(es)
occur. At 1080p@60Hz, I get no crashes, but they come back if I disable=
 v-sync
and framerate limiting.

I logged power consumption with `sensors | egrep '^power' | awk '{ print $1=
 " "
$2; }'`, and found that the crash often occurs soon after the card hits its
maximum power draw at around 190W.

I don't have much experience debugging or developing software at the
kernel/driver level, but I'm happy to help with providing information as I =
go
through the learning process here. I'll compile the amd-staging-drm-next ke=
rnel
later tonight and post some results and logs.

Please let me know if there's more information I could provide that may be =
of
use here. Thanks for your hard work!


You are receiving this mail because:
  • You are the assignee for the bug.
= --15595922280.6216.13865-- --===============1584309014== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1584309014==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Mon, 08 Jul 2019 07:51:29 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1682675652==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 81FD689C33 for ; Mon, 8 Jul 2019 07:51:29 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1682675652== Content-Type: multipart/alternative; boundary="15625722891.a0cE.7767" Content-Transfer-Encoding: 7bit --15625722891.a0cE.7767 Date: Mon, 8 Jul 2019 07:51:29 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #83 from Wilko Bartels --- (In reply to Jaap Buurman from comment #81) > issue seems to be using a setup that requires higher engine clocks in idle > AFAIK. Either high refresh displays, or in my case, multiple monitors. Co= uld > this be part of the issue that seems to trigger this bug? I might be > grasping at straws here, but I have had this problem for as long as I have > this Vega64 (bought at launch), while it is 100% stable under Windows 10 = in > the same setup. This might be true. I was running i3 with xrandr set to 144hz when the free= ze scenario began (somewhat last mont, did not "game" much before). Than switc= hed to icewm to test and issue was gone. Later when i configured icewm to also = have proper xrandr setting issue comes back. I didnt know that could be related. Will test this tonight. --=20 You are receiving this mail because: You are the assignee for the bug.= --15625722891.a0cE.7767 Date: Mon, 8 Jul 2019 07:51:29 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 83 on bug 10232= 2 from = Wilko Bartels
(In reply to Jaap Buurman from comment #81)
> issue seems to be using a setup that requires hi=
gher engine clocks in idle
> AFAIK. Either high refresh displays, or in my case, multiple monitors.=
 Could
> this be part of the issue that seems to trigger this bug? I might be
> grasping at straws here, but I have had this problem for as long as I =
have
> this Vega64 (bought at launch), while it is 100% stable under Windows =
10 in
> the same setup.

This might be true. I was running i3 with xrandr set to 144hz when the free=
ze
scenario began (somewhat last mont, did not "game" much before). =
Than switched
to icewm to test and issue was gone. Later when i configured icewm to also =
have
proper xrandr setting issue comes back. I didnt know that could be related.
Will test this tonight.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15625722891.a0cE.7767-- --===============1682675652== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1682675652==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 09 Jul 2019 07:38:25 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1080029124==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 4A02F6E061 for ; Tue, 9 Jul 2019 07:38:26 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1080029124== Content-Type: multipart/alternative; boundary="15626579063.B7921ae9.30285" Content-Transfer-Encoding: 7bit --15626579063.B7921ae9.30285 Date: Tue, 9 Jul 2019 07:38:26 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #84 from Wilko Bartels --- (In reply to Wilko Bartels from comment #83) > (In reply to Jaap Buurman from comment #81) > > issue seems to be using a setup that requires higher engine clocks in i= dle > > AFAIK. Either high refresh displays, or in my case, multiple monitors. = Could > > this be part of the issue that seems to trigger this bug? I might be > > grasping at straws here, but I have had this problem for as long as I h= ave > > this Vega64 (bought at launch), while it is 100% stable under Windows 1= 0 in > > the same setup. >=20 > This might be true. I was running i3 with xrandr set to 144hz when the > freeze scenario began (somewhat last mont, did not "game" much before). T= han > switched to icewm to test and issue was gone. Later when i configured ice= wm > to also have proper xrandr setting issue comes back. I didnt know that co= uld > be related. Will test this tonight. nevermind. it crashed on 60hz as well (once) yesterday --=20 You are receiving this mail because: You are the assignee for the bug.= --15626579063.B7921ae9.30285 Date: Tue, 9 Jul 2019 07:38:26 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 84 on bug 10232= 2 from = Wilko Bartels
(In reply to Wilko Bartels from comment #83)
> (In reply to Jaap Buurman from comment #81)
> > issue seems to be using a setup that requires higher engine clock=
s in idle
> > AFAIK. Either high refresh displays, or in my case, multiple moni=
tors. Could
> > this be part of the issue that seems to trigger this bug? I might=
 be
> > grasping at straws here, but I have had this problem for as long =
as I have
> > this Vega64 (bought at launch), while it is 100% stable under Win=
dows 10 in
> > the same setup.
>=20
> This might be true. I was running i3 with xrandr set to 144hz when the
> freeze scenario began (somewhat last mont, did not "game" mu=
ch before). Than
> switched to icewm to test and issue was gone. Later when i configured =
icewm
> to also have proper xrandr setting issue comes back. I didnt know that=
 could
> be related. Will test this tonight.

nevermind. it crashed on 60hz as well (once) yesterday


You are receiving this mail because:
  • You are the assignee for the bug.
= --15626579063.B7921ae9.30285-- --===============1080029124== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1080029124==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 09 Jul 2019 21:50:04 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0929723337==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 22AA689272 for ; Tue, 9 Jul 2019 21:50:05 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0929723337== Content-Type: multipart/alternative; boundary="15627090051.4d8Ac9Ca6.22802" Content-Transfer-Encoding: 7bit --15627090051.4d8Ac9Ca6.22802 Date: Tue, 9 Jul 2019 21:50:05 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #85 from dwagner --- (In reply to Wilko Bartels from comment #84) > nevermind. it crashed on 60hz as well (once) yesterday It sure does. This bug is now about two years old, during which amdgpu has never been stable, got worse, and every contemporary kernel, whether "offic= ial" ones or ones compiled from git heads of development trees has this very problem, which I can reproduce within minutes. I've given up hoping for a fix. I'll buy an Intel Xe GPU as soon as it hits= the shelves. --=20 You are receiving this mail because: You are the assignee for the bug.= --15627090051.4d8Ac9Ca6.22802 Date: Tue, 9 Jul 2019 21:50:05 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 85 on bug 10232= 2 from dwagner
(In reply to Wilko Bartels from comment #84)
> nevermind. it crashed on 60hz as well (once) yes=
terday

It sure does. This bug is now about two years old, during which amdgpu has
never been stable, got worse, and every contemporary kernel, whether "=
official"
ones or ones compiled from git heads of development trees has this very
problem, which I can reproduce within minutes.

I've given up hoping for a fix. I'll buy an Intel Xe GPU as soon as it hits=
 the
shelves.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15627090051.4d8Ac9Ca6.22802-- --===============0929723337== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0929723337==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Sat, 07 Sep 2019 05:42:21 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0765490935==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 73FC489FE8 for ; Sat, 7 Sep 2019 05:42:22 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0765490935== Content-Type: multipart/alternative; boundary="15678349425.AFF0.15036" Content-Transfer-Encoding: 7bit --15678349425.AFF0.15036 Date: Sat, 7 Sep 2019 05:42:22 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #86 from Paul Ezvan --- I was also impacted by this bug (amdgpu hangs on random conditions with sim= ilar messages as the one exposed) with any kernel/mesa version combination other than the ones on Debian Stretch (any other distro or using Mesa from backpo= rts would trigger those crashes). This was on a Ryzen 1700 platform with chipset B450. I had this issue with a RX480 and a RX560 (as I tried to replace the GPU in case it was faulty, I a= lso replace the motherboard). I was still impacted with Fedora 30 with recurring GPU hangs. Then I replac= ed the CPU/motherboard with a Core i7-9700k/Z390 platform. Since then I did not have a single GPU hang on Fedora 30. My hypothesis on this problem not being easily reproducible is that it would happen only on specific GPU/CPU combinations. --=20 You are receiving this mail because: You are the assignee for the bug.= --15678349425.AFF0.15036 Date: Sat, 7 Sep 2019 05:42:22 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 86 on bug 10232= 2 from Paul Ezvan
I was also impacted by this bug (amdgpu hangs on random condit=
ions with similar
messages as the one exposed) with any kernel/mesa version combination other
than the ones on Debian Stretch (any other distro or using Mesa from backpo=
rts
would trigger those crashes).
This was on a Ryzen 1700 platform with chipset B450. I had this issue with a
RX480 and a RX560 (as I tried to replace the GPU in case it was faulty, I a=
lso
replace the motherboard).

I was still impacted with Fedora 30 with recurring GPU hangs. Then I replac=
ed
the CPU/motherboard with a Core i7-9700k/Z390 platform. Since then I did not
have a single GPU hang on Fedora 30.

My hypothesis on this problem not being easily reproducible is that it would
happen only on specific GPU/CPU combinations.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15678349425.AFF0.15036-- --===============0765490935== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0765490935==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 12 Sep 2019 23:09:47 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0639816094==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 1B3C26EE5C for ; Thu, 12 Sep 2019 23:09:47 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0639816094== Content-Type: multipart/alternative; boundary="15683297871.8Dc3.29441" Content-Transfer-Encoding: 7bit --15683297871.8Dc3.29441 Date: Thu, 12 Sep 2019 23:09:47 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #87 from dwagner --- (In reply to Paul Ezvan from comment #86) > My hypothesis on this problem not being easily reproducible is that it wo= uld > happen only on specific GPU/CPU combinations. ... and at least a specific operating system (Linux) and a specific driver (amdgpu with dc=3D1). If your hypothesis was true - do you suggest everyone plagued by this bug j= ust buys a new main-board and an Intel CPU to evade it? Since my Ryzen system is perfectly stable when used as a server, not displa= ying anything but the text console, I'm inclined to rather keep my main-board and CPU and just exchange the GPU for another brand that comes with stable driv= ers. --=20 You are receiving this mail because: You are the assignee for the bug.= --15683297871.8Dc3.29441 Date: Thu, 12 Sep 2019 23:09:47 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 87 on bug 10232= 2 from dwagner
(In reply to Paul Ezvan from comment #86)
> My hypothesis on this problem not being easily r=
eproducible is that it would
> happen only on specific GPU/CPU combinations.

... and at least a specific operating system (Linux) and a specific driver
(amdgpu with dc=3D1).

If your hypothesis was true - do you suggest everyone plagued by this bug j=
ust
buys a new main-board and an Intel CPU to evade it?

Since my Ryzen system is perfectly stable when used as a server, not displa=
ying
anything but the text console, I'm inclined to rather keep my main-board and
CPU and just exchange the GPU for another brand that comes with stable driv=
ers.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15683297871.8Dc3.29441-- --===============0639816094== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0639816094==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Wed, 25 Sep 2019 21:37:12 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0078018959==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id B145D7AC68 for ; Wed, 25 Sep 2019 21:37:12 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0078018959== Content-Type: multipart/alternative; boundary="15694474322.95D76FD.18131" Content-Transfer-Encoding: 7bit --15694474322.95D76FD.18131 Date: Wed, 25 Sep 2019 21:37:12 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #88 from jeroenimo --- Found this thread while googling the error from the log. AMD Ryzen 3600 Asrock B350 motherboard ASrock RX560 Radeon GPU Ubuntu and Xubuntu 18.04 and 19.04 both lockups so not useable, after login almost imminent black screen, ssh access still possible. Seems a newer kern= el and mesa drivers. sometimes 5 min , sometimes after 2 secomds Linux mint 19.2 Seems a lot more stable but so far only 1 lockup with black screen uname -a Linux jeroenimo-amd 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux Last log from mint: Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207322] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:43:crtc-0] flip_done timed out Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207331] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:45:crtc-1] flip_done timed out Sep 25 23:02:07 jeroenimo-amd kernel: [ 4990.451366] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:43:crtc-0] flip_done timed out I suspect I'm in the same trouble as most. Win 10 flawless so it's really software.. --=20 You are receiving this mail because: You are the assignee for the bug.= --15694474322.95D76FD.18131 Date: Wed, 25 Sep 2019 21:37:12 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 88 on bug 10232= 2 from jeroenimo
Found this thread while googling the error from the log.

AMD Ryzen 3600
Asrock B350 motherboard
ASrock RX560 Radeon GPU


Ubuntu and Xubuntu  18.04 and 19.04 both lockups so not useable, after login
almost imminent black screen, ssh access still possible. Seems a newer kern=
el
and mesa drivers. sometimes 5 min , sometimes after 2 secomds

Linux mint 19.2
Seems a lot more stable but so far only  1 lockup with black screen

uname -a
Linux jeroenimo-amd 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux


Last log from mint:

Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207322]
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:43:crtc-0] flip_done timed out
Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207331]
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:45:crtc-1] flip_done timed out
Sep 25 23:02:07 jeroenimo-amd kernel: [ 4990.451366]
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[CRTC:43:crtc-0] flip_done timed out

 I suspect I'm in the same trouble as most.

Win 10 flawless so it's really software..


You are receiving this mail because:
  • You are the assignee for the bug.
= --15694474322.95D76FD.18131-- --===============0078018959== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0078018959==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 26 Sep 2019 08:35:24 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1456458940==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id F24156ED4C for ; Thu, 26 Sep 2019 08:35:24 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1456458940== Content-Type: multipart/alternative; boundary="15694869245.071C709E.30895" Content-Transfer-Encoding: 7bit --15694869245.071C709E.30895 Date: Thu, 26 Sep 2019 08:35:24 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #89 from jeroenimo --- I found a way to crash the system with glmark2 It almost instantly crashes it. --=20 You are receiving this mail because: You are the assignee for the bug.= --15694869245.071C709E.30895 Date: Thu, 26 Sep 2019 08:35:24 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 89 on bug 10232= 2 from jeroenimo
I found a way to crash the system with glmark2
It almost instantly crashes it.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15694869245.071C709E.30895-- --===============1456458940== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1456458940==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Thu, 26 Sep 2019 12:29:04 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1935622349==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 0405F6ED04 for ; Thu, 26 Sep 2019 12:29:04 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1935622349== Content-Type: multipart/alternative; boundary="15695009431.2fde.9640" Content-Transfer-Encoding: 7bit --15695009431.2fde.9640 Date: Thu, 26 Sep 2019 12:29:03 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #90 from jeroenimo --- I managed to run glmark2 without crashing the system with=20 By running the card manual at lowest frequency from root shell: echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level echo 0 > /sys/class/drm/card0/device/pp_dpm_sclk root@jeroenimo-amd:/home/jeroen# cat /sys/class/drm/card0/device/pp_dpm_scl= k=20 0: 214Mhz * 1: 387Mhz=20 2: 843Mhz=20 3: 995Mhz=20 4: 1062Mhz=20 5: 1108Mhz=20 6: 1149Mhz=20 7: 1176Mhz=20 root@jeroenimo-amd:/home/jeroen#=20 If I go to higher e.g. 2: 843Mhz I manage to crash it.. although it takes a while before it crashes.=20 when I force the card to anything above 4 I get an immediate crash without = even starting glmark2 I hope this helps! --=20 You are receiving this mail because: You are the assignee for the bug.= --15695009431.2fde.9640 Date: Thu, 26 Sep 2019 12:29:03 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 90 on bug 10232= 2 from jeroenimo
I managed to run glmark2 without crashing the system with=20

By running the card manual at lowest frequency

from root shell:
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_le=
vel
echo 0 > /sys/class/drm/card0/device/pp_dpm_sclk

root@jeroenimo-amd:/home/jeroen# cat /sys/class/drm/card0/device/pp_dpm=
_sclk=20
0: 214Mhz *
1: 387Mhz=20
2: 843Mhz=20
3: 995Mhz=20
4: 1062Mhz=20
5: 1108Mhz=20
6: 1149Mhz=20
7: 1176Mhz=20
root@jeroenimo-amd:/home/jeroen#=20

If I go to higher e.g. 2: 843Mhz I manage to crash it.. although it takes a
while before it crashes.=20

when I force the card to anything above 4 I get an immediate crash without =
even
starting glmark2

I hope this helps!


You are receiving this mail because:
  • You are the assignee for the bug.
= --15695009431.2fde.9640-- --===============1935622349== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============1935622349==-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 19 Nov 2019 08:22:31 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0939758530==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 686056E2A9 for ; Tue, 19 Nov 2019 08:22:31 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0939758530== Content-Type: multipart/alternative; boundary="15741517513.E3D6fEBF.14840" Content-Transfer-Encoding: 7bit --15741517513.E3D6fEBF.14840 Date: Tue, 19 Nov 2019 08:22:31 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 Martin Peres changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |MOVED --- Comment #91 from Martin Peres --- -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this = link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/226. --=20 You are receiving this mail because: You are the assignee for the bug.= --15741517513.E3D6fEBF.14840 Date: Tue, 19 Nov 2019 08:22:31 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated <= span class=3D"fn">Martin Peres changed bug 10232= 2
What Removed Added
Status NEW RESOLVED
Resolution --- MOVED

Comme= nt # 91 on bug 10232= 2 from Martin Peres
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this =
link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/226.


You are receiving this mail because:
  • You are the assignee for the bug.
= --15741517513.E3D6fEBF.14840-- --===============0939758530== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVs --===============0939758530==--