From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! Date: Tue, 26 Jun 2018 15:20:45 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1679670518==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 514E86E5AA for ; Tue, 26 Jun 2018 15:20:45 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1679670518== Content-Type: multipart/alternative; boundary="15300264450.b3E37c.31284" Content-Transfer-Encoding: 7bit --15300264450.b3E37c.31284 Date: Tue, 26 Jun 2018 15:20:45 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D102322 --- Comment #8 from Andrey Grodzovsky --- (In reply to dwagner from comment #7) > (In reply to Andrey Grodzovsky from comment #6) > > Verify you are using latest AMD firmware and up to date MESA/LLVM >=20 > Firmware: >=20 > pacman -Q linux-firmware > linux-firmware 20180606.d114732-1 >=20 > ll /usr/lib/firmware/amdgpu/vega10_vce.bin > -rw-r--r-- 1 root root 165344 Jun 7 08:01 > /usr/lib/firmware/amdgpu/vega10_vce.bin >=20 >=20 > MESA: >=20 > pacman -Q mesa > mesa 18.1.2-1 >=20 >=20 > LLVM: > pacman -Q llvm-libs > llvm-libs 6.0.0-4 >=20 > Is this new enough? The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7. The firmware also looks pretty late but I still would advise to manually override all firmware files with files from here https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git= /tree/amdgpu Just backup your existing firmware/amdgpu folder for any case. >=20 >=20 > BTW: In a forum somebody asked what the dmesg output on crash looked like= if > I enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of out= put, > but still a fatal system crash: >=20 > Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* > ring gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279 > Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung! > Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung! > Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin! > Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done > [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out > Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies > [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out > Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies > [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out It's a know issue, try the patch I attached to resolve the deadlock , but y= ou will probably experience other failures after that anyway.=20 Andrey --=20 You are receiving this mail because: You are the assignee for the bug.= --15300264450.b3E37c.31284 Date: Tue, 26 Jun 2018 15:20:45 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 8 on bug 10232= 2 from Andrey Grodzovsky
(In reply to dwagner from comment #7)
> (In reply to Andrey Grodzovsky from comment #6)
> > Verify you are using latest AMD firmware and up to date MESA/LLVM
>=20
> Firmware:
>=20
> pacman -Q linux-firmware
> linux-firmware 20180606.d114732-1
>=20
> ll  /usr/lib/firmware/amdgpu/vega10_vce.bin
> -rw-r--r-- 1 root root 165344 Jun  7 08:01
> /usr/lib/firmware/amdgpu/vega10_vce.bin
>=20
>=20
> MESA:
>=20
> pacman -Q mesa
> mesa 18.1.2-1
>=20
>=20
> LLVM:
> pacman -Q llvm-libs
> llvm-libs 6.0.0-4
>=20
> Is this new enough?

The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7.
The firmware also looks pretty late but I still would advise to manually
override all firmware files with files from here
https://git.kernel.org/pub/scm/linux/kernel/git/fi=
rmware/linux-firmware.git/tree/amdgpu
Just backup your existing firmware/amdgpu folder for any case.

>=20
>=20
> BTW: In a forum somebody asked what the dmesg output on crash looked l=
ike if
> I enabled amdgpu.gpu_recovery=3D1 - the result is a few lines more of =
output,
> but still a fatal system crash:
>=20
> Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERRO=
R*
> ring gfx timeout, last signaled seq=3D12277, last emitted seq=3D12279
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependen=
cies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependen=
cies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out

It's a know issue, try the patch I attached to resolve the deadlock , but y=
ou
will probably experience other failures after that anyway.=20

Andrey


You are receiving this mail because:
  • You are the assignee for the bug.
= --15300264450.b3E37c.31284-- --===============1679670518== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1679670518==--