From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105251] [Vega10] GPU lockup on boot: VMC page fault Date: Fri, 01 Jun 2018 17:31:30 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1626546139==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 6E3C96E74C for ; Fri, 1 Jun 2018 17:31:30 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1626546139== Content-Type: multipart/alternative; boundary="15278742903.acd08.27781" Content-Transfer-Encoding: 7bit --15278742903.acd08.27781 Date: Fri, 1 Jun 2018 17:31:30 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105251 --- Comment #6 from dxxf@volny.cz --- It seems I'm now affected by this bug too... Hardware: GPU: RX Vega 64 Liquid CPU: Ryzen R7 1800X Software: OS: OpenSUSE Tumbleweed Kernel: 4.17rc5 (from OpenSUSE Factory repos) Mesa: 18.1.0 (from OpenSUSE Tumbleweed repos) Kernel log - "journalctl -b -1 -r | grep amdgpu": May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D2, last emitted seq=3D3 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x001013BD May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: at page 0x00000005000c0000 f= rom 27 May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i= d:0 ring:222 vmid:1 pasid:32768) May 31 20:35:48 kernel: [drm] Initialized amdgpu 3.25.0 20150101 for 0000:0d:00.0 on minor 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 17(vce2) uses VM inv eng = 11 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 16(vce1) uses VM inv eng = 10 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 15(vce0) uses VM inv eng = 9 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 14(uvd_enc1) uses VM inv = eng 8 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 13(uvd_enc0) uses VM inv = eng 7 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 12(uvd) uses VM inv eng 6= on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 11(sdma1) uses VM inv eng= 5 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 10(sdma0) uses VM inv eng= 4 on hub 1 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 9(kiq_2.1.0) uses VM inv = eng 13 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 8(comp_1.3.1) uses VM inv= eng 12 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 7(comp_1.2.1) uses VM inv= eng 11 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 6(comp_1.1.1) uses VM inv= eng 10 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 5(comp_1.0.1) uses VM inv= eng 9 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 4(comp_1.3.0) uses VM inv= eng 8 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 3(comp_1.2.0) uses VM inv= eng 7 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 2(comp_1.1.0) uses VM inv= eng 6 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 1(comp_1.0.0) uses VM inv= eng 5 on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 0(gfx) uses VM inv eng 4 = on hub 0 May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: fb0: amdgpudrmfb frame buffer device May 31 20:35:48 kernel: fbcon: amdgpudrmfb (fb0) is primary device May 31 20:35:47 kernel: [drm] amdgpu: 8176M of GTT memory ready. May 31 20:35:47 kernel: [drm] amdgpu: 8176M of VRAM memory ready May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: GTT: 512M 0x000000F600000000 - 0x000000F61FFFFFFF May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: VRAM: 8176M 0x000000F400000000= - 0x000000F5FEFFFFFF (8176M used) May 31 20:35:47 kernel: [drm] add ip block number 6 May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: enabling device (0006 -> 0007) May 31 20:35:47 kernel: fb: switching to amdgpudrmfb from EFI VGA May 31 20:35:47 kernel: [drm] amdgpu kernel modesetting enabled. VMC Page faults are now in the log always, but "amdgpu_job_timeout" is=20 persistent: May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=3D2, last emitted seq=3D3 --=20 You are receiving this mail because: You are the assignee for the bug.= --15278742903.acd08.27781 Date: Fri, 1 Jun 2018 17:31:30 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Commen= t # 6 on bug 10525= 1 from dxxf@volny.cz
It seems I'm now affected by this bug too...

Hardware:
GPU: RX Vega 64 Liquid
CPU: Ryzen R7 1800X

Software:
OS: OpenSUSE Tumbleweed
Kernel: 4.17rc5 (from OpenSUSE Factory repos)
Mesa: 18.1.0 (from OpenSUSE Tumbleweed repos)

Kernel log - "journalctl -b -1 -r | grep amdgpu":
May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=3D2, last emitted seq=3D3
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x001013BD
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0:   at page 0x00000005000c0000 f=
rom
27
May 31 20:37:54 kernel: amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_i=
d:0
ring:222 vmid:1 pasid:32768)
May 31 20:35:48 kernel: [drm] Initialized amdgpu 3.25.0 20150101 for
0000:0d:00.0 on minor 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 17(vce2) uses VM inv eng =
11
on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 16(vce1) uses VM inv eng =
10
on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 15(vce0) uses VM inv eng =
9 on
hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 14(uvd_enc1) uses VM inv =
eng
8 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 13(uvd_enc0) uses VM inv =
eng
7 on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 12(uvd) uses VM inv eng 6=
 on
hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 11(sdma1) uses VM inv eng=
 5
on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 10(sdma0) uses VM inv eng=
 4
on hub 1
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 9(kiq_2.1.0) uses VM inv =
eng
13 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 8(comp_1.3.1) uses VM inv=
 eng
12 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 7(comp_1.2.1) uses VM inv=
 eng
11 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 6(comp_1.1.1) uses VM inv=
 eng
10 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 5(comp_1.0.1) uses VM inv=
 eng
9 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 4(comp_1.3.0) uses VM inv=
 eng
8 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 3(comp_1.2.0) uses VM inv=
 eng
7 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 2(comp_1.1.0) uses VM inv=
 eng
6 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 1(comp_1.0.0) uses VM inv=
 eng
5 on hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: ring 0(gfx) uses VM inv eng 4 =
on
hub 0
May 31 20:35:48 kernel: amdgpu 0000:0d:00.0: fb0: amdgpudrmfb frame buffer
device
May 31 20:35:48 kernel: fbcon: amdgpudrmfb (fb0) is primary device
May 31 20:35:47 kernel: [drm] amdgpu: 8176M of GTT memory ready.
May 31 20:35:47 kernel: [drm] amdgpu: 8176M of VRAM memory ready
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: GTT: 512M 0x000000F600000000 -
0x000000F61FFFFFFF
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: VRAM: 8176M 0x000000F400000000=
 -
0x000000F5FEFFFFFF (8176M used)
May 31 20:35:47 kernel: [drm] add ip block number 6 <gfx_v9_0>
May 31 20:35:47 kernel: amdgpu 0000:0d:00.0: enabling device (0006 -> 00=
07)
May 31 20:35:47 kernel: fb: switching to amdgpudrmfb from EFI VGA
May 31 20:35:47 kernel: [drm] amdgpu kernel modesetting enabled.

VMC Page faults are now in the log always, but "amdgpu_job_timeout&quo=
t; is=20
persistent:
May 31 20:38:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=3D2, last emitted seq=3D3


You are receiving this mail because:
  • You are the assignee for the bug.
= --15278742903.acd08.27781-- --===============1626546139== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1626546139==--