All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
@ 2017-08-20 22:53 bugzilla-daemon
  2017-11-19 16:40 ` bugzilla-daemon
                   ` (90 more replies)
  0 siblings, 91 replies; 92+ messages in thread
From: bugzilla-daemon @ 2017-08-20 22:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1845 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

            Bug ID: 102322
           Summary: System crashes after "[drm] IP block:gmc_v8_0 is
                    hung!" / [drm] IP block:sdma_v3_0 is hung!
           Product: DRI
           Version: DRI git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: critical
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: jb5sgc1n.nya@20mm.eu

I consistently experience complete system crashes when browsing web pages using
firefox for about 30 minutes, with the following dmesg output from the amdgpu
driver:

[ 2330.720711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
last signaled seq=40778, last emitted seq=40780
[ 2330.720768] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout,
last signaled seq=31305, last emitted seq=31306
[ 2330.720771] [drm] IP block:gmc_v8_0 is hung!
[ 2330.720774] [drm] IP block:gmc_v8_0 is hung!
[ 2330.720775] [drm] IP block:sdma_v3_0 is hung!
[ 2330.720778] [drm] IP block:sdma_v3_0 is hung!

(Above cited messages are the last to make it to a network-filesystem by
running "dmesg -w" before the system stops to do anything.)

I am running a kernel compiled from
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next as of
"commit 94097b0f7f1bfa54b3b1f8b0d74bbd271a0564e4" (so the very latest as of
today).
My GPU is an RX 460.

Notice that this bug may be the same symptom as reported in
https://bugs.freedesktop.org/show_bug.cgi?id=98874

However, the system crashes for me occur usually while vertically scrolling
through some (ordinary) web page.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3477 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
@ 2017-11-19 16:40 ` bugzilla-daemon
  2018-02-24 18:36 ` bugzilla-daemon
                   ` (89 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2017-11-19 16:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 483 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #1 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Sadly, not only did this bug not attract any attention, it also still occurs,
and seemingly even more frequent than before, on current bleeding-edge kernels
from amd-staging-drm-next, and also with the now current Firefox 57 and the now
current versions of Xorg, Mesa etc. from Arch Linux.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1366 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
  2017-11-19 16:40 ` bugzilla-daemon
@ 2018-02-24 18:36 ` bugzilla-daemon
  2018-06-03 21:00 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-02-24 18:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 956 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #2 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just to mention this once again: These system crashes still occur, and way too
frequently to consider the amdgpu driver stable enough for professional use.
Sample dmesg output from today:

Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=5430589, last emitted seq=5430591
Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung!
Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung!
Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
last signaled seq=185928, last emitted seq=185930
Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung!
Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung!
Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:43:crtc-0]
hw_done or flip_done timed out

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1839 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
  2017-11-19 16:40 ` bugzilla-daemon
  2018-02-24 18:36 ` bugzilla-daemon
@ 2018-06-03 21:00 ` bugzilla-daemon
  2018-06-03 21:02 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-03 21:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 330 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #3 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just for the record, others have reported similar symptoms - here is a recent
example: https://bugs.freedesktop.org/show_bug.cgi?id=106666

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1511 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-06-03 21:00 ` bugzilla-daemon
@ 2018-06-03 21:02 ` bugzilla-daemon
  2018-06-25 21:43 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-03 21:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 706 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #4 from dwagner <jb5sgc1n.nya@20mm.eu> ---
I was asked in
https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1027705-amdgpu-on-linux-4-18-to-offer-greater-vega-power-savings-displayport-1-4-fixes?p=1027933#post1027933
to mention here that I have experienced this kind of bug only when using the
"new" display code (amdgpu.dc=1).

I cannot strictly rule out that it could also happen with dc=0, since I have
tried dc=0 only for short periods occasionally, but during those periods I did
not see this kind of crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1811 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-06-03 21:02 ` bugzilla-daemon
@ 2018-06-25 21:43 ` bugzilla-daemon
  2018-06-25 22:11 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-25 21:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 678 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #5 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just for the record: To rule out my personally compiled kernels are somehow
"more buggy than what others compile", I tried the current Arch-Linux-supplied
Linux 4.17.2-1-ARCH kernel.

Survives about 5 minutes of Firefox-browsing between crashes with:

Jun 20 00:01:11 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, last signaled seq=1895, last em>
Jun 20 00:01:11 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!

(4.13.* did at least survive days.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1574 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-06-25 21:43 ` bugzilla-daemon
@ 2018-06-25 22:11 ` bugzilla-daemon
  2018-06-25 23:08 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-25 22:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 395 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #6 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Verify you are using latest AMD firmware and up to date MESA/LLVM

Firmware here  (amdgpu folder) -
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/

Andrey

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1384 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-06-25 22:11 ` bugzilla-daemon
@ 2018-06-25 23:08 ` bugzilla-daemon
  2018-06-26 15:20 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-25 23:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1584 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #7 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #6)
> Verify you are using latest AMD firmware and up to date MESA/LLVM

Firmware:

pacman -Q linux-firmware
linux-firmware 20180606.d114732-1

ll  /usr/lib/firmware/amdgpu/vega10_vce.bin
-rw-r--r-- 1 root root 165344 Jun  7 08:01
/usr/lib/firmware/amdgpu/vega10_vce.bin


MESA:

pacman -Q mesa
mesa 18.1.2-1


LLVM:
pacman -Q llvm-libs
llvm-libs 6.0.0-4

Is this new enough?


BTW: In a forum somebody asked what the dmesg output on crash looked like if I
enabled amdgpu.gpu_recovery=1 - the result is a few lines more of output, but
still a fatal system crash:

Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, last signaled seq=12277, last emitted seq=12279
Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!
Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung!
Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
[drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2538 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-06-25 23:08 ` bugzilla-daemon
@ 2018-06-26 15:20 ` bugzilla-daemon
  2018-06-26 15:21 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-26 15:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2229 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #8 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #7)
> (In reply to Andrey Grodzovsky from comment #6)
> > Verify you are using latest AMD firmware and up to date MESA/LLVM
> 
> Firmware:
> 
> pacman -Q linux-firmware
> linux-firmware 20180606.d114732-1
> 
> ll  /usr/lib/firmware/amdgpu/vega10_vce.bin
> -rw-r--r-- 1 root root 165344 Jun  7 08:01
> /usr/lib/firmware/amdgpu/vega10_vce.bin
> 
> 
> MESA:
> 
> pacman -Q mesa
> mesa 18.1.2-1
> 
> 
> LLVM:
> pacman -Q llvm-libs
> llvm-libs 6.0.0-4
> 
> Is this new enough?

The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7.
The firmware also looks pretty late but I still would advise to manually
override all firmware files with files from here
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu
Just backup your existing firmware/amdgpu folder for any case.

> 
> 
> BTW: In a forum somebody asked what the dmesg output on crash looked like if
> I enabled amdgpu.gpu_recovery=1 - the result is a few lines more of output,
> but still a fatal system crash:
> 
> Jun 26 00:50:09 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> ring gfx timeout, last signaled seq=12277, last emitted seq=12279
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gmc_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: [drm] IP block:gfx_v8_0 is hung!
> Jun 26 00:50:09 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_flip_done
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:15 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [CRTC:42:crtc-0] flip_done timed out
> Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
> [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out

It's a know issue, try the patch I attached to resolve the deadlock , but you
will probably experience other failures after that anyway. 

Andrey

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3491 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-06-26 15:20 ` bugzilla-daemon
@ 2018-06-26 15:21 ` bugzilla-daemon
  2018-06-26 22:52 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-26 15:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 318 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #9 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 140345
  --> https://bugs.freedesktop.org/attachment.cgi?id=140345&action=edit
Deadlock fix

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-06-26 15:21 ` bugzilla-daemon
@ 2018-06-26 22:52 ` bugzilla-daemon
  2018-06-27  7:48 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-26 22:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1519 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #10 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #8)
> The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7.

LLVM 7 has not been released, and replacing LLVM 6 with the current subversion
head of LLVM 7 means to basically recompile and reinstall half of the operating
system (starting at radeonsi, then Xorg, then its dependencies...)

I'm fine with using experimental new kernels to find a more stable amdgpu
driver - but if a kernel driver crashes just because some user-space
application (X11) utilizes a wrong compiler version at run time, then some part
of the driver design is very wrong. 

> The firmware also looks pretty late but I still would advise to manually
> override all firmware files with files from here
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tree/amdgpu

I did a "diff -r" on the git files with the ones installed by Arch, they are
all binary identical.

> > Jun 26 00:50:25 ryzen kernel: [drm:drm_atomic_helper_wait_for_dependencies
> > [drm_kms_helper]] *ERROR* [PLANE:40:plane-4] flip_done timed out
> 
> It's a know issue, try the patch I attached to resolve the deadlock , but
> you will probably experience other failures after that anyway. 

Ok, thanks for the patch, will try this next time I compile a new kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2664 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-06-26 22:52 ` bugzilla-daemon
@ 2018-06-27  7:48 ` bugzilla-daemon
  2018-06-27 13:53 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-27  7:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 341 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #11 from Michel Dänzer <michel@daenzer.net> ---
(In reply to Andrey Grodzovsky from comment #8)
> The kernel and MESA seems new enough, LLVM is 6 so maybe you should try 7.

LLVM 6 is fine.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1301 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-06-27  7:48 ` bugzilla-daemon
@ 2018-06-27 13:53 ` bugzilla-daemon
  2018-06-27 23:15 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-27 13:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1166 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #12 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #2)
> Just to mention this once again: These system crashes still occur, and way
> too frequently to consider the amdgpu driver stable enough for professional
> use. Sample dmesg output from today:
> 
> Feb 24 18:26:55 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> last signaled seq=5430589, last emitted seq=5430591
> Feb 24 18:26:55 [drm] IP block:gmc_v8_0 is hung!
> Feb 24 18:26:55 [drm] IP block:gfx_v8_0 is hung!
> Feb 24 18:27:02 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, last signaled seq=185928, last emitted seq=185930
> Feb 24 18:27:02 [drm] IP block:gmc_v8_0 is hung!
> Feb 24 18:27:02 [drm] IP block:gfx_v8_0 is hung!
> Feb 24 18:27:05 [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
> [CRTC:43:crtc-0] hw_done or flip_done timed out

Can you load the kernel with grub command line amdgpu.vm_update_mode=3 to force
CPU VM update mode and see if this helps ?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2175 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-06-27 13:53 ` bugzilla-daemon
@ 2018-06-27 23:15 ` bugzilla-daemon
  2018-06-28  2:17 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-27 23:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1589 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #13 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #12)
> Can you load the kernel with grub command line amdgpu.vm_update_mode=3 to
> force CPU VM update mode and see if this helps ?

Sure. Too early yet to say "hurray", but at an uptime of one hour, currently,
4.17.2 survived with amdgpu.vm_update_mode=3 already about 20 times longer than
without that option before the first crash.

One (probably just informal) message is emitted by the kernel:
[   19.319565] CPU update of VM recommended only for large BAR system

Can you explain a little: What is a "large BAR system", and what does the
vm_update_mode=3 option actually cause? Should I expect any weird side effects
to look for?


BTW: Not a result of that option, but of the kernel version, seems to be the
fact that the shader clock keeps at a pretty high frequency all the time - even
without any 3d or compute load, just displaying a quiet 4k/60Hz desktop image:

cat pp_dpm_sclk
0: 214Mhz 
1: 481Mhz 
2: 760Mhz 
3: 1020Mhz 
4: 1102Mhz 
5: 1138Mhz 
6: 1180Mhz *
7: 1220Mhz 

Much lower shader clocks are used only if I lower the refresh rate of the
screen. Is there a reason why the shader clocks should stay high even in the
absence of 3d/compute load?

(I would have better understood if the minimum memory clock was depending on
the refresh rate, but memory clock stays as low as with the older kernels.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2568 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-06-27 23:15 ` bugzilla-daemon
@ 2018-06-28  2:17 ` bugzilla-daemon
  2018-06-28  4:17 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28  2:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #14 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to dwagner from comment #13)
> 
> Much lower shader clocks are used only if I lower the refresh rate of the
> screen. Is there a reason why the shader clocks should stay high even in the
> absence of 3d/compute load?
> 

Certain display requirements can cause the engine clock to be kept higher as
well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1493 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-06-28  2:17 ` bugzilla-daemon
@ 2018-06-28  4:17 ` bugzilla-daemon
  2018-06-28  4:36 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28  4:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #15 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #13)
> (In reply to Andrey Grodzovsky from comment #12)
> > Can you load the kernel with grub command line amdgpu.vm_update_mode=3 to
> > force CPU VM update mode and see if this helps ?
> 
> Sure. Too early yet to say "hurray", but at an uptime of one hour,
> currently, 4.17.2 survived with amdgpu.vm_update_mode=3 already about 20
> times longer than without that option before the first crash.
> 
> One (probably just informal) message is emitted by the kernel:
> [   19.319565] CPU update of VM recommended only for large BAR system
> 
> Can you explain a little: What is a "large BAR system", and what does the
> vm_update_mode=3 option actually cause? Should I expect any weird side
> effects to look for?

I think it just means systems with large VRAM so it will require large BAR for
mapping. But I am not sure on that point.
vm_update_mode=3 means GPUVM page tables update is done using CPU. By default
we do it using DMA engine on the ASIC. The log showed a hang in this engine so
I assumed there is something wrong with SDMA commands we submit.
I assume more CPU utilization as a side effect and maybe slower rendering.

> 
> 
> BTW: Not a result of that option, but of the kernel version, seems to be the
> fact that the shader clock keeps at a pretty high frequency all the time -
> even without any 3d or compute load, just displaying a quiet 4k/60Hz desktop
> image:
> 
> cat pp_dpm_sclk
> 0: 214Mhz 
> 1: 481Mhz 
> 2: 760Mhz 
> 3: 1020Mhz 
> 4: 1102Mhz 
> 5: 1138Mhz 
> 6: 1180Mhz *
> 7: 1220Mhz 
> 
> Much lower shader clocks are used only if I lower the refresh rate of the
> screen. Is there a reason why the shader clocks should stay high even in the
> absence of 3d/compute load?
> 
> (I would have better understood if the minimum memory clock was depending on
> the refresh rate, but memory clock stays as low as with the older kernels.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3321 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (14 preceding siblings ...)
  2018-06-28  4:17 ` bugzilla-daemon
@ 2018-06-28  4:36 ` bugzilla-daemon
  2018-06-28 10:33 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28  4:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 615 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #16 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Andrey Grodzovsky from comment #15)
> I think it just means systems with large VRAM so it will require large BAR
> for mapping. But I am not sure on that point.

That's correct.  the updates are done with the CPU rather than the GPU (SDMA). 
The default BAR size on most systems is usually 256MB for 32 bit compatibility
so the window for CPU access to vram (where the page tables live) is limited.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1580 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (15 preceding siblings ...)
  2018-06-28  4:36 ` bugzilla-daemon
@ 2018-06-28 10:33 ` bugzilla-daemon
  2018-06-28 19:56 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28 10:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 966 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #17 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to Alex Deucher from comment #16)
> (In reply to Andrey Grodzovsky from comment #15)
> > I think it just means systems with large VRAM so it will require large BAR
> > for mapping. But I am not sure on that point.
> 
> That's correct.  the updates are done with the CPU rather than the GPU
> (SDMA).  The default BAR size on most systems is usually 256MB for 32 bit
> compatibility so the window for CPU access to vram (where the page tables
> live) is limited.

Thanks Alex.

dwagner, this is obviously just a work around and not a fix. It points to some
problem with SDMA packets, if you want to continue exploring we can try to dump
some fence traces and SDMA HW ring content to examine the latest packets before
the hang happened.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2005 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (16 preceding siblings ...)
  2018-06-28 10:33 ` bugzilla-daemon
@ 2018-06-28 19:56 ` bugzilla-daemon
  2018-06-28 21:09 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28 19:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1086 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #18 from dwagner <jb5sgc1n.nya@20mm.eu> ---
The good news: So far no crashes during normal uptime with
amdgpu.vm_update_mode=3

The bad news: System crashes immediately upon S3 resume (with messages quite
different from the ones I saw with earlier S3-resume crashes) - I filed bug
report https://bugs.freedesktop.org/show_bug.cgi?id=107065 on this.

(In reply to Andrey Grodzovsky from comment #17)
> dwagner, this is obviously just a work around and not a fix. It points to
> some problem with SDMA packets, if you want to continue exploring we can try
> to dump some fence traces and SDMA HW ring content to examine the latest
> packets before the hang happened.

If you can include some debug output into "amd-staging-drm-next" that helps
finding the root cause, I might be able to provide some output - if the kernel
survives long enough after the crash to write the system journal - this has not
always been the case.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2282 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (17 preceding siblings ...)
  2018-06-28 19:56 ` bugzilla-daemon
@ 2018-06-28 21:09 ` bugzilla-daemon
  2018-06-28 22:56 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28 21:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1637 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #19 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Can you use addr2line or gdb with 'list' command to give the line number
matching (In reply to dwagner from comment #18)
> The good news: So far no crashes during normal uptime with
> amdgpu.vm_update_mode=3
> 
> The bad news: System crashes immediately upon S3 resume (with messages quite
> different from the ones I saw with earlier S3-resume crashes) - I filed bug
> report https://bugs.freedesktop.org/show_bug.cgi?id=107065 on this.
> 
> (In reply to Andrey Grodzovsky from comment #17)
> > dwagner, this is obviously just a work around and not a fix. It points to
> > some problem with SDMA packets, if you want to continue exploring we can try
> > to dump some fence traces and SDMA HW ring content to examine the latest
> > packets before the hang happened.
> 
> If you can include some debug output into "amd-staging-drm-next" that helps
> finding the root cause, I might be able to provide some output - if the
> kernel survives long enough after the crash to write the system journal -
> this has not always been the case.

No need to recompile, just need to see what is the content of SDMA ring buffer
when the hang occurs.

Clone and build our register analyzer from here -
https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run 

sudo umr -lb
sudo umr -R gfx[.]
sudo umr -R sdma0[.]
sudo umr -R sdma1[.]

I will probably need more info later but let's try this first.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2992 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (18 preceding siblings ...)
  2018-06-28 21:09 ` bugzilla-daemon
@ 2018-06-28 22:56 ` bugzilla-daemon
  2018-06-28 22:57 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28 22:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 772 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #20 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #19)
> No need to recompile, just need to see what is the content of SDMA ring
> buffer when the hang occurs.
> 
> Clone and build our register analyzer from here -
> https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run 
> 
> sudo umr -lb
> sudo umr -R gfx[.]
> sudo umr -R sdma0[.]
> sudo umr -R sdma1[.]
> 
> I will probably need more info later but let's try this first.

How can I run "umr" on a crashed system? I guess those register values are
retained over a press of the reset button / reboot?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1823 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (19 preceding siblings ...)
  2018-06-28 22:56 ` bugzilla-daemon
@ 2018-06-28 22:57 ` bugzilla-daemon
  2018-06-29  0:10 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-28 22:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 282 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #21 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(I meant to write "I guess those register values are NOT retained over a
reboot, right?")

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1176 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (20 preceding siblings ...)
  2018-06-28 22:57 ` bugzilla-daemon
@ 2018-06-29  0:10 ` bugzilla-daemon
  2018-07-04 23:03 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-06-29  0:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 449 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #22 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #21)
> (I meant to write "I guess those register values are NOT retained over a
> reboot, right?")

Yes, my assumption was that at least some times you still have SSH access to
the system in those cases.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1433 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (21 preceding siblings ...)
  2018-06-29  0:10 ` bugzilla-daemon
@ 2018-07-04 23:03 ` bugzilla-daemon
  2018-07-05 13:59 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-04 23:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 886 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #23 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just for the record: At this point, I can say that with
amggpu.vm_update_mode=3 4.17.2-ARCH runs at least for hours,
not only the minutes it runs without this option before crashing.

I cannot, however, say that above combination reaches the
some-days-between-amdgpu-crashes uptimes that 4.13.x reached -
in order to be able to test this, I would need S3 resumes to work,
which is subject to bug report 107065.

Without working S3 resumes, there is no way for me to test longer
uptimes because amdgpu consistently crashes (in any version I know
of) if I just let the system run but switch off the display, and I do
not want to keep the connected 4k TV switched on all day and night.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1770 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (22 preceding siblings ...)
  2018-07-04 23:03 ` bugzilla-daemon
@ 2018-07-05 13:59 ` bugzilla-daemon
  2018-07-05 23:32 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-05 13:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 288 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #24 from Michel Dänzer <michel@daenzer.net> ---
Can you try bisecting between 4.13 and 4.17 to find where stability went
downhill for you?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1177 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (23 preceding siblings ...)
  2018-07-05 13:59 ` bugzilla-daemon
@ 2018-07-05 23:32 ` bugzilla-daemon
  2018-07-06 23:20 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-05 23:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 855 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #25 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Michel Dänzer from comment #24)
> Can you try bisecting between 4.13 and 4.17 to find where stability went
> downhill for you?

A bisect like that is not likely to converge in any reasonable time, given the
stochastic nature of those crashes.

While the mean-time-between-driver-crashes is dramatically different, there
will be occasions on which 4.13 will crash early enough to yield a false "bad",
and there will be occasions on which 4.17 is lasting like the 20 minutes or so
to assume a false "good".

What about the multitude of debug-options - isn't there one that could allow
for some more insight on when/why the driver crashes?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1834 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (24 preceding siblings ...)
  2018-07-05 23:32 ` bugzilla-daemon
@ 2018-07-06 23:20 ` bugzilla-daemon
  2018-07-07  8:36 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-06 23:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1257 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #26 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Today for the first time I had a sudden "crash while just browsing with
Firefox" while using the amggpu.vm_update_mode=3 parameter with the
current-as-of-today amd-staging-drm-next
(bb2e406ba66c2573b68e609e148cab57b1447095) with patch 
https://bugs.freedesktop.org/attachment.cgi?id=140418 applied on top.

Different kernel messages than with previous crashed of this kind were emitted:

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c80440c
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)
Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, last signaled seq=75244, last emitted seq=75245
Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!

Hope this helps somehow.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2219 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (25 preceding siblings ...)
  2018-07-06 23:20 ` bugzilla-daemon
@ 2018-07-07  8:36 ` bugzilla-daemon
  2018-07-07 20:08 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-07  8:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 426 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #27 from Michel Dänzer <michel@daenzer.net> ---
(In reply to dwagner from comment #26)
> Today for the first time I had a sudden "crash while just browsing with
> Firefox" [...]

That could be a Mesa issue, anyway it should probably be tracked separately
from this report.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1400 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (26 preceding siblings ...)
  2018-07-07  8:36 ` bugzilla-daemon
@ 2018-07-07 20:08 ` bugzilla-daemon
  2018-07-09 14:34 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-07 20:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 550 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #28 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Michel Dänzer from comment #27)
> That could be a Mesa issue, anyway it should probably be tracked separately
> from this report.

Created separate bug report https://bugs.freedesktop.org/show_bug.cgi?id=107152

(If that is a Mesa issue, no more than user processes / X11 should have crashed
- but not the kernel amdgpu driver... right?)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1689 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (27 preceding siblings ...)
  2018-07-07 20:08 ` bugzilla-daemon
@ 2018-07-09 14:34 ` bugzilla-daemon
  2018-07-11 22:32 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-09 14:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 786 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #29 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #28)
> (In reply to Michel Dänzer from comment #27)
> > That could be a Mesa issue, anyway it should probably be tracked separately
> > from this report.
> 
> Created separate bug report
> https://bugs.freedesktop.org/show_bug.cgi?id=107152
> 
> (If that is a Mesa issue, no more than user processes / X11 should have
> crashed - but not the kernel amdgpu driver... right?)

Not exactly, MESA could create a bad request (faulty GPU address) which would
lead to this. It can even be triggered on purpose using a debug flag from MESA.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2008 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (28 preceding siblings ...)
  2018-07-09 14:34 ` bugzilla-daemon
@ 2018-07-11 22:32 ` bugzilla-daemon
  2018-07-15  8:56 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-11 22:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 871 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #30 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #29)
> > (If that is a Mesa issue, no more than user processes / X11 should have
> > crashed - but not the kernel amdgpu driver... right?)
> 
> Not exactly, MESA could create a bad request (faulty GPU address) which
> would lead to this. It can even be triggered on purpose using a debug flag
> from MESA.

My understanding is that all parts of MESA run as user processes, outside of
the kernel space. If such code is allowed to pass parameters into kernel
functions that make the kernel crash, that would be a veritable security hole
which attackers could exploit to stage at least denial-of-service attacks, if
not worse.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1848 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (29 preceding siblings ...)
  2018-07-11 22:32 ` bugzilla-daemon
@ 2018-07-15  8:56 ` bugzilla-daemon
  2018-07-15  9:03 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-15  8:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 469 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #31 from Doctor <tekcommnv@gmail.com> ---
I got that one too and was able to track the problem down a bit further. Chrome
and video with the gpu enabled will blow it up too. Interesting I was able to
reproduce it consistantly with my rtl8188eu usb driver plug it in connect and
wpa_supplicant will cause it to explode.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1351 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (30 preceding siblings ...)
  2018-07-15  8:56 ` bugzilla-daemon
@ 2018-07-15  9:03 ` bugzilla-daemon
  2018-07-15  9:07 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-15  9:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 756 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #32 from Doctor <tekcommnv@gmail.com> ---
I ended up due to working on a live dev cd for codexl since all my machines are
memory based and use no magnetic media. Just cherry picking the code back to
the  last 4.16 and no problems Heres the working 4.16 . I chased this rabbit
for awhile and it pops up like the dam wood chuck in caddie shack.


Here is the latest as of 11 hours ago 4.19-wip
https://github.com/tekcomm/linux-image-4.19-wip-generic


Here is the latest as of 11 hours ago 4.16 version from three weeks ago with no
woodchucks
https://github.com/tekcomm/linux-kernel-amdgpu-binaries

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1778 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (31 preceding siblings ...)
  2018-07-15  9:03 ` bugzilla-daemon
@ 2018-07-15  9:07 ` bugzilla-daemon
  2018-07-15 19:59 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-15  9:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 241 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #33 from Doctor <tekcommnv@gmail.com> ---
I think it may be something as stupid as a var too.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1123 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (32 preceding siblings ...)
  2018-07-15  9:07 ` bugzilla-daemon
@ 2018-07-15 19:59 ` bugzilla-daemon
  2018-07-16 14:06 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-15 19:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 708 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #34 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Doctor from comment #32)
> Just cherry picking the code
> back to the  last 4.16 and no problems Heres the working 4.16 . I chased
> this rabbit for awhile and it pops up like the dam wood chuck in caddie
> shack.
> 
> Here is the latest as of 11 hours ago 4.19-wip
> https://github.com/tekcomm/linux-image-4.19-wip-generic

I am not sure I understand what you are trying to tell us, here.

The repository you linked does not seem to contain any relevant commits
changing kernel source code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1752 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (33 preceding siblings ...)
  2018-07-15 19:59 ` bugzilla-daemon
@ 2018-07-16 14:06 ` bugzilla-daemon
  2018-07-29 10:02 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-16 14:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1299 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #35 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #30)
> (In reply to Andrey Grodzovsky from comment #29)
> > > (If that is a Mesa issue, no more than user processes / X11 should have
> > > crashed - but not the kernel amdgpu driver... right?)
> > 
> > Not exactly, MESA could create a bad request (faulty GPU address) which
> > would lead to this. It can even be triggered on purpose using a debug flag
> > from MESA.
> 
> My understanding is that all parts of MESA run as user processes, outside of
> the kernel space. If such code is allowed to pass parameters into kernel
> functions that make the kernel crash, that would be a veritable security
> hole which attackers could exploit to stage at least denial-of-service
> attacks, if not worse.

There is no impact on the kernlel, please note that this is a GPU page fault,
not CPU page fault so the kernel keeps working normal, doesn't hang and
workable. You might get black screen out of this and have to reboot the graphic
card or maybe the entire system to recover but I don't see any system security
and stability compromise here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2371 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (34 preceding siblings ...)
  2018-07-16 14:06 ` bugzilla-daemon
@ 2018-07-29 10:02 ` bugzilla-daemon
  2018-08-08 23:07 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-07-29 10:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 500 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

Roshless <roshless@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roshless@gmail.com

--- Comment #36 from Roshless <roshless@gmail.com> ---
*** Bug 107311 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2149 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (35 preceding siblings ...)
  2018-07-29 10:02 ` bugzilla-daemon
@ 2018-08-08 23:07 ` bugzilla-daemon
  2018-08-09 20:56 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-08 23:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1495 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #37 from dwagner <jb5sgc1n.nya@20mm.eu> ---
In the related bug report (https://bugs.freedesktop.org/show_bug.cgi?id=107152)
I noticed that this bug can be triggered very reliably and quickly by playing a
video with a deliberately lowered frame rate:
 "mpv --no-correct-pts --fps=3 --ao=null some_arbitrary_video.webm"

This led me to assume this bug might be caused by the dynamic power management,
that often ramps performance up/down when a video is played at such a low frame
rate.

And indeed, I found this confirmed by many experiments: If I use a script like
> #!/bin/bash
> cd /sys/class/drm/card0/device
> echo manual >power_dpm_force_performance_level
> # low
> echo 0 >pp_dpm_mclk 
> echo 0 >pp_dpm_sclk
> # medium
> #echo 1 >pp_dpm_mclk 
> #echo 1 >pp_dpm_sclk
> # high
> #echo 1 >pp_dpm_mclk 
> #echo 6 >pp_dpm_sclk
to enforce just any performance level, then the crashes do not occur anymore -
also with the "low frame rate video test".

So it seems that the transition from one "dpm" performance level to another,
with a certain probability, causes these crashes. And the more often the
transitions occur, the sooner one will experience them.

(BTW: For unknown reason, invoking "xrandr" or enabling a monitor after sleep
causes the above settings to get lost, so one has to invoke above script
again.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2684 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (36 preceding siblings ...)
  2018-08-08 23:07 ` bugzilla-daemon
@ 2018-08-09 20:56 ` bugzilla-daemon
  2018-08-14 21:27 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-09 20:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 254 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #38 from dwagner <jb5sgc1n.nya@20mm.eu> ---
*** Bug 107152 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1348 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (37 preceding siblings ...)
  2018-08-09 20:56 ` bugzilla-daemon
@ 2018-08-14 21:27 ` bugzilla-daemon
  2018-08-15 14:24 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-14 21:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1771 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #39 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #37)
> In the related bug report
> (https://bugs.freedesktop.org/show_bug.cgi?id=107152) I noticed that this
> bug can be triggered very reliably and quickly by playing a video with a
> deliberately lowered frame rate:
>  "mpv --no-correct-pts --fps=3 --ao=null some_arbitrary_video.webm"

> 
> This led me to assume this bug might be caused by the dynamic power
> management, that often ramps performance up/down when a video is played at
> such a low frame rate.

I tried exactly the same - reproduce with same card model and latest kernel and
run webm clip with mpv same way you did and it didn't happen. 

> 
> And indeed, I found this confirmed by many experiments: If I use a script
> like
> > #!/bin/bash
> > cd /sys/class/drm/card0/device
> > echo manual >power_dpm_force_performance_level
> > # low
> > echo 0 >pp_dpm_mclk 
> > echo 0 >pp_dpm_sclk
> > # medium
> > #echo 1 >pp_dpm_mclk 
> > #echo 1 >pp_dpm_sclk
> > # high
> > #echo 1 >pp_dpm_mclk 
> > #echo 6 >pp_dpm_sclk
> to enforce just any performance level, then the crashes do not occur anymore
> - also with the "low frame rate video test".
> 
> So it seems that the transition from one "dpm" performance level to another,
> with a certain probability, causes these crashes. And the more often the
> transitions occur, the sooner one will experience them.
> 
> (BTW: For unknown reason, invoking "xrandr" or enabling a monitor after
> sleep causes the above settings to get lost, so one has to invoke above
> script again.)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3204 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (38 preceding siblings ...)
  2018-08-14 21:27 ` bugzilla-daemon
@ 2018-08-15 14:24 ` bugzilla-daemon
  2018-08-15 22:03 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-15 14:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1298 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #40 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 141112
  --> https://bugs.freedesktop.org/attachment.cgi?id=141112&action=edit
.config

I uploaded my .config file - maybe something in your Kconfig flags makes this
happen - you can try and rebuild latest kernel from Alex's repository using my
.config and see if you don't experience this anymore. 
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

Other than that, since you system hard hangs so you can't do any postmortem
dumps, you can at least provide output from events tracing though trace_pipe to
catch live logs on the fly. Maybe we can infer something from there...

So again - 
Load the system and before starting reproduce run the following trace command -

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"

then cd /sys/kernel/debug/tracing && cat trace_pipe

When the problem happens just copy all the output from the terminal to a log
file. Make sure your terminal app has largest possible buffer to catch ALL the
output.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2425 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (39 preceding siblings ...)
  2018-08-15 14:24 ` bugzilla-daemon
@ 2018-08-15 22:03 ` bugzilla-daemon
  2018-08-16 21:53 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-15 22:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1573 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #41 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #40)
> Created attachment 141112 [details]
> .config
> 
> I uploaded my .config file - maybe something in your Kconfig flags makes
> this happen - you can try and rebuild latest kernel from Alex's repository
> using my .config and see if you don't experience this anymore. 
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

Did just that - but still the video test crashes after at most few minutes, and
does not crash with DPM turned off. So we can rule out our .config differences
(of which there are many).

> Other than that, since you system hard hangs so you can't do any postmortem
> dumps, you can at least provide output from events tracing though trace_pipe
> to catch live logs on the fly. Maybe we can infer something from there...
> 
> So again - 
> Load the system and before starting reproduce run the following trace
> command -
> 
> sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
> "amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
> 
> then cd /sys/kernel/debug/tracing && cat trace_pipe
> 
> When the problem happens just copy all the output from the terminal to a log
> file. Make sure your terminal app has largest possible buffer to catch ALL
> the output.

Will try that on next opportunity, probably tomorrow evening.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2914 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (40 preceding siblings ...)
  2018-08-15 22:03 ` bugzilla-daemon
@ 2018-08-16 21:53 ` bugzilla-daemon
  2018-08-16 21:55 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-16 21:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 949 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #42 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Ok, did the proposed debugging session with trace-cmd, with output to a
different PC over ssh. Using today's amd-staging-drm-next and btw., Arch
updated the Xorg server earlier today.

This time it took about 4 minutes until the video playback with 3 fps crashed -
the symptom was the same (as in one-colored blank screen and a subsequent
system crash), but this time the kernel and ssh session survived the crash for
some seconds, enough for me to also issue the earlier suggested "umr -O verbose
-R gfx[.]" command after the amdgpu crash, so I can upload the output of that,
too, but this was the last command executed, the system crashed completely
while running it (so its output may be partial).

Find attached dmesg, trace, and umr output.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1843 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (41 preceding siblings ...)
  2018-08-16 21:53 ` bugzilla-daemon
@ 2018-08-16 21:55 ` bugzilla-daemon
  2018-08-16 21:56 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-16 21:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 352 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #43 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141155
  --> https://bugs.freedesktop.org/attachment.cgi?id=141155&action=edit
trace-cmd induced output during 3-fps-video replay and crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1448 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (42 preceding siblings ...)
  2018-08-16 21:55 ` bugzilla-daemon
@ 2018-08-16 21:56 ` bugzilla-daemon
  2018-08-16 21:57 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-16 21:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 343 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #44 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141156
  --> https://bugs.freedesktop.org/attachment.cgi?id=141156&action=edit
dmesg from boot to after the 3-fps-video test crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1421 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (43 preceding siblings ...)
  2018-08-16 21:56 ` bugzilla-daemon
@ 2018-08-16 21:57 ` bugzilla-daemon
  2018-08-16 22:31 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-16 21:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #45 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141157
  --> https://bugs.freedesktop.org/attachment.cgi?id=141157&action=edit
output of umr command after 3-fps-video test crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (44 preceding siblings ...)
  2018-08-16 21:57 ` bugzilla-daemon
@ 2018-08-16 22:31 ` bugzilla-daemon
  2018-08-17 21:25 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-16 22:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 214 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #46 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1113 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (45 preceding siblings ...)
  2018-08-16 22:31 ` bugzilla-daemon
@ 2018-08-17 21:25 ` bugzilla-daemon
  2018-08-18 21:36 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-17 21:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 911 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #47 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 141174
  --> https://bugs.freedesktop.org/attachment.cgi?id=141174&action=edit
add_debug_info.patch

A am attaching a basic debug patch, please try to apply it. It should give a
bit more info in dmesg whe VM fault happens. I wasn't able to test it on  my
system so it might be buggy or crash.

Reproduce again like before with the cmd-trace like before and once the fault
happens if possible try quickly run 

sudo umr -O halt_waves -wa

and only if you still have running system after that do the 
sudo umr -O verbose -R gfx[.]

The driver should be loaded amdgpu.vm_fault_stop=2 from grub
Also check if adding amdgpu.vm_debug=1 makes the issue reproduce more quickly

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2028 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (46 preceding siblings ...)
  2018-08-17 21:25 ` bugzilla-daemon
@ 2018-08-18 21:36 ` bugzilla-daemon
  2018-08-18 21:37 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-18 21:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1543 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #48 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #47)
> Created attachment 141174 [details] [review]
> add_debug_info.patch
> 
> A am attaching a basic debug patch, please try to apply it.

Done.

> It should give a
> bit more info in dmesg whe VM fault happens. 

Hmm - I could not see any additional output resulting from it.

> Reproduce again like before with the cmd-trace like before and once the
> fault happens if possible try quickly run 
> 
> sudo umr -O halt_waves -wa
> 
> and only if you still have running system after that do the 
> sudo umr -O verbose -R gfx[.]
> 
> The driver should be loaded amdgpu.vm_fault_stop=2 from grub

Did that - will attach the script "gpu_debug3.sh" and its output - this time,
dmesg and trace output are in the same file, if you want to look only at the
dmesg part, "grep '^\[' gpu_debug_3.txt" will get it. 

I reproduced the bug 4 times, on 2 occasions no error was emitted before
crashing, the 2 other times both umr commands could still run - since the error
message looked the same, I'll attach the shorter file, where the crash occurred
more quickly.

> Also check if adding amdgpu.vm_debug=1 makes the issue reproduce more quickly

I used that setting, but it did not seem to make a difference for how quickly
the crash occurred - still "some seconds to some minutes".

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3088 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (47 preceding siblings ...)
  2018-08-18 21:36 ` bugzilla-daemon
@ 2018-08-18 21:37 ` bugzilla-daemon
  2018-08-18 21:38 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-18 21:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 366 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #49 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141189
  --> https://bugs.freedesktop.org/attachment.cgi?id=141189&action=edit
script used to generate the gpu_debug_3.txt (when executed via ssh -t ...)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (48 preceding siblings ...)
  2018-08-18 21:37 ` bugzilla-daemon
@ 2018-08-18 21:38 ` bugzilla-daemon
  2018-08-18 21:40 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-18 21:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 337 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #50 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141190
  --> https://bugs.freedesktop.org/attachment.cgi?id=141190&action=edit
dmesg / trace / umr output from gpu_debug3.sh

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1403 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (49 preceding siblings ...)
  2018-08-18 21:38 ` bugzilla-daemon
@ 2018-08-18 21:40 ` bugzilla-daemon
  2018-08-18 21:43 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-18 21:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 631 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

dwagner <jb5sgc1n.nya@20mm.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #141190|0                           |1
        is obsolete|                            |

--- Comment #51 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141191
  --> https://bugs.freedesktop.org/attachment.cgi?id=141191&action=edit
xz-compressed output of gpu_debug3.sh - dmesg, trace, umr

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2288 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (50 preceding siblings ...)
  2018-08-18 21:40 ` bugzilla-daemon
@ 2018-08-18 21:43 ` bugzilla-daemon
  2018-08-20 14:16 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-18 21:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 459 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #52 from dwagner <jb5sgc1n.nya@20mm.eu> ---
One other experiment I made: I wrote a script to quickly toggle pp_dpm_mclk and
pp_dpm_sclk while playing a 3 fps video with
power_dpm_force_performance_level=manual. Could not reproduce the crashes that
happen with power_dpm_force_performance_level=auto this way.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1343 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (51 preceding siblings ...)
  2018-08-18 21:43 ` bugzilla-daemon
@ 2018-08-20 14:16 ` bugzilla-daemon
  2018-08-21  8:41 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-20 14:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 599 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #53 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
Created attachment 141198
  --> https://bugs.freedesktop.org/attachment.cgi?id=141198&action=edit
add_debug_info2.patch

Try this patch instead, i might be missing some prints in the first one.
In the last log you attached I haven't seen any UMR dumps or GPU fault prints
in dmesg. THe GPU fault has to be in the log to compare the faulty address
against the debug prints in the patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1718 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (52 preceding siblings ...)
  2018-08-20 14:16 ` bugzilla-daemon
@ 2018-08-21  8:41 ` bugzilla-daemon
  2018-08-21 14:43 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-21  8:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3900 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #54 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #53)
> Created attachment 141198 [details] [review]
> add_debug_info2.patch
> 
> Try this patch instead, i might be missing some prints in the first one.

Can try that this evening.

> In the last log you attached I haven't seen any UMR dumps or GPU fault
> prints in dmesg. THe GPU fault has to be in the log to compare the faulty
> address against the debug prints in the patch.

In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
output at the time of the crash (238 seconds after the reboot):

----------------------------------------------
...
          mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
driver=drm_sched timeline=gfx context=162 seqno=87
          mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
driver=drm_sched timeline=gfx context=162 seqno=87
     kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=210
     kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=211
[  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=32624, emitted seq=32626
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!

crash detected!

executing umr -O halt_waves -wa
No active waves!


executing umr -O verbose -R gfx[.]

polaris11.gfx.rptr == 1792
polaris11.gfx.wptr == 1792
polaris11.gfx.drv_wptr == 1792
polaris11.gfx.ring[1761] == 0xffff1000    ... 
polaris11.gfx.ring[1762] == 0xffff1000    ... 
polaris11.gfx.ring[1763] == 0xffff1000    ... 
polaris11.gfx.ring[1764] == 0xffff1000    ... 
polaris11.gfx.ring[1765] == 0xffff1000    ... 
polaris11.gfx.ring[1766] == 0xffff1000    ... 
polaris11.gfx.ring[1767] == 0xffff1000    ... 
polaris11.gfx.ring[1768] == 0xffff1000    ... 
polaris11.gfx.ring[1769] == 0xffff1000    ... 
polaris11.gfx.ring[1770] == 0xffff1000    ... 
polaris11.gfx.ring[1771] == 0xffff1000    ... 
polaris11.gfx.ring[1772] == 0xffff1000    ... 
polaris11.gfx.ring[1773] == 0xffff1000    ... 
polaris11.gfx.ring[1774] == 0xffff1000    ... 
polaris11.gfx.ring[1775] == 0xffff1000    ... 
polaris11.gfx.ring[1776] == 0xffff1000    ... 
polaris11.gfx.ring[1777] == 0xffff1000    ... 
polaris11.gfx.ring[1778] == 0xffff1000    ... 
polaris11.gfx.ring[1779] == 0xffff1000    ... 
polaris11.gfx.ring[1780] == 0xffff1000    ... 
polaris11.gfx.ring[1781] == 0xffff1000    ... 
polaris11.gfx.ring[1782] == 0xffff1000    ... 
polaris11.gfx.ring[1783] == 0xffff1000    ... 
polaris11.gfx.ring[1784] == 0xffff1000    ... 
polaris11.gfx.ring[1785] == 0xffff1000    ... 
polaris11.gfx.ring[1786] == 0xffff1000    ... 
polaris11.gfx.ring[1787] == 0xffff1000    ... 
polaris11.gfx.ring[1788] == 0xffff1000    ... 
polaris11.gfx.ring[1789] == 0xffff1000    ... 
polaris11.gfx.ring[1790] == 0xffff1000    ... 
polaris11.gfx.ring[1791] == 0xffff1000    ... 
polaris11.gfx.ring[1792] == 0xc0032200    rwD 

trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'

done after crash, flashing NUMLOCK LED.
     amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
list=0000000099c16b5c, bo=000000001771c26f, bo_size=131072
     amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
list=0000000099c16b5c, bo=0000000046bfd439, bo_size=131072
...
----------------------------------------------

But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages this
time. Sometimes such are emitted, sometimes not.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5205 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (53 preceding siblings ...)
  2018-08-21  8:41 ` bugzilla-daemon
@ 2018-08-21 14:43 ` bugzilla-daemon
  2018-08-21 21:16 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-21 14:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4321 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #55 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #54)
> (In reply to Andrey Grodzovsky from comment #53)
> > Created attachment 141198 [details] [review] [review]
> > add_debug_info2.patch
> > 
> > Try this patch instead, i might be missing some prints in the first one.
> 
> Can try that this evening.
> 
> > In the last log you attached I haven't seen any UMR dumps or GPU fault
> > prints in dmesg. THe GPU fault has to be in the log to compare the faulty
> > address against the debug prints in the patch.
> 
> In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
> output at the time of the crash (238 seconds after the reboot):
> 
> ----------------------------------------------
> ...
>           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
> driver=drm_sched timeline=gfx context=162 seqno=87
>           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
> driver=drm_sched timeline=gfx context=162 seqno=87
>      kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
> driver=amdgpu timeline=sdma1 context=11 seqno=210
>      kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
> driver=amdgpu timeline=sdma1 context=11 seqno=211
> [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=32624, emitted seq=32626
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> 
> crash detected!
> 
> executing umr -O halt_waves -wa
> No active waves!

Did you use amdgpu.vm_fault_stop=2 parameter ? In case a fault happened that
should have froze GPUs compute units and hence the above command would produce
a lot of wave info.

> 
> 
> executing umr -O verbose -R gfx[.]
> 
> polaris11.gfx.rptr == 1792
> polaris11.gfx.wptr == 1792
> polaris11.gfx.drv_wptr == 1792
> polaris11.gfx.ring[1761] == 0xffff1000    ... 
> polaris11.gfx.ring[1762] == 0xffff1000    ... 
> polaris11.gfx.ring[1763] == 0xffff1000    ... 
> polaris11.gfx.ring[1764] == 0xffff1000    ... 
> polaris11.gfx.ring[1765] == 0xffff1000    ... 
> polaris11.gfx.ring[1766] == 0xffff1000    ... 
> polaris11.gfx.ring[1767] == 0xffff1000    ... 
> polaris11.gfx.ring[1768] == 0xffff1000    ... 
> polaris11.gfx.ring[1769] == 0xffff1000    ... 
> polaris11.gfx.ring[1770] == 0xffff1000    ... 
> polaris11.gfx.ring[1771] == 0xffff1000    ... 
> polaris11.gfx.ring[1772] == 0xffff1000    ... 
> polaris11.gfx.ring[1773] == 0xffff1000    ... 
> polaris11.gfx.ring[1774] == 0xffff1000    ... 
> polaris11.gfx.ring[1775] == 0xffff1000    ... 
> polaris11.gfx.ring[1776] == 0xffff1000    ... 
> polaris11.gfx.ring[1777] == 0xffff1000    ... 
> polaris11.gfx.ring[1778] == 0xffff1000    ... 
> polaris11.gfx.ring[1779] == 0xffff1000    ... 
> polaris11.gfx.ring[1780] == 0xffff1000    ... 
> polaris11.gfx.ring[1781] == 0xffff1000    ... 
> polaris11.gfx.ring[1782] == 0xffff1000    ... 
> polaris11.gfx.ring[1783] == 0xffff1000    ... 
> polaris11.gfx.ring[1784] == 0xffff1000    ... 
> polaris11.gfx.ring[1785] == 0xffff1000    ... 
> polaris11.gfx.ring[1786] == 0xffff1000    ... 
> polaris11.gfx.ring[1787] == 0xffff1000    ... 
> polaris11.gfx.ring[1788] == 0xffff1000    ... 
> polaris11.gfx.ring[1789] == 0xffff1000    ... 
> polaris11.gfx.ring[1790] == 0xffff1000    ... 
> polaris11.gfx.ring[1791] == 0xffff1000    ... 
> polaris11.gfx.ring[1792] == 0xc0032200    rwD 
> 
> trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
> trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
> 
> done after crash, flashing NUMLOCK LED.
>      amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
> list=0000000099c16b5c, bo=000000001771c26f, bo_size=131072
>      amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
> list=0000000099c16b5c, bo=0000000046bfd439, bo_size=131072
> ...
> ----------------------------------------------
> 
> But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages
> this time. Sometimes such are emitted, sometimes not.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5943 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (54 preceding siblings ...)
  2018-08-21 14:43 ` bugzilla-daemon
@ 2018-08-21 21:16 ` bugzilla-daemon
  2018-08-21 21:29 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-21 21:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2168 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #56 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #55)
> > In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
> > output at the time of the crash (238 seconds after the reboot):
> > 
> > ----------------------------------------------
> > ...
> >           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
> > driver=drm_sched timeline=gfx context=162 seqno=87
> >           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
> > driver=drm_sched timeline=gfx context=162 seqno=87
> >      kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
> > driver=amdgpu timeline=sdma1 context=11 seqno=210
> >      kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
> > driver=amdgpu timeline=sdma1 context=11 seqno=211
> > [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> > timeout, signaled seq=32624, emitted seq=32626
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > 
> > crash detected!
> > 
> > executing umr -O halt_waves -wa
> > No active waves!
> 
> Did you use amdgpu.vm_fault_stop=2 parameter ? In case a fault happened that
> should have froze GPUs compute units and hence the above command would
> produce a lot of wave info.

Yes I did, as can be seen from the kernel command line at the very beginning of
the file I attached:
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux_amd
root=UUID=b5d56e15-18f3-4783-af84-bbff3bbff3ef rw
cryptdevice=/dev/nvme0n1p2:root:allow-discards libata.force=1.5 video=DP-1:d
video=DVI-D-1:d video=HDMI-A-1:1024x768 amdgpu.dc=1 amdgpu.vm_update_mode=0
amdgpu.dpm=-1 amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_fault_stop=2
amdgpu.vm_debug=1

Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a procedure
that discards whatever has been in thoses "waves" before? If yes, could
amdgpu.gpu_recovery=0 prevent that from happening?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3295 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (55 preceding siblings ...)
  2018-08-21 21:16 ` bugzilla-daemon
@ 2018-08-21 21:29 ` bugzilla-daemon
  2018-08-22  0:24 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-21 21:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2339 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #57 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #56)
> (In reply to Andrey Grodzovsky from comment #55)
> > > In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
> > > output at the time of the crash (238 seconds after the reboot):
> > > 
> > > ----------------------------------------------
> > > ...
> > >           mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
> > > driver=drm_sched timeline=gfx context=162 seqno=87
> > >           mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
> > > driver=drm_sched timeline=gfx context=162 seqno=87
> > >      kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
> > > driver=amdgpu timeline=sdma1 context=11 seqno=210
> > >      kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
> > > driver=amdgpu timeline=sdma1 context=11 seqno=211
> > > [  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> > > timeout, signaled seq=32624, emitted seq=32626
> > > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > > [  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
> > > 
> > > crash detected!
> > > 
> > > executing umr -O halt_waves -wa
> > > No active waves!
> > 
> > Did you use amdgpu.vm_fault_stop=2 parameter ? In case a fault happened that
> > should have froze GPUs compute units and hence the above command would
> > produce a lot of wave info.
> 
> Yes I did, as can be seen from the kernel command line at the very beginning
> of the file I attached:
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-linux_amd
> root=UUID=b5d56e15-18f3-4783-af84-bbff3bbff3ef rw
> cryptdevice=/dev/nvme0n1p2:root:allow-discards libata.force=1.5 video=DP-1:d
> video=DVI-D-1:d video=HDMI-A-1:1024x768 amdgpu.dc=1 amdgpu.vm_update_mode=0
> amdgpu.dpm=-1 amdgpu.ppfeaturemask=0xffffffff amdgpu.vm_fault_stop=2
> amdgpu.vm_debug=1
> 
> Could the "amdgpu 0000:0a:00.0: GPU reset begin!" message indicate a
> procedure that discards whatever has been in thoses "waves" before? If yes,
> could amdgpu.gpu_recovery=0 prevent that from happening?

Yes, missed that one. No resets.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3642 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (56 preceding siblings ...)
  2018-08-21 21:29 ` bugzilla-daemon
@ 2018-08-22  0:24 ` bugzilla-daemon
  2018-08-22  0:26 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-22  0:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3653 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #58 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Here comes another trace log, with your info2.patch applied.

Something must have changed since the last test, as it took pretty long this
time to reproduce the crash. Could that have been caused by
https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c?h=amd-staging-drm-next&id=b385925f3922faca7435e50e31380bb2602fd6b8
now being part of the kernel?

However, the latest trace you find attached below is not much different to the
last one, xzcat /tmp/gpu_debug5.txt.xz  | grep '^\[' will tell you:

[ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=475104, emitted seq=475106
[ 1510.023117] [drm] GPU recovery disabled.

     amdgpu_cs:0-806   [012] ....  1787.493126: amdgpu_vm_bo_cs:
soffs=00001001a0, eoffs=00001001b9, flags=70
     amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
soffs=0000100200, eoffs=00001021e0, flags=70
     amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
soffs=0000102200, eoffs=00001041e0, flags=70
     amdgpu_cs:0-806   [012] ....  1787.493129: amdgpu_vm_bo_cs:
soffs=000010c1e0, eoffs=000010c2e1, flags=70
     amdgpu_cs:0-806   [012] ....  1787.493131: drm_sched_job:
entity=00000000406345a7, id=10239, fence=000000007a120377, ring=gfx, job
count:8, hw job count:0

And later in the file you can find:
------------------------------------------------------
crash detected!

executing umr -O halt_waves -wa
No active waves!

executing umr -O verbose -R gfx[.]

polaris11.gfx.rptr == 512
polaris11.gfx.wptr == 512
polaris11.gfx.drv_wptr == 512
polaris11.gfx.ring[ 481] == 0xffff1000    ... 
polaris11.gfx.ring[ 482] == 0xffff1000    ... 
polaris11.gfx.ring[ 483] == 0xffff1000    ... 
polaris11.gfx.ring[ 484] == 0xffff1000    ... 
polaris11.gfx.ring[ 485] == 0xffff1000    ... 
polaris11.gfx.ring[ 486] == 0xffff1000    ... 
polaris11.gfx.ring[ 487] == 0xffff1000    ... 
polaris11.gfx.ring[ 488] == 0xffff1000    ... 
polaris11.gfx.ring[ 489] == 0xffff1000    ... 
polaris11.gfx.ring[ 490] == 0xffff1000    ... 
polaris11.gfx.ring[ 491] == 0xffff1000    ... 
polaris11.gfx.ring[ 492] == 0xffff1000    ... 
polaris11.gfx.ring[ 493] == 0xffff1000    ... 
polaris11.gfx.ring[ 494] == 0xffff1000    ... 
polaris11.gfx.ring[ 495] == 0xffff1000    ... 
polaris11.gfx.ring[ 496] == 0xffff1000    ... 
polaris11.gfx.ring[ 497] == 0xffff1000    ... 
polaris11.gfx.ring[ 498] == 0xffff1000    ... 
polaris11.gfx.ring[ 499] == 0xffff1000    ... 
polaris11.gfx.ring[ 500] == 0xffff1000    ... 
polaris11.gfx.ring[ 501] == 0xffff1000    ... 
polaris11.gfx.ring[ 502] == 0xffff1000    ... 
polaris11.gfx.ring[ 503] == 0xffff1000    ... 
polaris11.gfx.ring[ 504] == 0xffff1000    ... 
polaris11.gfx.ring[ 505] == 0xffff1000    ... 
polaris11.gfx.ring[ 506] == 0xffff1000    ... 
polaris11.gfx.ring[ 507] == 0xffff1000    ... 
polaris11.gfx.ring[ 508] == 0xffff1000    ... 
polaris11.gfx.ring[ 509] == 0xffff1000    ... 
polaris11.gfx.ring[ 510] == 0xffff1000    ... 
polaris11.gfx.ring[ 511] == 0xffff1000    ... 
polaris11.gfx.ring[ 512] == 0xc0032200    rwD 


trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'

done after crash.
-------------------------------------------

So even without GPU reset, still no "waves". And the error message also does
not state any VM fault address.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4724 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (57 preceding siblings ...)
  2018-08-22  0:24 ` bugzilla-daemon
@ 2018-08-22  0:26 ` bugzilla-daemon
  2018-08-22 14:33 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-22  0:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #59 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141228
  --> https://bugs.freedesktop.org/attachment.cgi?id=141228&action=edit
latest crash trace output, without gpu_reset

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1400 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (58 preceding siblings ...)
  2018-08-22  0:26 ` bugzilla-daemon
@ 2018-08-22 14:33 ` bugzilla-daemon
  2018-08-22 22:18 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-22 14:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4220 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #60 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #58)
> Here comes another trace log, with your info2.patch applied.
> 
> Something must have changed since the last test, as it took pretty long this
> time to reproduce the crash. Could that have been caused by
> https://cgit.freedesktop.org/~agd5f/linux/commit/drivers/gpu/drm/amd/amdgpu/
> nbio_v7_4.c?h=amd-staging-drm-
> next&id=b385925f3922faca7435e50e31380bb2602fd6b8 now being part of the
> kernel?

Don't think it's related. This code is more related to virtualization.

> 
> However, the latest trace you find attached below is not much different to
> the last one, xzcat /tmp/gpu_debug5.txt.xz  | grep '^\[' will tell you:
> 
> [ 1510.023112] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0
> timeout, signaled seq=475104, emitted seq=475106
> [ 1510.023117] [drm] GPU recovery disabled.

That just means you are again running with GPU VM update mode set to use SDMA.
Which is seen in you dmesg (amdgpu.vm_update_mode=0) , so are again
experiencing the original issue of SDMA hang. Please use
amdgpu.vm_update_mode=3 to get back to VM_FAULTs issue.

> 
>      amdgpu_cs:0-806   [012] ....  1787.493126: amdgpu_vm_bo_cs:
> soffs=00001001a0, eoffs=00001001b9, flags=70
>      amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
> soffs=0000100200, eoffs=00001021e0, flags=70
>      amdgpu_cs:0-806   [012] ....  1787.493127: amdgpu_vm_bo_cs:
> soffs=0000102200, eoffs=00001041e0, flags=70
>      amdgpu_cs:0-806   [012] ....  1787.493129: amdgpu_vm_bo_cs:
> soffs=000010c1e0, eoffs=000010c2e1, flags=70
>      amdgpu_cs:0-806   [012] ....  1787.493131: drm_sched_job:
> entity=00000000406345a7, id=10239, fence=000000007a120377, ring=gfx, job
> count:8, hw job count:0
> 
> And later in the file you can find:
> ------------------------------------------------------
> crash detected!
> 
> executing umr -O halt_waves -wa
> No active waves!
> 
> executing umr -O verbose -R gfx[.]
> 
> polaris11.gfx.rptr == 512
> polaris11.gfx.wptr == 512
> polaris11.gfx.drv_wptr == 512
> polaris11.gfx.ring[ 481] == 0xffff1000    ... 
> polaris11.gfx.ring[ 482] == 0xffff1000    ... 
> polaris11.gfx.ring[ 483] == 0xffff1000    ... 
> polaris11.gfx.ring[ 484] == 0xffff1000    ... 
> polaris11.gfx.ring[ 485] == 0xffff1000    ... 
> polaris11.gfx.ring[ 486] == 0xffff1000    ... 
> polaris11.gfx.ring[ 487] == 0xffff1000    ... 
> polaris11.gfx.ring[ 488] == 0xffff1000    ... 
> polaris11.gfx.ring[ 489] == 0xffff1000    ... 
> polaris11.gfx.ring[ 490] == 0xffff1000    ... 
> polaris11.gfx.ring[ 491] == 0xffff1000    ... 
> polaris11.gfx.ring[ 492] == 0xffff1000    ... 
> polaris11.gfx.ring[ 493] == 0xffff1000    ... 
> polaris11.gfx.ring[ 494] == 0xffff1000    ... 
> polaris11.gfx.ring[ 495] == 0xffff1000    ... 
> polaris11.gfx.ring[ 496] == 0xffff1000    ... 
> polaris11.gfx.ring[ 497] == 0xffff1000    ... 
> polaris11.gfx.ring[ 498] == 0xffff1000    ... 
> polaris11.gfx.ring[ 499] == 0xffff1000    ... 
> polaris11.gfx.ring[ 500] == 0xffff1000    ... 
> polaris11.gfx.ring[ 501] == 0xffff1000    ... 
> polaris11.gfx.ring[ 502] == 0xffff1000    ... 
> polaris11.gfx.ring[ 503] == 0xffff1000    ... 
> polaris11.gfx.ring[ 504] == 0xffff1000    ... 
> polaris11.gfx.ring[ 505] == 0xffff1000    ... 
> polaris11.gfx.ring[ 506] == 0xffff1000    ... 
> polaris11.gfx.ring[ 507] == 0xffff1000    ... 
> polaris11.gfx.ring[ 508] == 0xffff1000    ... 
> polaris11.gfx.ring[ 509] == 0xffff1000    ... 
> polaris11.gfx.ring[ 510] == 0xffff1000    ... 
> polaris11.gfx.ring[ 511] == 0xffff1000    ... 
> polaris11.gfx.ring[ 512] == 0xc0032200    rwD 
> 
> 
> trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
> trying to get VMID from dmesg output for 'umr -O verbose -vm ...'
> 
> done after crash.
> -------------------------------------------
> 
> So even without GPU reset, still no "waves". And the error message also does
> not state any VM fault address.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5595 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (59 preceding siblings ...)
  2018-08-22 14:33 ` bugzilla-daemon
@ 2018-08-22 22:18 ` bugzilla-daemon
  2018-08-22 22:18 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-22 22:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1319 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #61 from dwagner <jb5sgc1n.nya@20mm.eu> ---
> Please use amdgpu.vm_update_mode=3 to get back to VM_FAULTs issue.

The "good" news is that reproduction of the crashes with 3-fps-video-replay is
very quick when using amdgpu.vm_update_mode=3.

But the bad news is that I have not been able to get useful error output when
using vm_update_mode=3.

At first I tried with also amdgpu.vm_debug=1, and with that in 10 crashes not a
single error output line was emitted to either the ssh channel or the system
journal.

I then tried with amdgpu.vm_debug=0, and while a few error lines output become
logged, then, not quite anything useful - see also in attached example:

[  912.447139] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=12818, emitted seq=12819
[  912.447145] [drm] GPU recovery disabled.

These are the only lines indicating the error, not even the
 echo "crash detected!"
after the
 "dmesg -w | tee /dev/tty | grep -m 1 -e "amdgpu.*GPU" -e "amdgpu.*ERROR"
gets emitted, much less the theoretically following umr commands.

What could I do to not let the kernel die so quickly when using
amdgpu.vm_update_mode=3?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2279 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (60 preceding siblings ...)
  2018-08-22 22:18 ` bugzilla-daemon
@ 2018-08-22 22:18 ` bugzilla-daemon
  2018-09-19 23:35 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-08-22 22:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #62 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 141243
  --> https://bugs.freedesktop.org/attachment.cgi?id=141243&action=edit
crash trace with amdgpu.vm_update_mode=3

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1388 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (61 preceding siblings ...)
  2018-08-22 22:18 ` bugzilla-daemon
@ 2018-09-19 23:35 ` bugzilla-daemon
  2018-09-19 23:35 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-09-19 23:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 345 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #63 from Anthony Ruhier <a_ruhier@hotmail.com> ---
FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have been
fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1236 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (62 preceding siblings ...)
  2018-09-19 23:35 ` bugzilla-daemon
@ 2018-09-19 23:35 ` bugzilla-daemon
  2018-09-23 22:04 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-09-19 23:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 436 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #64 from Anthony Ruhier <a_ruhier@hotmail.com> ---
(In reply to Anthony Ruhier from comment #63)
> FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have
> been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed.

Forgot to say that I have a vega 64.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1402 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (63 preceding siblings ...)
  2018-09-19 23:35 ` bugzilla-daemon
@ 2018-09-23 22:04 ` bugzilla-daemon
  2018-09-23 23:42 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-09-23 22:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 669 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #65 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Anthony Ruhier from comment #63)
> FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have
> been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed.

Unluckily, I cannot confirm either observation: The current
amd-staging-drm-next git head still crashes on me quickly, still well
reproduceable with the 3-fps-video-replay test.

And going into S3 suspend does not work for me with the current
amd-staging-drm-next either.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1628 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (64 preceding siblings ...)
  2018-09-23 22:04 ` bugzilla-daemon
@ 2018-09-23 23:42 ` bugzilla-daemon
  2018-09-25 12:11 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-09-23 23:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 904 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #66 from Anthony Ruhier <a_ruhier@hotmail.com> ---
(In reply to dwagner from comment #65)
> (In reply to Anthony Ruhier from comment #63)
> > FYI, I also had this bug under linux 4.17 and 4.18, but it seems to have
> > been fixed in 4.19-rc3. The suspend/hibernate issue has also been fixed.
> 
> Unluckily, I cannot confirm either observation: The current
> amd-staging-drm-next git head still crashes on me quickly, still well
> reproduceable with the 3-fps-video-replay test.
> 
> And going into S3 suspend does not work for me with the current
> amd-staging-drm-next either.

Last time I tested, amd-staging-drm-next seemed to be based on 4.19-rc1, on
which I had the issue too. I switched to vanilla 4.19-rc4 (now -rc5) and it was
fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1941 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (65 preceding siblings ...)
  2018-09-23 23:42 ` bugzilla-daemon
@ 2018-09-25 12:11 ` bugzilla-daemon
  2018-11-14  0:23 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-09-25 12:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 266 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #67 from Roshless <roshless@gmail.com> ---
Tried on 4.19-rc5, still crashes for me after about 2-3 days (of 6-12h use)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1149 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (66 preceding siblings ...)
  2018-09-25 12:11 ` bugzilla-daemon
@ 2018-11-14  0:23 ` bugzilla-daemon
  2018-11-15 23:37 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-11-14  0:23 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2095 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #68 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Tested today's current amd-staging-drm-next git head, to see if there has been
any improvement over the last two months.

The bad news: The 3-fps-video-replay test still crashes the driver reproducably
after few minutes, as long as the default automatic power management is active.

The mediocre news: At least it looks as if the linux kernel now survives the
driver crash to some extent, I found messages in the journal like this:

Nov 14 00:59:36 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=22008, emitted seq=22010
Nov 14 00:59:36 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:37 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma1 timeout, signaled seq=107, emitted seq=109
Nov 14 00:59:37 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:40 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=22008, emitted seq=22010
Nov 14 00:59:40 ryzen kernel: [drm] GPU recovery disabled.
Nov 14 00:59:41 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma1 timeout, signaled seq=107, emitted seq=109

... and so on repeating for several minutes after the screen went blank.

Will test tomorrow if this means I can now collect the diagnostics outputs that
were asked for earlier.

Some good news: S3 suspends/resumes are working fine right now. There are some
scary messages emitted upon resume, but they do not seem to have bad
consequences:

[  281.465654] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read EDID
[  281.490719] [drm:emulated_link_detect [amdgpu]] *ERROR* Failed to read EDID
[  282.006225] [drm] Fence fallback timer expired on ring sdma0
[  282.512879] [drm] Fence fallback timer expired on ring sdma0
[  282.556651] [drm] UVD and UVD ENC initialized successfully.
[  282.657771] [drm] VCE initialized successfully.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2979 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (67 preceding siblings ...)
  2018-11-14  0:23 ` bugzilla-daemon
@ 2018-11-15 23:37 ` bugzilla-daemon
  2018-11-15 23:38 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-11-15 23:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #69 from dwagner <jb5sgc1n.nya@20mm.eu> ---
As promised in above comment, today I ran my debug script "gpu_debug4.sh" to
obtain the diagnostic output after the crash as requested above.
This output is in attached "gpu_debug4_output.txt".
Since the trace output, the "dmesg -w" output and stdout are written to the
same file, they are roughly chronologic.

If you want to look only at the dmesg-output, use
> grep '^\[' gpu_debug4_output.txt

(gpu_debug4.sh is a slight variation of earlier gpu_debug3.sh, just writing to
a local log file.)

BTW: I ran the script multiple times, crashes occurred after 5 to 300 seconds,
the diagnostic output always looked like in attached gpu_debug4_output.txt.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1801 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (68 preceding siblings ...)
  2018-11-15 23:37 ` bugzilla-daemon
@ 2018-11-15 23:38 ` bugzilla-daemon
  2018-11-15 23:39 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-11-15 23:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 303 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #70 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 142483
  --> https://bugs.freedesktop.org/attachment.cgi?id=142483&action=edit
test script

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1301 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (69 preceding siblings ...)
  2018-11-15 23:38 ` bugzilla-daemon
@ 2018-11-15 23:39 ` bugzilla-daemon
  2018-12-17 22:56 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-11-15 23:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 316 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #71 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 142484
  --> https://bugs.freedesktop.org/attachment.cgi?id=142484&action=edit
gpu_debug4_output.txt.gz

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1340 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (70 preceding siblings ...)
  2018-11-15 23:39 ` bugzilla-daemon
@ 2018-12-17 22:56 ` bugzilla-daemon
  2018-12-22 20:41 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-12-17 22:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 443 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #72 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just for the record, since another month has passed: I can still reproduce the
crash with today's git head of amd-staging-drm-next within minutes. (Also using
the very latest firmware files from
https://people.freedesktop.org/~agd5f/radeon_ucode/ )

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1393 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (71 preceding siblings ...)
  2018-12-17 22:56 ` bugzilla-daemon
@ 2018-12-22 20:41 ` bugzilla-daemon
  2018-12-24 12:56 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-12-22 20:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 856 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #73 from Jānis Jansons <janhouse@gmail.com> ---
Someone suggested I buy Ryzen 2400G APU, but almost every time some network lag
happens while watching TV stream through Kodi and FPS of that video goes to 0,
display just freezes and you have to power cycle the computer.
There is no space for external graphics card in my case and I don't want the
increased power consumption, so at this point I'm just considering switch to
Intel CPU.

I have been following this case for 4 months now with hope that it would move
forward a bit but it seems stuck.

I can give additional dumps and test some patches if that would help but seems
like others have given plenty of information on how to reproduce it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1745 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (72 preceding siblings ...)
  2018-12-22 20:41 ` bugzilla-daemon
@ 2018-12-24 12:56 ` bugzilla-daemon
  2018-12-24 14:49 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-12-24 12:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #74 from fin4478@hotmail.com ---
The Firefox browser requires the pulseaudio driver. Use the Alsa audio and the
chrome/chromium browser. Disable hardware acceleration in browser settings.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1204 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (73 preceding siblings ...)
  2018-12-24 12:56 ` bugzilla-daemon
@ 2018-12-24 14:49 ` bugzilla-daemon
  2019-01-19 17:01 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2018-12-24 14:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 825 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #75 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Audio is unrelated to this bug. In my reproduction scripts, I do not output any
audio at all. 

The video-at-3-fps replay that I use for reproduction seems to just trigger a
certain pattern of the memory- and shader-clocks getting increased/decreased
(with dynamic power management being enabled) that makes the occurrence of this
bug likely. Any other GPU-usage pattern that triggers a lot of memory/shader
clock changes seems to also increase the crash likelihood - manual use of some
web-browser where GPU load spikes are caused a few times per second seems to be
also a scenario where this bug is triggered now and then.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1709 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (74 preceding siblings ...)
  2018-12-24 14:49 ` bugzilla-daemon
@ 2019-01-19 17:01 ` bugzilla-daemon
  2019-02-16 15:06 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-01-19 17:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1494 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #76 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just for the record, since another month has passed: I can still reproduce the
crash with today's git head of amd-staging-drm-next within minutes.

As a bonus bug, with today's git head I also get unexplainable "minimal" memory
and shader clock values - and a doubled power consumption (12W instead of 6W)
for my default 3840x2160 60Hz display mode in comparison to last month's
drm-next of the day:

> cd /sys/class/drm/card0/device

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate 30
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:0: 300Mhz *
pp_dpm_sclk:0: 214Mhz *

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate 50
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:1: 1750Mhz *
pp_dpm_sclk:1: 481Mhz *

> xrandr --output HDMI-A-0 --mode 3840x2160 --rate 60
> echo manual >power_dpm_force_performance_level
> echo 0 >pp_dpm_mclk
> echo 0 >pp_dpm_sclk
> grep -H \\* pp_dpm_mclk pp_dpm_sclk
pp_dpm_mclk:0: 300Mhz *
pp_dpm_sclk:6: 1180Mhz *

But that power consumption issue is negligible in comparison to the
show-stopping crashes that are the topic of this bug report.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2575 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (75 preceding siblings ...)
  2019-01-19 17:01 ` bugzilla-daemon
@ 2019-02-16 15:06 ` bugzilla-daemon
  2019-04-11  6:40 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-02-16 15:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 784 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #77 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Since another month has passed: I can still reproduce the crash with today's
git head of amd-staging-drm-next (and an up-to-date Arch Linux) within minutes
by replaying a video at 3 fps.

Additional new bonus bugs this time:
- system consistently hangs at soft-reboots if X11 was started before
- system crashes immediately upon X11 start if vm_update_mode=3 is used
- system crashes if the HDMI-connected TV is shut off while screen blanking

Again, the bonus bugs are either irrelevant in comparison to the instability
this report is about or have been reported already by others.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1668 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (76 preceding siblings ...)
  2019-02-16 15:06 ` bugzilla-daemon
@ 2019-04-11  6:40 ` bugzilla-daemon
  2019-04-12 22:11 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-04-11  6:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 692 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #78 from Mauro Gaspari <ilvipero@gmx.com> ---
Hi, I am affected by similar issues too using AMDGPU drivers on linux, and I
have opened another bug, before finding this.
You can have a look at my findings and the workarounds I am applying. So far I
had good success with those, but I am interested in knowing your thoughts,
recommendations, and feedback.

Also if the bug I opened is a duplicate of this one, feel free to let me know
and I will mark it as duplicate.

https://bugs.freedesktop.org/show_bug.cgi?id=109955

Cheers
Mauro

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1731 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (77 preceding siblings ...)
  2019-04-11  6:40 ` bugzilla-daemon
@ 2019-04-12 22:11 ` bugzilla-daemon
  2019-04-12 23:00 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-04-12 22:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 972 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #79 from Jaap Buurman <jaapbuurman@gmail.com> ---
I am also running into the same issue. I have two questions that might help
tracking down why we are having issues, but not all people that are running a
Vega graphics card.

1)

What is the output of the following command for you guys?

cat /sys/class/drm/card0/device/vbios_version 

I am running the following version:

113-D0500100-103

According to the techpowerup GPU bios database, this is a vega bios that was
replaced two days (!) later by a new version. Perhaps issues were found that
required another bios update? I might install Windows on a spare HDD and try to
flash my Vega to see if that changes anything.

2)

Memory clocking is different for people running multiple monitors. Are you guys
also running multiple monitors by any chance?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1862 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (78 preceding siblings ...)
  2019-04-12 22:11 ` bugzilla-daemon
@ 2019-04-12 23:00 ` bugzilla-daemon
  2019-04-13 13:27 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-04-12 23:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1055 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #80 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Jaap Buurman from comment #79)
> I am also running into the same issue. I have two questions that might help
> tracking down why we are having issues, but not all people that are running
> a Vega graphics card.

As you can see from my initial description, I'm running an RX460, which uses
not a "Vega", but a "Polaris 11" AMD GPU.

> What is the output of the following command for you guys?
> 
> cat /sys/class/drm/card0/device/vbios_version 

"113-BAFFIN_PRO_1606"

I have not heard of any update to this from the vendor - there is just some
unofficial hacked version around (which I do not use) that is said to enable
some switched-off CUs.

> Memory clocking is different for people running multiple monitors. Are you
> guys also running multiple monitors by any chance?

No, I'm using just one 3840x2160 @ 60Hz HDMI display.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2122 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (79 preceding siblings ...)
  2019-04-12 23:00 ` bugzilla-daemon
@ 2019-04-13 13:27 ` bugzilla-daemon
  2019-06-03 20:03 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-04-13 13:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1098 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #81 from Jaap Buurman <jaapbuurman@gmail.com> ---
(In reply to Alex Deucher from comment #14)
> (In reply to dwagner from comment #13)
> > 
> > Much lower shader clocks are used only if I lower the refresh rate of the
> > screen. Is there a reason why the shader clocks should stay high even in the
> > absence of 3d/compute load?
> > 
> 
> Certain display requirements can cause the engine clock to be kept higher as
> well.

In this bug report and another similar one
(https://bugs.freedesktop.org/show_bug.cgi?id=109955), everybody having the
issue seems to be using a setup that requires higher engine clocks in idle
AFAIK. Either high refresh displays, or in my case, multiple monitors. Could
this be part of the issue that seems to trigger this bug? I might be grasping
at straws here, but I have had this problem for as long as I have this Vega64
(bought at launch), while it is 100% stable under Windows 10 in the same setup.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2293 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (80 preceding siblings ...)
  2019-04-13 13:27 ` bugzilla-daemon
@ 2019-06-03 20:03 ` bugzilla-daemon
  2019-07-08  7:51 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-06-03 20:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1437 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #82 from Matt Coffin <mcoffin13@gmail.com> ---
I am also experiencing this issue.

* Kernel: 5.1.3-arch2-1-ARCH
* LLVM 8.0.0
* AMDVLK (dev branch pulled 20190602)
* Mesa 19.0.4
* Card: XFX Radeon RX 590

I've seen this error, bug 105733, bug 105152, bug 107536, and bug 109955 all
repeatable (which one each time appears to be non-deterministic) with the same
process.

I just launch "House Flipper" from Steam (DX11 title), with DXVK 1.2.1, on
either the mesa RADV or AMDVLK vulkan implementations.

At 2560x1440 resolution (both 60Hz and 144Hz refresh rates), the crash(es)
occur. At 1080p@60Hz, I get no crashes, but they come back if I disable v-sync
and framerate limiting.

I logged power consumption with `sensors | egrep '^power' | awk '{ print $1 " "
$2; }'`, and found that the crash often occurs soon after the card hits its
maximum power draw at around 190W.

I don't have much experience debugging or developing software at the
kernel/driver level, but I'm happy to help with providing information as I go
through the learning process here. I'll compile the amd-staging-drm-next kernel
later tonight and post some results and logs.

Please let me know if there's more information I could provide that may be of
use here. Thanks for your hard work!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3116 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (81 preceding siblings ...)
  2019-06-03 20:03 ` bugzilla-daemon
@ 2019-07-08  7:51 ` bugzilla-daemon
  2019-07-09  7:38 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-07-08  7:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 987 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #83 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Jaap Buurman from comment #81)
> issue seems to be using a setup that requires higher engine clocks in idle
> AFAIK. Either high refresh displays, or in my case, multiple monitors. Could
> this be part of the issue that seems to trigger this bug? I might be
> grasping at straws here, but I have had this problem for as long as I have
> this Vega64 (bought at launch), while it is 100% stable under Windows 10 in
> the same setup.

This might be true. I was running i3 with xrandr set to 144hz when the freeze
scenario began (somewhat last mont, did not "game" much before). Than switched
to icewm to test and issue was gone. Later when i configured icewm to also have
proper xrandr setting issue comes back. I didnt know that could be related.
Will test this tonight.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1970 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (82 preceding siblings ...)
  2019-07-08  7:51 ` bugzilla-daemon
@ 2019-07-09  7:38 ` bugzilla-daemon
  2019-07-09 21:50 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-07-09  7:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1117 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #84 from Wilko Bartels <me@jasondaigo.de> ---
(In reply to Wilko Bartels from comment #83)
> (In reply to Jaap Buurman from comment #81)
> > issue seems to be using a setup that requires higher engine clocks in idle
> > AFAIK. Either high refresh displays, or in my case, multiple monitors. Could
> > this be part of the issue that seems to trigger this bug? I might be
> > grasping at straws here, but I have had this problem for as long as I have
> > this Vega64 (bought at launch), while it is 100% stable under Windows 10 in
> > the same setup.
> 
> This might be true. I was running i3 with xrandr set to 144hz when the
> freeze scenario began (somewhat last mont, did not "game" much before). Than
> switched to icewm to test and issue was gone. Later when i configured icewm
> to also have proper xrandr setting issue comes back. I didnt know that could
> be related. Will test this tonight.

nevermind. it crashed on 60hz as well (once) yesterday

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2180 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (83 preceding siblings ...)
  2019-07-09  7:38 ` bugzilla-daemon
@ 2019-07-09 21:50 ` bugzilla-daemon
  2019-09-07  5:42 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-07-09 21:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 667 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #85 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Wilko Bartels from comment #84)
> nevermind. it crashed on 60hz as well (once) yesterday

It sure does. This bug is now about two years old, during which amdgpu has
never been stable, got worse, and every contemporary kernel, whether "official"
ones or ones compiled from git heads of development trees has this very
problem, which I can reproduce within minutes.

I've given up hoping for a fix. I'll buy an Intel Xe GPU as soon as it hits the
shelves.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1633 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (84 preceding siblings ...)
  2019-07-09 21:50 ` bugzilla-daemon
@ 2019-09-07  5:42 ` bugzilla-daemon
  2019-09-12 23:09 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-09-07  5:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 964 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #86 from Paul Ezvan <paul@ezvan.fr> ---
I was also impacted by this bug (amdgpu hangs on random conditions with similar
messages as the one exposed) with any kernel/mesa version combination other
than the ones on Debian Stretch (any other distro or using Mesa from backports
would trigger those crashes).
This was on a Ryzen 1700 platform with chipset B450. I had this issue with a
RX480 and a RX560 (as I tried to replace the GPU in case it was faulty, I also
replace the motherboard).

I was still impacted with Fedora 30 with recurring GPU hangs. Then I replaced
the CPU/motherboard with a Core i7-9700k/Z390 platform. Since then I did not
have a single GPU hang on Fedora 30.

My hypothesis on this problem not being easily reproducible is that it would
happen only on specific GPU/CPU combinations.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1844 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (85 preceding siblings ...)
  2019-09-07  5:42 ` bugzilla-daemon
@ 2019-09-12 23:09 ` bugzilla-daemon
  2019-09-25 21:37 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-09-12 23:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 838 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #87 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Paul Ezvan from comment #86)
> My hypothesis on this problem not being easily reproducible is that it would
> happen only on specific GPU/CPU combinations.

... and at least a specific operating system (Linux) and a specific driver
(amdgpu with dc=1).

If your hypothesis was true - do you suggest everyone plagued by this bug just
buys a new main-board and an Intel CPU to evade it?

Since my Ryzen system is perfectly stable when used as a server, not displaying
anything but the text console, I'm inclined to rather keep my main-board and
CPU and just exchange the GPU for another brand that comes with stable drivers.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1797 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (86 preceding siblings ...)
  2019-09-12 23:09 ` bugzilla-daemon
@ 2019-09-25 21:37 ` bugzilla-daemon
  2019-09-26  8:35 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-09-25 21:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1366 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #88 from jeroenimo <freedesktop@jeroenimo.nl> ---
Found this thread while googling the error from the log.

AMD Ryzen 3600
Asrock B350 motherboard
ASrock RX560 Radeon GPU


Ubuntu and Xubuntu  18.04 and 19.04 both lockups so not useable, after login
almost imminent black screen, ssh access still possible. Seems a newer kernel
and mesa drivers. sometimes 5 min , sometimes after 2 secomds

Linux mint 19.2
Seems a lot more stable but so far only  1 lockup with black screen

uname -a
Linux jeroenimo-amd 4.15.0-64-generic #73-Ubuntu SMP Thu Sep 12 13:16:13 UTC
2019 x86_64 x86_64 x86_64 GNU/Linux


Last log from mint:

Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207322]
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:43:crtc-0] flip_done timed out
Sep 25 23:01:57 jeroenimo-amd kernel: [ 4980.207331]
[drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR*
[CRTC:45:crtc-1] flip_done timed out
Sep 25 23:02:07 jeroenimo-amd kernel: [ 4990.451366]
[drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR*
[CRTC:43:crtc-0] flip_done timed out

 I suspect I'm in the same trouble as most.

Win 10 flawless so it's really software..

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2256 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (87 preceding siblings ...)
  2019-09-25 21:37 ` bugzilla-daemon
@ 2019-09-26  8:35 ` bugzilla-daemon
  2019-09-26 12:29 ` bugzilla-daemon
  2019-11-19  8:22 ` bugzilla-daemon
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-09-26  8:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 277 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #89 from jeroenimo <freedesktop@jeroenimo.nl> ---
I found a way to crash the system with glmark2
It almost instantly crashes it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1167 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (88 preceding siblings ...)
  2019-09-26  8:35 ` bugzilla-daemon
@ 2019-09-26 12:29 ` bugzilla-daemon
  2019-11-19  8:22 ` bugzilla-daemon
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-09-26 12:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 899 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #90 from jeroenimo <freedesktop@jeroenimo.nl> ---
I managed to run glmark2 without crashing the system with 

By running the card manual at lowest frequency

from root shell:
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 0 > /sys/class/drm/card0/device/pp_dpm_sclk

root@jeroenimo-amd:/home/jeroen# cat /sys/class/drm/card0/device/pp_dpm_sclk 
0: 214Mhz *
1: 387Mhz 
2: 843Mhz 
3: 995Mhz 
4: 1062Mhz 
5: 1108Mhz 
6: 1149Mhz 
7: 1176Mhz 
root@jeroenimo-amd:/home/jeroen# 

If I go to higher e.g. 2: 843Mhz I manage to crash it.. although it takes a
while before it crashes. 

when I force the card to anything above 4 I get an immediate crash without even
starting glmark2

I hope this helps!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1803 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!
  2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
                   ` (89 preceding siblings ...)
  2019-09-26 12:29 ` bugzilla-daemon
@ 2019-11-19  8:22 ` bugzilla-daemon
  90 siblings, 0 replies; 92+ messages in thread
From: bugzilla-daemon @ 2019-11-19  8:22 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 806 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=102322

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |MOVED

--- Comment #91 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/226.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2529 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2019-11-19  8:22 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-20 22:53 [Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung! bugzilla-daemon
2017-11-19 16:40 ` bugzilla-daemon
2018-02-24 18:36 ` bugzilla-daemon
2018-06-03 21:00 ` bugzilla-daemon
2018-06-03 21:02 ` bugzilla-daemon
2018-06-25 21:43 ` bugzilla-daemon
2018-06-25 22:11 ` bugzilla-daemon
2018-06-25 23:08 ` bugzilla-daemon
2018-06-26 15:20 ` bugzilla-daemon
2018-06-26 15:21 ` bugzilla-daemon
2018-06-26 22:52 ` bugzilla-daemon
2018-06-27  7:48 ` bugzilla-daemon
2018-06-27 13:53 ` bugzilla-daemon
2018-06-27 23:15 ` bugzilla-daemon
2018-06-28  2:17 ` bugzilla-daemon
2018-06-28  4:17 ` bugzilla-daemon
2018-06-28  4:36 ` bugzilla-daemon
2018-06-28 10:33 ` bugzilla-daemon
2018-06-28 19:56 ` bugzilla-daemon
2018-06-28 21:09 ` bugzilla-daemon
2018-06-28 22:56 ` bugzilla-daemon
2018-06-28 22:57 ` bugzilla-daemon
2018-06-29  0:10 ` bugzilla-daemon
2018-07-04 23:03 ` bugzilla-daemon
2018-07-05 13:59 ` bugzilla-daemon
2018-07-05 23:32 ` bugzilla-daemon
2018-07-06 23:20 ` bugzilla-daemon
2018-07-07  8:36 ` bugzilla-daemon
2018-07-07 20:08 ` bugzilla-daemon
2018-07-09 14:34 ` bugzilla-daemon
2018-07-11 22:32 ` bugzilla-daemon
2018-07-15  8:56 ` bugzilla-daemon
2018-07-15  9:03 ` bugzilla-daemon
2018-07-15  9:07 ` bugzilla-daemon
2018-07-15 19:59 ` bugzilla-daemon
2018-07-16 14:06 ` bugzilla-daemon
2018-07-29 10:02 ` bugzilla-daemon
2018-08-08 23:07 ` bugzilla-daemon
2018-08-09 20:56 ` bugzilla-daemon
2018-08-14 21:27 ` bugzilla-daemon
2018-08-15 14:24 ` bugzilla-daemon
2018-08-15 22:03 ` bugzilla-daemon
2018-08-16 21:53 ` bugzilla-daemon
2018-08-16 21:55 ` bugzilla-daemon
2018-08-16 21:56 ` bugzilla-daemon
2018-08-16 21:57 ` bugzilla-daemon
2018-08-16 22:31 ` bugzilla-daemon
2018-08-17 21:25 ` bugzilla-daemon
2018-08-18 21:36 ` bugzilla-daemon
2018-08-18 21:37 ` bugzilla-daemon
2018-08-18 21:38 ` bugzilla-daemon
2018-08-18 21:40 ` bugzilla-daemon
2018-08-18 21:43 ` bugzilla-daemon
2018-08-20 14:16 ` bugzilla-daemon
2018-08-21  8:41 ` bugzilla-daemon
2018-08-21 14:43 ` bugzilla-daemon
2018-08-21 21:16 ` bugzilla-daemon
2018-08-21 21:29 ` bugzilla-daemon
2018-08-22  0:24 ` bugzilla-daemon
2018-08-22  0:26 ` bugzilla-daemon
2018-08-22 14:33 ` bugzilla-daemon
2018-08-22 22:18 ` bugzilla-daemon
2018-08-22 22:18 ` bugzilla-daemon
2018-09-19 23:35 ` bugzilla-daemon
2018-09-19 23:35 ` bugzilla-daemon
2018-09-23 22:04 ` bugzilla-daemon
2018-09-23 23:42 ` bugzilla-daemon
2018-09-25 12:11 ` bugzilla-daemon
2018-11-14  0:23 ` bugzilla-daemon
2018-11-15 23:37 ` bugzilla-daemon
2018-11-15 23:38 ` bugzilla-daemon
2018-11-15 23:39 ` bugzilla-daemon
2018-12-17 22:56 ` bugzilla-daemon
2018-12-22 20:41 ` bugzilla-daemon
2018-12-24 12:56 ` bugzilla-daemon
2018-12-24 14:49 ` bugzilla-daemon
2019-01-19 17:01 ` bugzilla-daemon
2019-02-16 15:06 ` bugzilla-daemon
2019-04-11  6:40 ` bugzilla-daemon
2019-04-12 22:11 ` bugzilla-daemon
2019-04-12 23:00 ` bugzilla-daemon
2019-04-13 13:27 ` bugzilla-daemon
2019-06-03 20:03 ` bugzilla-daemon
2019-07-08  7:51 ` bugzilla-daemon
2019-07-09  7:38 ` bugzilla-daemon
2019-07-09 21:50 ` bugzilla-daemon
2019-09-07  5:42 ` bugzilla-daemon
2019-09-12 23:09 ` bugzilla-daemon
2019-09-25 21:37 ` bugzilla-daemon
2019-09-26  8:35 ` bugzilla-daemon
2019-09-26 12:29 ` bugzilla-daemon
2019-11-19  8:22 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.