All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
@ 2018-07-07 20:03 bugzilla-daemon
  2018-07-07 20:04 ` bugzilla-daemon
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-07 20:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1734 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

            Bug ID: 107152
           Summary: GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT
                    / ring gfx timeout
           Product: DRI
           Version: DRI git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: jb5sgc1n.nya@20mm.eu

While just doing some Firefox-browsing amdgpu and then the whole system crashed
on me with the following messages emitted to the journal:

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c80440c
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)
Jul 07 01:08:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, last signaled seq=75244, last emitted seq=75245
Jul 07 01:08:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!

Kernel version used: amd-staging-drm-next current as of commit
bb2e406ba66c2573b68e609e148cab57b1447095, with patch 
https://bugs.freedesktop.org/attachment.cgi?id=140418 applied on top of it.

Mesa version: 18.1.3-1 (current from Arch Linux)

(This report was separated from
https://bugs.freedesktop.org/show_bug.cgi?id=102322)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3310 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
@ 2018-07-07 20:04 ` bugzilla-daemon
  2018-07-07 20:05 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-07 20:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 346 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #1 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 140497
  --> https://bugs.freedesktop.org/attachment.cgi?id=140497&action=edit
dmesg of from booting until gpu fault few minutes later

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1375 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
  2018-07-07 20:04 ` bugzilla-daemon
@ 2018-07-07 20:05 ` bugzilla-daemon
  2018-07-30 21:12 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-07 20:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #2 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 140498
  --> https://bugs.freedesktop.org/attachment.cgi?id=140498&action=edit
Xorg.log from the session ending in the "gpu fault"

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1393 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
  2018-07-07 20:04 ` bugzilla-daemon
  2018-07-07 20:05 ` bugzilla-daemon
@ 2018-07-30 21:12 ` bugzilla-daemon
  2018-07-31 21:41 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-30 21:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1097 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #3 from krzysiek@cybulski.info ---
Hi,

I get this GPU hung, 1-2 a day, mostly when using PHPStorm (Java based PHP
Editor)

System is KDE Neon (Ubuntu 18.04 + latest KDE), I use padoka PPA (currently
1:18.2~git180730133900.0ea243d~b~padoka0)

GPU POLARIS11 0x1002:0x67EF 0x1458:0x230A 0xE5
[ 8004.993577] amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000620c
[ 8004.993584] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x00000000
[ 8004.993587] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0806200C
[ 8004.993591] amdgpu 0000:01:00.0: VM fault (0x0c, vmid 4, pasid 32771) at
page 0, read from 'CBC0' (0x43424330) (98)
[ 8263.966497] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=194416, last emitted seq=194418
[ 8263.966510] [drm] IP block:gfx_v8_0 is hung!
[ 8263.966562] amdgpu 0000:01:00.0: GPU reset begin!

If you need more info please let me know
Krzysiek

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1914 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-07-30 21:12 ` bugzilla-daemon
@ 2018-07-31 21:41 ` bugzilla-daemon
  2018-07-31 21:45 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-31 21:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2018 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #4 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Saw this kind of crash (still with the latest amd-staging-drm-next kernel)
three times in a row today, just by playing a specific video immediately after
rebooting and starting X11 with mpv, before the 10 minute video ended.
The video (which just shows a static cover image) can be obtained via:

youtube-dl -f 248+251 'https://www.youtube.com/watch?v=kYKE78Pcjog'

The log messages were just like reported above, I guess the additional "hw_done
or flip_done timed out" after the "GPU reset begin!" is not really relevant:

Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 147
0x0f580402 for process Xorg pid 793 thread amdgpu_cs:0 pid 794
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0010C3EB
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02004002
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x02, vmid 1,
pasid 32768) at page 1098731, read from 'TC3' (0x54433300) (4)
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: GPU fault detected: 146
0x0c984424 for process Xorg pid 793 thread amdgpu_cs:0 pid 794
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100193
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x04044024
Jul 31 22:20:21 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x24, vmid 2,
pasid 32768) at page 1048979, read from 'TC1' (0x54433100) (68)
Jul 31 22:20:25 ryzen kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, signaled seq=26570, emitted seq=26573
Jul 31 22:20:25 ryzen kernel: amdgpu 0000:0a:00.0: GPU reset begin!
Jul 31 22:20:35 ryzen kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:44:crtc-0] hw_done or flip_done timed out

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2923 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-07-31 21:41 ` bugzilla-daemon
@ 2018-07-31 21:45 ` bugzilla-daemon
  2018-08-02 21:25 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-07-31 21:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 674 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #5 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Just in case somebody aims at reproducing this with mpv, this is the content of
the .config/mpv/mpv.conf file in use:

audio-device='alsa/iec958:CARD=Generic,DEV=0'
audio-delay=0.2
fs=no
vo=gpu
gpu-api=auto
profile=gpu-hq
fbo-format=rgba16f
hwdec=no
video-align-y=-1
hidpi-window-scale=no
target-prim=bt.709
tone-mapping=hable
cache=65536
cache-initial=2048
cache-secs=20

So not the amdgpu HDMI audio output was used, and no video decoding hardware
acceleration.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1501 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-07-31 21:45 ` bugzilla-daemon
@ 2018-08-02 21:25 ` bugzilla-daemon
  2018-08-02 21:54 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-02 21:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 347 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #6 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
dwanger, how quickly is this reproducible ?
A wild guess, what if you boot kernel with IOMMU disabled ? Add iommu=off to
grub command line.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1189 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-08-02 21:25 ` bugzilla-daemon
@ 2018-08-02 21:54 ` bugzilla-daemon
  2018-08-03 16:54 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-02 21:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1277 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #7 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #6)
> dwanger, how quickly is this reproducible ?

With the above video playback test (which I should refer to as the "Othan"
test, because that is the name of the song in the video) actually quite fast -
never took more than 10 minutes so far to get to the crash.

> A wild guess, what if you boot kernel with IOMMU disabled ? Add iommu=off to
> grub command line.

Tried this: No difference, two attempts with current amd-staging-drm-next, one
with hw_update_mode=0 and one with hw_update_mode=3, both crashed in < 1 minute
of replay.

Interestingly, the "Othan test" can even crash the 4.13 kernel quicker then the
usual one or two days of uptime I can get with that old kernel.

There isn't really anything special with the video other than it being encoded
at only 6 frames per second.

And btw., the video replay crashes even with --vo=xv, so without mpv making use
of opengl. Replay does not crash with --vo=null. 

In contrast, when I replay videos with the usual 24fps, this runs much longer
without crashing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2232 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-08-02 21:54 ` bugzilla-daemon
@ 2018-08-03 16:54 ` bugzilla-daemon
  2018-08-03 23:42 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-03 16:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1531 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #8 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
dwanger, i think you already have all the trace tools installed from previous
debug sessions so this should be quick for you - 

Update to latest kernel from
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

Load the system and before starting reproduce run the following trace command -

sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"


after VM_FAULT happened extract the log from /sys/kernel/debug/tracing

also run 
sudo umr -O verbose -R gfx[.]
sudo umr -O halt_waves -wa

Now let's say this your log crash 

Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:  
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)

Do

umr -O verbose -vm 7@100190000 1 

where 7 is vmid value and 100190000 is VM_CONTEXT1_PROTECTION_FAULT_ADDR value
with extra '000' to get from  virtual page number to actual virtual address
(left shift 4096b).

I can look at the log then and also run it by our MESA/LLVM experts to try and
figure out what's going on.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2491 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-08-03 16:54 ` bugzilla-daemon
@ 2018-08-03 23:42 ` bugzilla-daemon
  2018-08-03 23:48 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-03 23:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 367 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #9 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Created attachment 140959
  --> https://bugs.freedesktop.org/attachment.cgi?id=140959&action=edit
test script that attempted to catch useful output after crashes - but failed

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1438 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-08-03 23:42 ` bugzilla-daemon
@ 2018-08-03 23:48 ` bugzilla-daemon
  2018-08-05 19:59 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-03 23:48 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1164 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #10 from dwagner <jb5sgc1n.nya@20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #8)
> dwanger, i think you already have all the trace tools installed from
> previous debug sessions so this should be quick for you
Yes, and I tried really hard (with above attached script run as "root" on a
text console while the "othan_test.sh" script played the video on the screen)
to catch any useful output, but that failed for the same reason I mentioned in
https://bugs.freedesktop.org/show_bug.cgi?id=102322#c20
- the system simply crashes too hard to quickly to be able to do anything after
amdgpu.ko crashes. The output I get in gpu_result.txt stops at the "waiting for
the crash" line.

It is only in about 1 out of 10 crashes that the syslog at least contains the
error messages from the amdgpu crash, in the other 90% of cases the same crash
occurs with no message being recorded at all.

If there was any method to let other processes survive for a while after amdgpu
crashes, please let me know.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2308 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-08-03 23:48 ` bugzilla-daemon
@ 2018-08-05 19:59 ` bugzilla-daemon
  2018-08-08 23:13 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-05 19:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1072 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #11 from dwagner <jb5sgc1n.nya@20mm.eu> ---
I did some additional experiments to understand what is so special about the
"Othan" video that playing it causes amdgpu to crash relatively fast.

Since the only "odd" parameter of it is its "6 fps" frame rate, I tried
replaying other videos, first at their normal rate (like 24 fps), which did not
cause quick crashes, then at an artificially lower set rate - and indeed, that
causes fast crashing regardless of what video I play.

The framerate that caused the "quickest" crashing seemed to be 3 fps, running
> mpv --no-correct-pts --fps=3 --ao=null some_arbitrary_video.webm
was usually crashing amd-staging-drm-next within < 1 minute for me.


Just some random thought: Could the reason be some timed hysteresis in power
management of the GPU?
Would there be some possibility to lock the GPU on a specific power level to
then try if those crashes still occur?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1974 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (10 preceding siblings ...)
  2018-08-05 19:59 ` bugzilla-daemon
@ 2018-08-08 23:13 ` bugzilla-daemon
  2018-08-09 16:25 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-08 23:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1191 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #12 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Indeed, I found my theory confirmed by many experiments: If I use a script like
> #!/bin/bash
> cd /sys/class/drm/card0/device
> echo manual >power_dpm_force_performance_level
> # low
> echo 0 >pp_dpm_mclk 
> echo 0 >pp_dpm_sclk
> # medium
> #echo 1 >pp_dpm_mclk 
> #echo 1 >pp_dpm_sclk
> # high
> #echo 1 >pp_dpm_mclk 
> #echo 6 >pp_dpm_sclk
to enforce just any performance level, then the crashes do not occur anymore -
also with the "low frame rate video test".

So it seems that the transition from one "dpm" performance level to another,
with a certain probability, causes these crashes. And the more often the
transitions occur, the sooner one will experience them.

The dynamic power management issue can now be pursued with the original bug
report https://bugs.freedesktop.org/show_bug.cgi?id=102322 for the
vm_update_mode=0 case - there is probably not much sense in keeping this bug
report open just because errors also occur with wm_update_mode=3, just less
often.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2332 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (11 preceding siblings ...)
  2018-08-08 23:13 ` bugzilla-daemon
@ 2018-08-09 16:25 ` bugzilla-daemon
  2018-08-09 20:56 ` bugzilla-daemon
  2019-01-24  6:45 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-09 16:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1310 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #13 from Andrey Grodzovsky <andrey.grodzovsky@amd.com> ---
(In reply to dwagner from comment #12)
> Indeed, I found my theory confirmed by many experiments: If I use a script
> like
> > #!/bin/bash
> > cd /sys/class/drm/card0/device
> > echo manual >power_dpm_force_performance_level
> > # low
> > echo 0 >pp_dpm_mclk 
> > echo 0 >pp_dpm_sclk
> > # medium
> > #echo 1 >pp_dpm_mclk 
> > #echo 1 >pp_dpm_sclk
> > # high
> > #echo 1 >pp_dpm_mclk 
> > #echo 6 >pp_dpm_sclk
> to enforce just any performance level, then the crashes do not occur anymore
> - also with the "low frame rate video test".
> 
> So it seems that the transition from one "dpm" performance level to another,
> with a certain probability, causes these crashes. And the more often the
> transitions occur, the sooner one will experience them.
> 
> The dynamic power management issue can now be pursued with the original bug
> report https://bugs.freedesktop.org/show_bug.cgi?id=102322 for the
> vm_update_mode=0 case - there is probably not much sense in keeping this bug
> report open just because errors also occur with wm_update_mode=3, just less
> often.

Agreed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2585 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (12 preceding siblings ...)
  2018-08-09 16:25 ` bugzilla-daemon
@ 2018-08-09 20:56 ` bugzilla-daemon
  2019-01-24  6:45 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2018-08-09 20:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 555 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

dwagner <jb5sgc1n.nya@20mm.eu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|NEW                         |RESOLVED

--- Comment #14 from dwagner <jb5sgc1n.nya@20mm.eu> ---


*** This bug has been marked as a duplicate of bug 102322 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2340 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
  2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
                   ` (13 preceding siblings ...)
  2018-08-09 20:56 ` bugzilla-daemon
@ 2019-01-24  6:45 ` bugzilla-daemon
  14 siblings, 0 replies; 16+ messages in thread
From: bugzilla-daemon @ 2019-01-24  6:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 424 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107152

--- Comment #15 from Ida Wallace <idawallace89@gmail.com> ---
Thanks for letting us know about the duplicate bug of GPU fault and System
crashes, so solution seekers can refer both references to understand the bug
and try to solve it easily.

Ida,
http://www.assignmenthelpfolks.com/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1368 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-24  6:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-07 20:03 [Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout bugzilla-daemon
2018-07-07 20:04 ` bugzilla-daemon
2018-07-07 20:05 ` bugzilla-daemon
2018-07-30 21:12 ` bugzilla-daemon
2018-07-31 21:41 ` bugzilla-daemon
2018-07-31 21:45 ` bugzilla-daemon
2018-08-02 21:25 ` bugzilla-daemon
2018-08-02 21:54 ` bugzilla-daemon
2018-08-03 16:54 ` bugzilla-daemon
2018-08-03 23:42 ` bugzilla-daemon
2018-08-03 23:48 ` bugzilla-daemon
2018-08-05 19:59 ` bugzilla-daemon
2018-08-08 23:13 ` bugzilla-daemon
2018-08-09 16:25 ` bugzilla-daemon
2018-08-09 20:56 ` bugzilla-daemon
2019-01-24  6:45 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.