All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 107154] [drm] GPU recovery disabled.
@ 2018-07-08  9:24 bugzilla-daemon
  2018-07-08 19:00 ` bugzilla-daemon
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-08  9:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2323 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

            Bug ID: 107154
           Summary: [drm] GPU recovery disabled.
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: freedesktop.org@nentwig.biz

Hi!

This is a surprisingly long standing problem with a RX 460, more precisely
since 4.15 all the way up to 4.18 AMD staging DRM next [1]. 
After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead
(always, reliably).
Here's what dmesg has to say about it:

[Sun Jul  8 11:01:17 2018] PM: suspend exit
[Sun Jul  8 11:01:19 2018] [drm:gfx_v8_0_ring_test_ib [amdgpu]] *ERROR* amdgpu:
IB test timed out.
[Sun Jul  8 11:01:19 2018] [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* amdgpu:
failed testing IB on GFX ring (-110).
[Sun Jul  8 11:01:19 2018] [drm:process_one_work] *ERROR* ib ring test failed
(-110).
[Sun Jul  8 11:01:28 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, last signaled seq=864, last emitted seq=868
[Sun Jul  8 11:01:28 2018] [drm] GPU recovery disabled.

>From ealier versions:

[   42.802559] PM: suspend exit
[   42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802
[   42.824338] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x0034F97B
[   42.824341] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0C048002
[   42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739,
read from 'TC0' (0x54433000) (72)
[   52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
last signaled seq=1287, last emitted seq=1289
[   52.956316] [drm] IP block:gfx_v8_0 is hung!
[   52.956362] [drm] GPU recovery disabled.

I've also seen fault 146 but other than that it mostly looks the same. 4.14-lts
(with dc=0) works fine.

RX 460, Zenith Extreme, 1950x.

[1] arch linux AUR; this versioning is a bit confusing, it may actually already
be the 4.19 branch, latest commit is3838e387fd1eb17bfcf6ff7d443d931adb5cb41b

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3602 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
@ 2018-07-08 19:00 ` bugzilla-daemon
  2018-07-08 20:03 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-08 19:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1224 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #1 from dwagner <jb5sgc1n.nya@20mm.eu> ---
Indeed, crashes upon S3 resumes have been abundant with amdgpu.dc=1 for many
months now, and seemingly for more than one reason.

One bug I reported in August 2017 with
https://bugs.freedesktop.org/show_bug.cgi?id=102323 - that one was fixed
quickly.

The next S3 resume crash I reported in October 2017 in
https://bugs.freedesktop.org/show_bug.cgi?id=103277, that one stayed without
any resolution until April 2018, and the fix found in that report only works if
no "drm.edid_firmware=..." kernel command line option is used.

Another crash bug with S3 resumes I reported for 4.17.2 kernels in
https://bugs.freedesktop.org/show_bug.cgi?id=107065 - then realized that 4.18
pre-releases exhibit the very same kind of crash immediately upon starting X11.
For this crash upon X11 startup, there is a patch in the bug report, but it
does not prevent the S3 resume crash.

I currently work around S3 resume crashes by switching to the console display
before enterin S3 sleep - but this is really an awkward work-around.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2630 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
  2018-07-08 19:00 ` bugzilla-daemon
@ 2018-07-08 20:03 ` bugzilla-daemon
  2018-07-09  8:53 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-08 20:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 548 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #2 from freedesktop.org@nentwig.biz ---
(In reply to dwagner from comment #1)
> I currently work around S3 resume crashes by switching to the console
> display before enterin S3 sleep - but this is really an awkward work-around.

Oh, that doesn't help either. It crashes the very moment I switch back to X.

And what's more starting with 4.15 amdgpu.dc=0 doesn't appear to make any
difference.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1359 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
  2018-07-08 19:00 ` bugzilla-daemon
  2018-07-08 20:03 ` bugzilla-daemon
@ 2018-07-09  8:53 ` bugzilla-daemon
  2018-07-09 11:31 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09  8:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 273 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #3 from Michel Dänzer <michel@daenzer.net> ---
Please attach the full dmesg output.

Can you bisect between 4.14 and 4.15?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1015 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (2 preceding siblings ...)
  2018-07-09  8:53 ` bugzilla-daemon
@ 2018-07-09 11:31 ` bugzilla-daemon
  2018-07-09 16:03 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09 11:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 237 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #4 from Christian König <ckoenig.leichtzumerken@gmail.com> ---
Do you have a full dmesg?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 995 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (3 preceding siblings ...)
  2018-07-09 11:31 ` bugzilla-daemon
@ 2018-07-09 16:03 ` bugzilla-daemon
  2018-07-09 16:04 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09 16:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 333 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #5 from freedesktop.org@nentwig.biz ---
Created attachment 140525
  --> https://bugs.freedesktop.org/attachment.cgi?id=140525&action=edit
dmesg amdgpu.dc=1

Booted with amdgpu.dc=1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1196 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (4 preceding siblings ...)
  2018-07-09 16:03 ` bugzilla-daemon
@ 2018-07-09 16:04 ` bugzilla-daemon
  2018-07-09 16:13 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09 16:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 357 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #6 from freedesktop.org@nentwig.biz ---
Created attachment 140526
  --> https://bugs.freedesktop.org/attachment.cgi?id=140526&action=edit
dmesg /etc/modprobe.d/

Booted with amdgpu.dc=1 in /etc/modprobe.d/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (5 preceding siblings ...)
  2018-07-09 16:04 ` bugzilla-daemon
@ 2018-07-09 16:13 ` bugzilla-daemon
  2018-07-09 16:29 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09 16:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 890 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #7 from freedesktop.org@nentwig.biz ---
Sure, attached. AMD staging kernel. I don't know how to tell whether DC=1 is
really enabled, so I did two runs: one with amdgpu.dc=1 as boot parameter and
one with /etc/modprobe.d/ on top of that.

Procedure was the same both times:
- boot
- X login
- switch to console
- sleep, wakeup
- switch to X

The drm/amdgpu lines appear already in the console right after waking up, prior
to switching to X.

This time "only" X crashed (could still move the pointer); at times the
complete machine is dead, no switching to console and and no SSH.

(as a side note: is is normal that waking up on ryzen takes something on the
order of 10-30s? I'm used to split second wakeups on Intel.)

HTH

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1637 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (6 preceding siblings ...)
  2018-07-09 16:13 ` bugzilla-daemon
@ 2018-07-09 16:29 ` bugzilla-daemon
  2018-07-10  7:04 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-09 16:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 452 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #8 from freedesktop.org@nentwig.biz ---
Created attachment 140528
  --> https://bugs.freedesktop.org/attachment.cgi?id=140528&action=edit
dmesg 4.14 LTS

Sorry, forgot about the requested 4.14 dmesg log. Attached as well.

This is: boot, login (to KDE this time), do stuff, remember, sleep, wakeup.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1309 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (7 preceding siblings ...)
  2018-07-09 16:29 ` bugzilla-daemon
@ 2018-07-10  7:04 ` bugzilla-daemon
  2018-09-02 10:26 ` bugzilla-daemon
  2018-09-11 13:38 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-07-10  7:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 641 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

Christian König <ckoenig.leichtzumerken@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from Christian König <ckoenig.leichtzumerken@gmail.com> ---
Yeah, that is a known problem in the PCI subsystem. Will be fixed with 4.19 and
then backported to older kernels.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2112 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (8 preceding siblings ...)
  2018-07-10  7:04 ` bugzilla-daemon
@ 2018-09-02 10:26 ` bugzilla-daemon
  2018-09-11 13:38 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-09-02 10:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 556 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #10 from freedesktop.org@nentwig.biz ---
So, there's 4.19rc1-amd-next \o/

echo: write error: Device or resource busy

This started to happen with 4.18. dmesg:

[  171.245467] Freezing of tasks failed after 20.006 seconds (1 tasks refusing
to freeze, wq_busy=0):
[  171.245484] systemd-udevd   D    0   700    615 0x80000124

So, is this sth. to report to fricking systemd to?


Gee, really...?!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1346 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug 107154] [drm] GPU recovery disabled.
  2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
                   ` (9 preceding siblings ...)
  2018-09-02 10:26 ` bugzilla-daemon
@ 2018-09-11 13:38 ` bugzilla-daemon
  10 siblings, 0 replies; 12+ messages in thread
From: bugzilla-daemon @ 2018-09-11 13:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 477 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=107154

--- Comment #11 from kyle.devir@mykolab.com ---
> systemd-udevd

This is not systemd's fault, but indicative of something hanging in kernel
land, which udevd ends up being blocked on.

Experienced this a few major kernel releases ago, which were resolved by the
next major version. Never did figure out what caused udevd to block... :/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1288 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-09-11 13:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-08  9:24 [Bug 107154] [drm] GPU recovery disabled bugzilla-daemon
2018-07-08 19:00 ` bugzilla-daemon
2018-07-08 20:03 ` bugzilla-daemon
2018-07-09  8:53 ` bugzilla-daemon
2018-07-09 11:31 ` bugzilla-daemon
2018-07-09 16:03 ` bugzilla-daemon
2018-07-09 16:04 ` bugzilla-daemon
2018-07-09 16:13 ` bugzilla-daemon
2018-07-09 16:29 ` bugzilla-daemon
2018-07-10  7:04 ` bugzilla-daemon
2018-09-02 10:26 ` bugzilla-daemon
2018-09-11 13:38 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.