All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
@ 2019-05-14  5:55 bugzilla-daemon
  2019-05-14  5:55 ` bugzilla-daemon
                   ` (177 more replies)
  0 siblings, 178 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  5:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 908 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

            Bug ID: 110674
           Summary: Crashes / Resets From AMDGPU / Radeon VII
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: chris@hodapp.email

Created attachment 144254
  --> https://bugs.freedesktop.org/attachment.cgi?id=144254&action=edit
Kernel Log

I'm getting frequent crashes and resets. They seem to occur most often right
after boot, right after login, and right after wake from standby.

See the attachments for more (recommend `less --raw` if working with the color
dmesg).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2309 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
@ 2019-05-14  5:55 ` bugzilla-daemon
  2019-05-14  5:56 ` bugzilla-daemon
                   ` (176 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  5:55 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 588 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Chris Hodapp <chris@hodapp.email> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144254|0                           |1
        is obsolete|                            |

--- Comment #1 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144255
  --> https://bugs.freedesktop.org/attachment.cgi?id=144255&action=edit
dmesg.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1974 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
  2019-05-14  5:55 ` bugzilla-daemon
@ 2019-05-14  5:56 ` bugzilla-daemon
  2019-05-14  5:56 ` bugzilla-daemon
                   ` (175 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  5:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 309 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #2 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144256
  --> https://bugs.freedesktop.org/attachment.cgi?id=144256&action=edit
dmesg.color.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
  2019-05-14  5:55 ` bugzilla-daemon
  2019-05-14  5:56 ` bugzilla-daemon
@ 2019-05-14  5:56 ` bugzilla-daemon
  2019-05-14  9:04 ` bugzilla-daemon
                   ` (174 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  5:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 321 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #3 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144257
  --> https://bugs.freedesktop.org/attachment.cgi?id=144257&action=edit
display-manager.service.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1233 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-05-14  5:56 ` bugzilla-daemon
@ 2019-05-14  9:04 ` bugzilla-daemon
  2019-05-14  9:05 ` bugzilla-daemon
                   ` (173 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  9:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 427 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Michel Dänzer <michel@daenzer.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144255|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1086 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-05-14  9:04 ` bugzilla-daemon
@ 2019-05-14  9:05 ` bugzilla-daemon
  2019-05-14  9:20 ` bugzilla-daemon
                   ` (172 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  9:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 427 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Michel Dänzer <michel@daenzer.net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144257|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1086 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-05-14  9:05 ` bugzilla-daemon
@ 2019-05-14  9:20 ` bugzilla-daemon
  2019-05-14  9:34 ` bugzilla-daemon
                   ` (171 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  9:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 425 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Chris Hodapp <chris@hodapp.email> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144256|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1082 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-05-14  9:20 ` bugzilla-daemon
@ 2019-05-14  9:34 ` bugzilla-daemon
  2019-05-14 15:32 ` bugzilla-daemon
                   ` (170 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14  9:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 435 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #4 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144261
  --> https://bugs.freedesktop.org/attachment.cgi?id=144261&action=edit
display-manager.service.lastboot.log

Add a copy of display-manager.service log filtered down to contain just content
since the last boot.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1365 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-05-14  9:34 ` bugzilla-daemon
@ 2019-05-14 15:32 ` bugzilla-daemon
  2019-05-15  2:15 ` bugzilla-daemon
                   ` (169 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-14 15:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 265 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #5 from Alex Deucher <alexdeucher@gmail.com> ---
Does appending idle=nomwait on the kernel command line in grub help?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1034 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-05-14 15:32 ` bugzilla-daemon
@ 2019-05-15  2:15 ` bugzilla-daemon
  2019-05-15  2:16 ` bugzilla-daemon
                   ` (168 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  2:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 480 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #6 from Chris Hodapp <chris@hodapp.email> ---
I use systemd-boot but I doubt that matters very much here.

I tried adding idle=nomwait to the kernel command line but it seemed not to
affect the problem (I actually had a crash the very first time I tried adding
it). I'll attach a dmesg log just in case you want to double-check.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1246 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-05-15  2:15 ` bugzilla-daemon
@ 2019-05-15  2:16 ` bugzilla-daemon
  2019-05-15  3:05 ` bugzilla-daemon
                   ` (167 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  2:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 311 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #7 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144272
  --> https://bugs.freedesktop.org/attachment.cgi?id=144272&action=edit
dmesg.nomwait.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1203 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-05-15  2:16 ` bugzilla-daemon
@ 2019-05-15  3:05 ` bugzilla-daemon
  2019-05-15  3:09 ` bugzilla-daemon
                   ` (166 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  3:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1691 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #8 from Chris Hodapp <chris@hodapp.email> ---
I've actually found another crash which triggers pretty promptly whenever I
play (presumably-accelerated) YouTube videos. I'll attach dmesg and
display-manager.service logs for that crash here but I'm happy to file them as
a separate bug upon request.

I'm also going to go ahead and upload *another* set of logs from an event which
happens from time to time where I get blocks of vaporwave-looking colored
blocks mixed with garbled past-images (presumably hanging around in freed
memory). I didn't include this before because I mistakenly thought that the
dmesg output was the same as for the original less-visually-striking lock-ups
that I described up front. However, it's not clear that I was wrong about that.
Anyway, like I said, I'm going to post these logs too and, once again, I'm
happy to move them to a separate bug upon request.


I'll also describe my technique for capturing these logs, in case it matters:
When these crashes happen, my first response is to try and shift to a different
(text-mode) virtual console to capture the logs. When that works, I save off
the logs and then reboot. However, sometimes the crash is so bad that I'm not
able to switch to a text-mode virtual console, in which case I have to hard-cut
the power and capture the logs retroactively with `journalctl` and a negative
boot-number arg. I was able to capture the original log by switching virtual
consoles but both of these new ones were captured after a reboot with
`journalctl`.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2457 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-05-15  3:05 ` bugzilla-daemon
@ 2019-05-15  3:09 ` bugzilla-daemon
  2019-05-15  3:09 ` bugzilla-daemon
                   ` (165 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  3:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 329 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #9 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144273
  --> https://bugs.freedesktop.org/attachment.cgi?id=144273&action=edit
display-manager.service.youtube.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-05-15  3:09 ` bugzilla-daemon
@ 2019-05-15  3:09 ` bugzilla-daemon
  2019-05-15  3:10 ` bugzilla-daemon
                   ` (164 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  3:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 312 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #10 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144274
  --> https://bugs.freedesktop.org/attachment.cgi?id=144274&action=edit
dmesg.youtube.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1205 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (12 preceding siblings ...)
  2019-05-15  3:09 ` bugzilla-daemon
@ 2019-05-15  3:10 ` bugzilla-daemon
  2019-05-15  3:10 ` bugzilla-daemon
                   ` (163 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  3:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #11 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144275
  --> https://bugs.freedesktop.org/attachment.cgi?id=144275&action=edit
display-manager.service.vaporwave.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1265 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (13 preceding siblings ...)
  2019-05-15  3:10 ` bugzilla-daemon
@ 2019-05-15  3:10 ` bugzilla-daemon
  2019-05-19  9:36 ` bugzilla-daemon
                   ` (162 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-15  3:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 314 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #12 from Chris Hodapp <chris@hodapp.email> ---
Created attachment 144276
  --> https://bugs.freedesktop.org/attachment.cgi?id=144276&action=edit
dmesg.vaporwave.log

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1211 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (14 preceding siblings ...)
  2019-05-15  3:10 ` bugzilla-daemon
@ 2019-05-19  9:36 ` bugzilla-daemon
  2019-05-19  9:39 ` bugzilla-daemon
                   ` (161 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19  9:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 453 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #13 from Chris Hodapp <chris@hodapp.email> ---
So! It turns out that things are stable with 5.0.X kernels (despite there still
being some amdgpu errors in the kernel log). It's slow going because the search
space is so big but I'm trying to figure out where in the commit history things
actually broke.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (15 preceding siblings ...)
  2019-05-19  9:36 ` bugzilla-daemon
@ 2019-05-19  9:39 ` bugzilla-daemon
  2019-05-19 14:27 ` bugzilla-daemon
                   ` (160 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19  9:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 424 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #14 from Hameer Abbasi <hameerabbasi@yahoo.com> ---
I have additional information to report: 5.1.3 fixes this somewhat, but not
completely. For example, login is mostly fine, but restarting from the login
screen causes crashes.

I also agree that things were fine on 5.0.x

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1196 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (16 preceding siblings ...)
  2019-05-19  9:39 ` bugzilla-daemon
@ 2019-05-19 14:27 ` bugzilla-daemon
  2019-05-19 17:52 ` bugzilla-daemon
                   ` (159 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 14:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1198 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #15 from Tom B <tom@r.je> ---
Have been running 5.0 since release without issue but upgraded this morning and
got crashes as described here within a few seconds of boot. 

5.1.3 also fixed it for me, however I am still seeing powerplay errors in
dmesg:

[    6.198409] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    6.198411] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[    7.396661] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    7.396662] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    8.587385] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    8.587386] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
[    9.779135] amdgpu: [powerplay] Failed to send message 0x26, response 0x0
[    9.779136] amdgpu: [powerplay] Failed to set soft min gfxclk !
[    9.779136] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!


The GPU seems to be boosting as expected so I don't think there is any major
issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1948 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (17 preceding siblings ...)
  2019-05-19 14:27 ` bugzilla-daemon
@ 2019-05-19 17:52 ` bugzilla-daemon
  2019-05-19 20:30 ` bugzilla-daemon
                   ` (158 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 17:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 350 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #16 from Chris Hodapp <chris@hodapp.email> ---
Hrm, 5.1.3 does not truly fix things for me. Would you folks mind rebooting a
few times and then maybe playing a couple Youtube videos and reporting back?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1117 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-05-19 17:52 ` bugzilla-daemon
@ 2019-05-19 20:30 ` bugzilla-daemon
  2019-05-19 20:53 ` bugzilla-daemon
                   ` (157 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 20:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 492 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #17 from Hameer Abbasi <hameerabbasi@yahoo.com> ---
Hmm. 5.1.3 had issues for me too, on login and when launching Evolution (the
GNOME mail client). Seems the success was intermittent.

One additional piece of information: I have two 1440p 144 Hz Freesync displays
with audio... I'm not sure if anything about that is a contributing factor.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1264 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (19 preceding siblings ...)
  2019-05-19 20:30 ` bugzilla-daemon
@ 2019-05-19 20:53 ` bugzilla-daemon
  2019-05-19 22:04 ` bugzilla-daemon
                   ` (156 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 20:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 475 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #18 from Hameer Abbasi <hameerabbasi@yahoo.com> ---
Hmm, I'm fairly certain at this point that the issue happened between 5.0.13
and 5.1.0. Those are the ones available in the Arch repos, I lack the knowledge
to build the kernel myself.

I restarted thrice on 5.0.13, no issue.
Restarted once on 5.1.0, there was an issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (20 preceding siblings ...)
  2019-05-19 20:53 ` bugzilla-daemon
@ 2019-05-19 22:04 ` bugzilla-daemon
  2019-05-19 22:05 ` bugzilla-daemon
                   ` (155 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 510 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #19 from Tom B <tom@r.je> ---
I've just resumed from suspend (5.1.3). Had complete graphical corruption and a
frozen system. I couldn't switch TTY and had to do a hard reset.

First two reboots froze, third is working fine. Youtube was fine for my 30
second test, as is running unigine-heaven to try GPU load. 


I'll attach my journal from after suspend.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1260 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (21 preceding siblings ...)
  2019-05-19 22:04 ` bugzilla-daemon
@ 2019-05-19 22:05 ` bugzilla-daemon
  2019-05-19 22:14 ` bugzilla-daemon
                   ` (154 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 352 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #20 from Tom B <tom@r.je> ---
Created attachment 144303
  --> https://bugs.freedesktop.org/attachment.cgi?id=144303&action=edit
5.1.3 crash after resume

Journal output from suspend to crash on resume

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1242 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (22 preceding siblings ...)
  2019-05-19 22:05 ` bugzilla-daemon
@ 2019-05-19 22:14 ` bugzilla-daemon
  2019-05-19 22:19 ` bugzilla-daemon
                   ` (153 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 523 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #21 from Tom B <tom@r.je> ---
Ignore my last post, I just tried unigine-heaven again and it crashed
instantly. I don't know why it worked once. 

It took me 5 hard resets to be able to log in. It seems like if it lasts long
enough to log in then it's fine until the GPU is intermittently under load.

As such, it's probably worth mentioning that I'm using SDDM and KDE.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1273 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (23 preceding siblings ...)
  2019-05-19 22:14 ` bugzilla-daemon
@ 2019-05-19 22:19 ` bugzilla-daemon
  2019-05-19 22:28 ` bugzilla-daemon
                   ` (152 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 701 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #22 from Tom B <tom@r.je> ---
I was just able to run heaven again without issue by setting high performance
mode. 

Can anyone get a crash after running

# echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level


> One additional piece of information: I have two 1440p 144 Hz Freesync displays with audio... I'm not sure if anything about that is a contributing factor.

I am running two 4k 60hz monitors without Freesync. The only common factor
there is that we are both running two displays. Both of mine are DisplayPort.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1485 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (24 preceding siblings ...)
  2019-05-19 22:19 ` bugzilla-daemon
@ 2019-05-19 22:28 ` bugzilla-daemon
  2019-05-19 22:37 ` bugzilla-daemon
                   ` (151 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 989 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #23 from Chris Hodapp <chris@hodapp.email> ---
I've been building kernels like a fiend the last couple days. Making matters
more difficult is the fact that most of the intermediate commits produce
kernels that get stuck waiting for devices to come up (something that went in
the run-up to 5.1 disagrees with my machine but then got fixed for the actual
5.1 release). No conclusive results yet except that both 5.0 and 5.0.17 work
(so I'm assuming the whole 5.0.X series works) but nothing in the 5.1 series
has truly been stable for me.

It's interesting that you both have two monitors. I am also trying to run two
monitors over displayport. For what it's worth, I'm also using KDE with sddm,
though if KDE or sddm is making the GPU reset then that is on the graphics
driver, not the userspace programs that are trying to use it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1756 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (25 preceding siblings ...)
  2019-05-19 22:28 ` bugzilla-daemon
@ 2019-05-19 22:37 ` bugzilla-daemon
  2019-05-19 23:02 ` bugzilla-daemon
                   ` (150 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 22:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 823 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #24 from Hameer Abbasi <hameerabbasi@yahoo.com> ---
> I restarted thrice on 5.0.13, no issue.

Scratch that... I get minor corruption even on 5.0.13 on accelerated video
playback during high CPU usage (such as compilation). The videos freeze, glitch
out, and sometimes there's graphical corruption.

> I am running two 4k 60hz monitors without Freesync. The only common factor there is that we are both running two displays. Both of mine are DisplayPort.

Not the only thing... We're both doing similar amounts of pixels per second, in
a sense, more or less. Mine are DisplayPort too, so that's also common for us.

I'm on GNOME desktop and the entire GNOME suite.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1657 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (26 preceding siblings ...)
  2019-05-19 22:37 ` bugzilla-daemon
@ 2019-05-19 23:02 ` bugzilla-daemon
  2019-05-19 23:05 ` bugzilla-daemon
                   ` (149 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 23:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2614 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #25 from Tom B <tom@r.je> ---
On 5.1.3 (and presumably all 5.1 kernels) I am seeing a strange power profile.

Can everyone else run sensors (after sensors-detect if you don't have the
amdgpu device showing)

I'm seeing this:

amdgpu-pci-4400
Adapter: PCI adapter
vddgfx:       +1.11 V  
fan1:           0 RPM  (min =    0 RPM, max = 3850 RPM)
temp1:        +33.0°C  (crit = +118.0°C, hyst = -273.1°C)
power1:      135.00 W  (cap = 250.00 W)


Even at idle my GPU is running at 1100mv (my default base voltage) and
constantly running at 135w.


My output of cat /sys/kernel/debug/dri/0/amdgpu_pm_info shows the same thing:

Clock Gating Flags Mask: 0x36974f
        Graphics Medium Grain Clock Gating: On
        Graphics Medium Grain memory Light Sleep: On
        Graphics Coarse Grain Clock Gating: On
        Graphics Coarse Grain memory Light Sleep: On
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: On
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: On
        Graphics 3D Coarse Grain memory Light Sleep: On
        Memory Controller Light Sleep: On
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: On
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: On
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: On
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: Off

GFX Clocks and Power:
        351 MHz (MCLK)
        0 MHz (SCLK)
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        1106 mV (VDDGFX)
        135.0 W (average GPU)

GPU Temperature: 33 C
GPU Load: 0 %

SMC Feature Mask: 0x0000000000c0c002
UVD: Disabled

VCE: Disabled


It's locked at 135w and 1106mv. Are you guys seeing similar? Apologies for the
multiple posts but I'll post in a second after running unigine to see if it
tries to boost before it crashes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3364 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (27 preceding siblings ...)
  2019-05-19 23:02 ` bugzilla-daemon
@ 2019-05-19 23:05 ` bugzilla-daemon
  2019-05-19 23:18 ` bugzilla-daemon
                   ` (148 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 23:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 679 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #26 from Tom B <tom@r.je> ---
Ok, running unigine-heaven and watching  /sys/kernel/debug/dri/0/amdgpu_pm_info
 the wattage and voltage never change. It also never boosts to 1800mhz as it
should and sticks at 1373.

I should have mentioned in my last post that previously the card went down to
about 23w when idle.

I'm guessing the crash occurs when the GPU needs more than the 135w that it's
getting. 

Chris Hodapp, did you come across any commits referencing power profiles as
that looks to be the cause of the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1429 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (28 preceding siblings ...)
  2019-05-19 23:05 ` bugzilla-daemon
@ 2019-05-19 23:18 ` bugzilla-daemon
  2019-05-19 23:49 ` bugzilla-daemon
                   ` (147 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 23:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3711 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #27 from Tom B <tom@r.je> ---
Boost is definitely the problem.

Idle 5.0.13:

$ cat /sys/kernel/debug/dri/0/amdgpu_pm_info
Clock Gating Flags Mask: 0x36974f
        Graphics Medium Grain Clock Gating: On
        Graphics Medium Grain memory Light Sleep: On
        Graphics Coarse Grain Clock Gating: On
        Graphics Coarse Grain memory Light Sleep: On
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: On
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: On
        Graphics 3D Coarse Grain memory Light Sleep: On
        Memory Controller Light Sleep: On
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: On
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: On
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: On
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: Off

GFX Clocks and Power:
        351 MHz (MCLK)
        809 MHz (SCLK)
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        737 mV (VDDGFX)
        23.0 W (average GPU)

GPU Temperature: 31 C
GPU Load: 0 %

SMC Feature Mask: 0x0000000019f0e3cf
UVD: Disabled

VCE: Disabled


Load 5.0.13:


Clock Gating Flags Mask: 0x36974f
        Graphics Medium Grain Clock Gating: On
        Graphics Medium Grain memory Light Sleep: On
        Graphics Coarse Grain Clock Gating: On
        Graphics Coarse Grain memory Light Sleep: On
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: On
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: On
        Graphics 3D Coarse Grain memory Light Sleep: On
        Memory Controller Light Sleep: On
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: On
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: On
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: On
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: Off

GFX Clocks and Power:
        1001 MHz (MCLK)
        1802 MHz (SCLK)
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        1068 mV (VDDGFX)
        191.0 W (average GPU)

GPU Temperature: 63 C
GPU Load: 0 %

SMC Feature Mask: 0x0000000019f0e3cf
UVD: Disabled

VCE: Disabled



On 5.1, the same clocks, voltage and wattage are used, it never changes power
states. On 5.0 it idles at 23w low clocks and boosts to 191w with 1802mhz.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4461 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (29 preceding siblings ...)
  2019-05-19 23:18 ` bugzilla-daemon
@ 2019-05-19 23:49 ` bugzilla-daemon
  2019-05-21  7:38 ` bugzilla-daemon
                   ` (146 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-19 23:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 548 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #28 from Tom B <tom@r.je> ---
as a complete amateur looking at the commit history, this looks like a possible
culprit:
https://github.com/torvalds/linux/commit/e9c5b46e3c50f58403aeca6d6419b9235d2518b2#diff-db8ff8bb932e2ba3f89c9402f6856661

It has a block specifically for Vega20 and deals with power states. 

Though the related tag is 5.2-rc1 it's from back in January so that seems
unlikely.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1432 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (30 preceding siblings ...)
  2019-05-19 23:49 ` bugzilla-daemon
@ 2019-05-21  7:38 ` bugzilla-daemon
  2019-05-21  8:11 ` bugzilla-daemon
                   ` (145 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-21  7:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 302 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #29 from Chris Hodapp <chris@hodapp.email> ---
Tom B, as far as I can tell, that commit didn't get merged in until relatively
recently and is not in 5.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1069 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (31 preceding siblings ...)
  2019-05-21  7:38 ` bugzilla-daemon
@ 2019-05-21  8:11 ` bugzilla-daemon
  2019-05-21  9:42 ` bugzilla-daemon
                   ` (144 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-21  8:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1817 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #30 from Chris Hodapp <chris@hodapp.email> ---
Some interesting findings:

First, I think I may have identified the problematic commit (or at least the
most-problematic one): d1a3e239a6016f2bb42a91696056e223982e8538
(drm/amd/powerplay: drop the unnecessary uclk hard min setting). I eventually
gave up on doing a normal bisect since so many of the commits between 5.0 and
5.1 were non-viable. Instead, I made a list of all the commits that touched
vega20-related files. I then started repeatedly picking out the non-tested
commit with the most related-sounding message, checking out the v5.1 tag, and
reverting the commit in order to test it as the culprit. When I revert that
one, my system boots reliably. I still see 133.0 watts of power draw, though.

This brings me to the second thing: When looking through the commits, I noticed
that there were multiple commits that claim to prevent or reduce crashing in
high-resolution situations (one references 5k displays, another references 3+
4k displays). I want to note that we all seem to have relatively demanding
display setups: Hameer has two 144hz 1440p displays, Tom B has two 60hz 4k
displays, and I have two 120hz 4k displays. Putting these together I decided to
try unplugging one of my displays. Imagine my surprise when things booted
completely smoothly on a stock 5.1 kernel: glitch-free boot, *no powerplay
errors in the kernel log*, and 25 watts of power draw when usage is low. So I
think it is safe to say that one "workaround" is to unplug a monitor if you can
stand to work that way.

I actually have access to another Radeon VII so I may try running one per
monitor tomorrow.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2594 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (32 preceding siblings ...)
  2019-05-21  8:11 ` bugzilla-daemon
@ 2019-05-21  9:42 ` bugzilla-daemon
  2019-05-30 16:15 ` bugzilla-daemon
                   ` (143 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-21  9:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 473 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #31 from Tom B <tom@r.je> ---
That's interesting because a single one of your 120hz 4k displays would require
the same bandwidth as both of my 60hz 4k displays together. That means the
issue is either related only to resolution and not bandwidth or it's something
to do with having two displays connected at the same time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (33 preceding siblings ...)
  2019-05-21  9:42 ` bugzilla-daemon
@ 2019-05-30 16:15 ` bugzilla-daemon
  2019-06-03 11:39 ` bugzilla-daemon
                   ` (142 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-05-30 16:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6410 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #32 from Tom B <tom@r.je> ---
This is still an issue in 5.1.5. It seems slightly more stable but I'm still
getting the high power usage and no boost clocks. 

On a successful boot I see the following in dmesg:



[    3.628369] [drm] amdgpu: 16368M of VRAM memory ready
[    3.628371] [drm] amdgpu: 16368M of GTT memory ready.
[    3.629241] amdgpu 0000:44:00.0: Direct firmware load for
amdgpu/vega20_ta.bin failed with error -2
[    3.629243] amdgpu 0000:44:00.0: psp v11.0: Failed to load firmware
"amdgpu/vega20_ta.bin"
[    4.260631] fbcon: amdgpudrmfb (fb0) is primary device
[    4.376861] amdgpu 0000:44:00.0: fb0: amdgpudrmfb frame buffer device
[    4.410360] amdgpu 0000:44:00.0: ring gfx uses VM inv eng 0 on hub 0
[    4.410363] amdgpu 0000:44:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    4.410365] amdgpu 0000:44:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    4.410367] amdgpu 0000:44:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    4.410369] amdgpu 0000:44:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    4.410371] amdgpu 0000:44:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    4.410372] amdgpu 0000:44:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    4.410374] amdgpu 0000:44:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    4.410376] amdgpu 0000:44:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    4.410378] amdgpu 0000:44:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    4.410380] amdgpu 0000:44:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[    4.410382] amdgpu 0000:44:00.0: ring page0 uses VM inv eng 1 on hub 1
[    4.410383] amdgpu 0000:44:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[    4.410385] amdgpu 0000:44:00.0: ring page1 uses VM inv eng 5 on hub 1
[    4.410386] amdgpu 0000:44:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[    4.410388] amdgpu 0000:44:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    4.410390] amdgpu 0000:44:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    4.410391] amdgpu 0000:44:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[    4.410392] amdgpu 0000:44:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub
1
[    4.410393] amdgpu 0000:44:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub
1
[    4.410394] amdgpu 0000:44:00.0: ring vce0 uses VM inv eng 12 on hub 1
[    4.410396] amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    4.410397] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    5.088344] [drm] Initialized amdgpu 3.30.0 20150101 for 0000:44:00.0 on
minor 0
[    5.247245] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    5.247247] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[    6.092850] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    6.092851] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    6.939351] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    6.939351] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
[    7.784543] amdgpu: [powerplay] Failed to send message 0x26, response 0x0
[    7.784544] amdgpu: [powerplay] Failed to set soft min gfxclk !
[    7.784545] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[    7.842345] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.143759] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.143761] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    8.159090] amdgpu: [powerplay] Failed to send message 0x26, response 0xff
[    8.159091] amdgpu: [powerplay] Failed to set soft min socclk!
[    8.159092] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[    8.245063] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.825759] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.825760] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    8.825919] amdgpu: [powerplay] Failed to send message 0x26, response 0xff
[    8.825919] amdgpu: [powerplay] Failed to set soft min socclk!
[    8.825920] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[    8.826116] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.842518] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.842519] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    8.842691] amdgpu: [powerplay] Failed to send message 0x26, response 0xff
[    8.842692] amdgpu: [powerplay] Failed to set soft min socclk!
[    8.842692] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[    8.885751] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.892421] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.892422] amdgpu: [powerplay] Attempt to set Hard Min for DCEFCLK Failed!
[    8.892614] amdgpu: [powerplay] Failed to send message 0x26, response 0xff
[    8.892614] amdgpu: [powerplay] Failed to set soft min socclk!
[    8.892615] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[    8.892741] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.893595] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.893732] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.920997] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.921135] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.941712] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    8.941834] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.153837] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.154359] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.166532] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.170008] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.211796] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[    9.227359] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[   15.447508] amdgpu: [powerplay] Failed to send message 0x28, response 0xff
[   15.449293] amdgpu: [powerplay] Failed to send message 0x28, response 0xff

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 7170 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (34 preceding siblings ...)
  2019-05-30 16:15 ` bugzilla-daemon
@ 2019-06-03 11:39 ` bugzilla-daemon
  2019-06-03 14:57 ` bugzilla-daemon
                   ` (141 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-03 11:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 267 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #33 from Tom B <tom@r.je> ---
is this likely to be fixed in 5.2 or before? It's a showstopping bug for those
affected.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1017 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (35 preceding siblings ...)
  2019-06-03 11:39 ` bugzilla-daemon
@ 2019-06-03 14:57 ` bugzilla-daemon
  2019-06-04  4:19 ` bugzilla-daemon
                   ` (140 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-03 14:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 343 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #34 from antonh@gmx.de ---
I think this is not just affecting Vega 20 but also Vega 10 is now stuck on
memclock pstate 0 (167MHz) since kernel 5.1.

I assume this is related to fclk and defclk

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1079 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (36 preceding siblings ...)
  2019-06-03 14:57 ` bugzilla-daemon
@ 2019-06-04  4:19 ` bugzilla-daemon
  2019-06-04  4:21 ` bugzilla-daemon
                   ` (139 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-04  4:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 582 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #35 from sehellion@gmail.com ---
Vega20 affected to these or similar bugs, too. On are kernels 5.0.x the primary
monitor falls. Starting with version 5.1.x, hangs and resets gpu already after
login to x-session or after workiing dpms. This is not fixed in version 5.2-rc2
yet. But yesterday I successfully boot and work with two monitors. Problems
appeared only after idle time.
https://bugzilla.kernel.org/show_bug.cgi?id=203781

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1395 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (37 preceding siblings ...)
  2019-06-04  4:19 ` bugzilla-daemon
@ 2019-06-04  4:21 ` bugzilla-daemon
  2019-06-15 16:58 ` bugzilla-daemon
                   ` (138 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-04  4:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 314 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #36 from sehellion@gmail.com ---
Created attachment 144438
  --> https://bugs.freedesktop.org/attachment.cgi?id=144438&action=edit
dmesg.log vega20 crash after idle

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1220 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (38 preceding siblings ...)
  2019-06-04  4:21 ` bugzilla-daemon
@ 2019-06-15 16:58 ` bugzilla-daemon
  2019-06-15 16:59 ` bugzilla-daemon
                   ` (137 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-15 16:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2922 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #37 from Tom B <tom@r.je> ---
5.1.9 makes this bug even worse. It now crashes as soon as the display server
is started.

Running sensors now gives an error:


ERROR: Can't get value of subfeature fan1_input: I/O error
ERROR: Can't get value of subfeature power1_average: I/O error
iwlwifi-virtual-0
Adapter: Virtual device
temp1:        +37.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +34.8°C  (high = +70.0°C)
Tctl:         +61.8°C  

amdgpu-pci-4400
Adapter: PCI adapter
vddgfx:       +0.74 V  
fan1:             N/A  (min =    0 RPM, max = 3850 RPM)
temp1:        +39.0°C  (crit = +118.0°C, hyst = -273.1°C)
power1:           N/A  (cap = 250.00 W)

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +33.2°C  (high = +70.0°C)
Tctl:         +60.2°C  



I can't even see the wattage now. 

# cat /sys/kernel/debug/dri/0/amdgpu_pm_info

Clock Gating Flags Mask: 0x860200
        Graphics Medium Grain Clock Gating: Off
        Graphics Medium Grain memory Light Sleep: Off
        Graphics Coarse Grain Clock Gating: Off
        Graphics Coarse Grain memory Light Sleep: Off
        Graphics Coarse Grain Tree Shader Clock Gating: Off
        Graphics Coarse Grain Tree Shader Light Sleep: Off
        Graphics Command Processor Light Sleep: Off
        Graphics Run List Controller Light Sleep: Off
        Graphics 3D Coarse Grain Clock Gating: Off
        Graphics 3D Coarse Grain memory Light Sleep: Off
        Memory Controller Light Sleep: Off
        Memory Controller Medium Grain Clock Gating: On
        System Direct Memory Access Light Sleep: Off
        System Direct Memory Access Medium Grain Clock Gating: Off
        Bus Interface Medium Grain Clock Gating: Off
        Bus Interface Light Sleep: Off
        Unified Video Decoder Medium Grain Clock Gating: Off
        Video Compression Engine Medium Grain Clock Gating: Off
        Host Data Path Light Sleep: Off
        Host Data Path Medium Grain Clock Gating: Off
        Digital Right Management Medium Grain Clock Gating: Off
        Digital Right Management Light Sleep: On
        Rom Medium Grain Clock Gating: On
        Data Fabric Medium Grain Clock Gating: On

GFX Clocks and Power:
        1373 MHz (PSTATE_SCLK)
        1001 MHz (PSTATE_MCLK)
        737 mV (VDDGFX)

GPU Temperature: 39 C

UVD: Disabled

VCE: Disabled


No clocks or wattage! 

I'm guessing 34d07ce3d6a120056e4763ae9a3db0d769ab7c63 "fix ring test failure
issue during s3 in vce 3.0 (V2)" is to blame as dmesg (attached in next post)
says


[   20.584937] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=25, emitted seq=27

It would be nice to see some acknowledgement from AMD on this.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3682 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (39 preceding siblings ...)
  2019-06-15 16:58 ` bugzilla-daemon
@ 2019-06-15 16:59 ` bugzilla-daemon
  2019-06-15 22:15 ` bugzilla-daemon
                   ` (136 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-15 16:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 289 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #38 from Tom B <tom@r.je> ---
Created attachment 144554
  --> https://bugs.freedesktop.org/attachment.cgi?id=144554&action=edit
5.1.9 dmesg

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1153 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (40 preceding siblings ...)
  2019-06-15 16:59 ` bugzilla-daemon
@ 2019-06-15 22:15 ` bugzilla-daemon
  2019-06-16 16:05 ` bugzilla-daemon
                   ` (135 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-15 22:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 651 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #39 from Chris Hodapp <chris@hodapp.email> ---
The fact that amdgpu is getting less functional over time with this high-end
part _is_ definitely annoying, but let's all keep in mind that this is not an
official support channel from AMD, it's the issue tracker for an open source
project that AMD contribute to. AMD don't actually owe us anything though this
channel. Instead, the way to pressure them for concrete answers is actually to
choose an option from https://www.amd.com/en/support/contact.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1471 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (41 preceding siblings ...)
  2019-06-15 22:15 ` bugzilla-daemon
@ 2019-06-16 16:05 ` bugzilla-daemon
  2019-06-16 16:08 ` bugzilla-daemon
                   ` (134 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-16 16:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 428 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Alex Deucher <alexdeucher@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #144438|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1088 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (42 preceding siblings ...)
  2019-06-16 16:05 ` bugzilla-daemon
@ 2019-06-16 16:08 ` bugzilla-daemon
  2019-06-17 10:18 ` bugzilla-daemon
                   ` (133 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-16 16:08 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 283 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #40 from Alex Deucher <alexdeucher@gmail.com> ---
Please attach your full dmesg output.  Are you passing any parameters to the
driver?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1053 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (43 preceding siblings ...)
  2019-06-16 16:08 ` bugzilla-daemon
@ 2019-06-17 10:18 ` bugzilla-daemon
  2019-06-21 20:17 ` bugzilla-daemon
                   ` (132 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-17 10:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1270 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #41 from Tom B <tom@r.je> ---
Created attachment 144569
  --> https://bugs.freedesktop.org/attachment.cgi?id=144569&action=edit
5.1.9 full dmesg

Interestingly I just reinstalled 5.1.9 and I'm not seeing the same immediate
crash. It may be another package as I tried three boots and all had the same
issue of immediate crashing on SDDM start. The only way I managed to get dmesg
output was switching TTY immediately, switching back to tty1 where SDDM was
running caused the immediate crash

After reinstalling I'm getting the same issue as earlier 5.1 kernels where it
freezes the PC under load and is stuck in the same power state. Oddly I'm
seeing 137w constantly in 5.1.9 where I was getting 135w in 5.1.3 though I
didn't test 5.1.3 multiple times, it might reach a wattage on boot and then
stick to it.

I have attached the full dmesg anyway.

> Are you passing any parameters to the driver?

I have nothing related to amdgpu in /etc/modprobe.d and my kernel commend line
is:

[    0.364597] Kernel command line: BOOT_IMAGE=/vmlinuz-linux
root=UUID=fc6ad741-d52d-47eb-b6a6-0026f27b29f3 rw quiet

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2175 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (44 preceding siblings ...)
  2019-06-17 10:18 ` bugzilla-daemon
@ 2019-06-21 20:17 ` bugzilla-daemon
  2019-06-21 20:18 ` bugzilla-daemon
                   ` (131 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-21 20:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 418 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #42 from Matt Coffin <mcoffin13@gmail.com> ---
For what it's worth, I've experienced a bunch of issues similar to this with
OVERDRIVE enabled. You can try disabling it by setting the following in
modprobe.d or your kernel launch line

amdgpu.ppfeaturemask=0xfffdbfff

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1185 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (45 preceding siblings ...)
  2019-06-21 20:17 ` bugzilla-daemon
@ 2019-06-21 20:18 ` bugzilla-daemon
  2019-06-22  4:19 ` bugzilla-daemon
                   ` (130 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-21 20:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 692 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #43 from Matt Coffin <mcoffin13@gmail.com> ---
(In reply to Matt Coffin from comment #42)
> For what it's worth, I've experienced a bunch of issues similar to this with
> OVERDRIVE enabled. You can try disabling it by setting the following in
> modprobe.d or your kernel launch line
> 
> amdgpu.ppfeaturemask=0xfffdbfff

Also worth noting that I've found that using `fancontrol` creates a race
condition if you have the OD fuzzy fan control enabled, so try just maxing out
the fans via the sysfs hwmon interface instead just as a test.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1543 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (46 preceding siblings ...)
  2019-06-21 20:18 ` bugzilla-daemon
@ 2019-06-22  4:19 ` bugzilla-daemon
  2019-06-22  4:20 ` bugzilla-daemon
                   ` (129 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-22  4:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 592 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #44 from sehellion@gmail.com ---
(In reply to Matt Coffin from comment #42)
> For what it's worth, I've experienced a bunch of issues similar to this with
> OVERDRIVE enabled. You can try disabling it by setting the following in
> modprobe.d or your kernel launch line
> 
> amdgpu.ppfeaturemask=0xfffdbfff


It doesn't seem that in this case the problem with OVERDRIVE. I will attach
full dmesg log with amdgpu.ppfeaturemask=0xfffdbfff

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1424 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (47 preceding siblings ...)
  2019-06-22  4:19 ` bugzilla-daemon
@ 2019-06-22  4:20 ` bugzilla-daemon
  2019-07-08 12:29 ` bugzilla-daemon
                   ` (128 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-06-22  4:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #45 from sehellion@gmail.com ---
Created attachment 144611
  --> https://bugs.freedesktop.org/attachment.cgi?id=144611&action=edit
5.2-rc2 full dmesg with amdgpu.ppfeaturemask=0xfffdbfff

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1286 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (48 preceding siblings ...)
  2019-06-22  4:20 ` bugzilla-daemon
@ 2019-07-08 12:29 ` bugzilla-daemon
  2019-07-25  5:36 ` bugzilla-daemon
                   ` (127 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-08 12:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 479 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #46 from Tom B <tom@r.je> ---
Has anyone tested 5.3 yet? I noticed there are a lot of powerplay changes.

Since this bug messes up the card's power profile, how safe is testing new
kernels? Is there any danger of my card being damaged due to wrong voltages if
the powerplay code is as buggy or worse than it has been since 5.1?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1229 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (49 preceding siblings ...)
  2019-07-08 12:29 ` bugzilla-daemon
@ 2019-07-25  5:36 ` bugzilla-daemon
  2019-07-26  1:19 ` bugzilla-daemon
                   ` (126 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-25  5:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1004 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #47 from ReddestDream <reddestdream@gmail.com> ---
(In reply to Tom B from comment #46)
> Has anyone tested 5.3 yet? I noticed there are a lot of powerplay changes.
> 
> Since this bug messes up the card's power profile, how safe is testing new
> kernels? Is there any danger of my card being damaged due to wrong voltages
> if the powerplay code is as buggy or worse than it has been since 5.1?

I've tested 5.3-rc-1 and no dice. I still get the PowerPlay Failed to send
message errors in dmesg when I have more than one monitor connected to Radeon
VII. :(

My current workaround is to connect my second monitor to the iGPU before boot.
Then the PowerPlay errors do not happen. As long as I don't get the PowerPlay
errors in dmesg, graphics are stable. If the errors do appear, graphics will be
unstable. It's a pretty clear connection . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1859 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (50 preceding siblings ...)
  2019-07-25  5:36 ` bugzilla-daemon
@ 2019-07-26  1:19 ` bugzilla-daemon
  2019-07-26  1:24 ` bugzilla-daemon
                   ` (125 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-26  1:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 349 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #48 from Anthony Rabbito <ted437@gmail.com> ---
I'm able to run dual monitors with one HDMI and one DP.

Running 3 monitors (2 DP 1 HDMI) at 1440p 144Hz causes all the issues noted
here. Linux 5.2.2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1117 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (51 preceding siblings ...)
  2019-07-26  1:19 ` bugzilla-daemon
@ 2019-07-26  1:24 ` bugzilla-daemon
  2019-07-26  3:19 ` bugzilla-daemon
                   ` (124 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-26  1:24 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 676 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #49 from Tom B <tom@r.je> ---
Unfortunately iGPU isn't an option for me as I don't have one. 

> I'm able to run dual monitors with one HDMI and one DP.

> Running 3 monitors (2 DP 1 HDMI) at 1440p 144Hz causes all the issues noted here. Linux 5.2.2

That's interesting, as I was originally using HDMI + DP but it caused its own
set of similar issues as reported here:
https://bugs.freedesktop.org/show_bug.cgi?id=110510 

I wonder whether 5.1+ reversed it so that HDMI+DP now works, I'll test it when
I get a chance.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1643 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (52 preceding siblings ...)
  2019-07-26  1:24 ` bugzilla-daemon
@ 2019-07-26  3:19 ` bugzilla-daemon
  2019-07-28  5:20 ` bugzilla-daemon
                   ` (123 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-26  3:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 514 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #50 from ReddestDream <reddestdream@gmail.com> ---
(In reply to Anthony Rabbito from comment #48)
> I'm able to run dual monitors with one HDMI and one DP.
> 
> Running 3 monitors (2 DP 1 HDMI) at 1440p 144Hz causes all the issues noted
> here. Linux 5.2.2

Hmm. That's very interesting. I have not tried HDMI. All my testing was done
with just 2 DP monitors.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1366 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (53 preceding siblings ...)
  2019-07-26  3:19 ` bugzilla-daemon
@ 2019-07-28  5:20 ` bugzilla-daemon
  2019-07-29 10:52 ` bugzilla-daemon
                   ` (122 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-28  5:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 471 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #51 from ReddestDream <reddestdream@gmail.com> ---
Also, just FYI, it does look like there are some fixes to display type
detection on AMD GPUs coming in 5.3-rc2. These might fix or at least improve
the multimonitor issue on Radeon VII:

https://github.com/torvalds/linux/commit/e2921f9f95f1c1355a39e54dc038ad95b6e032be

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1338 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (54 preceding siblings ...)
  2019-07-28  5:20 ` bugzilla-daemon
@ 2019-07-29 10:52 ` bugzilla-daemon
  2019-07-29 19:25 ` bugzilla-daemon
                   ` (121 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-29 10:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 330 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #52 from Peter Hercek <phercek@gmail.com> ---
I'm getting hangs-up with kernels 5.2.3 (often) and 5.1.15 (less often).
Radeon VII with 3 monitors. Each monitor connected through DP.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1096 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (55 preceding siblings ...)
  2019-07-29 10:52 ` bugzilla-daemon
@ 2019-07-29 19:25 ` bugzilla-daemon
  2019-07-29 21:40 ` bugzilla-daemon
                   ` (120 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-29 19:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 366 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #53 from Anthony Rabbito <ted437@gmail.com> ---
Interesting, on 5.2.x with 2 monitors hooked up via HDMI and DP it behaves 75%
of the time with most issues coming from xinit or sleep. Hopefully 5.3 will
contain fixes

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1134 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (56 preceding siblings ...)
  2019-07-29 19:25 ` bugzilla-daemon
@ 2019-07-29 21:40 ` bugzilla-daemon
  2019-07-31 15:37 ` bugzilla-daemon
                   ` (119 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-29 21:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1097 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #54 from ReddestDream <reddestdream@gmail.com> ---
(In reply to Peter Hercek from comment #52)
> I'm getting hangs-up with kernels 5.2.3 (often) and 5.1.15 (less often).
> Radeon VII with 3 monitors. Each monitor connected through DP.

I hear that 5.0.0.13 is from before this regression and should work without
issue if you are willing to downgrade:

https://bbs.archlinux.org/viewtopic.php?id=247733

(In reply to Anthony Rabbito from comment #53)
> Interesting, on 5.2.x with 2 monitors hooked up via HDMI and DP it behaves
> 75% of the time with most issues coming from xinit or sleep. Hopefully 5.3
> will contain fixes

Would be interesting if it turns out that using HDMI+DP fixes the issue. Not
that HDMI doesn't come with its own issues sometimes with color. I do have some
faith that 5.3 will fix it since AMDGPU is getting a lot of work for Navi. I
plan to try out 5.3-rc2 (or whatever mainline is at) sometime this week.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2085 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (57 preceding siblings ...)
  2019-07-29 21:40 ` bugzilla-daemon
@ 2019-07-31 15:37 ` bugzilla-daemon
  2019-07-31 17:09 ` bugzilla-daemon
                   ` (118 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-31 15:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1363 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #55 from Anthony Rabbito <ted437@gmail.com> ---
(In reply to ReddestDream from comment #54)
> (In reply to Peter Hercek from comment #52)
> > I'm getting hangs-up with kernels 5.2.3 (often) and 5.1.15 (less often).
> > Radeon VII with 3 monitors. Each monitor connected through DP.
> 
> I hear that 5.0.0.13 is from before this regression and should work without
> issue if you are willing to downgrade:
> 
> https://bbs.archlinux.org/viewtopic.php?id=247733
> 
> (In reply to Anthony Rabbito from comment #53)
> > Interesting, on 5.2.x with 2 monitors hooked up via HDMI and DP it behaves
> > 75% of the time with most issues coming from xinit or sleep. Hopefully 5.3
> > will contain fixes
> 
> Would be interesting if it turns out that using HDMI+DP fixes the issue. Not
> that HDMI doesn't come with its own issues sometimes with color. I do have
> some faith that 5.3 will fix it since AMDGPU is getting a lot of work for
> Navi. I plan to try out 5.3-rc2 (or whatever mainline is at) sometime this
> week.

I will check my package cache to see of I still have kernel 5.0.0.13 to see if
it's available to me otherwise I'll build it. I'll report back how it goes. I
miss my third monitor.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2418 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (58 preceding siblings ...)
  2019-07-31 15:37 ` bugzilla-daemon
@ 2019-07-31 17:09 ` bugzilla-daemon
  2019-07-31 17:13 ` bugzilla-daemon
                   ` (117 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-31 17:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 554 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #56 from Peter Hercek <phercek@gmail.com> ---
I use 5.0.13 for 3 days. It works OK so far. But 3 days is too little to tell.
E.g. 5.1.15 hanged up after about 5 days. But from that time it hanged up
always after I launched two youtube videos just after login. I probably did not
launch youtube videos that early in my session in the first days of my 5.1.15
use. Kernel 5.0.13 can handle this situation.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1320 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (59 preceding siblings ...)
  2019-07-31 17:09 ` bugzilla-daemon
@ 2019-07-31 17:13 ` bugzilla-daemon
  2019-08-03 12:10 ` bugzilla-daemon
                   ` (116 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-07-31 17:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 689 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #57 from Tom B <tom@r.je> ---
5.0.13 works fine, I've been using it since I first encountered the problem.
5.1+ introduces this issue.

The way to tell whether it's working correctly is to run sensors and check the
power1 number. The bug causes the GPU to be stuck in a high power state (for me
135w) where in previous kernels it idles at 23w.

Alternatively run cat /sys/kernel/debug/dri/0/amdgpu_pm_info which will show
the same thing, it will be stuck at 1.1v/135w and the clocks will be maxed
rather that clocked down when idle.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1439 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (60 preceding siblings ...)
  2019-07-31 17:13 ` bugzilla-daemon
@ 2019-08-03 12:10 ` bugzilla-daemon
  2019-08-03 12:31 ` bugzilla-daemon
                   ` (115 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-03 12:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 972 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #58 from Peter Hercek <phercek@gmail.com> ---
It is probably not related to changes from 5.0 to 5.1.
I have got the hang up with 5.0.13 as well as with 4.20.11.
It may be only less common with older kernels.

In my case, it is triggered mostly by playing a video stream in parallel with
some other activity. My logs with 5.1 and 5.2 kernels look just like Chris'
log. First amdgpu_job_timedout, then an attempt to reset gpu followed by
endless stream of parser initialization failures.

I did not check the logs with older kernels but it all looked the same at the
user level. The video subsystem is hung up. The rest of the machine (e.g. an
ssh session) work ok.

My /sys/class/hwmon/hwmon1/power1_average reported normal values around 25W
after hang up. I'm not seeing unusually high power values like Tom B.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1738 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (61 preceding siblings ...)
  2019-08-03 12:10 ` bugzilla-daemon
@ 2019-08-03 12:31 ` bugzilla-daemon
  2019-08-03 13:35 ` bugzilla-daemon
                   ` (114 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-03 12:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 349 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #59 from Tom B <tom@r.je> ---
@Peter Hercek, do you see the wattage/voltage change at all? For me it's stuck
on 135w, perhaps it hits a power state and then cant change and for you it's
stuck on 25w.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1103 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (62 preceding siblings ...)
  2019-08-03 12:31 ` bugzilla-daemon
@ 2019-08-03 13:35 ` bugzilla-daemon
  2019-08-08 14:37 ` bugzilla-daemon
                   ` (113 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-03 13:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 396 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #60 from Peter Hercek <phercek@gmail.com> ---
The power value changes from 24 W to about 75 W (when I tried xonotic). I
checked the power value two times after hang up. It was 25 W in both cases. It
does not change after video subsystem hangs up.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1162 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (63 preceding siblings ...)
  2019-08-03 13:35 ` bugzilla-daemon
@ 2019-08-08 14:37 ` bugzilla-daemon
  2019-08-10 12:10 ` bugzilla-daemon
                   ` (112 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-08 14:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 340 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #61 from ReddestDream <reddestdream@gmail.com> ---
This issue is still not fixed with 5.3-rc3, at least not with two DisplayPort
monitors.

I am not able to test with DP+HDMI configuration.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1111 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (64 preceding siblings ...)
  2019-08-08 14:37 ` bugzilla-daemon
@ 2019-08-10 12:10 ` bugzilla-daemon
  2019-08-10 13:02 ` bugzilla-daemon
                   ` (111 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 12:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 720 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #62 from Peter Hercek <phercek@gmail.com> ---
OK, I started to use 5.2.5 kernel after the my last hang up with 4.20.11. It
worked fine for 1 week. I'm trying 5.2.7 now.

It is possible something was fixed in 5.2.5 because there was one commit which
seemed related (drm/amdgpu: Reserve shared fence for eviction fence
dd68722c427d5b33420dce0ed0c44b4881e0a416). But there are reasons to think I was
just lucky for the week: the commit seems to relate to some VM support and I
have got crashes without VM use, and ReddestDream reported the problem in
5.4.rc3 as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1486 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (65 preceding siblings ...)
  2019-08-10 12:10 ` bugzilla-daemon
@ 2019-08-10 13:02 ` bugzilla-daemon
  2019-08-10 13:14 ` bugzilla-daemon
                   ` (110 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 13:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 530 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #63 from Tom B <tom@r.je> ---
I've just done some testing with 5.2.7

- I still get the 135w/1.1v constant power state and crashing with DP+DP.

- HDMI+DP works, but this was my original setup when I got the VII.
Unfortunately  I get random flickering and black screens on the HDMI monitor
every 3-5 minutes as described in
https://bugs.freedesktop.org/show_bug.cgi?id=110510

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1435 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (66 preceding siblings ...)
  2019-08-10 13:02 ` bugzilla-daemon
@ 2019-08-10 13:14 ` bugzilla-daemon
  2019-08-10 13:15 ` bugzilla-daemon
                   ` (109 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 13:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1057 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #64 from Tom B <tom@r.je> ---
Scratch that, I just rebooted with HDMI+DP and it froze as soon as SDDM
started. I was eventually able to switch TTY and the voltages looked correct
(it was boosted down) but I was never able to log in to KDE as SDDM was frozen.
Restarting sddm allowed me to enter my password but it froze as soon as I
logged in. Not that HDMI is an optimal solution anyway as I get the flickering,
and I've tried 3 different cables. 

Back to 5.0.13 which works mostly fine. I do get a crash very occasionally, the
machine will appear to wake up from sleep with a black screen and a cursor.
Very rare, once a week or so and only when sleep/resume cycle has been run
multiple times.

Peter Hercek mentioned virtual machines, as such I tried with iommu enabled and
disabled in the bios, it didn't make any difference but thought it was worth
reporting to save others time trying it.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1807 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (67 preceding siblings ...)
  2019-08-10 13:14 ` bugzilla-daemon
@ 2019-08-10 13:15 ` bugzilla-daemon
  2019-08-10 13:29 ` bugzilla-daemon
                   ` (108 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 13:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 425 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #65 from Tom B <tom@r.je> ---
Created attachment 145018
  --> https://bugs.freedesktop.org/attachment.cgi?id=145018&action=edit
5.2.7  full dmesg

Full dmesg from 5.2.7, 2xdisplayport monitors the error that keeps repeating
is:

*ERROR* Failed to initialize parser -125!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1301 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (68 preceding siblings ...)
  2019-08-10 13:15 ` bugzilla-daemon
@ 2019-08-10 13:29 ` bugzilla-daemon
  2019-08-10 16:39 ` bugzilla-daemon
                   ` (107 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 13:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 358 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #66 from Tom B <tom@r.je> ---
One thing I haven't mentioned is I don't have a GPU fan installed as my VII is
water cooled, it's unlikely but perhaps this explains the different behaviour
of my card to others.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1108 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (69 preceding siblings ...)
  2019-08-10 13:29 ` bugzilla-daemon
@ 2019-08-10 16:39 ` bugzilla-daemon
  2019-08-10 19:00 ` bugzilla-daemon
                   ` (106 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 16:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2146 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #67 from Tom B <tom@r.je> ---
I had a look around at similar bugs and came across this:

https://bugs.freedesktop.org/show_bug.cgi?id=110822

It's for a 580, not a VII but the problems started at 5.1 and gives a similar
powerplay related crash.

The suggested fix there is to revert ad51c46eec739c18be24178a30b47801b10e0357.

I just tried this and after 4 reboots I can report it has two effects:

1. I don't have any crashing at all and my card boosts GPU clocks, voltages and
wattages. I can run unigine-heaven for several minutes without the system
freezing.

2. The memory is forced to 351mhz, limiting performance.

If I run 

cat /sys/class/drm/card0/device/pp_dpm_mclk 

it shows:

0: 351Mhz *
1: 801Mhz 
2: 1001Mhz 


Which looks correct for idle, but it never, even under load, boosts to the next
memory clock. It also can't be set manually:


echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 2 >  /sys/class/drm/card0/device/pp_dpm_mclk
-bash: echo: write error: Invalid argument


While this isn't a proper fix it does give us some valuable insight. If anyone
wants to run at 351mhz memory with a stable card and 2 screens they can. It
would be nice if someone can verify my findings as my card seemed to behave
differently to others for some reason.

This bug may be related to https://bugs.freedesktop.org/show_bug.cgi?id=110822
alternatively, it's possible the crash occurs when the memory clock changes
(which might mean it's related to
https://bugs.freedesktop.org/show_bug.cgi?id=102646 as there are issues with
memory clock changes there) There seem to be several powerplay related issues
which may have the same root cause.


I'm now going to:

1. Revert to the stock kernel and set the mclk to 1001 manually before starting
SDDM and see if the crash occurs.

2. See if I can manage to get stability and the mclk stuck at 1001mhz as this
would be an acceptable compromise, even if not ideal.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3451 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (70 preceding siblings ...)
  2019-08-10 16:39 ` bugzilla-daemon
@ 2019-08-10 19:00 ` bugzilla-daemon
  2019-08-11  1:15 ` bugzilla-daemon
                   ` (105 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-10 19:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1652 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #68 from Tom B <tom@r.je> ---
Apologies for the multiple replies/emails. I think I must just have got lucky.
It worked several boots (in a row) and now only works very occasionally. I
think it was just coincidence that it worked a few times after I installed that
kernel, sorry guys.

During my tests with 5.2.7 I have noticed some interesting findings with the
wattage though. It will indeed get stuck on a specific wattage, I've had 33,
24, 45, 133, 134 and on several wattages there is some fluctuation.  e.g.
33-34.

Higher wattages are significantly more stable, 133w lasts quite a while before
it crashes, 33w crashes instantly. I'm assuming this is because the card just
doesn't have enough power to do what's required.

When the wattage gets stuck, if you force the performance mode:

# echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level

it confuses the driver and sensors then shows

ERROR: Can't get value of subfeature power1_average: I/O error

Despite working until manually setting the power state. There doesn't seem to
be a way to get it back to a state where sensors shows the wattage after it
reaches this state, other than rebooting.


The inconsistent nature of this bug and the fact that it sometimes doesn't
appear suggests a race condition. I'd assume something else on the system
happens before or after amdgpu is expecting.

Is there any way to delay loading the amdgpu driver and manually loading it
after everything else?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2405 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (71 preceding siblings ...)
  2019-08-10 19:00 ` bugzilla-daemon
@ 2019-08-11  1:15 ` bugzilla-daemon
  2019-08-11 15:26 ` bugzilla-daemon
                   ` (104 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11  1:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2537 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #69 from ReddestDream <reddestdream@gmail.com> ---
>The inconsistent nature of this bug and the fact that it sometimes doesn't appear suggests a race condition. I'd assume something else on the system happens before or after amdgpu is expecting.

>Is there any way to delay loading the amdgpu driver and manually loading it after everything else?

Based on all the data you (Tom B) and others have provided as well as my own
tests, my current suspicion is that there is a bug in the display mode/type
detection and enumeration, leading to the driver losing state consistency and
eventually contact entirely with the hardware.

I think the clock dysregulation and excessive voltage/wattage are symptoms of
the underlying disease rather than the cause. If something is wrong between
what the driver thinks the hardware state is and what the hardware state
actually is, it's only a matter of time before this inconsistency leads to
dysregulation, instability, and crashing. For this reason, I'm not convinced
there is any better workaround than "just use one monitor." Pushing up the
clocks only seems to at best prolong the inevitable. :(

I'm also not convinced there is one commit in particular to point to here.
Rather it was probably in the restructuring of something between 5.0 and 5.1
that it became fundamentally broken while it was always somewhat flawed before.

Unfortunately, Radeon VII probably isn't really being tested by kernel
developers anymore and it's likely that multimonitor with this card on Linux
was never fully tested at all. It also seems like AMD's kernel development has
moved on to Navi and that the upcoming new Vega card, Arcturus, won't have
display outs at all, so work on that can't fix this issue.

As this card is fairly uncommon and expensive, the only real hope for a fix
seems to be to get the card into the hands of someone who has the skill to fix
graphics drivers and a willingness/need to test multimonitor.

Perhaps someone like gnif who has been able to solve the infamous Vega Reset
Bug on Vega 10 cards might be able to fix it. It's likely he will encounter our
issue while testing Radeon VII with Looking Glass and such. Someone has already
offered to lend him a Radeon VII as he states in the video, so there's some
hope that his work will lead to a solution.

https://www.youtube.com/watch?v=1ShkjXoG0O0

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3438 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (72 preceding siblings ...)
  2019-08-11  1:15 ` bugzilla-daemon
@ 2019-08-11 15:26 ` bugzilla-daemon
  2019-08-11 17:00 ` bugzilla-daemon
                   ` (103 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 15:26 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4275 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #70 from Tom B <tom@r.je> ---
> Based on all the data you (Tom B) and others have provided as well as my own tests, my current suspicion is that there is a bug in the display mode/type detection and enumeration, leading to the driver losing state consistency and eventually contact entirely with the hardware.

I looked through the commits and the code trying to find anything that dealt
with multiple displays as that seems to be the trigger but couldn't find
anything that looked promising.

It's probably worth noting what I tried/found even though I was unsuccessful as
it may help someone. I'm fairly sure that the problem must be this file:
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/powerplay/vega20_ppt.c
There is a variable called NumOfDisplays and related code.  Maybe someone who
understands driver development can point me in the right direction:

Line 2049 seems promising.

smu_send_smc_msg_with_param(smu, SMU_MSG_NumOfDisplays, 0);
        ret = vega20_set_uclk_to_highest_dpm_level(smu,
                                                   &dpm_table->mem_table);



if (ret)
                pr_err("Failed to set uclk to highest dpm level");




Although that error message is not displayed in dmesg, this function deals with
multiple displays and the power levels. Unfortunatelely I cannot find
documenation for the driver code. What does smu_send_smc_msg_with_param do?
Because here the last argument is 0. In the next function,
vega20_display_config_changed the final argument is the number of displays:

smu_send_smc_msg_with_param(smu,
                                            SMU_MSG_NumOfDisplays,
                                            smu->display_config->num_display);



The next point of interest is line 2091. I don't think it's the cause of the
bug but:

disable_mclk_switching = ((1 < smu->display_config->num_display) &&
                                  !smu->display_config->multi_monitor_in_sync)
|| vblank_too_short;


 disable_mclk_switching is set if the number of displays is more than 1 and
"multi_monitor_in_sync" (whatever that is, possibly mirrored displays?)  or
"vblank_too_short". I don't believe this is a problem because the code has
existed since January, presumably for the February release, but perhaps the
contents of the different variables has chagned so this code runs differently.

I only mention this because it's the only point in the code I found where it
does something different if more than one display is connected. 

My questions for the driver devs:

1. Why is smu_send_smc_msg_with_param called with zero in the function
vega20_pre_display_config_changed but the number of displays in the next
function?
2. Is num_displays an index (so 0 is actually the first display and we're
assuming 1 display in index 0) or is it actually 0, no displays?
3. Is there any way to see which code appears in which kernel version? The tags
are definitely incorrect, the first commit for that file:
https://github.com/torvalds/linux/commit/74e07f9d3b77034cd1546617afce1d014a68d1ca#diff-2575675126169f3c0c971db736852af9
says 5.2 but was done in December last year so I can't imagine this file isn't
used.



However, as a customer this is very frustrating. I bought the VII instead of an
nvidia card because AMD were supporting open source drivers.

As it stands:

- The AMDGPU driver worked for 4 months after the VII's release and now we've
had nearly the same amount of time where it hasn't worked with the latest
kernel.
- The AMDGPU-Pro driver only supports Ubuntu, I've never managed to get it to
run successfully on Arch and the latest version only supports The RX5700 cards
anyway.

I emailed AMD technical support about this bug over a month ago and never got a
reply.

The VII appears to be completely unsupported other than the initial driver
release when the card came out. I'll be going back to nvidia next time and
although I had intended to keep the VII for several years it looks like that
won't be possible as I can't run an old kernel forever.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5359 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (73 preceding siblings ...)
  2019-08-11 15:26 ` bugzilla-daemon
@ 2019-08-11 17:00 ` bugzilla-daemon
  2019-08-11 18:43 ` bugzilla-daemon
                   ` (102 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 17:00 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1465 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #71 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
On Sun, Aug 11, 2019 at 01:15:48AM +0000, bugzilla-daemon@freedesktop.org
wrote:
> I think the clock dysregulation and excessive voltage/wattage are symptoms of

Is there a way to configure the smu block to keep the memory clock to its max
with the appropriate power/voltage? If the smu block do configure some of the
vram arbiter block priority, could we tell it to keep the dc[en]x to max
priority and ignore display vram watermarks? (due to the realtime requirement
of monitor data transmission, I still don't understand the existence of
watermarks in the first place, I would need data which proves me wrong).

On my AMD TAHITI XT, the memory clock seems to be locked to the max (only 1
full hd 144Hz monitor). I recall dce6 has fancy inner-blocks configuration: I
simplified it in my custom driver (something about availability of display
clocks and memory bandwidth. Maybe the smu while clock/power managing breaks
due this dc[en]x "fancy" inner-blocks configuration. 

Additionnally, never heard of 2 displays which would be driven by a common
display block and being in sync. Is the sync dependant on the monitors and not
the display block??  What I am missing ? The nasty displayport mst thingy?
I would always set this to false.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2346 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (74 preceding siblings ...)
  2019-08-11 17:00 ` bugzilla-daemon
@ 2019-08-11 18:43 ` bugzilla-daemon
  2019-08-11 18:45 ` bugzilla-daemon
                   ` (101 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 18:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3875 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #72 from Tom B <tom@r.je> ---
> The nasty displayport mst thingy? I would always set this to false.

I don't believe mst is being used here, it's two monitors both with separate
cables.


Here's some additional investigation.

[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the
first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:


                PP_ASSERT_WITH_CODE(!(ret =
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard min uclk
failed!",
                                return ret);




hard_min_level is adjusted if disable_mclk_switching is set on line 3497.


        disable_mclk_switching = ((1 < hwmgr->display_config->num_display) &&
                           !hwmgr->display_config->multi_monitor_in_sync) ||
                            vblank_too_short;


        /* Hardmin is dependent on displayconfig */
        if (disable_mclk_switching) {
                dpm_table->dpm_state.hard_min_level =
dpm_table->dpm_levels[dpm_table->count - 1].value;
                for (i = 0; i < data->mclk_latency_table.count - 1; i++) {
                        if (data->mclk_latency_table.entries[i].latency <=
latency) {
                                if (dpm_table->dpm_levels[i].value >=
(hwmgr->display_config->min_mem_set_clock / 100)) {
                                        dpm_table->dpm_state.hard_min_level =
dpm_table->dpm_levels[i].value;
                                        break;
                                }
                        }
                }
        }


Interestingly, this also checks for the presence of multiple displays so we at
least have a connection between the code, error message and cause of the bug
(multiple displays). As a very crude test, I tried forcing it on and compiling
with

disable_mclk_switching = true;

No difference, so I also tried:

disable_mclk_switching = false;

Again, it didn't help. I will note that this code is identical in 5.0.13 so my
test was really only checking for an incorrect value being set elsewhere in
hwmgr->display_config->multi_monitor_in_sync or 
hwmgr->display_config->num_display. In 5.0.13 I do get mclk boosting, It idles
at 351mhz and boosts to 1001mhz so I don't think that forcing the memory to max
clock all the time is the correct solution.


I also diff'd vega20_hwmgr.c from 5.0.13 and 5.2.7  (I'll attach it). Here's a
few things I noticed:


in vega20_init_smc_table, this line has been added in this commit
https://github.com/torvalds/linux/commit/f5e79735cab448981e245a41ee6cbebf0e334f61
: 

+       data->vbios_boot_state.fclock = boot_up_values.ulFClk;

I don't know what fclock is, but this was never set in 5.0.13.


in vega20_setup_default_dpm_tables:

@@ -710,8 +729,10 @@ static int vega20_setup_default_dpm_tables(struct pp_hwmgr
*hwmgr)
                PP_ASSERT_WITH_CODE(!ret,
                                "[SetupDefaultDpmTable] failed to get fclk dpm
levels!",
                                return ret);
-       } else
-               dpm_table->count = 0;
+       } else {
+               dpm_table->count = 1;
+               dpm_table->dpm_levels[0].value = data->vbios_boot_state.fclock
/ 100;
+       }


in 5.0.13, dpm_table->count is set to 0, in 5.2.7 it's set and a dpm_level
added based on fclock. fclock appears throughout as a new addition. I don't
think this is the cause, but the addition of fclock may be worth exploring.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4889 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (75 preceding siblings ...)
  2019-08-11 18:43 ` bugzilla-daemon
@ 2019-08-11 18:45 ` bugzilla-daemon
  2019-08-11 22:31 ` bugzilla-daemon
                   ` (100 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 18:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 321 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #73 from Tom B <tom@r.je> ---
Created attachment 145026
  --> https://bugs.freedesktop.org/attachment.cgi?id=145026&action=edit
diff of vega20_hwmgr.c from 5.0.13 to 5.2.7

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1335 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (76 preceding siblings ...)
  2019-08-11 18:45 ` bugzilla-daemon
@ 2019-08-11 22:31 ` bugzilla-daemon
  2019-08-11 23:44 ` bugzilla-daemon
                   ` (99 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 22:31 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1006 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #74 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
Forcing the memory clock and voltage is not enough: the dc[en]x memory requests
should be given also the highest priority in the arbiter block. I don't recall
how it interacts with the dc[en]x watermarks, but they should be "disabled" or
"maxed out". Basically, whatever the 3D/compute/(vcn|vce/uvd) load, the dc[en]x
will always come first (due to the realtime nature of display data transmission
to monitors). Oh and of course, the smu/smc should not manage the dc[en]x. Very
probably, there are some smc/smu commands to do that.

If the GPU did not crash with dpm disabled as a whole, the proper way to
proceed would be to start from there and step by step add dpm features and see
when it starts crashing. It's not a small task since dpm code paths may be
scattered all over the code.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1805 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (77 preceding siblings ...)
  2019-08-11 22:31 ` bugzilla-daemon
@ 2019-08-11 23:44 ` bugzilla-daemon
  2019-08-12  3:12 ` bugzilla-daemon
                   ` (98 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-11 23:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2212 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #75 from ReddestDream <reddestdream@gmail.com> ---
>Here's some additional investigation.

>[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:

I agree that [SetUclkToHightestDpmLevel] is probably the key to all this as it
always seems to be the first thing that fails after dysregulation occurs. The
"Failed to send message 0x28, response 0x0" errors show that the driver is
sending wrong or at least wrongly timed commands to the GPU that eventually
cascade into complete failure.

>Again, it didn't help. I will note that this code is identical in 5.0.13 

I have also been unable to find changed code since 5.0 that could be directly
connected to display detect/init/enumeration issues on Radeon VII/Vega 20. This
is why I've come to suspect the error is triggered indirectly in a way that
will probably not be obvious and by code that was likely flawed from the
beginning of Radeon VII/Vega 20 support.

This is also why I was hopeful that 5.3-rc2 would fix this issue since it has
commits that do seem to affect display detection on AMD GPUs. Alas, it did not.
:(

>If the GPU did not crash with dpm disabled as a whole, the proper way to
proceed would be to start from there and step by step add dpm features and see
when it starts crashing. It's not a small task since dpm code paths may be
scattered all over the code.

Unfortunately, it does look like going through and slowing disabling features
and/or bisecting might be the only way to find how this issue got started. At
least if we could narrow it down, we might be in better shape. :/

I must admit I don't have much experience with graphics drivers and when I tell
other people about this issue, they immediately want to blame X or Mesa until I
explain that I can get these errors w/o starting any graphics at all. lol.

In any case, I really appreciate your testing Tom B. And any advice you might
have on debugging, Sylvain BERTRAND, is greatly appreciated. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3117 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (78 preceding siblings ...)
  2019-08-11 23:44 ` bugzilla-daemon
@ 2019-08-12  3:12 ` bugzilla-daemon
  2019-08-12  3:29 ` bugzilla-daemon
                   ` (97 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12  3:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 509 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #76 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
> Unfortunately, it does look like going through and slowing disabling features
> and/or bisecting might be the only way to find how this issue got started. At
> least if we could narrow it down, we might be in better shape. :/

I guess, you are good for a bisection if you have a "working" kernel.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1335 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (79 preceding siblings ...)
  2019-08-12  3:12 ` bugzilla-daemon
@ 2019-08-12  3:29 ` bugzilla-daemon
  2019-08-12  5:18 ` bugzilla-daemon
                   ` (96 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12  3:29 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 512 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #77 from ReddestDream <reddestdream@gmail.com> ---
>I guess, you are good for a bisection if you have a "working" kernel.

This is, based on everything here, I'm not convinced that 5.0.13 has 0 issues.
Only that it seems to have fewer issues. But yeah. I don't see anywhere else to
go but bisection from 5.0.13 to 5.1. That should at least find something . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1324 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (80 preceding siblings ...)
  2019-08-12  3:29 ` bugzilla-daemon
@ 2019-08-12  5:18 ` bugzilla-daemon
  2019-08-12  5:58 ` bugzilla-daemon
                   ` (95 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12  5:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 714 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #78 from Chris Hodapp <chris@hodapp.email> ---
> I don't see anywhere else to go but bisection from 5.0.13 to 5.1. That should at least find something . . .

I tried something like that before but a huge portion of the commits in that
range won't build kernels that can boot (at least on my system). I ended up
resorting to trying reverting individual vega20-affecting  commits out of 5.1.
See my results far above in the thread (though someone else willing to spend
more time doing a deeper analysis of the code could probably take my approach
much further).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1512 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (81 preceding siblings ...)
  2019-08-12  5:18 ` bugzilla-daemon
@ 2019-08-12  5:58 ` bugzilla-daemon
  2019-08-12 13:21 ` bugzilla-daemon
                   ` (94 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12  5:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2244 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #79 from ReddestDream <reddestdream@gmail.com> ---
>I tried something like that before but a huge portion of the commits in that range won't build kernels that can boot (at least on my system).

It's interesting that you found d1a3e239a6016f2bb42a91696056e223982e8538 to
improve the issue:

https://github.com/torvalds/linux/commit/d1a3e239a6016f2bb42a91696056e223982e8538#diff-0bc07842bc28283d64ffa6dd2ed716de

>From Tom B.'s and my review of the code, it seems very likely that somehow a
failure to set a hard minimum properly is at the heart of the issue. 

>This brings me to the second thing: When looking through the commits, I noticed that there were multiple commits that claim to prevent or reduce crashing in high-resolution situations (one references 5k displays, another references 3+ 4k displays).

Yeah. I have 2 4K displays as well. But I don't think it should really be
straining the card. These commits are probably overzealous for Radeon VII.
Rather it could be that at least part of the issue, especially the excessive
power draw at idle, is just due to these commits artificially setting minimums
very high. In fact, that could be why it's stable at all with just one monitor,
since the code to set the minimums up is only being triggered when there are
more monitors connected.

I'd suspect a boottime configuration issue too, but others have reported
instability even when the monitors are hotplugged later on. So, it seems like
maybe the monitor detect might at least partially be okay, but the
follow-through with raising the clock minimums is broken. I suspect the issue
is in the code calculating the minimum to set, so the driver gets stuck trying
to send incomplete/incorrect values to the card.

https://bbs.archlinux.org/viewtopic.php?id=247733

It does make me wonder if it's worth testing like 2 simple 1080p 60 Hz
displays. Maybe that wouldn't trigger this issue. Not that that would really be
of use to me. But it might help distinguish between just monitor detect
generally being broken and "high monitor load" being broken . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3285 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (82 preceding siblings ...)
  2019-08-12  5:58 ` bugzilla-daemon
@ 2019-08-12 13:21 ` bugzilla-daemon
  2019-08-12 14:34 ` bugzilla-daemon
                   ` (93 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 13:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2348 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #80 from Tom B <tom@r.je> ---
> I tried something like that before but a huge portion of the commits in that range won't build kernels that can boot (at least on my system). I ended up resorting to trying reverting individual vega20-affecting  commits out of 5.1. See my results far above in the thread (though someone else willing to spend more time doing a deeper analysis of the code could probably take my approach much further).

That's why my focus has been finding places in the code where something
different happens based on the number of displays. Though this may be a futile
avenue of exploration as it could just be an issue of additional memory
bandwith requirements or even something that should be done differently with 2
displays that isn't.

> It does make me wonder if it's worth testing like 2 simple 1080p 60 Hz displays. Maybe that wouldn't trigger this issue. Not that that would really be of use to me. But it might help distinguish between just monitor detect generally being broken and "high monitor load" being broken . . .

This would be an interesting test but I think 1080p 60hz monitors with
displayport are fairly uncommon and I don't have any to test with. My guess is
anyone with a Radeon VII, a high end card with 16gb VRAM, is likely to have a
high end display which could equally explain why there are no reports here of
people running 1080p 60hz displays. 

My next test is going to be logging dpm_table->dpm_state.hard_min_level on line
3354 (just before it's sent to the smc) on both 5.0.13 and 5.2.7 to see if the
same hard_min_level value is sent to the smc on both kernels. This will at
least let us know whether it's something that's incorrectly setting
hard_min_level or something that prevents the smc accepting the value. My hunch
from my previous tests is that it's the latter but I'll try it and report back.

I know nothing about driver development so I have no idea how this stuff should
work, I can only compare the differences between 5.0.13 and later kernels.

Anyway, thanks everyone for your input. Any information, even on things that
you tried and didn't work, is valuable as it can help us narrow down the
problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3173 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (83 preceding siblings ...)
  2019-08-12 13:21 ` bugzilla-daemon
@ 2019-08-12 14:34 ` bugzilla-daemon
  2019-08-12 15:34 ` bugzilla-daemon
                   ` (92 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 14:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3083 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #81 from Tom B <tom@r.je> ---
Created attachment 145038
  --> https://bugs.freedesktop.org/attachment.cgi?id=145038&action=edit
5.2.7 dmesg with hard_min_level logged

As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.

Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.

However, what is interesting is that it doesn't always fail.


[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!





Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, which
don't fail, before the card is initialised.


This is from 5.2.7:

[    3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[    5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0


And the same from 5.0.13:

[    3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    3.722422] amdgpu: [powerplay] hard_min_level: 1001
[    3.766269] amdgpu: [powerplay] hard_min_level: 1001
[    4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0


There are a couple of things here:

1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2"
line and "Initialized"

2. My patched code looks like this:

                pr_err("hard_min_level: %d\n",
                                        dpm_table->dpm_state.hard_min_level);

                PP_ASSERT_WITH_CODE(!(ret =
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard min uclk
failed!",
                                return ret);

Yet the log shows:

- My debug line 
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!

So initialization is happening between (and possibly a result of) sending the
message and getting the response.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4053 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (84 preceding siblings ...)
  2019-08-12 14:34 ` bugzilla-daemon
@ 2019-08-12 15:34 ` bugzilla-daemon
  2019-08-12 15:42 ` bugzilla-daemon
                   ` (91 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 15:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1010 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #82 from Tom B <tom@r.je> ---
In addition, I will note that the file vega20_baco.c has been added in 5.1 

details: https://www.phoronix.com/scan.php?page=news_item&px=AMD-Vega-12-BACO


commit:
https://github.com/torvalds/linux/commit/0c5ccf14f50431d0196b96025c878ae9f45676a9#diff-c2d82e6f1326b5b4e0a09c9cb42cbcc2 


This seems like quite a large change, and requires a special "workaround" for
Vega 20. Unfortunately, this seems like quite a large code restructure in the
driver as I cannot just revert that single commit. 

I mention this because part of the problem I am seeing is with the wrong
wattage. I wonder whether BACO wrongly tries to turn off a part of the card
that is required for a secondary monitor and as such puts the card in an
invalid state.

I'm going to see if I can disable/revert BACO entirely to at least rule it out.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1995 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (85 preceding siblings ...)
  2019-08-12 15:34 ` bugzilla-daemon
@ 2019-08-12 15:42 ` bugzilla-daemon
  2019-08-12 15:53 ` bugzilla-daemon
                   ` (90 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 15:42 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2566 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #83 from ReddestDream <reddestdream@gmail.com> ---
> Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and 5.2.7 so the issue is not the value from the dpm table. The dpm table is probably correct. 

Fantastic! Glad you tested this. I had suspected the hard_min_level was bogus
and that's why it was failing. Card was rejecting the bogus value. Glad to know
that's not the case.

> However, what is interesting is that it doesn't always fail.

Yeah. I've had boots where I have my 2 4K DP monitors in and I don't get
powerplay error on boot. In fact, it can go a bit and seem stable. But then the
powerplay errors suddenly (not related to some high load on the card) start
showing up again and the graphics become unstable. Similarly others have
reported that on hotplugging a second monitor after boot, the powerplay errors
will start showing up.

So, maybe there is a timing problem involved with sending the message. It's
generally a question of when rather than if it's going to fail.

> 1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2" line and "Initialized"

Is it always called twice? Even on 5.2.7? Because it looks like it might get
called two times right before "Initialized" on 5.0.13 but then only once on
5.2.7 before "Initialized" kicks in. Maybe "Initialized" is interrupting on
5.2.7 but not on 5.0.13. It's possible that Initialization of the card is
messing up values that powerplay needs to read off the card or making the card
unavailable for receiving messages or something . . .

> So initialization is happening between (and possibly a result of) sending the message and getting the response

Yeah. Something is definitely happening while
vega20_set_uclk_to_highest_dpm_level is running . . . Not 100% sure that's
really problematic tho . . .  But it could be an atomicity issue. Need to
figure out what exactly what is generating the line "[drm] Initialized amdgpu
3.27.0 20150101 for 0000:44:00.0 on minor 0." Looks like it's coming from the
drm core rather than amdgpu specifically.

> I'm going to see if I can disable/revert BACO entirely to at least rule it out.

I thought BACO was reverted for Vega 20 here:

https://github.com/torvalds/linux/commit/7db329e57b90ddebcb58fc88eedbb3082d22a957#diff-8a4d25be8ad5d9c3ff27bb54b678dab2

Your commit seems to have been introduced in 5.2-rc1, not 5.1.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3686 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (86 preceding siblings ...)
  2019-08-12 15:42 ` bugzilla-daemon
@ 2019-08-12 15:53 ` bugzilla-daemon
  2019-08-12 15:56 ` bugzilla-daemon
                   ` (89 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 15:53 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 478 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #84 from ReddestDream <reddestdream@gmail.com> ---
>Need to figure out what exactly what is generating the line "[drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on minor 0."

That "Initialized amdgpu" message seems to be coming from here:

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_drv.c#L994

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1391 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (87 preceding siblings ...)
  2019-08-12 15:53 ` bugzilla-daemon
@ 2019-08-12 15:56 ` bugzilla-daemon
  2019-08-12 16:32 ` bugzilla-daemon
                   ` (88 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 15:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1140 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #85 from Tom B <tom@r.je> ---

> Yeah. I've had boots where I have my 2 4K DP monitors in and I don't get powerplay error on boot. In fact, it can go a bit and seem stable.

In addition to that, vega20_set_fclk_to_highest_dpm_level is called several
times before the card is initialized and even on 5.2.7 works. Something happens
during or just before the initialization stage that stops
smum_send_msg_to_smc_with_parameter accepting 1001 as a valid value, as it does
until that point.

I think you're right about BACO, it was worth looking at but I applied a quick
hack to ensure it's disabled:

int vega20_baco_set_state(struct pp_hwmgr *hwmgr, enum BACO_STATE state)
{
        return 0;
}

int vega20_baco_get_capability(struct pp_hwmgr *hwmgr, bool *cap)
{
    *cap = false;
    return 0;
}

No difference, I still get the errors and wrong wattage so unless BACO is
somehow on by default and only turned off in the proper version of this code,
we can rule it out.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1921 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (88 preceding siblings ...)
  2019-08-12 15:56 ` bugzilla-daemon
@ 2019-08-12 16:32 ` bugzilla-daemon
  2019-08-12 16:38 ` bugzilla-daemon
                   ` (87 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 16:32 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1103 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #86 from ReddestDream <reddestdream@gmail.com> ---
>In addition to that, vega20_set_fclk_to_highest_dpm_level is called several times before the card is initialized and even on 5.2.7 works. Something happens during or just before the initialization stage that stops smum_send_msg_to_smc_with_parameter accepting 1001 as a valid value, as it does until that point.

Could be we've got a race condition between the powerplay setup and amdgpu
handing off the card to drm_dev_register to advertise it for normal use.

drm_dev_register is responsible for the "[drm] Initialized" message:

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_drv.c#L994

And it seems like amdgpu calls it here:

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1054

Odd that it's doing this if powerplay still has more work to do. And that might
be why vega20_set_uclk_to_highest_dpm_level fails that last time.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2112 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (89 preceding siblings ...)
  2019-08-12 16:32 ` bugzilla-daemon
@ 2019-08-12 16:38 ` bugzilla-daemon
  2019-08-12 16:47 ` bugzilla-daemon
                   ` (86 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 16:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1158 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #87 from Tom B <tom@r.je> ---
> Could be we've got a race condition between the powerplay setup and amdgpu
handing off the card to drm_dev_register to advertise it for normal use.

The question then becomes: Why doesn't the race condition happen with only one
screen? Perhaps it's a matter of speed. With a single display, the driver
detect the displays, read/parse the EDID data, initialize in time. But then
that doesn't explain why the crash still occurs if you boot with one
DisplayPort monitor and attach another after X is running.

One thing I've been trying to work out is the difference between vega21_ppt.c
and   vega20_hwmgr.c is, as they both contain slightly different or identical
versions of the same functions. It looks like the functions in vega20_hwmgr.c 
take precedence but it's strange to see this duplication and both files are
worked on in the commit history.

Take a look at vega20_set_uclk_to_highest_dpm_level and
vega20_apply_clocks_adjust_rules in both for examples.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1939 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (90 preceding siblings ...)
  2019-08-12 16:38 ` bugzilla-daemon
@ 2019-08-12 16:47 ` bugzilla-daemon
  2019-08-12 16:57 ` bugzilla-daemon
                   ` (85 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 16:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1526 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #88 from ReddestDream <reddestdream@gmail.com> ---
>The question then becomes: Why doesn't the race condition happen with only one screen? Perhaps it's a matter of speed. With a single display, the driver detect the displays, read/parse the EDID data, initialize in time. But then that doesn't explain why the crash still occurs if you boot with one DisplayPort monitor and attach another after X is running.

I do suspect it's a matter of speed and complexity when you have more monitors.
Also maybe the clock it tries to set (the value of hard_min_level) is different
if you only have one monitor and somehow that takes more time (resetting it
away from some default).

I do wonder if maybe in:

"[SetUclkToHightestDpmLevel] Set hard min uclk failed!",
                                return ret);

It should return -EINVAL instead. Maybe then it would reset and try again
instead of just ignoring it and continuing with initialization anyway, leading
to instability.

>One thing I've been trying to work out is the difference between vega21_ppt.c and   vega20_hwmgr.c is, as they both contain slightly different or identical versions of the same functions. It looks like the functions in vega20_hwmgr.c  take precedence but it's strange to see this duplication and both files are worked on in the commit history.

Hmm. That is interesting. I'll take a look.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2369 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (91 preceding siblings ...)
  2019-08-12 16:47 ` bugzilla-daemon
@ 2019-08-12 16:57 ` bugzilla-daemon
  2019-08-12 17:40 ` bugzilla-daemon
                   ` (84 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 16:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1269 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #89 from Tom B <tom@r.je> ---
> It should return -EINVAL instead. Maybe then it would reset and try again instead of just ignoring it and continuing with initialization anyway, leading to instability.

If you look at vega20_send_msg_to_smc_with_parameter: 

static int vega20_send_msg_to_smc_with_parameter(struct pp_hwmgr *hwmgr,
                uint16_t msg, uint32_t parameter)
{
        struct amdgpu_device *adev = hwmgr->adev;
        int ret = 0;

        vega20_wait_for_response(hwmgr);

        WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90, 0);

        WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82, parameter);

        vega20_send_msg_to_smc_without_waiting(hwmgr, msg);

        ret = vega20_wait_for_response(hwmgr);
        if (ret != PPSMC_Result_OK)
                pr_err("Failed to send message 0x%x, response 0x%x\n", msg,
ret);

        return (ret == PPSMC_Result_OK) ? 0 : -EIO;
}


It returns 0 on success and -EIO on failure, which is then in turn returned
from vega20_set_fclk_to_highest_dpm_leve. Where did you see the check/retry on
EINVAL? Perhaps -EIO should be -EINVAL?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2063 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (92 preceding siblings ...)
  2019-08-12 16:57 ` bugzilla-daemon
@ 2019-08-12 17:40 ` bugzilla-daemon
  2019-08-12 18:37 ` bugzilla-daemon
                   ` (83 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 17:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1711 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #90 from Tom B <tom@r.je> ---
I'm not sure this is helpful but I managed to somewhat test the race condition
theory.

If you follow the callstack:

vega20_set_fclk_to_highest_dpm_level -> smum_send_msg_to_smc_with_parameter ->
vega20_send_msg_to_smc_with_parameter -> vega20_wait_for_response ->
phm_wait_for_register_unequal you find this code in smu_helper.c:

int phm_wait_on_register(struct pp_hwmgr *hwmgr, uint32_t index,
                         uint32_t value, uint32_t mask)
{
        uint32_t i;
        uint32_t cur_value;

        if (hwmgr == NULL || hwmgr->device == NULL) {
                pr_err("Invalid Hardware Manager!");
                return -EINVAL;
        }

        for (i = 0; i < hwmgr->usec_timeout; i++) {
                cur_value = cgs_read_register(hwmgr->device, index);
                if ((cur_value & mask) == (value & mask))
                        break;
                udelay(1);
        }

        /* timeout means wrong logic*/
        if (i == hwmgr->usec_timeout)
                return -1;
        return 0;
}


The timeout there is interesting. I increased it.


for (i = 0; i < hwmgr->usec_timeout*10; i++) {
                cur_value = cgs_read_register(hwmgr->device, index);
                if ((cur_value & mask) == (value & mask))
                        break;
                udelay(1);
        }


The PC takes significantly longer to boot (10 or so seconds when it's usually
instant) and the error still occurs. So I'm not sure it's just a matter of
waiting.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2523 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (93 preceding siblings ...)
  2019-08-12 17:40 ` bugzilla-daemon
@ 2019-08-12 18:37 ` bugzilla-daemon
  2019-08-13  3:15 ` bugzilla-daemon
                   ` (82 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-12 18:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1620 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #91 from ReddestDream <reddestdream@gmail.com> ---
>It returns 0 on success and -EIO on failure, which is then in turn returned from vega20_set_fclk_to_highest_dpm_leve. Where did you see the check/retry on EINVAL? Perhaps -EIO should be -EINVAL?

I didn't find check/retry code. It was more just a thought that maybe we could
keep vega20_set_uclk_to_highest_dpm_level from just returning despite the error
and allowing further initialization to proceed. Even if it crashed, that might
be even be helpful since it's not clear if it's the initialization
(drm_dev_register) or something else that is silent in the logs that is
changing something and causing vega20_set_uclk_to_highest_dpm_level to fail
where we know it succeeded so many times before.

>I'm not sure this is helpful but I managed to somewhat test the race condition theory.

If there is a race, I'm not sure it's in the time the driver waits for the
hardware registers to respond and/or the value to set. But it's still
enlightening.

At this point it seems more likely that something else we aren't seeing in the
logs is breaking vega20_set_uclk_to_highest_dpm_level in the last moments
(unlikely due to the dpm_state.hard_min_level value), it falls through and
drm_dev_register runs and initialization message prints. amdgpu doesn't
consider the "[SetUclkToHightestDpmLevel] Set hard min uclk failed!" to be a
significant enough error to stop initialization. But maybe it should . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2463 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (94 preceding siblings ...)
  2019-08-12 18:37 ` bugzilla-daemon
@ 2019-08-13  3:15 ` bugzilla-daemon
  2019-08-13  3:33 ` bugzilla-daemon
                   ` (81 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13  3:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1206 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #92 from ReddestDream <reddestdream@gmail.com> ---
>If you follow the callstack:

I've been thinking all this over. The only thing unfortunately that really
sticks out at me still is how Chris Hodapp says that reverting this commit:

https://github.com/torvalds/linux/commit/d1a3e239a6016f2bb42a91696056e223982e8538#diff-0bc07842bc28283d64ffa6dd2ed716de

Seems to improve things. Considering that we now know from Tom B.'s work that
dpm_state.hard_min_level is apparently calculated correctly and stable the
entire time, it doesn't make sense that reverting this commit could fix
anything. 

The code seems very similar to what we see in
vega20_notify_smc_display_config_after_ps_adjustment near where we get the "
[SetHardMinFreq] Set hard min uclk failed!" Maybe this
smum_send_msg_to_smc_with_parameter get through where others fail because of
the formatting or something?

Thanks again Tom B. for all your testing. I'd like to do some tests of my own,
but time's just not permitting for me ATM. Hoping to be more free next weekend.
:/

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2152 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (95 preceding siblings ...)
  2019-08-13  3:15 ` bugzilla-daemon
@ 2019-08-13  3:33 ` bugzilla-daemon
  2019-08-13 13:05 ` bugzilla-daemon
                   ` (80 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13  3:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 335 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #93 from Chris Hodapp <chris@hodapp.email> ---
Note: It might be good for someone else to double-check my conclusion before
too much stock is put into it. Scientific method and all that.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (96 preceding siblings ...)
  2019-08-13  3:33 ` bugzilla-daemon
@ 2019-08-13 13:05 ` bugzilla-daemon
  2019-08-13 13:35 ` bugzilla-daemon
                   ` (79 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13 13:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2228 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #94 from Tom B <tom@r.je> ---
Reverting d1a3e239a6016f2bb42a91696056e223982e8538 didn't fix it for me. But
that commit may give some insight because it is related to uclk which is the
first error we get.

I also tried globally increasing usec_timeout as it's used in a few places
(patch below). This makes the PC take about a minute to boot up, so clearly the
GPU is in an invalid state before these timeouts are hit and then each
subsequent call to smum_send_msg_to_smc_with_parameter causes a delay because
each call times out. Whatever happens, puts the card into a state that it can't
recover from.

The next step is to try to find where vega20_set_uclk_to_highest_dpm_level is
called from and see what happens just before the call to this function.



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f4ac632a87b2..9b878c74b17e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2418,7 +2418,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
        adev->pdev = pdev;
        adev->flags = flags;
        adev->asic_type = flags & AMD_ASIC_MASK;
-       adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT;
+       adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT*10;
        if (amdgpu_emu_mode == 1)
                adev->usec_timeout *= 2;
        adev->gmc.gart_size = 512 * 1024 * 1024;
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index a7e8340baf90..a6b2bc4277ef 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -84,7 +84,7 @@ int hwmgr_early_init(struct pp_hwmgr *hwmgr)
        if (!hwmgr)
                return -EINVAL;

-       hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT;
+       hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT*10;
        hwmgr->pp_table_version = PP_TABLE_V1;
        hwmgr->dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;
        hwmgr->request_dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3050 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (97 preceding siblings ...)
  2019-08-13 13:05 ` bugzilla-daemon
@ 2019-08-13 13:35 ` bugzilla-daemon
  2019-08-13 15:20 ` bugzilla-daemon
                   ` (78 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13 13:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 608 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #95 from Tom B <tom@r.je> ---
So here's something interesting. In 5.0.13 there is no function
vega20_display_config_changed.  This function issues
smu_send_smc_msg_with_param(smu, SMU_MSG_NumOfDisplays, 0);

In fact, in 5.0.13 there is no reference at all to SMU_MSG_NumOfDisplays
anywhere in the amdgpu driver. 

Which means, the way that the number of displays is configured is changed in
5.0.13, or done with a hardcoded value instead of a constant.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1358 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (98 preceding siblings ...)
  2019-08-13 13:35 ` bugzilla-daemon
@ 2019-08-13 15:20 ` bugzilla-daemon
  2019-08-13 17:11 ` bugzilla-daemon
                   ` (77 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13 15:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3995 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #96 from Tom B <tom@r.je> ---
Created attachment 145047
  --> https://bugs.freedesktop.org/attachment.cgi?id=145047&action=edit
logging anywhere the number of screens is set

Again, no closer to a fix but another thing to rule out. In addition to
SMU_MSG_NumOfDisplays, PPSMC_MSG_NumOfDisplays is also used.

I put a debug message anywhere PPSMC_MSG_NumOfDisplays or SMU_MSG_NumOfDisplays
is set end put else blocks in places where it may have been set:

        if ((data->water_marks_bitmap & WaterMarksExist) &&
            data->smu_features[GNLD_DPM_DCEFCLK].supported &&
            data->smu_features[GNLD_DPM_SOCCLK].supported) {

                pr_err("vega20_display_configuration_changed_task setting
PPSMC_MSG_NumOfDisplays to %d\n", hwmgr->display_config->num_display);

                result = smum_send_msg_to_smc_with_parameter(hwmgr,
                        PPSMC_MSG_NumOfDisplays,
                        hwmgr->display_config->num_display);
        }
        else {
                pr_err("vega20_display_configuration_changed_task not setting
PPSMC_MSG_NumOfDisplays\n");
        }

        return result;
}


Here's what I found:

- The functions dealing with screesn in vega20_ppt.c are never used (
vega20_display_config_changed, vega20_pre_display_config_changed) and can be
ignored for our further tests

- The line: 

result = smum_send_msg_to_smc_with_parameter(hwmgr,                     
PPSMC_MSG_NumOfDisplays, hwmgr->display_config->num_display);

Is never executed, it always triggers the else block so PPSMC_MSG_NumOfDisplays
is never set using num_display.

- The same thing happens in 5.0.13, when I saw the above result I had hoped
that the problem was that  smum_send_msg_to_smc_with_parameter(hwmgr,           
PPSMC_MSG_NumOfDisplays, hwmgr->display_config->num_display); was never called
with the correct number of displays. Unfortunately the behaviour is the same on
5.0.13, PPSMC_MSG_NumOfDisplays is only ever set to zero in both versions of
the kernel.


Unfortunately this doesn't get us any closer.


The instruction is sent a lot more in 5.0.13 though. 

5.0.13:

[    3.475471] amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    3.475472] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    3.475508] amdgpu: [powerplay] vega20_display_configuration_changed_task
not setting PPSMC_MSG_NumOfDisplays
[    3.794037] amdgpu: [powerplay]
vega20_pre_display_configuration_changed_task setting PPSMC_MSG_NumOfDisplays
to 0
[    3.800180] amdgpu: [powerplay] vega20_display_configuration_changed_task
not setting PPSMC_MSG_NumOfDisplays
[    3.833502] amdgpu: [powerplay]
vega20_pre_display_configuration_changed_task setting PPSMC_MSG_NumOfDisplays
to 0
[    3.833647] amdgpu: [powerplay] vega20_display_configuration_changed_task
not setting PPSMC_MSG_NumOfDisplays
[    4.153232] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0
[    4.664044] amdgpu: [powerplay]
vega20_pre_display_configuration_changed_task setting PPSMC_MSG_NumOfDisplays
to 0


5.2.7
[    3.711028] amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    3.711028] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    4.086310] amdgpu: [powerplay]
vega20_pre_display_configuration_changed_task setting PPSMC_MSG_NumOfDisplays
to 0
[    4.385470] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.522398] amdgpu: [powerplay] Failed to send message 0x28, response 0x0

Notice that vega20_pre_display_configuration_changed_task is run 5 times
between the ring lines and initilization line in 5.0.13 and only once in 5.2.7.

This might not mean anything, but it could be another clue that initilization
is happening before the card is really ready.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 5000 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (99 preceding siblings ...)
  2019-08-13 15:20 ` bugzilla-daemon
@ 2019-08-13 17:11 ` bugzilla-daemon
  2019-08-13 18:33 ` bugzilla-daemon
                   ` (76 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13 17:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 875 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #97 from Tom B <tom@r.je> ---
I've been investigating this:

https://github.com/torvalds/linux/commit/94ed6d0cfdb867be9bf05f03d682980bce5d0036

Because vega20 doesn't export display_configuration_change, it jumps to the
newly added else block and calls smu_display_configuration_change. This didn't
happen in 5.0.13. It's not the cause of this as I commented it out and it still
breaks. 
I'll also note that pp_display_cfg->display_count is correct at this point, it
shows 2 for me with 2 screens connected. But why doesn't vega20 export
display_configuration_change? It has display_config_changed and I can't find
where that's called from so I wonder if display_config_changed should be being
called at this point.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1724 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (100 preceding siblings ...)
  2019-08-13 17:11 ` bugzilla-daemon
@ 2019-08-13 18:33 ` bugzilla-daemon
  2019-08-14 15:44 ` bugzilla-daemon
                   ` (75 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-13 18:33 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 606 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #98 from Sylvain BERTRAND <sylvain.bertrand@gmail.com> ---
> The code seems very similar to what we see in
> vega20_notify_smc_display_config_after_ps_adjustment near where we get the "
> [SetHardMinFreq] Set hard min uclk failed!" Maybe this
> smum_send_msg_to_smc_with_parameter get through where others fail because of
> the formatting or something?

It seems there is a patch from amd about smu v11 and this smc/smu command.
I may be wrong though.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1438 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (101 preceding siblings ...)
  2019-08-13 18:33 ` bugzilla-daemon
@ 2019-08-14 15:44 ` bugzilla-daemon
  2019-08-14 17:30 ` bugzilla-daemon
                   ` (74 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-14 15:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1475 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #99 from Tom B <tom@r.je> ---
Created attachment 145062
  --> https://bugs.freedesktop.org/attachment.cgi?id=145062&action=edit
a list of commits 5.0.13 - 5.1.0

Attached is a list of all amdgpu and powerplay commits from 5.0.13 - 5.1.0. 

I have tried reverting the following which looked most likely culprits:

919a94d8101ebc29868940b580fe9e9811b7dc86 drm/amdgpu: fix CPDMA hang in PRT mode
for VEGA20

f7b1844bacecca96dd8d813675e4d8adec02cd66 drm/amdgpu: Update gc golden setting
for vega family

d25689760b747287c6ca03cfe0729da63e0717f4 drm/amdgpu/display:
drm/amdgpu/display: Keep malloc ref to MST port  -- A change to the way
displayport connectors are handled, looked promising.

db64a2f43c1bc22c5ff2d22606000b8c3587d0ec drm/amd/powerplay: fix possible hang
with 3+ 4K monitors


I also looked at that last one in detail as it seems very close to this bug.
Nothing in the code looks for 3+ monitors or even 4k. It only actually looks
for > 1 monitor.

Although it's based on disable_mclk_switching, I also tried forcing
disable_fclk_switching to true and false, neither had any affect. The result is
that mclk would be calculated based on screens but fclk would be forced on/off.
 It didn't help but I can't help think that this commit is a little too close
to this issue to be irrelevant.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2384 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (102 preceding siblings ...)
  2019-08-14 15:44 ` bugzilla-daemon
@ 2019-08-14 17:30 ` bugzilla-daemon
  2019-08-16  5:58 ` bugzilla-daemon
                   ` (73 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-14 17:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 7251 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #100 from Tom B <tom@r.je> ---
I've bee trying to work backwards to find the place where screens get
initialised and eventually call vega20_pre_display_configuration_changed_task. 

vega20_pre_display_configuration_changed_task is exported as
pp_hwmgr_func::display_config_changed

Which is called form hardwaremanager.c:phm_pre_display_configuration_changed 

phm_pre_display_configuration_changed is called from
hwmghr.c:hwmgr_handle_task:

        switch (task_id) {
        case AMD_PP_TASK_DISPLAY_CONFIG_CHANGE:
                ret = phm_pre_display_configuration_changed(hwmgr);


pp_dpm_dispatch_tasks is exported as amd_pm_funcs::dispatch_tasks is called
from amdgpu_dpm_dispatch_task which is called in amdgpu_pm.c:


void amdgpu_pm_compute_clocks(struct amdgpu_device *adev)
{
        int i = 0;

        if (!adev->pm.dpm_enabled)
                return;

        if (adev->mode_info.num_crtc)
                amdgpu_display_bandwidth_update(adev);

        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
                struct amdgpu_ring *ring = adev->rings[i];
                if (ring && ring->sched.ready)
                        amdgpu_fence_wait_empty(ring);
        }

        if (is_support_sw_smu(adev)) {
                struct smu_context *smu = &adev->smu;
                struct smu_dpm_context *smu_dpm = &adev->smu.smu_dpm;
                mutex_lock(&(smu->mutex));
                smu_handle_task(&adev->smu,
                                smu_dpm->dpm_level,
                                AMD_PP_TASK_DISPLAY_CONFIG_CHANGE);
                mutex_unlock(&(smu->mutex));
        } else {
                if (adev->powerplay.pp_funcs->dispatch_tasks) {
                        if (!amdgpu_device_has_dc_support(adev)) {
                                mutex_lock(&adev->pm.mutex);
                                amdgpu_dpm_get_active_displays(adev);
                                adev->pm.pm_display_cfg.num_display =
adev->pm.dpm.new_active_crtc_count;
                                adev->pm.pm_display_cfg.vrefresh =
amdgpu_dpm_get_vrefresh(adev);
                                adev->pm.pm_display_cfg.min_vblank_time =
amdgpu_dpm_get_vblank_time(adev);
                                /* we have issues with mclk switching with
refresh rates over 120 hz on the non-DC code. */
                                if (adev->pm.pm_display_cfg.vrefresh > 120)
                                        adev->pm.pm_display_cfg.min_vblank_time
= 0;
                                if
(adev->powerplay.pp_funcs->display_configuration_change)
                                       
adev->powerplay.pp_funcs->display_configuration_change(
                                                                       
adev->powerplay.pp_handle,
                                                                       
&adev->pm.pm_display_cfg);
                                mutex_unlock(&adev->pm.mutex);
                        }
                        amdgpu_dpm_dispatch_task(adev,
AMD_PP_TASK_DISPLAY_CONFIG_CHANGE, NULL);
                } else {
                        mutex_lock(&adev->pm.mutex);
                        amdgpu_dpm_get_active_displays(adev);
                        amdgpu_dpm_change_power_state_locked(adev);
                        mutex_unlock(&adev->pm.mutex);
                }
        }
}


This is the only place I can see AMD_PP_TASK_DISPLAY_CONFIG_CHANGE being called
from, which eventually is where vega20_pre_display_configuration_changed_task
gets called.

Presumably the code:

        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
                struct amdgpu_ring *ring = adev->rings[i];
                if (ring && ring->sched.ready)
                        amdgpu_fence_wait_empty(ring);
        }



is what generates 


[    3.683718] amdgpu 0000:44:00.0: ring gfx uses VM inv eng 0 on hub 0
[    3.683719] amdgpu 0000:44:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    3.683720] amdgpu 0000:44:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    3.683720] amdgpu 0000:44:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    3.683721] amdgpu 0000:44:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    3.683722] amdgpu 0000:44:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    3.683722] amdgpu 0000:44:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    3.683723] amdgpu 0000:44:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    3.683724] amdgpu 0000:44:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    3.683724] amdgpu 0000:44:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    3.683725] amdgpu 0000:44:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[    3.683726] amdgpu 0000:44:00.0: ring page0 uses VM inv eng 1 on hub 1
[    3.683726] amdgpu 0000:44:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[    3.683727] amdgpu 0000:44:00.0: ring page1 uses VM inv eng 5 on hub 1
[    3.683728] amdgpu 0000:44:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[    3.683728] amdgpu 0000:44:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[    3.683729] amdgpu 0000:44:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[    3.683730] amdgpu 0000:44:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[    3.683730] amdgpu 0000:44:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub
1
[    3.683731] amdgpu 0000:44:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub
1
[    3.683731] amdgpu 0000:44:00.0: ring vce0 uses VM inv eng 12 on hub 1
[    3.683732] amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
[    3.683733] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1

In dmesg. I'll add a pr_err() to verify this.  If so, it means our issue is
introduced somewhere between that for loop and amdgpu_dpm_dispatch_task in this
function. 


amdgpu_pm_compute_clocks is called from
amdgpu_dm_pp_smu.c:dm_pp_apply_display_requirements which is called in
dce_clk_mgr.c in two places: dce_pplib_apply_display_requirements and
dce11_pplib_apply_display_requirements. I don't know which is used for the VII,
I'll add some logging to verify.

But here's something that may be relevant to this bug. In
dce11_pplib_apply_display_requirements there's a check for the number of
displays:


        /* TODO: is this still applicable?*/
        if (pp_display_cfg->display_count == 1) {
                const struct dc_crtc_timing *timing =
                        &context->streams[0]->timing;

                pp_display_cfg->crtc_index =
                        pp_display_cfg->disp_configs[0].pipe_idx;
                pp_display_cfg->line_time_in_us = timing->h_total * 10000 /
timing->pix_clk_100hz;
        }


So there's something that is different when mroe than one display is connected.
That's as far as I got walking backwards through the code. I'll note that this
was also present in 5.0.1, but it could be that something is relying on
ctrc_inxex or line_time_in_us, which wasn't checked previously as these values
only appear to be set if there is a single display.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 8185 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (103 preceding siblings ...)
  2019-08-14 17:30 ` bugzilla-daemon
@ 2019-08-16  5:58 ` bugzilla-daemon
  2019-08-16 10:10 ` bugzilla-daemon
                   ` (72 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16  5:58 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1218 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #101 from ReddestDream <reddestdream@gmail.com> ---
Grasping at straws a bit here, but it occurred to me that maybe Linux kernel
testing on Radeon VII was done on an early VBIOS that didn't have full UEFI
support yet. We know that AMD had to issue a VBIOS update for Radeon VII to fix
UEFI support shortly after the launch. So maybe enabling the CSM/Legacy Support
in the BIOS, which does impact early GPU initialization, might have some effect
on the multimonitor problem? Something I plan to test, but I wanted to share
the idea in case someone else has a chance first.

>This might not mean anything, but it could be another clue that initilization is happening before the card is really ready.

Also, I considered that both of my monitors have audio out support. I wonder if
audio initialization might be the missing piece to the puzzle, the thing that
interrupts/changes the state of the card and prevents
smu_send_smc_msg_with_param from working where it did before. I know that in
the past with previous AMD cards, display audio has been buggy . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2021 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (104 preceding siblings ...)
  2019-08-16  5:58 ` bugzilla-daemon
@ 2019-08-16 10:10 ` bugzilla-daemon
  2019-08-16 10:35 ` bugzilla-daemon
                   ` (71 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 10:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1105 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #102 from Tom B <tom@r.je> ---
> Grasping at straws a bit here, but it occurred to me that maybe Linux kernel testing on Radeon VII was done on an early VBIOS that didn't have full UEFI support yet. We know that AMD had to issue a VBIOS update for Radeon VII to fix UEFI support shortly after the launch. So maybe enabling the CSM/Legacy Support in the BIOS, which does impact early GPU initialization, might have some effect on the multimonitor problem? Something I plan to test, but I wanted to share the idea in case someone else has a chance first.

I had already tried that unfortunately, I tried the following BIOS options:

CSM on/off
IOMMU on/of
PCIE speed 16x/4x (the only options my motherboard allowed for some reason)

Having said that, I didn't try booting using grub in BIOS mode as I  didn't
want to change my partition table, so it's possible that although I had used
CSM, it was only legacy support and still booting in UEFI mode.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1887 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (105 preceding siblings ...)
  2019-08-16 10:10 ` bugzilla-daemon
@ 2019-08-16 10:35 ` bugzilla-daemon
  2019-08-16 10:41 ` bugzilla-daemon
                   ` (70 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 10:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 740 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #103 from Peter Hercek <phercek@gmail.com> ---
I boot in BIOS mode and I'm still getting these errors. Though they are rare in
my case with the "better" kernels (around once a week).

Just a note: There were tearing errors in windows drivers of Radeon VII too.
One of the reasons for it was different refresh rate for different monitors.
They recommended to set all refresh rates to 60 Hz or its multiple till it is
fixed. In my case it is not completely possible (one monitor supports 60 Hz,
but other two monitors support only 59.95 Hz). I have slight difference in the
frequencies.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1517 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (106 preceding siblings ...)
  2019-08-16 10:35 ` bugzilla-daemon
@ 2019-08-16 10:41 ` bugzilla-daemon
  2019-08-16 13:10 ` bugzilla-daemon
                   ` (69 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 10:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 973 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #104 from Tom B <tom@r.je> ---
I did get very similar crashing when I was running HDMI + DP at different
refresh rates ( see https://bugs.freedesktop.org/show_bug.cgi?id=110510 ). I
switched to DP + DP because HDMI+DP wasn't stable, it could be related.

the tl;dir from that bug report, and this was on 5.0.9:

- HDMI alone at 60hz works but the screen flickers off every 3-5 minutes
- HDMI alone works at 59.9hz without any flickering
- HDMI 60hz + DP 60hz works, but the HDMI screen flickers off every 3-5 minutes
- HDMI 59.94hz + DP 60hz freezes the PC instantly.

Unfortunately my monitors don't support displayport at 59.94hz so I couldn't
test that combination as I think it would have worked. 

Still, it does tell us that these could be related and the issue could be
syncing between the two displays.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1879 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (107 preceding siblings ...)
  2019-08-16 10:41 ` bugzilla-daemon
@ 2019-08-16 13:10 ` bugzilla-daemon
  2019-08-16 13:18 ` bugzilla-daemon
                   ` (68 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 13:10 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 792 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #105 from Tom B <tom@r.je> ---
> Also, I considered that both of my monitors have audio out support. I wonder if audio initialization might be the missing piece to the puzzle, the thing that interrupts/changes the state of the card and prevents smu_send_smc_msg_with_param from working where it did before. I know that in the past with previous AMD cards, display audio has been buggy . 

I just tried setting admgpu.audio=0 and it didn't help. Though it doesn't rule
out audio entirely, the audio backend is probably still used as part of the
connection to the monitor, I'd imagine it just prevents the card appearing as
an output device.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1574 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (108 preceding siblings ...)
  2019-08-16 13:10 ` bugzilla-daemon
@ 2019-08-16 13:18 ` bugzilla-daemon
  2019-08-16 14:17 ` bugzilla-daemon
                   ` (67 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 13:18 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 635 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #106 from Tom B <tom@r.je> ---
Booting with amdgpu.dpm=0 on 5.2.7 works.

Performance is poor and as expected I cannot get any information about power
states because /sys/kernel/debug/dri/0/amdgpu_pm_info doesn't exist. I'm
guessing it runs at minimum clocks as I get ~10-17fps in unigine-heaven instead
of ~60-100. 

It is a DPM issue of some kind so although my earlier tests showed that
hard_min_level was set correctly, it still could be an issue elsewhere in the
DPM table.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1386 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (109 preceding siblings ...)
  2019-08-16 13:18 ` bugzilla-daemon
@ 2019-08-16 14:17 ` bugzilla-daemon
  2019-08-16 21:06 ` bugzilla-daemon
                   ` (66 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 14:17 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 481 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #107 from ReddestDream <reddestdream@gmail.com> ---
> Booting with amdgpu.dpm=0 on 5.2.7 works.

> It is a DPM issue of some kind so although my earlier tests showed that hard_min_level was set correctly, it still could be an issue elsewhere in the DPM table.

Great news! At least now we have a better place to investigate . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1315 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (110 preceding siblings ...)
  2019-08-16 14:17 ` bugzilla-daemon
@ 2019-08-16 21:06 ` bugzilla-daemon
  2019-08-16 22:14 ` bugzilla-daemon
                   ` (65 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 21:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 588 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #108 from ReddestDream <reddestdream@gmail.com> ---
> Booting with amdgpu.dpm=0 on 5.2.7 works.

Tom B., did you try booting with amdgpu.dpm=1 or amdgpu.dpm=2 (default is
generally -1 for automatic)? Seems like one of those might enable the new
experimental SW SMU v11 feature on Vega20 . . .

https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html

https://lists.freedesktop.org/archives/amd-gfx/2019-January/030788.html?print=anzwix

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1557 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (111 preceding siblings ...)
  2019-08-16 21:06 ` bugzilla-daemon
@ 2019-08-16 22:14 ` bugzilla-daemon
  2019-08-16 23:19 ` bugzilla-daemon
                   ` (64 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 22:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1887 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #109 from Tom B <tom@r.je> ---
Created attachment 145080
  --> https://bugs.freedesktop.org/attachment.cgi?id=145080&action=edit
dmesg with amdgpu.dpm=2

> Tom B., did you try booting with amdgpu.dpm=1 or amdgpu.dpm=2 (default is generally -1 for automatic)? Seems like one of those might enable the new experimental SW SMU v11 feature on Vega20 . . .

Now that is interesting.dpm=-1 is the same as default, and default is 1,
enabled so dpm=1 is what we've been using all along. But dpm=2 and the patch
you linked to are interesting.

I tried it, it didn't help the crashing issue and I was stuck at 30w. As soon
as I started sddm the system froze. I've attached my dmesg from amdgpu.dpm=2
boot. It doesn't fix the issue but it does help answer a few questions I had:


1. The functions in vega20_ppt.c are used with this new patch so that answers
my question from earlier, that's what this file is for and why it contains
similar/identical functions.

2. It explains the difference I found in comment 97: This commit
https://github.com/torvalds/linux/commit/94ed6d0cfdb867be9bf05f03d682980bce5d0036
has the new else block for smu_display_configuration_change which we now know
is the software version of this function.


More importantly, though, knowing that enabling DPM causes the crash, this
tells us either:

A) The bug is present in both versions of the vega20 code: vega20_hwmgr.c and
vega20_ppt.c or..

B) The card reaches an invalid state before DPM is initialised and the card is
fine until it receives a DPM change.

Given that two different versions of the code produce the same result, my hunch
is that the problem is B. The card is not in a state where it's able to receive
power changes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2944 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (112 preceding siblings ...)
  2019-08-16 22:14 ` bugzilla-daemon
@ 2019-08-16 23:19 ` bugzilla-daemon
  2019-08-17  1:47 ` bugzilla-daemon
                   ` (63 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-16 23:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2182 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #110 from ReddestDream <reddestdream@gmail.com> ---
> 1. The functions in vega20_ppt.c are used with this new patch so that answers my question from earlier, that's what this file is for and why it contains similar/identical functions.

I was hoping this was the case as the duplicated functions were confusing me
too. Glad we got this figured out! :)

> I tried it, it didn't help the crashing issue and I was stuck at 30w. As soon as I started sddm the system froze. I've attached my dmesg from amdgpu.dpm=2 boot. It doesn't fix the issue but it does help answer a few questions I had:

This is disappointing tho. I was hoping that setting amdgpu.dpm=2 would use the
more "actively developed" path and that would fix the issue. :/

> Given that two different versions of the code produce the same result, my hunch is that the problem is B. The card is not in a state where it's able to receive power changes.

I tend to agree, but it's still not clear why or how the card ends up in a bad
state when commands to it via smu_send_smc_msg_with_param seem to just suddenly
stop working. And given the amount of same/similar functions in vega20_hwmgr.c
and vega20_ppt.c it's hard to rule out A entirely.

Since amdgpu.dpm=0 resolves the issue (albeit at the cost of being stuck at
minimum clocks inherited from the VBIOS/GOP/UEFI/firmware), it seems that the
card is starting out in a reasonable state and then being thrown into a bad
state later by bad driver code. And that code is part of the DPM (Dynamic Power
Management) system. We are pretty confident that dpm_state.hard_min_level is
stable the whole time, so that's probably not what's throwing the card into a
bad state. But perhaps another value in the DPM table is . . . 

It doesn't make intuitive sense that the soft min/max values would be
problematic since they are presumably "more flexible," but it's possible that
they get calculated out of spec or something and logging them should be
possible like how dpm_state.hard_min_level was logged.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 3067 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (113 preceding siblings ...)
  2019-08-16 23:19 ` bugzilla-daemon
@ 2019-08-17  1:47 ` bugzilla-daemon
  2019-08-17  2:15 ` bugzilla-daemon
                   ` (62 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-17  1:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 899 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #111 from ReddestDream <reddestdream@gmail.com> ---
A few other ideas to ponder:

1. Looking into DPM, I found this commit for 5.1-rc1 that looks interesting:

https://github.com/torvalds/linux/commit/7ca881a8651bdeffd99ba8e0010160f9bf60673e

Looks like it exposes "ppfeatures" interface on Vega 10 and later GPU,
including some code for Vega 20.

2. I also found two interesting commits that pertain to "doorbell" register
initialization on Vega 20. Also from 5.1-rc1. Might be related to setting up
the GPU ASICs . I must admit I'm not exactly sure what these do . . .

https://github.com/torvalds/linux/commit/fd4855409f6ebe015406cd2b2ffa4fee4cd1f4a7

https://github.com/torvalds/linux/commit/828845b7c86c5338f6ca02aaaaf4b525718f31b2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1979 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (114 preceding siblings ...)
  2019-08-17  1:47 ` bugzilla-daemon
@ 2019-08-17  2:15 ` bugzilla-daemon
  2019-08-17  2:37 ` bugzilla-daemon
                   ` (61 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-17  2:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1215 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #112 from ReddestDream <reddestdream@gmail.com> ---
More ideas:

3. Looking through the crash in sehellion's comment 45:

gfx_v9_0_ring_test_ring+0x19e/0x230 [amdgpu]
amdgpu_ring_test_helper+0x1e/0x90 [amdgpu]
gfx_v9_0_hw_fini+0x299/0x690 [amdgpu]
amdgpu_device_ip_suspend_phase2+0x6c/0xa0 [amdgpu]
amdgpu_device_ip_suspend+0x44/0x80 [amdgpu]
amdgpu_device_pre_asic_reset+0x1ef/0x204 [amdgpu]
amdgpu_device_gpu_recover+0x7b/0x7a3 [amdgpu]
amdgpu_job_timedout+0xfc/0x120 [amdgpu]

We see gfx_v9_0_ring_test and gfx_v9_0_hw_fini which both come from:

https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c

There's a 5.1-rc1 commit in this file pertaining to a "wave ID mismatch" that
could cause deadlocks.

https://github.com/torvalds/linux/commit/41cca166cc57e75e94d888595a428d23a3bf4e36

Along with updated "golden values" for Vega in 5.1-rc1:

https://github.com/torvalds/linux/commit/919a94d8101ebc29868940b580fe9e9811b7dc86

https://github.com/torvalds/linux/commit/f7b1844bacecca96dd8d813675e4d8adec02cd66

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2434 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (115 preceding siblings ...)
  2019-08-17  2:15 ` bugzilla-daemon
@ 2019-08-17  2:37 ` bugzilla-daemon
  2019-08-17  3:16 ` bugzilla-daemon
                   ` (60 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-17  2:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1192 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #113 from ReddestDream <reddestdream@gmail.com> ---
4. 

> Given that two different versions of the code produce the same result, my hunch is that the problem is B. The card is not in a state where it's able to receive power changes.

Something to consider: In pretty much all the dmesg logs we see, amdgpu
attempts to reset the GPU, sometimes successfully, and yet it still can't
properly message the GPU afterward and we see the same sequence of failures
starting with "amdgpu: [powerplay] Failed to send message 0x28, response 0x0
amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min uclk failed!"

Eventually we start to see: "[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!"

This comes from:

https://github.com/torvalds/linux/commits/master/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

I'm not sure what the -125 error code indicates. My guess is ECANCELED
(Operation Cancelled) as the negated error code 125.

https://github.com/torvalds/linux/blob/master/include/uapi/asm-generic/errno.h

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2210 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (116 preceding siblings ...)
  2019-08-17  2:37 ` bugzilla-daemon
@ 2019-08-17  3:16 ` bugzilla-daemon
  2019-08-17 13:37 ` bugzilla-daemon
                   ` (59 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-17  3:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 457 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #114 from ReddestDream <reddestdream@gmail.com> ---
5. Tom B., it is probably worth getting a full dmesg with your two monitors in
on a relatively new 5.2.x kernel using at least: amdgpu.dc_log=1 drm.debug=0x1e
log_buf_len=2M

And anything else you might think of. Just to try to get more debug info. Thx!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1229 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (117 preceding siblings ...)
  2019-08-17  3:16 ` bugzilla-daemon
@ 2019-08-17 13:37 ` bugzilla-daemon
  2019-08-25 20:46 ` bugzilla-daemon
                   ` (58 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-17 13:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3190 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #115 from Tom B <tom@r.je> ---
I should have noted it earlier, but I had already tried reverting both "golden
values" commits. I've no idea what it does but it didn't fix this crash.

One thing that would be insightful would be logging every call to
smum_send_msg_to_smc_with_parameter and printing out message/parameter:

int smum_send_msg_to_smc_with_parameter(struct pp_hwmgr *hwmgr,
                                        uint16_t msg, uint32_t parameter)
{

This would cause a very busy log but we could see the last successful message
that was sent and with the same log in 5.0.13 see if there are any obvious
differences. It might be that the previous message causes the invalid state so
knowing what that is could lead us towards the solution.

I don't think I have time to try it today but if anyone is recompiling the code
adding

pr_err("msg: %d / parameter: %d\n", msg, parameter); 

to this function in smumgr.c would be a useful addition.

Also, wants to try re-compiling, here's a quick guide for arch:

1. Get the kernel sources using asp as described here:
https://wiki.archlinux.org/index.php/Kernel/Arch_Build_System navigate to the
created linux/repos/core-x86_64 directory. 

2. You will need to run makepkg -s once to get it to download the sources.

3. You can set the kernel version in PKGBUILD: e.g. _srcver=5.2.7-arch1 or
_srcver=5.0.13-arch1

4. If you want to revert one or more commits put it in the prepare() block
before local src:

  echo "$_kernelname" > localversion.20-pkgname

  git revert db64a2f43c1bc22c5ff2d22606000b8c3587d0ec --no-edit
  git revert f5e79735cab448981e245a41ee6cbebf0e334f61 --no-edit

  local src

It will open your editor, if you don't want to use vi set:


5. For making changes to the code you need to make a patch. Open the
src/archlinux-linux directory. The files you're interested in are in
drivers/drm/gpu/drm/amd/powerplay likely hwmgr/vega20_hwmgr.c Make your changes
to the code. You can't just re-run makepkg as it checks out the original
version of the code. After making changes, navigate to the archlinux-linux
directory and run git diff > ../../vii.patch

6. Add your patch to PKGBUILD source: 

source=(
  "$_srcname::git+https://git.archlinux.org/linux.git?signed#tag=v$_srcver"
  config         # the main kernel config file
  60-linux.hook  # pacman hook for depmod
  90-linux.hook  # pacman hook for initramfs regeneration
  linux.preset   # standard config files for mkinitcpio ramdisk
  vii.patch
)

7. I've been cheating with makepkg and getting it to skip hash checks as
otherwise you have to generate the sha256sums for each patch you create. This
is an extra step that only slows down testing. To compile/install run makepkg
-si --skipinteg

Because of the way makepkg works, it keeps the compiled code in the src
directory. That means that although the first compile will take a few minutes,
subsequent compiles will be a lot faster as it'll probably only be recompiling
vega20_hwmgr.c

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4134 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (118 preceding siblings ...)
  2019-08-17 13:37 ` bugzilla-daemon
@ 2019-08-25 20:46 ` bugzilla-daemon
  2019-08-25 20:47 ` bugzilla-daemon
                   ` (57 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-25 20:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1504 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #116 from ReddestDream <reddestdream@gmail.com> ---
Created attachment 145153
  --> https://bugs.freedesktop.org/attachment.cgi?id=145153&action=edit
dmesgAMD2Monitors

I've been doing a few tests. I looked into and compiled 5.3-rc5 along with
these patches, but nothing seemed to resolve our multimonitor issue. :/

https://phoronix.com/scan.php?page=news_item&px=AMDGPU-Multi-Monitor-vRAM-Clock

I've also gotten some dmesg output with 5.2.9 with amdgpu.dc_log=1
drm.debug=0x1e log_buf_len=2M. Turns out that amdgpu.dc_log=1 does nothing on
this kernel, but I didn't know this when I ran the tests. The interesting added
data appears to be coming from drm.debug=0x1e.

I have two (physically) identical LG 24UD58-B 4K60 monitors connected via DP.
One test was done with both monitors connected to Radeon VII, and the other was
done using my stable Intel+Radeon VII setup where one monitor is connected to
Radeon VII and the other is connected to the Intel iGPU (HD 630, also via DP at
4K60).

These dmesg dumps were taken with all DMs/DEs/Graphics disabled in order to
limit interference. The system was booted to a text commandline at native
resolution.

Since 5.3 isn't changing anything, I plan to do a recompile of 5.2.9 (or 5.2.10
if it's out for Arch) with the smum_send_msg_to_smc_with_parameter patch
suggested by Tom B.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2504 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (119 preceding siblings ...)
  2019-08-25 20:46 ` bugzilla-daemon
@ 2019-08-25 20:47 ` bugzilla-daemon
  2019-08-25 23:01 ` bugzilla-daemon
                   ` (56 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-25 20:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #117 from ReddestDream <reddestdream@gmail.com> ---
Created attachment 145154
  --> https://bugs.freedesktop.org/attachment.cgi?id=145154&action=edit
AMDInteliGPUBoot

Also find my stable Intel iGPU + AMD Graphics config dmesg here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1280 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (120 preceding siblings ...)
  2019-08-25 20:47 ` bugzilla-daemon
@ 2019-08-25 23:01 ` bugzilla-daemon
  2019-08-26  3:20 ` bugzilla-daemon
                   ` (55 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-25 23:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1052 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #118 from ReddestDream <reddestdream@gmail.com> ---
So, this is a crazy idea, but ironically I think it might be getting closer to
the truth.

Tom B. attempted reverting ad51c46eec739c18be24178a30b47801b10e0357, which was
known to cause some issue with an RX 580. He found that doing so fixed the
multimonitor crash but locked the card to the lowest possible memory speed,
which really isn't acceptable.

Perhaps our issue seem is connected to insufficient or improperly calculated
PCIe bandwidth/speed. Speed mismatches can and will cause messages to not go
through to the peripheral. It's also well-known that Radeon VII was originally
a PCIe 4.0 card that AMD locked down to the 3.0 speeds . . .

What if when using multiple monitors and/or higher clock speeds Radeon VII uses
more bandwidth than Linux expects, causing the loss of communication?

Something else I plan to investigate.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1824 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (121 preceding siblings ...)
  2019-08-25 23:01 ` bugzilla-daemon
@ 2019-08-26  3:20 ` bugzilla-daemon
  2019-08-26  3:21 ` bugzilla-daemon
                   ` (54 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-26  3:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 904 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #119 from ReddestDream <reddestdream@gmail.com> ---
Created attachment 145158
  --> https://bugs.freedesktop.org/attachment.cgi?id=145158&action=edit
DebugAMD2Monitors

>I don't think I have time to try it today but if anyone is recompiling the code adding
>pr_err("msg: %d / parameter: %d\n", msg, parameter); 
>to this function in smumgr.c would be a useful addition.


So, I've done just this. I also added a speed/width check to
amdgpu_device_get_min_pci_speed_width in amdgpu_device.c to check the values of
cur_speed and cur_width.

I ran two checks with 5.2.9, one with two monitors on Radeon VII and another
with my stable 1 monitor on each Radeon VII and Intel iGPU.

Please find them attached.

Thanks so much for all your help!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (122 preceding siblings ...)
  2019-08-26  3:20 ` bugzilla-daemon
@ 2019-08-26  3:21 ` bugzilla-daemon
  2019-08-26  3:47 ` bugzilla-daemon
                   ` (53 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-26  3:21 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 348 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #120 from ReddestDream <reddestdream@gmail.com> ---
Created attachment 145159
  --> https://bugs.freedesktop.org/attachment.cgi?id=145159&action=edit
DebugAMDiGPU

Also here is the AMD + iGPU one.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1236 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (123 preceding siblings ...)
  2019-08-26  3:21 ` bugzilla-daemon
@ 2019-08-26  3:47 ` bugzilla-daemon
  2019-08-27 21:56 ` bugzilla-daemon
                   ` (52 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-26  3:47 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1597 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #121 from ReddestDream <reddestdream@gmail.com> ---
Some observations:

1. Nothing at all seems to be up with cur_speed and cur_width. They get set
several times in a row in both runs, but the values are all the same in both.

2. I can't really see anything up with msg/parameter either. When I compare
them to each other nothing seems particularly wacky. And we also have an
instance in my AMD+iGPU run where we see msg/parameter after "[drm] Initialized
amdgpu", so the theory that all messages have to be sent before Initialization
is complete must be wrong.

Now the real question is if we can decode what these msg/parameter values mean.
But it looks more likely to me that vega20_hwmgr.c and vega20_ppt.c are just
bugged somewhere (probably in the same way since they seem to be alternate
versions of each) and that the rest of the amdgpu code is (relatively) fine.

I'm thinking we'll have to go through and knock out/debug pretty much
everything in those files until we figure out where the breakage is. That's
about 3000-4000 lines of code in each of those two files tho. So any thoughts
anyone has about where we should start would be helpful. My focus will probably
be on UCLK (since it seems to break first), SCLK (since it gets set to 0 MHz
when there's multiple displays), DCEFCLK, and basically anything else that
smells like it might control the memory clock and/or be affected by multiple
monitors.

Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2379 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (124 preceding siblings ...)
  2019-08-26  3:47 ` bugzilla-daemon
@ 2019-08-27 21:56 ` bugzilla-daemon
  2019-08-31  0:11 ` bugzilla-daemon
                   ` (51 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-27 21:56 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1173 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #122 from ReddestDream <reddestdream@gmail.com> ---
Tested 5.3-rc6. Still has the same issues. Only it's maybe actually worse
because I lose display completely when I use amdgpu.dpm=2 w/Radeon VII
multimonitor on 5.3-rc6, whereas on 5.2.9 I just got same/similar errors to
default.

I'm working a kernel fork of 5.3-rc6 where I'm reverting various things and
adding things in from Vega 10/12 and Navi to see if it helps. Haven't compiled
and tested it yet but since I know 5.3-rc6 itself boots, compiles, and
demonstrates the issue I guess it's a good base until 5.3 releases.

https://github.com/ReddestDream/linux

Any ideas anyone has are appreciated.

For now I actually find that amdgpu.dpm=0 with both 4K monitors on Radeon VII
allows for much snappier generic desktop than my previous setup with AMD+iGPU.
It's amazing how well this card runs 4K displays w/o any proper memory clock
management at all. I'm sure the gaming performance would be pretty bad tho, but
I have Windows for that for now . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1997 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (125 preceding siblings ...)
  2019-08-27 21:56 ` bugzilla-daemon
@ 2019-08-31  0:11 ` bugzilla-daemon
  2019-09-03 16:46 ` bugzilla-daemon
                   ` (50 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-08-31  0:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 823 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #123 from ReddestDream <reddestdream@gmail.com> ---
A few interesting fixes that touch vega20_hwmgr.c have rolled in from
drm-fixes:

The first is likely the most interesting for our issues, as it touches
min/maxes (tho only the soft ones it seems). The other two are related to SMU
versions.

https://github.com/torvalds/linux/commit/83e09d5bddbee749fc83063890244397896a1971

https://github.com/torvalds/linux/commit/21649c0b6b7899f4fa3099c46d3d027f60b107ec

https://github.com/torvalds/linux/commit/23b7f6c41d4717b1638eca47e09d7e99fc7b9fd9

I haven't tested them out yet, but it does give me some hope that someone is
still looking at Vega 20/Radeon VII . . .

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1883 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (126 preceding siblings ...)
  2019-08-31  0:11 ` bugzilla-daemon
@ 2019-09-03 16:46 ` bugzilla-daemon
  2019-09-18  9:52 ` bugzilla-daemon
                   ` (49 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-03 16:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 487 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #124 from ReddestDream <reddestdream@gmail.com> ---
Created attachment 145254
  --> https://bugs.freedesktop.org/attachment.cgi?id=145254&action=edit
Dmesg 5.3-rc7 w/ Two monitors

This issue is still not fixed on 5.3-rc7. I guess we will probably have to wait
until 5.4 (the next LTS) before more people take a look at this issue. :(

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1409 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (127 preceding siblings ...)
  2019-09-03 16:46 ` bugzilla-daemon
@ 2019-09-18  9:52 ` bugzilla-daemon
  2019-09-18 11:36 ` bugzilla-daemon
                   ` (48 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-18  9:52 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1488 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #125 from Adrian Brown <aide.brown@googlemail.com> ---
I am also getting frequent crashes with a Radeon VII on Kubuntu 19.10 (kernel
5.0.0-29-generic). I see there is some discussion in this thread about it
possibly being related to multiple monitors. But I don't think that's the case.
I have a single monitor but it is old with only a dual link DVI connection. So
I am using displayport on the GPU but connected to an active adapter to convert
DP to a dual link DVI connection (my monitor is a Dell 3007WFP running at
2560x1600).

I often get crashes soon after boot. They tend to happen in clusters so it
crashes a few times, then stays stable for a short time and then crashes again.
I don't get these crashes on the same system when dual booted into Windows 10
so the hardware itself seems good. 

One thing worth mentioning is that on Windows 10 I occasionally get a black
screen and the monitor goes off for a couple of seconds. It then comes back to
life. Apparently this is not uncommon and the suspicion in the Windows
community is that AMD drivers sometimes crash but Windows recovers (I never had
this with my Vega 64, only with the Radeon VII). It most likely is a completely
different issue of course, but thought it worth mentioning.

Still hoping for a fix at some point. Also happy to help test any fix.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2263 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (128 preceding siblings ...)
  2019-09-18  9:52 ` bugzilla-daemon
@ 2019-09-18 11:36 ` bugzilla-daemon
  2019-09-20 19:12 ` bugzilla-daemon
                   ` (47 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-18 11:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1125 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #126 from ReddestDream <reddestdream@gmail.com> ---
@Adrian Brown Your Linux issue is potentially related to the active adapter.
Have you tried w/o it?

On Windows, the flickering on/around login, at least for me, has been mostly
resolved by using the latest AMD driver + Windows 10 1903 and all the recent
updates. There was a Windows update about a month ago that resolved a lot of
flickering issues by fixing a bug in Windows's 10-bit color support.

Also, if you are using Ubuntu, it might be worth downgrading to 18.04.3 so that
you can use the Radeon Software for Linux Driver:

https://www.amd.com/en/support/graphics/amd-radeon-2nd-generation-vega/amd-radeon-2nd-generation-vega/amd-radeon-vii

Currently, I hear that using AMD's driver + a supported distro is the best way
to get stability out of Radeon VII. And it's something I will probably end up
trying myself if there's no resolution to the issues forthcoming with 5.4,
which will be the new LTS.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2032 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (129 preceding siblings ...)
  2019-09-18 11:36 ` bugzilla-daemon
@ 2019-09-20 19:12 ` bugzilla-daemon
  2019-09-20 19:13 ` bugzilla-daemon
                   ` (46 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-20 19:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 466 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #127 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Tom B from comment #15)
> Have been running 5.0 since release without issue but upgraded this morning
> and got crashes as described here within a few seconds of boot. 
>

Can you bisect between 5.0 and 5.1 and see what commit caused the regression?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1315 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (130 preceding siblings ...)
  2019-09-20 19:12 ` bugzilla-daemon
@ 2019-09-20 19:13 ` bugzilla-daemon
  2019-09-21 15:02 ` bugzilla-daemon
                   ` (45 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-20 19:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 435 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #128 from Alex Deucher <alexdeucher@gmail.com> ---
Do these patches help?
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes&id=c46e5df4ac898108da66a880c4e18f69c74f6c1b
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes&id=c02d6a161395dfc0c2fdabb9e976a229017288d8

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1462 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (131 preceding siblings ...)
  2019-09-20 19:13 ` bugzilla-daemon
@ 2019-09-21 15:02 ` bugzilla-daemon
  2019-09-21 15:12 ` bugzilla-daemon
                   ` (44 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2134 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #129 from Tom B <tom@r.je> ---
Thank you Alex! That has fixed it! The card is now correctly setting its
voltages and clocks. I applied the patch to 5.3.1

However, I've noticed a few very minor problems that are probably worth
reporting.

1. I still get this in dmesg:


[    6.307005] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    6.307006] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
[    9.225192] amdgpu 0000:44:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on sdma0 (-110).
[   10.238621] amdgpu 0000:44:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on page0 (-110).
[   10.532004] amdgpu: [powerplay] Failed to send message 0x26, response 0x0
[   10.532005] amdgpu: [powerplay] Failed to set soft min gfxclk !
[   10.532006] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!


Though this doesn't really matter, we were focussing our issue there earlier in
the thread as it looked like `Set hard min uclk failed!` was the cause of the
problem, obviously it isn't.

2. This repeats indefinitely in dmesg:

[  332.575747] [drm] schedsdma0 is not ready, skipping
[  332.582657] [drm] schedsdma0 is not ready, skipping
[  332.582864] [drm] schedsdma0 is not ready, skipping
[  332.708848] [drm] schedsdma0 is not ready, skipping
[  332.715975] [drm] schedsdma0 is not ready, skipping
[  332.716229] [drm] schedsdma0 is not ready, skipping
[  332.756987] [drm] schedsdma0 is not ready, skipping
[  332.763970] [drm] schedsdma0 is not ready, skipping
[  332.764169] [drm] schedsdma0 is not ready, skipping


As you can see several dozens of times second this gets written to dmesg. This
might be because the patches are intended to be used on 5.4?

3. The lowest wattage now seems to be 33w rather than 23w which means increased
idle power usage and temps. This isn't really a problem but I thought it was
worth mentioning and is a fair tradeoff for stability.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2885 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (132 preceding siblings ...)
  2019-09-21 15:02 ` bugzilla-daemon
@ 2019-09-21 15:12 ` bugzilla-daemon
  2019-09-21 15:25 ` bugzilla-daemon
                   ` (43 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 681 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #130 from Anthony Rabbito <ted437@gmail.com> ---
(In reply to Alex Deucher from comment #128)
> Do these patches help?
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes&id=c46e5df4ac898108da66a880c4e18f69c74f6c1b
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes&id=c02d6a161395dfc0c2fdabb9e976a229017288d8

I will try to apply these patches in a few hours.Though I must say in 5.3
things have been much better. Not perfect and I haven't tried triple monitor
yet, but definitely improvement

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1683 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (133 preceding siblings ...)
  2019-09-21 15:12 ` bugzilla-daemon
@ 2019-09-21 15:25 ` bugzilla-daemon
  2019-09-21 15:38 ` bugzilla-daemon
                   ` (42 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 401 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #131 from Tom B <tom@r.je> ---
In addition to my previous comment, [drm] schedsdma0 is not ready, skipping
repeating indefinitely stops after a suspend/resume. After the machine is
resumed these stop appearing but it does suspend and resume correctly.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1152 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (134 preceding siblings ...)
  2019-09-21 15:25 ` bugzilla-daemon
@ 2019-09-21 15:38 ` bugzilla-daemon
  2019-09-21 15:57 ` bugzilla-daemon
                   ` (41 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 568 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #132 from Anthony Rabbito <ted437@gmail.com> ---
Created attachment 145458
  --> https://bugs.freedesktop.org/attachment.cgi?id=145458&action=edit
linux-mainline5.3 dmesg without patches

Here's my current dmesg with two out of three monitors running without the
patches Alex provided. I'm currently compiling the kernel with his patches to
look at the differences and see if I can get my third monitor to boot up.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1507 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (135 preceding siblings ...)
  2019-09-21 15:38 ` bugzilla-daemon
@ 2019-09-21 15:57 ` bugzilla-daemon
  2019-09-21 15:59 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:57 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 413 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #133 from Anthony Rabbito <ted437@gmail.com> ---
Created attachment 145459
  --> https://bugs.freedesktop.org/attachment.cgi?id=145459&action=edit
dsmeg log with Alex's patches

Here's my dsmeg with Alex's patches. Going to mess around and see what I can
find.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1332 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (136 preceding siblings ...)
  2019-09-21 15:57 ` bugzilla-daemon
@ 2019-09-21 15:59 ` bugzilla-daemon
  2019-09-21 19:54 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 15:59 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 264 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #134 from Anthony Rabbito <ted437@gmail.com> ---
Wow ! All three of my monitors are working again. 2560x1440 @ 144Hz

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1037 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (137 preceding siblings ...)
  2019-09-21 15:59 ` bugzilla-daemon
@ 2019-09-21 19:54 ` bugzilla-daemon
  2019-09-21 20:04 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 19:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 369 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #135 from Adrian Brown <aide.brown@googlemail.com> ---
@reddestdream Thanks. I don't think the active adapter is the problem as it
works perfectly with my Vega 64. However I will try 18.04 and AMD's driver as
suggested.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1148 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (138 preceding siblings ...)
  2019-09-21 19:54 ` bugzilla-daemon
@ 2019-09-21 20:04 ` bugzilla-daemon
  2019-09-22 21:36 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-21 20:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1627 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #136 from tom91136@gmail.com ---
Been following this thread for a while now as I just got 3 4k 60Hz monitors
connected to the 3 DP ports on my Radeon VII. 
I'm getting the exact same errors discussed in this report with matching dmesg
outputs.

I've applied the patches to Fedora 31's 5.3.0-3 kernel and everything now works
perfectly!

Just a few notes:

* Idle power draw before patch was 22W in lm_sensors, now it's reading 28W,
makes sense as the memory is now properly clocked. This also loosely matches
@Tom B's results.

* I did not get the repeated `[drm] schedsdma0 is not ready, skipping` in
dmesg, however, it is still possible to trigger a freeze by toggling dpms:

    xset dpms force off

Resulting in:

[  155.431068] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[  155.431070] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
[  161.334003] amdgpu: [powerplay] Failed to send message 0x26, response 0x0
[  161.334004] amdgpu: [powerplay] Failed to set soft min gfxclk !
[  161.334005] amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
[  164.622060] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[  164.622062] amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!


Previously, without the patch, the machine hangs. With the patch, the display
freezes for a few seconds and then power off. Mouse movement correctly turns on
all screen and everything is back to normal.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2378 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (139 preceding siblings ...)
  2019-09-21 20:04 ` bugzilla-daemon
@ 2019-09-22 21:36 ` bugzilla-daemon
  2019-09-22 21:38 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-22 21:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1209 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #137 from sehellion@gmail.com ---
(In reply to Alex Deucher from comment #128)
> Do these patches help?
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes&id=c46e5df4ac898108da66a880c4e18f69c74f6c1b
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-
> fixes&id=c02d6a161395dfc0c2fdabb9e976a229017288d8

Yes, these patches fix the problem. 

amdgpu: [powerplay] Failed to send message 0x28, response 0x0
amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!
amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed
on sdma0 (-110).
amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed
on page0 (-110).
amdgpu: [powerplay] Failed to send message 0x26, response 0x0
amdgpu: [powerplay] Failed to set soft min gfxclk !
amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed
on sdma1 (-110).
[drm:process_one_work] *ERROR* ib ring test failed (-110).

In general system is stable.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2191 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (140 preceding siblings ...)
  2019-09-22 21:36 ` bugzilla-daemon
@ 2019-09-22 21:38 ` bugzilla-daemon
  2019-09-23  4:09 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-22 21:38 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 325 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #138 from sehellion@gmail.com ---
Created attachment 145461
  --> https://bugs.freedesktop.org/attachment.cgi?id=145461&action=edit
5.3.1 with Alex's patches and dual monitors

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1252 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (141 preceding siblings ...)
  2019-09-22 21:38 ` bugzilla-daemon
@ 2019-09-23  4:09 ` bugzilla-daemon
  2019-09-23  4:11 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23  4:09 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 420 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #139 from sehellion@gmail.com ---
Today, when trying to wake up the monitors, the system crashed again. 

WARNING: CPU: 4 PID: 32 at
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:1720
decide_link_settings+0xe0/0x2a0 [amdgpu]

Full dmesg log has updated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1169 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (142 preceding siblings ...)
  2019-09-23  4:09 ` bugzilla-daemon
@ 2019-09-23  4:11 ` bugzilla-daemon
  2019-09-23 14:19 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23  4:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 603 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

sehellion@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #145461|0                           |1
        is obsolete|                            |

--- Comment #140 from sehellion@gmail.com ---
Created attachment 145463
  --> https://bugs.freedesktop.org/attachment.cgi?id=145463&action=edit
5.3.1 with Alex's patches and dual monitors, crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2035 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (143 preceding siblings ...)
  2019-09-23  4:11 ` bugzilla-daemon
@ 2019-09-23 14:19 ` bugzilla-daemon
  2019-09-23 14:20 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23 14:19 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 428 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Alex Deucher <alexdeucher@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #145463|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1088 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (144 preceding siblings ...)
  2019-09-23 14:19 ` bugzilla-daemon
@ 2019-09-23 14:20 ` bugzilla-daemon
  2019-09-23 15:40 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23 14:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 377 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #141 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to sehellion from comment #140)
> Created attachment 145463 [details]
> 5.3.1 with Alex's patches and dual monitors, crash

That's not a crash, it's just a warning.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1479 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (145 preceding siblings ...)
  2019-09-23 14:20 ` bugzilla-daemon
@ 2019-09-23 15:40 ` bugzilla-daemon
  2019-09-23 15:43 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23 15:40 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 717 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #142 from sehellion@gmail.com ---
(In reply to Alex Deucher from comment #141)
> (In reply to sehellion from comment #140)
> > Created attachment 145463 [details]
> > 5.3.1 with Alex's patches and dual monitors, crash
> 
> That's not a crash, it's just a warning.

But system hangs after. Today it happened twice. When I try to resume work,
monitors turn on, then the secondary shows that there is no signal, and the
primary shows a black screen. But perhaps this is not related to this bug. I
can connect via ssh and see logs when this happens, if necessary.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1854 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (146 preceding siblings ...)
  2019-09-23 15:40 ` bugzilla-daemon
@ 2019-09-23 15:43 ` bugzilla-daemon
  2019-09-23 16:04 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23 15:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 415 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #143 from Tom B <tom@r.je> ---
I'm not sure how KDE handles monitor power behind the scenes but I have an
uptime of 2 days now since applying the patches and with KDE I've let it turn
off the monitors at least 6 or 7 times and suspend/resume 3 times without
issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1166 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (147 preceding siblings ...)
  2019-09-23 15:43 ` bugzilla-daemon
@ 2019-09-23 16:04 ` bugzilla-daemon
  2019-09-24  9:44 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-23 16:04 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 382 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #144 from sehellion@gmail.com ---
I also think this is strange. Since yesterday, they turned off and on many
times successfully without any problems. Most likely, it's connected with
something else, but I don’t know where to find.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1131 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (148 preceding siblings ...)
  2019-09-23 16:04 ` bugzilla-daemon
@ 2019-09-24  9:44 ` bugzilla-daemon
  2019-09-27 14:46 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-24  9:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 275 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #145 from tom91136@gmail.com ---
@Alex any plans for the patches to be merged for 5.4 or even backported to 5.3
at some point?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1026 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (149 preceding siblings ...)
  2019-09-24  9:44 ` bugzilla-daemon
@ 2019-09-27 14:46 ` bugzilla-daemon
  2019-09-27 15:12 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-27 14:46 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 409 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #146 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to tom91136 from comment #145)
> @Alex any plans for the patches to be merged for 5.4 or even backported to
> 5.3 at some point?

Already merged to 5.4.  I'll take a look at older kernels as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1260 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (150 preceding siblings ...)
  2019-09-27 14:46 ` bugzilla-daemon
@ 2019-09-27 15:12 ` bugzilla-daemon
  2019-09-27 15:13 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-27 15:12 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 322 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #147 from ReddestDream <reddestdream@gmail.com> ---
> Already merged to 5.4.  I'll take a look at older kernels as well.

@Alex Deucher Thanks so much for all your help! :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1129 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (151 preceding siblings ...)
  2019-09-27 15:12 ` bugzilla-daemon
@ 2019-09-27 15:13 ` bugzilla-daemon
  2019-09-29 19:25 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-27 15:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 609 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Anthony Rabbito <ted437@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #148 from Anthony Rabbito <ted437@gmail.com> ---
Everyone's contribution is very much appreciated ! I can finally go back to
using my workstation. Alex, thank you

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2087 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (152 preceding siblings ...)
  2019-09-27 15:13 ` bugzilla-daemon
@ 2019-09-29 19:25 ` bugzilla-daemon
  2019-09-29 19:28 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-29 19:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 865 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

linedot@xcpp.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |linedot@xcpp.org

--- Comment #149 from linedot@xcpp.org ---
Created attachment 145581
  --> https://bugs.freedesktop.org/attachment.cgi?id=145581&action=edit
5.3.1 plus Alex's patches, kde wayland crash, then kde xorg crash

This issue is not fixed for me with Alex's patches.

I use only a single monitor via DP. Running a patched 5.3.1 kernel. Attached is
a dmesg log: First a wayland KDE session crashes, I kill all user processes and
restart sddm and start a KDE Xorg session, which later also crashes.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2420 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (153 preceding siblings ...)
  2019-09-29 19:25 ` bugzilla-daemon
@ 2019-09-29 19:28 ` bugzilla-daemon
  2019-09-29 19:30 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-29 19:28 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 635 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

linedot@xcpp.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #145581|0                           |1
        is obsolete|                            |

--- Comment #150 from linedot@xcpp.org ---
Created attachment 145582
  --> https://bugs.freedesktop.org/attachment.cgi?id=145582&action=edit
5.3.1 patched, wayland crash

Sorry, the file got messed up, here is the wayland crash

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2089 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (154 preceding siblings ...)
  2019-09-29 19:28 ` bugzilla-daemon
@ 2019-09-29 19:30 ` bugzilla-daemon
  2019-09-30 20:20 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-29 19:30 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 357 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #151 from linedot@xcpp.org ---
Created attachment 145583
  --> https://bugs.freedesktop.org/attachment.cgi?id=145583&action=edit
5.3.1 patched, xorg crash

And here is a dmesg of just an X session crashing

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1294 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (155 preceding siblings ...)
  2019-09-29 19:30 ` bugzilla-daemon
@ 2019-09-30 20:20 ` bugzilla-daemon
  2019-10-01 23:44 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-09-30 20:20 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 532 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #152 from ReddestDream <reddestdream@gmail.com> ---
Kernel 5.4-rc1, the first kernel version that includes the Vega 20 patches
noted by Alex Deucher, is now out and in linux-mainline on Arch Linux AUR. :)

I plan to do some testing of this version over the next few days, and it might
be worth it for people who are still having issues to confirm on this version
as well. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1356 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (156 preceding siblings ...)
  2019-09-30 20:20 ` bugzilla-daemon
@ 2019-10-01 23:44 ` bugzilla-daemon
  2019-10-03  6:54 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-01 23:44 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 306 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #153 from ReddestDream <reddestdream@gmail.com> ---
Just FYI, it appears that kernel 5.3.2 does not have the Vega 20 fix commits
that Alex Deucher mentioned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1130 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (157 preceding siblings ...)
  2019-10-01 23:44 ` bugzilla-daemon
@ 2019-10-03  6:54 ` bugzilla-daemon
  2019-10-04 12:43 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-03  6:54 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 384 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #154 from linedot@xcpp.org ---
Created attachment 145623
  --> https://bugs.freedesktop.org/attachment.cgi?id=145623&action=edit
5.4.0-rc1 hangup

dmesg with 5.4.0-rc1.

System freezes and becomes unresponsive to input like before

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1303 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (158 preceding siblings ...)
  2019-10-03  6:54 ` bugzilla-daemon
@ 2019-10-04 12:43 ` bugzilla-daemon
  2019-10-06 14:16 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-04 12:43 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1278 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #155 from ReddestDream <reddestdream@gmail.com> ---
So, I've done some tests with 5.4-rc1 and it seems like I'm getting similar
results to linedot@xcpp.org and sehellion@gmail.com. I'm using GNOME with
Wayland (which works fine with only 1 display). Sometimes it works for a while.
Sometimes I can't see the mouse cursor. Sometimes I get glitches all over the
screen containing pieces and parts of previous framebuffers. But, I mean, it's
better than 5.3 was, which was so bad I never could see anything and I would
get stuck on blackscreen. At least on 5.4-rc1 I've been able to manually switch
to a virtual console and reboot rather than force a reboot with the power
button.

Still hoping for some fix for this, but it's become less important to me as
further improvements to GNOME and MESA have made the Radeon VII + iGPU setup
I've been using run smoother. I've also discovered further issues on Windows
regarding the high memory clock when using multiple monitors with Radeon VII,
and it's been affecting performance there too. I'm considering just sticking
with 1 monitor only with for this machine/card. lol

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (159 preceding siblings ...)
  2019-10-04 12:43 ` bugzilla-daemon
@ 2019-10-06 14:16 ` bugzilla-daemon
  2019-10-06 16:39 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-06 14:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 741 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #156 from Tom B <tom@r.je> ---
This is strange because with a patched 5.3.1, I have perfect stability. An
uptime of over a week and no issues. Are you saying that the issue comes back
in 5.4? Hopefully not as Linux 5.4 + Mesa 19.3 looks to have a nice performance
bump on the VII. 

With the patches, do you see the card boosting correctly? Do the wattage,
voltage and clocks change under load? Asking an obvious question here, but is
the crash temperature related? Maybe the patches increase power and overheat.
If so, it might explain why I'm not affected as my card is water cooled.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1544 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (160 preceding siblings ...)
  2019-10-06 14:16 ` bugzilla-daemon
@ 2019-10-06 16:39 ` bugzilla-daemon
  2019-10-06 17:06 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-06 16:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 332 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #157 from ReddestDream <reddestdream@gmail.com> ---
@Tom B. Well, some good news. Kernel 5.3.4 should have the patches for Radeon
VII included now. I'll do some more tests on that ...

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1160 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (161 preceding siblings ...)
  2019-10-06 16:39 ` bugzilla-daemon
@ 2019-10-06 17:06 ` bugzilla-daemon
  2019-10-06 17:07 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-06 17:06 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 840 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #158 from ReddestDream <reddestdream@gmail.com> ---
More good news. It seems that 5.3.4 does work for me and doesn't (at least
immediately since I'm typing this from there right now) fall apart into a
glitchy mess.

I'm still not really sure of the complete stability of things tho because we do
still see our old friend: "amdgpu: [powerplay] Failed to send message 0x28,
response 0x0, amdgpu: [powerplay] [SetHardMinFreq] Set hard min uclk failed!"
in dmesg. So, AFAICT, there's still something wrong. It's just more stable than
it was before.

But yeah. This is the first time since I've gotten this card that I've been
able to boot to a DE w/o crashing and w/o disabling dpm. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1674 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (162 preceding siblings ...)
  2019-10-06 17:06 ` bugzilla-daemon
@ 2019-10-06 17:07 ` bugzilla-daemon
  2019-10-10 12:50 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-06 17:07 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 387 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #159 from ReddestDream <reddestdream@gmail.com> ---
Oh. Also,

cat /sys/kernel/debug/dri/0/amdgpu_pm_info

Now seems to work on 5.3.4 with more than one monitor in. It doesn't report
nonsense values like 0 watts like it did before. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1211 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (163 preceding siblings ...)
  2019-10-06 17:07 ` bugzilla-daemon
@ 2019-10-10 12:50 ` bugzilla-daemon
  2019-10-12 23:34 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-10 12:50 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 312 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #160 from ReddestDream <reddestdream@gmail.com> ---
Well, today I had a hard freeze using more than one display with Radeon VII.
Back to Radeon VII + iGPU . . . :(

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1136 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (164 preceding siblings ...)
  2019-10-10 12:50 ` bugzilla-daemon
@ 2019-10-12 23:34 ` bugzilla-daemon
  2019-10-14  9:15 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-12 23:34 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1747 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #161 from Gargoyle <g@rgoyle.com> ---
Hi there. I've been trying to solve some lockups and pauses with my system and
have just read this entire thread. 

The good news is that I am another Radeon VII owner having the same problems
and I am willing to do whatever I can to help.

My current situation is:-

- I'm running dual 2560x1440@60Hz via display port.

- I am running the beta of ubuntu 19:10 (Linux ryzen1910 5.3.0-18-generic
#19-Ubuntu SMP Tue Oct 8 20:14:06 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux).

- I don't push the R:VII at all under Linux. I boot into Windows 10 to play
games.

- I have disabled IOMMU in BIOS/EFI. With IOMMU enabled things are MUCH worse.

- My system is mostly stable. If the displays blank, sometimes after waking
them I get the 15-30 second freeze. Then the "amdgpu [powerplay] Failed..."
messages and then everything continues ok. I can semi-reliably recreate this by
using the "xset dpms force off" command someone posted earlier. I've not
managed to find any kind of pattern yet, but 8 out of 10 times running that
command and then waking the system with a keypress/mouse click will cause the
freeze.

- I use X11 and not wayland. Not sure that is significant, but with Ubuntu
19:10 it seems wayland is started temporarily and then stopped during boot /
starting gdm. If I enable IOMMU my GDM login screen will be completely corrupt.
However, if I press enter (to select my user) and enter my password, my X11
gnome session starts. Although there are LOTS of pauses and warnings and errors
all over the place in "journalctl -f".

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2591 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (165 preceding siblings ...)
  2019-10-12 23:34 ` bugzilla-daemon
@ 2019-10-14  9:15 ` bugzilla-daemon
  2019-10-14 10:39 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-14  9:15 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 892 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #162 from linedot@xcpp.org ---
Created attachment 145730
  --> https://bugs.freedesktop.org/attachment.cgi?id=145730&action=edit
Freeze/Black screen/Crash on 5.3.6

Apologies, I have been on vacation and thus away from my main System.

Attached is the dmesg log of another crash with kernel version 5.3.6. Here is a
description of what the crash looked like:
1) Successfully booted up to login manager
2) Logged into a graphical session
3) Shortly after, the screen freezes
4) Screen flashes to black (~5-10 sec)
5) Screen flashes back to the frozen desktop (~5-10 sec)
6) Screen goes black (not off), no response to input, switching to tty doesn't
work. I was able to ssh into the machine from a laptop and get the dmesg
output.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1847 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (166 preceding siblings ...)
  2019-10-14  9:15 ` bugzilla-daemon
@ 2019-10-14 10:39 ` bugzilla-daemon
  2019-10-14 11:37 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-14 10:39 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 325 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #163 from Tom B <tom@r.je> ---
Gargoyle, linedot, can you confirm whether this crash is with both patches
applied?

I'm still on 5.3.1 patched and haven't had a single crash.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1128 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (167 preceding siblings ...)
  2019-10-14 10:39 ` bugzilla-daemon
@ 2019-10-14 11:37 ` bugzilla-daemon
  2019-10-14 17:05 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-14 11:37 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 612 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #164 from linedot@xcpp.org ---
(In reply to Tom B from comment #163)
> Gargoyle, linedot, can you confirm whether this crash is with both patches
> applied?
> 
> I'm still on 5.3.1 patched and haven't had a single crash.

For 5.3.1 I've built the kernel with the arch build system and manually added
lines to apply the two patches to PKGBUILD and also have seen them being
applied in the log.

For 5.3.6 I've checked that the patches are already applied.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1489 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (168 preceding siblings ...)
  2019-10-14 11:37 ` bugzilla-daemon
@ 2019-10-14 17:05 ` bugzilla-daemon
  2019-10-19 17:35 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-14 17:05 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 993 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #165 from Tom B <tom@r.je> ---
I just tried 5.3.5 (which is the latest in the arch repo) and it's working fine
for me.

I do have an issue on Wayland. If the screen turns off, Wayland crashes and I
have to hard reset. The log shows 

Oct 14 17:48:56 desktop kernel: amdgpu: [powerplay] [SetHardMinFreq] Set hard
min uclk failed!
Oct 14 17:49:02 desktop kernel: amdgpu: [powerplay] Failed to send message
0x26, response 0x0
Oct 14 17:49:02 desktop kernel: amdgpu: [powerplay] Failed to set soft min
gfxclk !
Oct 14 17:49:02 desktop kernel: amdgpu: [powerplay] Failed to upload DPM Bootup
Levels!


But, this also shows on boot so I'm not sure it's a problem and it seems to be
wayland that segfaults, not an issue with amdgpu. 

I do still get `kernel: [drm] schedsdma0 is not ready, skipping` repeating
forever in my journal.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1796 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (169 preceding siblings ...)
  2019-10-14 17:05 ` bugzilla-daemon
@ 2019-10-19 17:35 ` bugzilla-daemon
  2019-10-20 18:27 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-19 17:35 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 668 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #166 from Peter Hercek <phercek@gmail.com> ---
I tried, 5.3.6-arch1-1 on archlinux with 3 DP monitors. It should contain the
patch based on the comment from linedot@xcpp.org.

I got the crash after 4 days of use. It looks the same as before:
ring sdma0 timeout, gpu reset (allegedly successful), many skipped IBs, and
failure to initialize parser for ever.

The situation looked like this from my experience: with each new kernel the
error got worse and worse; 5.3.6 improved it a lot, but it is still not fixed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1533 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (170 preceding siblings ...)
  2019-10-19 17:35 ` bugzilla-daemon
@ 2019-10-20 18:27 ` bugzilla-daemon
  2019-10-21  8:11 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-20 18:27 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 663 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #167 from Alex Deucher <alexdeucher@gmail.com> ---
(In reply to Peter Hercek from comment #166)
> I got the crash after 4 days of use. It looks the same as before:
> ring sdma0 timeout, gpu reset (allegedly successful), many skipped IBs, and
> failure to initialize parser for ever.

The parser error just means you need to restart your desktop environment.  At
the moment no desktop managers properly handle GPU resets (recreate their
context and buffers) so you need to restart your desktop to get it back.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1565 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (171 preceding siblings ...)
  2019-10-20 18:27 ` bugzilla-daemon
@ 2019-10-21  8:11 ` bugzilla-daemon
  2019-11-10 16:36 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-10-21  8:11 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 655 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #168 from linedot@xcpp.org ---
Created attachment 145784
  --> https://bugs.freedesktop.org/attachment.cgi?id=145784&action=edit
5.3.7: Fence fallback timer expired on ring <x>

Here is a freeze which went a bit differently. 
This time the system is frozen without any blinking and there are tons of
messages like:

[ 2940.919451] [drm] Fence fallback timer expired on ring page1

This is on 5.3.7-arch1-1

(Also I'm using only one single monitor connected through DP, as opposed to the
others)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1654 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (172 preceding siblings ...)
  2019-10-21  8:11 ` bugzilla-daemon
@ 2019-11-10 16:36 ` bugzilla-daemon
  2019-11-10 17:45 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-11-10 16:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 678 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #169 from picard12@live.de ---
I am using a Radeon VII with Arch Linux, a 1440p144hz and a 4K60Hz monitor, and
I had similar crashes to the others here if I tried running the 1440p144hz
monitor at 144hz, at 60hz it was stable. This behavior stayed all the way from
kernel 5.0 up to 5.3, and only stopped when I started using kernel 5.4.0
(5.4.0-rc6-mainline right now). Now I can run it at 144hz without crashes.

The driver still isn't working that well, as games seem very stuttery, but at
least it doesn't crash anymore.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (173 preceding siblings ...)
  2019-11-10 16:36 ` bugzilla-daemon
@ 2019-11-10 17:45 ` bugzilla-daemon
  2019-11-26 12:03 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-11-10 17:45 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 5621 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #170 from Peter Hercek <phercek@gmail.com> ---
Maybe this helps since there is a stack trace. GUI stopped to respond so I shut
it down over ssh. A kernel crash during the shutdown on 5.3.6-arch1-1-ARCH even
when amdgpu.dpm=0. That is the option which is supposed to work. It has both
the patch and also amdgpu.dpm=0.

Nov 04 17:38:58 phnm kernel: ------------[ cut here ]------------
Nov 04 17:38:58 phnm kernel: WARNING: CPU: 6 PID: 640 at
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:5804
amdgpu_dm_atomic_commit_tail.cold+0x82/0xed [amdgpu]
Nov 04 17:38:58 phnm kernel: Modules linked in: fuse xt_CHECKSUM xt_MASQUERADE
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter tun
bridge cfg80211 rfkill 8021q garp mrp stp llc intel_rapl_msr intel_rapl_common
amdgpu x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel hid_microsoft
radeon mousedev input_leds joydev ff_memless kvm gpu_sched
snd_hda_codec_realtek snd_hda_codec_generic i2c_algo_bit irqbypass
ledtrig_audio ttm crct10dif_pclmul snd_hda_intel crc32_pclmul hid_generic
ghash_clmulni_intel cdc_acm drm_kms_helper snd_hda_codec aesni_intel usbhid
iTCO_wdt iTCO_vendor_support snd_hda_core wmi_bmof aes_x86_64 hid crypto_simd
cryptd mxm_wmi snd_hwdep glue_helper drm intel_cstate snd_pcm agpgart r8169
syscopyarea intel_uncore sysfillrect realtek sysimgblt snd_timer pcspkr
i2c_i801 fb_sys_fops e1000e intel_rapl_perf
Nov 04 17:38:58 phnm kernel:  mei_me snd libphy mei soundcore lpc_ich wmi evdev
mac_hid sg ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2
crc32c_intel firewire_ohci xhci_pci xhci_hcd firewire_core ehci_pci crc_itu_t
ehci_hcd sr_mod cdrom sd_mod ahci libahci libata scsi_mod
Nov 04 17:38:58 phnm kernel: CPU: 6 PID: 640 Comm: Xorg Not tainted
5.3.6-arch1-1-ARCH #1
Nov 04 17:38:58 phnm kernel: Hardware name: System manufacturer System Product
Name/P9X79, BIOS 4502 10/15/2013
Nov 04 17:38:58 phnm kernel: RIP:
0010:amdgpu_dm_atomic_commit_tail.cold+0x82/0xed [amdgpu]
Nov 04 17:38:58 phnm kernel: Code: c7 c7 08 1e db c0 e8 0f 59 a0 db 0f 0b 41 83
7c 24 08 00 0f 85 92 ff f1 ff e9 ad ff f1 ff 48 c7 c7 08 1e db c0 e8 f0 58 a0
db <0f> 0b e9 32 f5 f1 ff 48 8b 85 00 fd ff ff 4c 89 f2 48 c7 c6 0d 0f
Nov 04 17:38:58 phnm kernel: RSP: 0018:ffffa98c410475a0 EFLAGS: 00010046
Nov 04 17:38:58 phnm kernel: RAX: 0000000000000024 RBX: ffff894125e06000 RCX:
0000000000000000
Nov 04 17:38:58 phnm kernel: RDX: 0000000000000000 RSI: 0000000000000003 RDI:
00000000ffffffff
Nov 04 17:38:58 phnm kernel: RBP: ffffa98c410478c0 R08: 000016b622fb648e R09:
ffffffff9deb3254
Nov 04 17:38:58 phnm kernel: R10: 0000000000000616 R11: 000000000001d890 R12:
0000000000000286
Nov 04 17:38:58 phnm kernel: R13: ffff8940f30b0400 R14: ffff894129c20000 R15:
ffff894075ba6a00
Nov 04 17:38:58 phnm kernel: FS:  00007fbf9c35c500(0000)
GS:ffff89413fb80000(0000) knlGS:0000000000000000
Nov 04 17:38:58 phnm kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 17:38:58 phnm kernel: CR2: 0000559991d31420 CR3: 000000082a644002 CR4:
00000000000606e0
Nov 04 17:38:58 phnm kernel: Call Trace:
Nov 04 17:38:58 phnm kernel:  ? commit_tail+0x3c/0x70 [drm_kms_helper]
Nov 04 17:38:58 phnm kernel:  commit_tail+0x3c/0x70 [drm_kms_helper]
Nov 04 17:38:58 phnm kernel:  drm_atomic_helper_commit+0x108/0x110
[drm_kms_helper]
Nov 04 17:38:58 phnm kernel:  drm_client_modeset_commit_atomic+0x1e8/0x200
[drm]
Nov 04 17:38:58 phnm kernel:  drm_client_modeset_commit_force+0x50/0x150 [drm]
Nov 04 17:38:58 phnm kernel:  drm_fb_helper_pan_display+0xc2/0x200
[drm_kms_helper]
Nov 04 17:38:58 phnm kernel:  fb_pan_display+0x83/0x100
Nov 04 17:38:58 phnm kernel:  fb_set_var+0x1e8/0x3d0
Nov 04 17:38:58 phnm kernel:  fbcon_blank+0x1dd/0x290
Nov 04 17:38:58 phnm kernel:  do_unblank_screen+0x98/0x130
Nov 04 17:38:58 phnm kernel:  vt_ioctl+0xeff/0x1290
Nov 04 17:38:58 phnm kernel:  tty_ioctl+0x37b/0x900
Nov 04 17:38:58 phnm kernel:  ? preempt_count_add+0x68/0xa0
Nov 04 17:38:58 phnm kernel:  do_vfs_ioctl+0x43d/0x6c0
Nov 04 17:38:58 phnm kernel:  ? syscall_trace_enter+0x1f2/0x2e0
Nov 04 17:38:58 phnm kernel:  ksys_ioctl+0x5e/0x90
Nov 04 17:38:58 phnm kernel:  __x64_sys_ioctl+0x16/0x20
Nov 04 17:38:58 phnm kernel:  do_syscall_64+0x5f/0x1c0
Nov 04 17:38:58 phnm kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 04 17:38:58 phnm kernel: RIP: 0033:0x7fbf9d7b425b
Nov 04 17:38:58 phnm kernel: Code: 0f 1e fa 48 8b 05 25 9c 0c 00 64 c7 00 26 00
00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f5 9b 0c 00 f7 d8 64 89 01 48
Nov 04 17:38:58 phnm kernel: RSP: 002b:00007ffe21162798 EFLAGS: 00000246
ORIG_RAX: 0000000000000010
Nov 04 17:38:58 phnm kernel: RAX: ffffffffffffffda RBX: 000055d93ebf5180 RCX:
00007fbf9d7b425b
Nov 04 17:38:58 phnm kernel: RDX: 0000000000000000 RSI: 0000000000004b3a RDI:
000000000000000c
Nov 04 17:38:58 phnm kernel: RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000007
Nov 04 17:38:58 phnm kernel: R10: fffffffffffff4b4 R11: 0000000000000246 R12:
ffffffffffffffff
Nov 04 17:38:58 phnm kernel: R13: 000055d93ebfa4a0 R14: 00007ffe21162968 R15:
0000000000000000
Nov 04 17:38:58 phnm kernel: ---[ end trace 40ade9cecd96ffc0 ]---

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 6452 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (174 preceding siblings ...)
  2019-11-10 17:45 ` bugzilla-daemon
@ 2019-11-26 12:03 ` bugzilla-daemon
  2019-11-26 14:14 ` bugzilla-daemon
  2019-11-26 23:13 ` bugzilla-daemon
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-11-26 12:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 721 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #171 from linedot@xcpp.org ---
Created attachment 146026
  --> https://bugs.freedesktop.org/attachment.cgi?id=146026&action=edit
5.4.0-arch1-1 GPU initialization fails

With kernel version 5.4.0-arch1-1 the GPU can flat out no longer be
initialized.

My system is now completely unusable with the current kernel.

Does this specifically mean anything?
[   15.575361] amdgpu: [powerplay] smu driver if version = 0x00000013, smu fw
if version = 0x00000012, smu fw version = 0x00282d00 (40.45.0)
[   15.575362] amdgpu: [powerplay] SMU driver if version not matched

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1684 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (175 preceding siblings ...)
  2019-11-26 12:03 ` bugzilla-daemon
@ 2019-11-26 14:14 ` bugzilla-daemon
  2019-11-26 23:13 ` bugzilla-daemon
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-11-26 14:14 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 428 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

Alex Deucher <alexdeucher@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #146026|text/x-log                  |text/plain
          mime type|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1114 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

* [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
  2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
                   ` (176 preceding siblings ...)
  2019-11-26 14:14 ` bugzilla-daemon
@ 2019-11-26 23:13 ` bugzilla-daemon
  177 siblings, 0 replies; 179+ messages in thread
From: bugzilla-daemon @ 2019-11-26 23:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 267 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #172 from linedot@xcpp.org ---
I had dpm=2 as a module option. GPU initialization failure does not occur
without dpm=2

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1062 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 179+ messages in thread

end of thread, other threads:[~2019-11-26 23:13 UTC | newest]

Thread overview: 179+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14  5:55 [Bug 110674] Crashes / Resets From AMDGPU / Radeon VII bugzilla-daemon
2019-05-14  5:55 ` bugzilla-daemon
2019-05-14  5:56 ` bugzilla-daemon
2019-05-14  5:56 ` bugzilla-daemon
2019-05-14  9:04 ` bugzilla-daemon
2019-05-14  9:05 ` bugzilla-daemon
2019-05-14  9:20 ` bugzilla-daemon
2019-05-14  9:34 ` bugzilla-daemon
2019-05-14 15:32 ` bugzilla-daemon
2019-05-15  2:15 ` bugzilla-daemon
2019-05-15  2:16 ` bugzilla-daemon
2019-05-15  3:05 ` bugzilla-daemon
2019-05-15  3:09 ` bugzilla-daemon
2019-05-15  3:09 ` bugzilla-daemon
2019-05-15  3:10 ` bugzilla-daemon
2019-05-15  3:10 ` bugzilla-daemon
2019-05-19  9:36 ` bugzilla-daemon
2019-05-19  9:39 ` bugzilla-daemon
2019-05-19 14:27 ` bugzilla-daemon
2019-05-19 17:52 ` bugzilla-daemon
2019-05-19 20:30 ` bugzilla-daemon
2019-05-19 20:53 ` bugzilla-daemon
2019-05-19 22:04 ` bugzilla-daemon
2019-05-19 22:05 ` bugzilla-daemon
2019-05-19 22:14 ` bugzilla-daemon
2019-05-19 22:19 ` bugzilla-daemon
2019-05-19 22:28 ` bugzilla-daemon
2019-05-19 22:37 ` bugzilla-daemon
2019-05-19 23:02 ` bugzilla-daemon
2019-05-19 23:05 ` bugzilla-daemon
2019-05-19 23:18 ` bugzilla-daemon
2019-05-19 23:49 ` bugzilla-daemon
2019-05-21  7:38 ` bugzilla-daemon
2019-05-21  8:11 ` bugzilla-daemon
2019-05-21  9:42 ` bugzilla-daemon
2019-05-30 16:15 ` bugzilla-daemon
2019-06-03 11:39 ` bugzilla-daemon
2019-06-03 14:57 ` bugzilla-daemon
2019-06-04  4:19 ` bugzilla-daemon
2019-06-04  4:21 ` bugzilla-daemon
2019-06-15 16:58 ` bugzilla-daemon
2019-06-15 16:59 ` bugzilla-daemon
2019-06-15 22:15 ` bugzilla-daemon
2019-06-16 16:05 ` bugzilla-daemon
2019-06-16 16:08 ` bugzilla-daemon
2019-06-17 10:18 ` bugzilla-daemon
2019-06-21 20:17 ` bugzilla-daemon
2019-06-21 20:18 ` bugzilla-daemon
2019-06-22  4:19 ` bugzilla-daemon
2019-06-22  4:20 ` bugzilla-daemon
2019-07-08 12:29 ` bugzilla-daemon
2019-07-25  5:36 ` bugzilla-daemon
2019-07-26  1:19 ` bugzilla-daemon
2019-07-26  1:24 ` bugzilla-daemon
2019-07-26  3:19 ` bugzilla-daemon
2019-07-28  5:20 ` bugzilla-daemon
2019-07-29 10:52 ` bugzilla-daemon
2019-07-29 19:25 ` bugzilla-daemon
2019-07-29 21:40 ` bugzilla-daemon
2019-07-31 15:37 ` bugzilla-daemon
2019-07-31 17:09 ` bugzilla-daemon
2019-07-31 17:13 ` bugzilla-daemon
2019-08-03 12:10 ` bugzilla-daemon
2019-08-03 12:31 ` bugzilla-daemon
2019-08-03 13:35 ` bugzilla-daemon
2019-08-08 14:37 ` bugzilla-daemon
2019-08-10 12:10 ` bugzilla-daemon
2019-08-10 13:02 ` bugzilla-daemon
2019-08-10 13:14 ` bugzilla-daemon
2019-08-10 13:15 ` bugzilla-daemon
2019-08-10 13:29 ` bugzilla-daemon
2019-08-10 16:39 ` bugzilla-daemon
2019-08-10 19:00 ` bugzilla-daemon
2019-08-11  1:15 ` bugzilla-daemon
2019-08-11 15:26 ` bugzilla-daemon
2019-08-11 17:00 ` bugzilla-daemon
2019-08-11 18:43 ` bugzilla-daemon
2019-08-11 18:45 ` bugzilla-daemon
2019-08-11 22:31 ` bugzilla-daemon
2019-08-11 23:44 ` bugzilla-daemon
2019-08-12  3:12 ` bugzilla-daemon
2019-08-12  3:29 ` bugzilla-daemon
2019-08-12  5:18 ` bugzilla-daemon
2019-08-12  5:58 ` bugzilla-daemon
2019-08-12 13:21 ` bugzilla-daemon
2019-08-12 14:34 ` bugzilla-daemon
2019-08-12 15:34 ` bugzilla-daemon
2019-08-12 15:42 ` bugzilla-daemon
2019-08-12 15:53 ` bugzilla-daemon
2019-08-12 15:56 ` bugzilla-daemon
2019-08-12 16:32 ` bugzilla-daemon
2019-08-12 16:38 ` bugzilla-daemon
2019-08-12 16:47 ` bugzilla-daemon
2019-08-12 16:57 ` bugzilla-daemon
2019-08-12 17:40 ` bugzilla-daemon
2019-08-12 18:37 ` bugzilla-daemon
2019-08-13  3:15 ` bugzilla-daemon
2019-08-13  3:33 ` bugzilla-daemon
2019-08-13 13:05 ` bugzilla-daemon
2019-08-13 13:35 ` bugzilla-daemon
2019-08-13 15:20 ` bugzilla-daemon
2019-08-13 17:11 ` bugzilla-daemon
2019-08-13 18:33 ` bugzilla-daemon
2019-08-14 15:44 ` bugzilla-daemon
2019-08-14 17:30 ` bugzilla-daemon
2019-08-16  5:58 ` bugzilla-daemon
2019-08-16 10:10 ` bugzilla-daemon
2019-08-16 10:35 ` bugzilla-daemon
2019-08-16 10:41 ` bugzilla-daemon
2019-08-16 13:10 ` bugzilla-daemon
2019-08-16 13:18 ` bugzilla-daemon
2019-08-16 14:17 ` bugzilla-daemon
2019-08-16 21:06 ` bugzilla-daemon
2019-08-16 22:14 ` bugzilla-daemon
2019-08-16 23:19 ` bugzilla-daemon
2019-08-17  1:47 ` bugzilla-daemon
2019-08-17  2:15 ` bugzilla-daemon
2019-08-17  2:37 ` bugzilla-daemon
2019-08-17  3:16 ` bugzilla-daemon
2019-08-17 13:37 ` bugzilla-daemon
2019-08-25 20:46 ` bugzilla-daemon
2019-08-25 20:47 ` bugzilla-daemon
2019-08-25 23:01 ` bugzilla-daemon
2019-08-26  3:20 ` bugzilla-daemon
2019-08-26  3:21 ` bugzilla-daemon
2019-08-26  3:47 ` bugzilla-daemon
2019-08-27 21:56 ` bugzilla-daemon
2019-08-31  0:11 ` bugzilla-daemon
2019-09-03 16:46 ` bugzilla-daemon
2019-09-18  9:52 ` bugzilla-daemon
2019-09-18 11:36 ` bugzilla-daemon
2019-09-20 19:12 ` bugzilla-daemon
2019-09-20 19:13 ` bugzilla-daemon
2019-09-21 15:02 ` bugzilla-daemon
2019-09-21 15:12 ` bugzilla-daemon
2019-09-21 15:25 ` bugzilla-daemon
2019-09-21 15:38 ` bugzilla-daemon
2019-09-21 15:57 ` bugzilla-daemon
2019-09-21 15:59 ` bugzilla-daemon
2019-09-21 19:54 ` bugzilla-daemon
2019-09-21 20:04 ` bugzilla-daemon
2019-09-22 21:36 ` bugzilla-daemon
2019-09-22 21:38 ` bugzilla-daemon
2019-09-23  4:09 ` bugzilla-daemon
2019-09-23  4:11 ` bugzilla-daemon
2019-09-23 14:19 ` bugzilla-daemon
2019-09-23 14:20 ` bugzilla-daemon
2019-09-23 15:40 ` bugzilla-daemon
2019-09-23 15:43 ` bugzilla-daemon
2019-09-23 16:04 ` bugzilla-daemon
2019-09-24  9:44 ` bugzilla-daemon
2019-09-27 14:46 ` bugzilla-daemon
2019-09-27 15:12 ` bugzilla-daemon
2019-09-27 15:13 ` bugzilla-daemon
2019-09-29 19:25 ` bugzilla-daemon
2019-09-29 19:28 ` bugzilla-daemon
2019-09-29 19:30 ` bugzilla-daemon
2019-09-30 20:20 ` bugzilla-daemon
2019-10-01 23:44 ` bugzilla-daemon
2019-10-03  6:54 ` bugzilla-daemon
2019-10-04 12:43 ` bugzilla-daemon
2019-10-06 14:16 ` bugzilla-daemon
2019-10-06 16:39 ` bugzilla-daemon
2019-10-06 17:06 ` bugzilla-daemon
2019-10-06 17:07 ` bugzilla-daemon
2019-10-10 12:50 ` bugzilla-daemon
2019-10-12 23:34 ` bugzilla-daemon
2019-10-14  9:15 ` bugzilla-daemon
2019-10-14 10:39 ` bugzilla-daemon
2019-10-14 11:37 ` bugzilla-daemon
2019-10-14 17:05 ` bugzilla-daemon
2019-10-19 17:35 ` bugzilla-daemon
2019-10-20 18:27 ` bugzilla-daemon
2019-10-21  8:11 ` bugzilla-daemon
2019-11-10 16:36 ` bugzilla-daemon
2019-11-10 17:45 ` bugzilla-daemon
2019-11-26 12:03 ` bugzilla-daemon
2019-11-26 14:14 ` bugzilla-daemon
2019-11-26 23:13 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.