All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 109649] [raven] gfx ring timeout when running clover apps
@ 2019-02-15 23:02 bugzilla-daemon
  2019-02-28  9:25 ` [Bug 109649] [bisected][raven] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-02-15 23:02 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1281 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

            Bug ID: 109649
           Summary: [raven] gfx ring timeout when running clover apps
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: jv356@scarletmail.rutgers.edu

This is a regression in 4.20.x, the same userspace works ok on 4.19.
I could bisect, but it's my main machine so I can't quite dedicate the time,
any hint would be appreciated.
The kernel is booted using iommu=soft. full iommu hangs on boot, and noimmu
disables the wi-fi.

Dmesg:
> [  702.207054] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1340, emitted seq=1342
> [  702.207061] [drm] GPU recovery disabled.

lspci -nn:
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc.
[AMD/ATI] Device [1002:15dd] (rev c4)

It's a thinkpad e485 laptop with:
AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx (family: 0x17, model: 0x11,
stepping: 0x0)

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2612 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
@ 2019-02-28  9:25 ` bugzilla-daemon
  2019-02-28  9:36 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-02-28  9:25 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2376 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

Jan Vesely <jv356@scarletmail.rutgers.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[raven] gfx ring timeout    |[bisected][raven] gfx ring
                   |when running clover apps    |timeout when running clover
                   |                            |apps
                 CC|                            |christian.koenig@amd.com

--- Comment #1 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
Bisection shows that the first bad commit is:
commit 09b6f25b55d9c66af7302e1f09ad90aa5b1dfbcb (HEAD, refs/bisect/bad)
Author: Christian König <christian.koenig@amd.com>
Date:   Wed Aug 15 14:04:47 2018 +0200

    drm/amdgpu: fix VM size reporting on Raven

    Raven doesn't have an VCE block and so also no buggy VCE firmware.

    Signed-off-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Huang Rui <ray.huang@amd.com>
    Acked-by: Chunming Zhou <david1.zhou@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

I guess there is other buggy firmware/limitation?

# cat /sys/kernel/debug/dri/0/amdgpu_firmware_info 
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 40, firmware version: 0x00000099
PFP feature version: 40, firmware version: 0x000000ae
CE feature version: 40, firmware version: 0x0000004d
RLC feature version: 1, firmware version: 0x0000d237
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
MEC feature version: 40, firmware version: 0x0000018b
MEC2 feature version: 40, firmware version: 0x0000018b
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x0017ba78
SMC feature version: 0, firmware version: 0x00001e49
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x01004912
VBIOS version: 113-RAVEN-106

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 4157 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
  2019-02-28  9:25 ` [Bug 109649] [bisected][raven] " bugzilla-daemon
@ 2019-02-28  9:36 ` bugzilla-daemon
  2019-02-28 16:03 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-02-28  9:36 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 278 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #2 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
I've confirmed that reverting the change on top of 4.20.13 fixes the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1089 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
  2019-02-28  9:25 ` [Bug 109649] [bisected][raven] " bugzilla-daemon
  2019-02-28  9:36 ` bugzilla-daemon
@ 2019-02-28 16:03 ` bugzilla-daemon
  2019-03-08  2:51 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-02-28 16:03 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 241 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #3 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
The bug is still present in 5.0.0-rc8.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1052 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-02-28 16:03 ` bugzilla-daemon
@ 2019-03-08  2:51 ` bugzilla-daemon
  2019-03-08  4:01 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-03-08  2:51 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1357 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #4 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
The issue appears fixed with new firmware, but now the laptop won't suspend.

# cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
VCE feature version: 0, firmware version: 0x00000000
UVD feature version: 0, firmware version: 0x00000000
MC feature version: 0, firmware version: 0x00000000
ME feature version: 40, firmware version: 0x00000099
PFP feature version: 40, firmware version: 0x000000ae
CE feature version: 40, firmware version: 0x0000004d
RLC feature version: 1, firmware version: 0x0000d237
RLC SRLC feature version: 1, firmware version: 0x00000001
RLC SRLG feature version: 1, firmware version: 0x00000001
RLC SRLS feature version: 1, firmware version: 0x00000001
MEC feature version: 40, firmware version: 0x0000018b
MEC2 feature version: 40, firmware version: 0x0000018b
SOS feature version: 0, firmware version: 0x00000000
ASD feature version: 0, firmware version: 0x0017ba78
SMC feature version: 0, firmware version: 0x00001e49
SDMA0 feature version: 41, firmware version: 0x000000a9
VCN feature version: 0, firmware version: 0x01004912
DMCU feature version: 0, firmware version: 0x00000001
VBIOS version: 113-RAVEN-106

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2168 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-03-08  2:51 ` bugzilla-daemon
@ 2019-03-08  4:01 ` bugzilla-daemon
  2019-03-11  3:41 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-03-08  4:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 829 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #5 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
since the sysfs does not show fw difference, here's the change in files:
$ diff old_fw new_fw 
8,9c8
- e2ddb912bf242e3b1b4219b36a19bff7  /lib/firmware/amdgpu/raven2_rlc.bin
- 27168d5b60ef396926a2aa0e2da00a97  /lib/firmware/amdgpu/raven2_sdma1.bin
---
+ 4ac07f88b9c4aa4fe026be87cb16ceda  /lib/firmware/amdgpu/raven2_rlc.bin


(In reply to Jan Vesely from comment #4)
> The issue appears fixed with new firmware, but now the laptop won't suspend.

The same workaround as before fixes the suspend/resume issue.

drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:709
+                      vm_size = min(vm_size, 1ULL << 40);

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1717 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-03-08  4:01 ` bugzilla-daemon
@ 2019-03-11  3:41 ` bugzilla-daemon
  2019-05-07  6:01 ` bugzilla-daemon
  2019-11-19  9:13 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-03-11  3:41 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 512 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #6 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
I managed to get IOMMU working by passing "amd_iommu=pt
ivrs_ioapic[32]=00:14.0" on the kernel commandline.
Now it's back to square one.
all clover kernels hang the GPU unless I limit VM size to 'vm_size =
min(vm_size, 1ULL << 40);'
otherwise the machine works (including 3d graphics and suspend/resume).

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1339 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-03-11  3:41 ` bugzilla-daemon
@ 2019-05-07  6:01 ` bugzilla-daemon
  2019-11-19  9:13 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-05-07  6:01 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 343 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

--- Comment #7 from Jan Vesely <jv356@scarletmail.rutgers.edu> ---
The workaround is still necessary in kernel 5.1.0.
The failure mode is a bit different, it hangs just the application, not entire
machine.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 1154 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 109649] [bisected][raven] gfx ring timeout when running clover apps
  2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-05-07  6:01 ` bugzilla-daemon
@ 2019-11-19  9:13 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-11-19  9:13 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 805 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109649

Martin Peres <martin.peres@free.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |MOVED
             Status|NEW                         |RESOLVED

--- Comment #8 from Martin Peres <martin.peres@free.fr> ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/698.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 2401 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-19  9:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 23:02 [Bug 109649] [raven] gfx ring timeout when running clover apps bugzilla-daemon
2019-02-28  9:25 ` [Bug 109649] [bisected][raven] " bugzilla-daemon
2019-02-28  9:36 ` bugzilla-daemon
2019-02-28 16:03 ` bugzilla-daemon
2019-03-08  2:51 ` bugzilla-daemon
2019-03-08  4:01 ` bugzilla-daemon
2019-03-11  3:41 ` bugzilla-daemon
2019-05-07  6:01 ` bugzilla-daemon
2019-11-19  9:13 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.