dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Bug 206141] New: VCE UVD ring test failed -110
@ 2020-01-09  9:18 bugzilla-daemon
  2020-01-11 22:02 ` [Bug 206141] " bugzilla-daemon
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-09  9:18 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

            Bug ID: 206141
           Summary: VCE UVD ring test failed -110
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.4.6
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: low
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: janpieter.sollie@dommel.be
        Regression: No

Created attachment 286705
  --> https://bugzilla.kernel.org/attachment.cgi?id=286705&action=edit
part of DMESG that looks relevant

while booting my PC, amdgpu complains it cannot execute UVD and VCE ring tests
on Fiji GPU (R9 Nano).
The error code -110 points to a timeout.
Maybe R9 Nano needs more time to initialize the UVD decoder in UEFI mode?
-------------------
[    7.270335] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on uvd (-110).
[    8.063987] Generic FE-GE Realtek PHY r8169-500:00: attached PHY driver
[Generic FE-GE Realtek PHY] (mii_bus:phy_addr=r8169-500:00, irq=IGNORE)
[    8.211306] r8169 0000:05:00.0 eth0: Link is Down
[    8.400079] amdgpu 0000:0a:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR*
IB test failed on vce0 (-110).
[    8.400084] [drm:process_one_work] *ERROR* ib ring test failed (-110).
--------------------
Please note that this is not critical to me at all: I only use this card to
perform OpenCL calculations via RoCm, so couldn't care less, but I still think
it's worth to have a look at.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
@ 2020-01-11 22:02 ` bugzilla-daemon
  2020-01-11 22:03 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-11 22:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #1 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
tried to increase timeout value (timeout << 1) but it did not help. moving on
...

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
  2020-01-11 22:02 ` [Bug 206141] " bugzilla-daemon
@ 2020-01-11 22:03 ` bugzilla-daemon
  2020-01-13 13:08 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-11 22:03 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

Janpieter Sollie (janpieter.sollie@dommel.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.4.6                       |5.4.10

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
  2020-01-11 22:02 ` [Bug 206141] " bugzilla-daemon
  2020-01-11 22:03 ` bugzilla-daemon
@ 2020-01-13 13:08 ` bugzilla-daemon
  2020-01-13 16:39 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-13 13:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #2 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Tried to make ALL functions where ETIMEDOUT is specified (in
drivers/gpu/amd/amdgpu/) with timeout << 2, but nothing.  Am I looking at the
wrong function here?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-01-13 13:08 ` bugzilla-daemon
@ 2020-01-13 16:39 ` bugzilla-daemon
  2020-01-14  6:23 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-13 16:39 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

Alex Deucher (alexdeucher@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexdeucher@gmail.com

--- Comment #3 from Alex Deucher (alexdeucher@gmail.com) ---
The relevant function is amdgpu_ib_ring_tests(), however, if the relevant
engines are in some bad state, increasing the timeout won't help.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-01-13 16:39 ` bugzilla-daemon
@ 2020-01-14  6:23 ` bugzilla-daemon
  2020-01-14 19:27 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-14  6:23 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #4 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Hi Alex, 
Thank you for the feedback. Tried that one as well. 
When I already multiplied the timeout by 4, guess it will be the bad state
then.
Is there any way I could reset the state before the ring tests begin?
FYI: I'm tried different firmware revisions already, some of them seem to
impact the situation: sometimes UVD fails, sometimes VCE fails, but none of
them seem  to work for both.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-01-14  6:23 ` bugzilla-daemon
@ 2020-01-14 19:27 ` bugzilla-daemon
  2020-01-14 19:50 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-14 19:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

Thong Thai (thong.thai@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |thong.thai@amd.com

--- Comment #5 from Thong Thai (thong.thai@amd.com) ---
Hi Janpieter,

I was unsuccessful in trying to recreate your issue. 
- Running Linux stable, 5.4.10, verified the IB tests are running
- Same video card, same VBIOS, same firmware
- Tried with/without a display connected
- Tried with/with rocm-dkms

Trying to see if I'm missing anything else, what motherboard / CPU are using?
and do you have any special kernal parameters you're using?

---

[    4.702561] [drm] initializing kernel modesetting (FIJI 0x1002:0x7300
0x1002:0x0B36 0xCA).
[    4.702569] [drm] register mmio base: 0xFCF00000
[    4.702569] [drm] register mmio size: 262144
[    4.702577] [drm] add ip block number 0 <vi_common>
[    4.702578] [drm] add ip block number 1 <gmc_v8_0>
[    4.702578] [drm] add ip block number 2 <tonga_ih>
[    4.702579] [drm] add ip block number 3 <gfx_v8_0>
[    4.702580] [drm] add ip block number 4 <sdma_v3_0>
[    4.702580] [drm] add ip block number 5 <powerplay>
[    4.702581] [drm] add ip block number 6 <dm>
[    4.702582] [drm] add ip block number 7 <uvd_v6_0>
[    4.702582] [drm] add ip block number 8 <vce_v3_0>
[    4.702758] amdgpu 0000:07:00.0: No more image in the PCI ROM
[    4.702777] ATOM BIOS: 113-C8820200-107
[    4.702789] [drm] UVD is enabled in physical mode
[    4.702789] [drm] VCE enabled in physical mode
[    4.702811] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment
size is 9-bit
[    4.702818] amdgpu 0000:07:00.0: VRAM: 4096M 0x000000F400000000 -
0x000000F4FFFFFFFF (4096M used)
[    4.702819] amdgpu 0000:07:00.0: GART: 1024M 0x000000FF00000000 -
0x000000FF3FFFFFFF
[    4.702823] [drm] Detected VRAM RAM=4096M, BAR=256M
[    4.702824] [drm] RAM width 512bits HBM
[    4.702876] [TTM] Zone  kernel: Available graphics memory: 8201336 KiB
[    4.702877] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[    4.702877] [TTM] Initializing pool allocator
[    4.702880] [TTM] Initializing DMA pool allocator
[    4.702907] [drm] amdgpu: 4096M of VRAM memory ready
[    4.702909] [drm] amdgpu: 4096M of GTT memory ready.
[    4.702925] [drm] GART: num cpu pages 262144, num gpu pages 262144
[    4.702988] [drm] PCIE GART of 1024M enabled (table at 0x000000F4001D5000).
[    4.743132] [drm] Chained IB support enabled!
[    4.764245] amdgpu: [powerplay] hwmgr_sw_init smu backed is fiji_smu
[    4.768880] [drm] Found UVD firmware Version: 1.91 Family ID: 12
[    4.768885] [drm] UVD ENC is disabled
[    4.772092] [drm] Found VCE firmware Version: 55.2 Binary ID: 3
[    4.837386] [drm] dce110_link_encoder_construct: Failed to get
encoder_cap_info from VBIOS with error code 4!
[    4.848088] [drm] Display Core initialized with v3.2.48!
[    4.848823] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    4.848824] [drm] Driver supports precise vblank timestamp query.
[    4.874937] [drm] UVD initialized successfully.
[    4.974949] [drm] VCE initialized successfully.
[    4.976370] [drm] Cannot find any crtc or sizes
[    4.978116] [drm] Initialized amdgpu 3.35.0 20150101 for 0000:07:00.0 on
minor 0

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-01-14 19:27 ` bugzilla-daemon
@ 2020-01-14 19:50 ` bugzilla-daemon
  2020-01-17 16:34 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-14 19:50 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #6 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Created attachment 286813
  --> https://bugzilla.kernel.org/attachment.cgi?id=286813&action=edit
Kernel. Config

Hi Thong,
I use efibootmgr, so no kernel arguments on bootloader, but there are a few in
config (attached here) 
Hope it helps!
Also, I ordered another R9 nano, to rule out the possibility of hardware
failure. It will be available in a few days. I'll keep you updated

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-01-14 19:50 ` bugzilla-daemon
@ 2020-01-17 16:34 ` bugzilla-daemon
  2020-01-17 21:16 ` bugzilla-daemon
  2020-01-18  6:40 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-17 16:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #7 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Created attachment 286863
  --> https://bugzilla.kernel.org/attachment.cgi?id=286863&action=edit
dmesg with 2 GPUs

OK, so this definitely looks like a HW failure,
also tried to copy FW from working GPU to broken GPU, but it did not help.
Is it possible to disable the UVD/VCE engine on the original GPU?
I mean, it's not used anyway, so I might as well disable it completely to avoid
these errors.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-01-17 16:34 ` bugzilla-daemon
@ 2020-01-17 21:16 ` bugzilla-daemon
  2020-01-18  6:40 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-17 21:16 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

--- Comment #8 from Alex Deucher (alexdeucher@gmail.com) ---
(In reply to Janpieter Sollie from comment #7)
> Is it possible to disable the UVD/VCE engine on the original GPU?
> I mean, it's not used anyway, so I might as well disable it completely to
> avoid these errors.

Yes.  Set the amdgpu.ip_block_mask module parameter on the kernel command line
in grub.  Each bit refers to an IP.  From your log:

[    3.987749] [drm] add ip block number 0 <vi_common>
[    3.987750] [drm] add ip block number 1 <gmc_v8_0>
[    3.987751] [drm] add ip block number 2 <tonga_ih>
[    3.987752] [drm] add ip block number 3 <gfx_v8_0>
[    3.987753] [drm] add ip block number 4 <sdma_v3_0>
[    3.987753] [drm] add ip block number 5 <powerplay>
[    3.987754] [drm] add ip block number 6 <dm>
[    3.987755] [drm] add ip block number 7 <uvd_v6_0>
[    3.987755] [drm] add ip block number 8 <vce_v3_0>

bits 7 and 8 are for uvd and vce, so you can append amdgpu.ip_block_mask=0x7f
to only enable blocks 0-6.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 206141] VCE UVD ring test failed -110
  2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-01-17 21:16 ` bugzilla-daemon
@ 2020-01-18  6:40 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-01-18  6:40 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=206141

Janpieter Sollie (janpieter.sollie@dommel.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #9 from Janpieter Sollie (janpieter.sollie@dommel.be) ---
Thank you Alex! You helped the environment by making it unnecessary to trash a
GPU :p.
Anyway, this bug is definitely solved, thank you all! If there's anything I
could do to pay back for the support, let me know.
Janpieter

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-18  6:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-09  9:18 [Bug 206141] New: VCE UVD ring test failed -110 bugzilla-daemon
2020-01-11 22:02 ` [Bug 206141] " bugzilla-daemon
2020-01-11 22:03 ` bugzilla-daemon
2020-01-13 13:08 ` bugzilla-daemon
2020-01-13 16:39 ` bugzilla-daemon
2020-01-14  6:23 ` bugzilla-daemon
2020-01-14 19:27 ` bugzilla-daemon
2020-01-14 19:50 ` bugzilla-daemon
2020-01-17 16:34 ` bugzilla-daemon
2020-01-17 21:16 ` bugzilla-daemon
2020-01-18  6:40 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).