All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 217690] New: consistent amdgpu failures on Lenovo ThinkPad Z16: "[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!"
@ 2023-07-21  2:33 bugzilla-daemon
  2023-07-21 12:00 ` [Bug 217690] " bugzilla-daemon
  0 siblings, 1 reply; 2+ messages in thread
From: bugzilla-daemon @ 2023-07-21  2:33 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=217690

            Bug ID: 217690
           Summary: consistent amdgpu failures on Lenovo ThinkPad Z16:
                    "[drm:amdgpu_dm_process_dmub_aux_transfer_sync
                    [amdgpu]] *ERROR* wait_for_completion_timeout
                    timeout!"
           Product: Drivers
           Version: 2.5
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: high
          Priority: P3
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: mhall@mhcomputing.net
        Regression: No

I am using the following system:

System Information
        Manufacturer: LENOVO
        Product Name: 21D4000HUS
        Version: ThinkPad Z16 Gen 1
        Serial Number: PF3XPEBD
        UUID: 59e137b9-fc54-11ec-80f2-6c2408eab813
        Wake-up Type: Power Switch
        SKU Number: LENOVO_MT_21D4_BU_Think_FM_ThinkPad Z16 Gen 1
        Family: ThinkPad Z16 Gen 1

It has the following graphics controller:

Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M]
(rev c3)

I see different bizarre AMD GPU errors from 6.2.0, 6.2.16, and 6.4.3. I have
never been able to find a kernel that works 100% stable, so I am not sure how
to narrow this down or work around it without some professional advice.

Here are the errors from Linux version 6.4.3-060403-generic (kernel@kathleen)
(x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1) 12.3.0, GNU ld (GNU Binutils
for Ubuntu) 2.40) #202307110536 SMP PREEMPT_DYNAMIC Tue Jul 11 05:43:58 UTC
2023.

When the errors happen, X / Wayland freezes up. The problem happens more often
in Wayland than X, from what I can see. Sometimes, you can mode-switch to the
TTY and try to recover some work, but other times it locks up any local
interaction with the machine, so you can only get in via SSH. It does not seem
to fully cripple the machine, but the machine usually refuses to complete the
normal shutdown process when the infinite timeout message loop described below
occurs.

I am not 100% sure what information would be most helpful for debugging this,
but let me know and I will provide whatever is needed ASAP.

2023-07-14T13:12:00.727393-07:00 mhall-xps-01 kernel: [601793.214603]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled
seq=9336145, emitted seq=9336147
2023-07-14T13:12:00.727411-07:00 mhall-xps-01 kernel: [601793.215114]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process
kwin_wayland pid 2368 thread kwin_wayla:cs0 pid 2378
2023-07-14T13:12:00.727413-07:00 mhall-xps-01 kernel: [601793.215552] amdgpu
0000:67:00.0: amdgpu: GPU reset begin!
2023-07-14T13:12:01.439388-07:00 mhall-xps-01 kernel: [601793.927226] amdgpu
0000:67:00.0: amdgpu: MODE2 reset
2023-07-14T13:12:01.452637-07:00 mhall-xps-01 kernel: [601793.936646] amdgpu
0000:67:00.0: amdgpu: GPU reset succeeded, trying to resume
2023-07-14T13:12:01.452643-07:00 mhall-xps-01 kernel: [601793.936835] [drm]
PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
2023-07-14T13:12:01.455390-07:00 mhall-xps-01 kernel: [601793.941616] [drm] PSP
is resuming...
2023-07-14T13:12:01.475649-07:00 mhall-xps-01 kernel: [601793.963877] [drm]
reserve 0xa00000 from 0xf41e000000 for PSP TMR
2023-07-14T13:12:01.799694-07:00 mhall-xps-01 kernel: [601794.288086] amdgpu
0000:67:00.0: amdgpu: RAS: optional ras ta ucode is not available
2023-07-14T13:12:01.811380-07:00 mhall-xps-01 kernel: [601794.300087] amdgpu
0000:67:00.0: amdgpu: RAP: optional rap ta ucode is not available
2023-07-14T13:12:01.811389-07:00 mhall-xps-01 kernel: [601794.300089] amdgpu
0000:67:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2023-07-14T13:12:01.811390-07:00 mhall-xps-01 kernel: [601794.300094] amdgpu
0000:67:00.0: amdgpu: SMU is resuming...
2023-07-14T13:12:01.815388-07:00 mhall-xps-01 kernel: [601794.301356] amdgpu
0000:67:00.0: amdgpu: SMU is resumed successfully!
2023-07-14T13:12:01.815403-07:00 mhall-xps-01 kernel: [601794.303468] [drm]
DMUB hardware initialized: version=0x0400002E
2023-07-14T13:12:02.839400-07:00 mhall-xps-01 kernel: [601794.311151] [drm]
Watermarks table not configured properly by SMU
2023-07-14T13:12:02.839419-07:00 mhall-xps-01 kernel: [601795.324614] [drm] kiq
ring mec 2 pipe 1 q 0
2023-07-14T13:12:02.843825-07:00 mhall-xps-01 kernel: [601795.330387] [drm] VCN
decode and encode initialized successfully(under DPG Mode).
2023-07-14T13:12:02.843842-07:00 mhall-xps-01 kernel: [601795.330870] [drm]
JPEG decode initialized successfully.
2023-07-14T13:12:02.843845-07:00 mhall-xps-01 kernel: [601795.330877] amdgpu
0000:67:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2023-07-14T13:12:02.843846-07:00 mhall-xps-01 kernel: [601795.330882] amdgpu
0000:67:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2023-07-14T13:12:02.843848-07:00 mhall-xps-01 kernel: [601795.330886] amdgpu
0000:67:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2023-07-14T13:12:02.843849-07:00 mhall-xps-01 kernel: [601795.330888] amdgpu
0000:67:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
2023-07-14T13:12:02.843851-07:00 mhall-xps-01 kernel: [601795.330889] amdgpu
0000:67:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
2023-07-14T13:12:02.843852-07:00 mhall-xps-01 kernel: [601795.330891] amdgpu
0000:67:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
2023-07-14T13:12:02.843852-07:00 mhall-xps-01 kernel: [601795.330893] amdgpu
0000:67:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
2023-07-14T13:12:02.843854-07:00 mhall-xps-01 kernel: [601795.330895] amdgpu
0000:67:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
2023-07-14T13:12:02.843855-07:00 mhall-xps-01 kernel: [601795.330896] amdgpu
0000:67:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
2023-07-14T13:12:02.843856-07:00 mhall-xps-01 kernel: [601795.330898] amdgpu
0000:67:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
2023-07-14T13:12:02.843857-07:00 mhall-xps-01 kernel: [601795.330900] amdgpu
0000:67:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2023-07-14T13:12:02.843858-07:00 mhall-xps-01 kernel: [601795.330902] amdgpu
0000:67:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
2023-07-14T13:12:02.843860-07:00 mhall-xps-01 kernel: [601795.330904] amdgpu
0000:67:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
2023-07-14T13:12:02.843860-07:00 mhall-xps-01 kernel: [601795.330906] amdgpu
0000:67:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
2023-07-14T13:12:02.843861-07:00 mhall-xps-01 kernel: [601795.330908] amdgpu
0000:67:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
2023-07-14T13:12:02.849015-07:00 mhall-xps-01 kernel: [601795.337869] amdgpu
0000:67:00.0: amdgpu: recover vram bo from shadow start
2023-07-14T13:12:02.849027-07:00 mhall-xps-01 kernel: [601795.337873] amdgpu
0000:67:00.0: amdgpu: recover vram bo from shadow done
2023-07-14T13:12:02.849029-07:00 mhall-xps-01 kernel: [601795.337922] [drm]
Skip scheduling IBs!
2023-07-14T13:12:02.849039-07:00 mhall-xps-01 kernel: [601795.337938] amdgpu
0000:67:00.0: amdgpu: GPU reset(2) succeeded!
2023-07-14T13:12:02.849041-07:00 mhall-xps-01 kernel: [601795.337940] [drm]
Skip scheduling IBs!
2023-07-14T13:12:03.659407-07:00 mhall-xps-01 kernel: [601796.144908] [drm]
PCIE GART of 512M enabled (table at 0x00000080FEB00000).
2023-07-14T13:12:03.659436-07:00 mhall-xps-01 kernel: [601796.144933] [drm] PSP
is resuming...
2023-07-14T13:12:03.735392-07:00 mhall-xps-01 kernel: [601796.221134] [drm]
reserve 0xa00000 from 0x80fd000000 for PSP TMR
2023-07-14T13:12:03.835392-07:00 mhall-xps-01 kernel: [601796.320820] amdgpu
0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
2023-07-14T13:12:03.847400-07:00 mhall-xps-01 kernel: [601796.335924] amdgpu
0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2023-07-14T13:12:03.847414-07:00 mhall-xps-01 kernel: [601796.335931] amdgpu
0000:03:00.0: amdgpu: SMU is resuming...
2023-07-14T13:12:03.847418-07:00 mhall-xps-01 kernel: [601796.335938] amdgpu
0000:03:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version =
0x0000000f, smu fw program = 0, version = 0x00491f00 (73.31.0)
2023-07-14T13:12:03.847422-07:00 mhall-xps-01 kernel: [601796.335944] amdgpu
0000:03:00.0: amdgpu: SMU driver if version not matched
2023-07-14T13:12:03.847424-07:00 mhall-xps-01 kernel: [601796.335987] amdgpu
0000:03:00.0: amdgpu: use vbios provided pptable
2023-07-14T13:12:03.887360-07:00 mhall-xps-01 kernel: [601796.374699] amdgpu
0000:03:00.0: amdgpu: SMU is resumed successfully!
2023-07-14T13:12:03.887364-07:00 mhall-xps-01 kernel: [601796.375979] [drm]
DMUB hardware initialized: version=0x02020017
2023-07-14T13:12:03.891378-07:00 mhall-xps-01 kernel: [601796.378784] [drm] kiq
ring mec 2 pipe 1 q 0
2023-07-14T13:12:03.895371-07:00 mhall-xps-01 kernel: [601796.382127] [drm] VCN
decode and encode initialized successfully(under DPG Mode).
2023-07-14T13:12:03.895382-07:00 mhall-xps-01 kernel: [601796.382143] amdgpu
0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2023-07-14T13:12:03.895384-07:00 mhall-xps-01 kernel: [601796.382145] amdgpu
0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2023-07-14T13:12:03.895384-07:00 mhall-xps-01 kernel: [601796.382146] amdgpu
0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2023-07-14T13:12:03.895385-07:00 mhall-xps-01 kernel: [601796.382147] amdgpu
0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
2023-07-14T13:12:03.895406-07:00 mhall-xps-01 kernel: [601796.382147] amdgpu
0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
2023-07-14T13:12:03.895408-07:00 mhall-xps-01 kernel: [601796.382148] amdgpu
0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
2023-07-14T13:12:03.895409-07:00 mhall-xps-01 kernel: [601796.382148] amdgpu
0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
2023-07-14T13:12:03.895410-07:00 mhall-xps-01 kernel: [601796.382149] amdgpu
0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
2023-07-14T13:12:03.895411-07:00 mhall-xps-01 kernel: [601796.382150] amdgpu
0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
2023-07-14T13:12:03.895412-07:00 mhall-xps-01 kernel: [601796.382151] amdgpu
0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
2023-07-14T13:12:03.895413-07:00 mhall-xps-01 kernel: [601796.382151] amdgpu
0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2023-07-14T13:12:03.895415-07:00 mhall-xps-01 kernel: [601796.382152] amdgpu
0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
2023-07-14T13:12:30.415435-07:00 mhall-xps-01 kernel: [601812.660226]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:12:34.415365-07:00 mhall-xps-01 kernel: [601822.899679]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:12:34.415376-07:00 mhall-xps-01 kernel: [601826.899898] [drm]
PCIE GART of 512M enabled (table at 0x00000080FEB00000).
2023-07-14T13:12:34.415378-07:00 mhall-xps-01 kernel: [601826.899922] [drm] PSP
is resuming...
2023-07-14T13:12:34.491417-07:00 mhall-xps-01 kernel: [601826.976455] [drm]
reserve 0xa00000 from 0x80fd000000 for PSP TMR
2023-07-14T13:12:34.591378-07:00 mhall-xps-01 kernel: [601827.075900] amdgpu
0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
2023-07-14T13:12:34.603398-07:00 mhall-xps-01 kernel: [601827.091423] amdgpu
0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
2023-07-14T13:12:34.603421-07:00 mhall-xps-01 kernel: [601827.091433] amdgpu
0000:03:00.0: amdgpu: SMU is resuming...
2023-07-14T13:12:34.603422-07:00 mhall-xps-01 kernel: [601827.091439] amdgpu
0000:03:00.0: amdgpu: smu driver if version = 0x0000000d, smu fw if version =
0x0000000f, smu fw program = 0, version = 0x00491f00 (73.31.0)
2023-07-14T13:12:34.603425-07:00 mhall-xps-01 kernel: [601827.091444] amdgpu
0000:03:00.0: amdgpu: SMU driver if version not matched
2023-07-14T13:12:34.603426-07:00 mhall-xps-01 kernel: [601827.091474] amdgpu
0000:03:00.0: amdgpu: use vbios provided pptable
2023-07-14T13:12:34.643366-07:00 mhall-xps-01 kernel: [601827.130769] amdgpu
0000:03:00.0: amdgpu: SMU is resumed successfully!
2023-07-14T13:12:34.647347-07:00 mhall-xps-01 kernel: [601827.132081] [drm]
DMUB hardware initialized: version=0x02020017
2023-07-14T13:12:34.647351-07:00 mhall-xps-01 kernel: [601827.135330] [drm] kiq
ring mec 2 pipe 1 q 0
2023-07-14T13:12:34.651354-07:00 mhall-xps-01 kernel: [601827.138683] [drm] VCN
decode and encode initialized successfully(under DPG Mode).
2023-07-14T13:12:34.651357-07:00 mhall-xps-01 kernel: [601827.138709] amdgpu
0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
2023-07-14T13:12:34.651359-07:00 mhall-xps-01 kernel: [601827.138713] amdgpu
0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
2023-07-14T13:12:34.651360-07:00 mhall-xps-01 kernel: [601827.138715] amdgpu
0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
2023-07-14T13:12:34.651361-07:00 mhall-xps-01 kernel: [601827.138717] amdgpu
0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
2023-07-14T13:12:34.651362-07:00 mhall-xps-01 kernel: [601827.138719] amdgpu
0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
2023-07-14T13:12:34.651363-07:00 mhall-xps-01 kernel: [601827.138721] amdgpu
0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
2023-07-14T13:12:34.651365-07:00 mhall-xps-01 kernel: [601827.138723] amdgpu
0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
2023-07-14T13:12:34.651365-07:00 mhall-xps-01 kernel: [601827.138724] amdgpu
0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
2023-07-14T13:12:34.651366-07:00 mhall-xps-01 kernel: [601827.138726] amdgpu
0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
2023-07-14T13:12:34.651367-07:00 mhall-xps-01 kernel: [601827.138727] amdgpu
0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
2023-07-14T13:12:34.651368-07:00 mhall-xps-01 kernel: [601827.138729] amdgpu
0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
2023-07-14T13:12:34.651377-07:00 mhall-xps-01 kernel: [601827.138731] amdgpu
0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
2023-07-14T13:12:50.895678-07:00 mhall-xps-01 kernel: [601833.139440]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:13:01.135668-07:00 mhall-xps-01 kernel: [601843.379388]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:13:11.375612-07:00 mhall-xps-01 kernel: [601853.618947]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:13:21.615870-07:00 mhall-xps-01 kernel: [601863.858670]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
2023-07-14T13:13:31.855713-07:00 mhall-xps-01 kernel: [601874.098488]
[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR*
wait_for_completion_timeout timeout!
... infinite repeat until reboot ...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug 217690] consistent amdgpu failures on Lenovo ThinkPad Z16: "[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!"
  2023-07-21  2:33 [Bug 217690] New: consistent amdgpu failures on Lenovo ThinkPad Z16: "[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!" bugzilla-daemon
@ 2023-07-21 12:00 ` bugzilla-daemon
  0 siblings, 0 replies; 2+ messages in thread
From: bugzilla-daemon @ 2023-07-21 12:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=217690

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |ANSWERED

--- Comment #1 from Artem S. Tashkinov (aros@gmx.com) ---
Please take it here instead: https://gitlab.freedesktop.org/drm/amd/-/issues

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-07-21 12:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21  2:33 [Bug 217690] New: consistent amdgpu failures on Lenovo ThinkPad Z16: "[drm:amdgpu_dm_process_dmub_aux_transfer_sync [amdgpu]] *ERROR* wait_for_completion_timeout timeout!" bugzilla-daemon
2023-07-21 12:00 ` [Bug 217690] " bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.