regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
@ 2021-11-10 16:27 Mark Boddington
  2021-11-10 17:38 ` Greg KH
  2021-11-11  6:09 ` Thorsten Leemhuis
  0 siblings, 2 replies; 10+ messages in thread
From: Mark Boddington @ 2021-11-10 16:27 UTC (permalink / raw)
  To: stable; +Cc: regressions

Hi all,

I run the mainline Linux kernel on Ubuntu 20.04, built from 
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/

There appears to be a regression in 5.15.1 which causes the GPU to fail 
to resume after power saving.

Could it be this change??:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad

The errors I see when the card tries to resume the DP output are:

ov 10 09:37:55 katana rtkit-daemon[2577]: Supervising 3 threads of 1 
processes of 1 users.
Nov 10 09:37:55 katana rtkit-daemon[2577]: Successfully made thread 
12215 of process 2820 owned by '10000' RT at priority 5.
Nov 10 09:37:55 katana rtkit-daemon[2577]: Supervising 4 threads of 1 
processes of 1 users.
Nov 10 09:37:55 katana kernel: [ 3296.643206] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 10 09:38:01 katana kernel: [ 3302.722202] snd_hda_intel 
0000:0d:00.1: refused to change power state from D0 to D3hot
Nov 10 09:38:01 katana kernel: [ 3302.988931] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 10 09:38:06 katana kernel: [ 3308.243937] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:38:06 katana kernel: [ 3308.243941] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 10 09:38:06 katana kernel: [ 3308.320003] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=250622, emitted 
seq=250624
Nov 10 09:38:06 katana kernel: [ 3308.320185] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* Process information: process Xorg pid 2203 thread 
Xorg:cs0 pid 2355
Nov 10 09:38:06 katana kernel: [ 3308.320334] amdgpu 0000:0d:00.0: 
amdgpu: GPU reset begin!
Nov 10 09:38:12 katana kernel: [ 3313.469702] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:38:12 katana kernel: [ 3313.469707] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 10 09:38:17 katana kernel: [ 3318.899866] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:38:17 katana kernel: [ 3318.899871] amdgpu 0000:0d:00.0: 
amdgpu: Failed to disable gfxoff!
Nov 10 09:38:18 katana kernel: [ 3319.514045] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 10 09:38:20 katana kernel: [ 3322.195318] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 10 09:38:30 katana kernel: [ 3331.866060] amdgpu 0000:0d:00.0: 
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test 
failed (-110)
Nov 10 09:38:30 katana kernel: [ 3331.866199] [drm:gfx_v10_0_hw_fini 
[amdgpu]] *ERROR* KGQ disable failed
Nov 10 09:38:30 katana kernel: [ 3332.187330] amdgpu 0000:0d:00.0: 
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test 
failed (-110)
Nov 10 09:38:30 katana kernel: [ 3332.187465] [drm:gfx_v10_0_hw_fini 
[amdgpu]] *ERROR* KCQ disable failed
Nov 10 09:38:36 katana kernel: [ 3337.614265] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:38:36 katana kernel: [ 3337.614269] amdgpu 0000:0d:00.0: 
amdgpu: Failed to disable smu features.
Nov 10 09:38:36 katana kernel: [ 3337.614273] amdgpu 0000:0d:00.0: 
amdgpu: Fail to disable dpm features!
Nov 10 09:38:36 katana kernel: [ 3337.614274] 
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP 
block <smu> failed -62
Nov 10 09:38:36 katana kernel: [ 3337.625941] [drm] free PSP TMR buffer
Nov 10 09:38:37 katana kernel: [ 3338.724759] [drm] psp gfx command 
DESTROY_TMR(0x7) failed and response status is (0x80000306)
Nov 10 09:38:37 katana kernel: [ 3338.745744] amdgpu 0000:0d:00.0: 
amdgpu: MODE1 reset
Nov 10 09:38:37 katana kernel: [ 3338.745748] amdgpu 0000:0d:00.0: 
amdgpu: GPU mode1 reset
Nov 10 09:38:37 katana kernel: [ 3338.745832] amdgpu 0000:0d:00.0: 
amdgpu: GPU smu mode1 reset
Nov 10 09:38:42 katana kernel: [ 3344.061148] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:38:42 katana kernel: [ 3344.061151] amdgpu 0000:0d:00.0: 
amdgpu: GPU mode1 reset failed
Nov 10 09:38:42 katana kernel: [ 3344.061265] amdgpu 0000:0d:00.0: 
amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:0d:00.0
Nov 10 09:38:53 katana kernel: [ 3355.141401] amdgpu 0000:0d:00.0: 
amdgpu: GPU reset succeeded, trying to resume
Nov 10 09:38:53 katana kernel: [ 3355.141674] [drm] PCIE GART of 512M 
enabled (table at 0x0000008000300000).
Nov 10 09:38:53 katana kernel: [ 3355.141709] [drm] VRAM is lost due to 
GPU reset!
Nov 10 09:38:53 katana kernel: [ 3355.142685] [drm] PSP is resuming...
Nov 10 09:38:54 katana kernel: [ 3356.258540] [drm] failed to load ucode 
SMC(0x18)
Nov 10 09:38:54 katana kernel: [ 3356.258567] [drm] psp gfx command 
LOAD_IP_FW(0x6) failed and response status is (0x80000306)
Nov 10 09:38:54 katana kernel: [ 3356.258572] [drm] reserve 0xa00000 
from 0x82fe000000 for PSP TMR
Nov 10 09:38:55 katana kernel: [ 3356.503720] amdgpu 0000:0d:00.0: 
amdgpu: RAS: optional ras ta ucode is not available
Nov 10 09:38:55 katana kernel: [ 3356.517290] amdgpu 0000:0d:00.0: 
amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Nov 10 09:38:55 katana kernel: [ 3356.517293] amdgpu 0000:0d:00.0: 
amdgpu: SMU is resuming...
Nov 10 09:39:00 katana kernel: [ 3361.828868] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 10 09:39:00 katana kernel: [ 3361.828871] amdgpu 0000:0d:00.0: 
amdgpu: Failed to SetDriverDramAddr!
Nov 10 09:39:00 katana kernel: [ 3361.828873] amdgpu 0000:0d:00.0: 
amdgpu: Failed to setup smc hw!
Nov 10 09:39:00 katana kernel: [ 3361.828874] 
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block 
<smu> failed -62
Nov 10 09:39:00 katana kernel: [ 3361.829025] amdgpu 0000:0d:00.0: 
amdgpu: GPU reset(2) failed
Nov 10 09:39:00 katana kernel: [ 3361.831396] amdgpu 0000:0d:00.0: 
amdgpu: GPU reset end with ret = -62
Nov 10 09:39:00 katana kernel: [ 3361.848123] snd_hda_intel 
0000:0d:00.1: refused to change power state from D0 to D3hot
Nov 10 09:39:10 katana kernel: [ 3372.062062] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=250624, emitted 
seq=250624
Nov 10 09:39:10 katana kernel: [ 3372.062243] [drm:amdgpu_job_timedout 
[amdgpu]] *ERROR* Process information: process Xorg pid 2203 thread 
Xorg:cs0 pid 2355
Nov 10 09:39:10 katana kernel: [ 3372.062395] amdgpu 0000:0d:00.0: 
amdgpu: GPU reset begin!
Nov 10 09:41:13 katana systemd[1]: Starting Ubuntu Advantage Timer for 
running repeated jobs...
Nov 10 09:41:13 katana systemd[1]: ua-timer.service: Succeeded.
Nov 10 09:41:13 katana systemd[1]: Finished Ubuntu Advantage Timer for 
running repeated jobs.
Nov 10 09:41:23 katana kernel: [ 3505.167372] INFO: task 
kworker/11:2:284 blocked for more than 120 seconds.
Nov 10 09:41:23 katana kernel: [ 3505.167377]       Not tainted 
5.15.1-051501-generic #202111071208-Ubuntu
Nov 10 09:41:23 katana kernel: [ 3505.167379] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 09:41:23 katana kernel: [ 3505.167380] task:kworker/11:2 state:D 
stack:    0 pid:  284 ppid:     2 flags:0x00004000
Nov 10 09:41:23 katana kernel: [ 3505.167384] Workqueue: events 
drm_sched_job_timedout [gpu_sched]
Nov 10 09:41:23 katana kernel: [ 3505.167392] Call Trace:
Nov 10 09:41:23 katana kernel: [ 3505.167394] __schedule+0x2b6/0x7e0
Nov 10 09:41:23 katana kernel: [ 3505.167398]  schedule+0x4e/0xb0
Nov 10 09:41:23 katana kernel: [ 3505.167400] schedule_timeout+0x202/0x290
Nov 10 09:41:23 katana kernel: [ 3505.167402]  ? 
raw_spin_rq_lock_nested.constprop.0+0x10/0x20
Nov 10 09:41:23 katana kernel: [ 3505.167405] 
dma_fence_default_wait+0x174/0x200
Nov 10 09:41:23 katana kernel: [ 3505.167409]  ? 
dma_fence_release+0x140/0x140
Nov 10 09:41:23 katana kernel: [ 3505.167410] 
dma_fence_wait_timeout+0xb7/0xd0
Nov 10 09:41:23 katana kernel: [ 3505.167412] drm_sched_stop+0xf7/0x170 
[gpu_sched]
Nov 10 09:41:23 katana kernel: [ 3505.167417] 
amdgpu_device_gpu_recover.cold+0xabd/0xad3 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.167569]  ? 
amdgpu_job_timedout+0xf5/0x170 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.167698] 
amdgpu_job_timedout+0x14f/0x170 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.167811] 
drm_sched_job_timedout+0x76/0xf0 [gpu_sched]
Nov 10 09:41:23 katana kernel: [ 3505.167814] process_one_work+0x22b/0x3d0
Nov 10 09:41:23 katana kernel: [ 3505.167816] worker_thread+0x4d/0x3f0
Nov 10 09:41:23 katana kernel: [ 3505.167818]  ? 
process_one_work+0x3d0/0x3d0
Nov 10 09:41:23 katana kernel: [ 3505.167820]  kthread+0x12a/0x150
Nov 10 09:41:23 katana kernel: [ 3505.167821]  ? 
set_kthread_struct+0x40/0x40
Nov 10 09:41:23 katana kernel: [ 3505.167822] ret_from_fork+0x22/0x30
Nov 10 09:41:23 katana kernel: [ 3505.167896] INFO: task chrome:sh1:3814 
blocked for more than 120 seconds.
Nov 10 09:41:23 katana kernel: [ 3505.167898]       Not tainted 
5.15.1-051501-generic #202111071208-Ubuntu
Nov 10 09:41:23 katana kernel: [ 3505.167899] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 09:41:23 katana kernel: [ 3505.167900] task:chrome:sh1 state:D 
stack:    0 pid: 3814 ppid:  3747 flags:0x00004002
Nov 10 09:41:23 katana kernel: [ 3505.167902] Call Trace:
Nov 10 09:41:23 katana kernel: [ 3505.167903] __schedule+0x2b6/0x7e0
Nov 10 09:41:23 katana kernel: [ 3505.167905]  schedule+0x4e/0xb0
Nov 10 09:41:23 katana kernel: [ 3505.167906] schedule_timeout+0x202/0x290
Nov 10 09:41:23 katana kernel: [ 3505.167908] 
dma_fence_default_wait+0x174/0x200
Nov 10 09:41:23 katana kernel: [ 3505.167910]  ? 
dma_fence_release+0x140/0x140
Nov 10 09:41:23 katana kernel: [ 3505.167912] 
dma_fence_wait_timeout+0xb7/0xd0
Nov 10 09:41:23 katana kernel: [ 3505.167913] 
drm_sched_entity_fini+0xd8/0x220 [gpu_sched]
Nov 10 09:41:23 katana kernel: [ 3505.167917] 
amdgpu_ctx_mgr_entity_fini+0xa6/0xf0 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.168020] 
amdgpu_ctx_mgr_fini+0x32/0xc0 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.168119] 
amdgpu_driver_postclose_kms+0x16e/0x240 [amdgpu]
Nov 10 09:41:23 katana kernel: [ 3505.168216]  ? idr_destroy+0x7f/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168219] 
drm_file_free.part.0+0x1e5/0x250 [drm]
Nov 10 09:41:23 katana kernel: [ 3505.168235] 
drm_close_helper.isra.0+0x65/0x70 [drm]
Nov 10 09:41:23 katana kernel: [ 3505.168249]  drm_release+0x6e/0xf0 [drm]
Nov 10 09:41:23 katana kernel: [ 3505.168263]  __fput+0x9f/0x260
Nov 10 09:41:23 katana kernel: [ 3505.168266]  ____fput+0xe/0x10
Nov 10 09:41:23 katana kernel: [ 3505.168267] task_work_run+0x70/0xb0
Nov 10 09:41:23 katana kernel: [ 3505.168269]  do_exit+0x367/0xad0
Nov 10 09:41:23 katana kernel: [ 3505.168271] do_group_exit+0x43/0xb0
Nov 10 09:41:23 katana kernel: [ 3505.168272] get_signal+0x171/0x890
Nov 10 09:41:23 katana kernel: [ 3505.168274]  ? do_futex+0x1b6/0x820
Nov 10 09:41:23 katana kernel: [ 3505.168277] 
arch_do_signal_or_restart+0xf3/0x290
Nov 10 09:41:23 katana kernel: [ 3505.168280] 
exit_to_user_mode_prepare+0x12c/0x1c0
Nov 10 09:41:23 katana kernel: [ 3505.168281] 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:41:23 katana kernel: [ 3505.168284] do_syscall_64+0x69/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168285]  ? switch_fpu_return+0x56/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168288]  ? 
exit_to_user_mode_prepare+0x98/0x1c0
Nov 10 09:41:23 katana kernel: [ 3505.168289]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:41:23 katana kernel: [ 3505.168291]  ? do_syscall_64+0x69/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168292]  ? do_syscall_64+0x69/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168293]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:41:23 katana kernel: [ 3505.168294]  ? __x64_sys_close+0x12/0x40
Nov 10 09:41:23 katana kernel: [ 3505.168297]  ? do_syscall_64+0x69/0xc0
Nov 10 09:41:23 katana kernel: [ 3505.168298] 
entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov 10 09:41:23 katana kernel: [ 3505.168300] RIP: 0033:0x7f0c2e2d5376
Nov 10 09:41:23 katana kernel: [ 3505.168301] RSP: 002b:00007f0c1c84b580 
EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
Nov 10 09:41:23 katana kernel: [ 3505.168303] RAX: fffffffffffffe00 RBX: 
0000000000000000 RCX: 00007f0c2e2d5376
Nov 10 09:41:23 katana kernel: [ 3505.168304] RDX: 0000000000000000 RSI: 
0000000000000080 RDI: 00002bc8000dae24
Nov 10 09:41:23 katana kernel: [ 3505.168305] RBP: 00002bc8000dadf8 R08: 
0000000000000000 R09: 0000000000000004
Nov 10 09:41:23 katana kernel: [ 3505.168306] R10: 0000000000000000 R11: 
0000000000000282 R12: 00002bc8000dae1c
Nov 10 09:41:23 katana kernel: [ 3505.168306] R13: 00002bc8000dadd0 R14: 
00007f0c1c84b5c0 R15: 00002bc8000dae24
Nov 10 09:43:24 katana kernel: [ 3625.995050] INFO: task 
kworker/11:2:284 blocked for more than 241 seconds.
Nov 10 09:43:24 katana kernel: [ 3625.995057]       Not tainted 
5.15.1-051501-generic #202111071208-Ubuntu
Nov 10 09:43:24 katana kernel: [ 3625.995059] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 09:43:24 katana kernel: [ 3625.995060] task:kworker/11:2 state:D 
stack:    0 pid:  284 ppid:     2 flags:0x00004000
Nov 10 09:43:24 katana kernel: [ 3625.995065] Workqueue: events 
drm_sched_job_timedout [gpu_sched]
Nov 10 09:43:24 katana kernel: [ 3625.995073] Call Trace:
Nov 10 09:43:24 katana kernel: [ 3625.995076] __schedule+0x2b6/0x7e0
Nov 10 09:43:24 katana kernel: [ 3625.995081]  schedule+0x4e/0xb0
Nov 10 09:43:24 katana kernel: [ 3625.995083] schedule_timeout+0x202/0x290
Nov 10 09:43:24 katana kernel: [ 3625.995085]  ? 
raw_spin_rq_lock_nested.constprop.0+0x10/0x20
Nov 10 09:43:24 katana kernel: [ 3625.995090] 
dma_fence_default_wait+0x174/0x200
Nov 10 09:43:24 katana kernel: [ 3625.995093]  ? 
dma_fence_release+0x140/0x140
Nov 10 09:43:24 katana kernel: [ 3625.995095] 
dma_fence_wait_timeout+0xb7/0xd0
Nov 10 09:43:24 katana kernel: [ 3625.995097] drm_sched_stop+0xf7/0x170 
[gpu_sched]
Nov 10 09:43:24 katana kernel: [ 3625.995103] 
amdgpu_device_gpu_recover.cold+0xabd/0xad3 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.995299]  ? 
amdgpu_job_timedout+0xf5/0x170 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.995465] 
amdgpu_job_timedout+0x14f/0x170 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.995614] 
drm_sched_job_timedout+0x76/0xf0 [gpu_sched]
Nov 10 09:43:24 katana kernel: [ 3625.995618] process_one_work+0x22b/0x3d0
Nov 10 09:43:24 katana kernel: [ 3625.995621] worker_thread+0x4d/0x3f0
Nov 10 09:43:24 katana kernel: [ 3625.995624]  ? 
process_one_work+0x3d0/0x3d0
Nov 10 09:43:24 katana kernel: [ 3625.995626]  kthread+0x12a/0x150
Nov 10 09:43:24 katana kernel: [ 3625.995627]  ? 
set_kthread_struct+0x40/0x40
Nov 10 09:43:24 katana kernel: [ 3625.995629] ret_from_fork+0x22/0x30
Nov 10 09:43:24 katana kernel: [ 3625.995667] INFO: task 
InputThread:2548 blocked for more than 120 seconds.
Nov 10 09:43:24 katana kernel: [ 3625.995669]       Not tainted 
5.15.1-051501-generic #202111071208-Ubuntu
Nov 10 09:43:24 katana kernel: [ 3625.995671] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 09:43:24 katana kernel: [ 3625.995672] task:InputThread state:D 
stack:    0 pid: 2548 ppid:  2105 flags:0x00000000
Nov 10 09:43:24 katana kernel: [ 3625.995674] Call Trace:
Nov 10 09:43:24 katana kernel: [ 3625.995676] __schedule+0x2b6/0x7e0
Nov 10 09:43:24 katana kernel: [ 3625.995678]  schedule+0x4e/0xb0
Nov 10 09:43:24 katana kernel: [ 3625.995680] 
schedule_preempt_disabled+0xe/0x10
Nov 10 09:43:24 katana kernel: [ 3625.995681] 
__mutex_lock.isra.0+0x208/0x470
Nov 10 09:43:24 katana kernel: [ 3625.995684] 
__mutex_lock_slowpath+0x13/0x20
Nov 10 09:43:24 katana kernel: [ 3625.995686]  mutex_lock+0x32/0x40
Nov 10 09:43:24 katana kernel: [ 3625.995688] 
handle_cursor_update.isra.0+0x197/0x320 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.995865] 
dm_plane_atomic_async_update+0xc4/0x100 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.996030] 
drm_atomic_helper_async_commit+0x6f/0x110 [drm_kms_helper]
Nov 10 09:43:24 katana kernel: [ 3625.996046] 
drm_atomic_helper_commit+0xf4/0x150 [drm_kms_helper]
Nov 10 09:43:24 katana kernel: [ 3625.996059] 
drm_atomic_commit+0x4a/0x50 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996084] 
drm_atomic_helper_update_plane+0xe7/0x140 [drm_kms_helper]
Nov 10 09:43:24 katana kernel: [ 3625.996098] 
__setplane_atomic+0xcc/0x110 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996121] 
drm_mode_cursor_universal+0x13e/0x260 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996141] 
drm_mode_cursor_common+0xef/0x220 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996160]  ? 
drm_mode_setplane+0x340/0x340 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996179] 
drm_mode_cursor_ioctl+0x4a/0x60 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996198] drm_ioctl_kernel+0xae/0xf0 
[drm]
Nov 10 09:43:24 katana kernel: [ 3625.996218]  drm_ioctl+0x25f/0x420 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996237]  ? 
drm_mode_setplane+0x340/0x340 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996256]  ? aa_file_perm+0x11d/0x470
Nov 10 09:43:24 katana kernel: [ 3625.996260] amdgpu_drm_ioctl+0x4e/0x80 
[amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.996384] __x64_sys_ioctl+0x91/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996387] do_syscall_64+0x5c/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996390]  ? vfs_read+0xa0/0x1a0
Nov 10 09:43:24 katana kernel: [ 3625.996393]  ? 
exit_to_user_mode_prepare+0x3d/0x1c0
Nov 10 09:43:24 katana kernel: [ 3625.996395]  ? ksys_read+0xce/0xe0
Nov 10 09:43:24 katana kernel: [ 3625.996398]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:43:24 katana kernel: [ 3625.996400]  ? __x64_sys_read+0x1a/0x20
Nov 10 09:43:24 katana kernel: [ 3625.996403]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996404]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:43:24 katana kernel: [ 3625.996406]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996407]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996409] 
entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov 10 09:43:24 katana kernel: [ 3625.996411] RIP: 0033:0x7f547252750b
Nov 10 09:43:24 katana kernel: [ 3625.996413] RSP: 002b:00007f5404dfb318 
EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 10 09:43:24 katana kernel: [ 3625.996415] RAX: ffffffffffffffda RBX: 
00007f5404dfb350 RCX: 00007f547252750b
Nov 10 09:43:24 katana kernel: [ 3625.996417] RDX: 00007f5404dfb350 RSI: 
00000000c01c64a3 RDI: 000000000000000d
Nov 10 09:43:24 katana kernel: [ 3625.996418] RBP: 00000000c01c64a3 R08: 
000000000000069b R09: 0000000000000001
Nov 10 09:43:24 katana kernel: [ 3625.996419] R10: 0000000000000000 R11: 
0000000000000246 R12: 000055886a8123a0
Nov 10 09:43:24 katana kernel: [ 3625.996420] R13: 000000000000000d R14: 
00000000000006a2 R15: 0000000000000431
Nov 10 09:43:24 katana kernel: [ 3625.996462] INFO: task chrome:sh1:3814 
blocked for more than 241 seconds.
Nov 10 09:43:24 katana kernel: [ 3625.996464]       Not tainted 
5.15.1-051501-generic #202111071208-Ubuntu
Nov 10 09:43:24 katana kernel: [ 3625.996465] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 09:43:24 katana kernel: [ 3625.996466] task:chrome:sh1 state:D 
stack:    0 pid: 3814 ppid:  3747 flags:0x00004002
Nov 10 09:43:24 katana kernel: [ 3625.996468] Call Trace:
Nov 10 09:43:24 katana kernel: [ 3625.996470] __schedule+0x2b6/0x7e0
Nov 10 09:43:24 katana kernel: [ 3625.996472]  schedule+0x4e/0xb0
Nov 10 09:43:24 katana kernel: [ 3625.996474] schedule_timeout+0x202/0x290
Nov 10 09:43:24 katana kernel: [ 3625.996476] 
dma_fence_default_wait+0x174/0x200
Nov 10 09:43:24 katana kernel: [ 3625.996479]  ? 
dma_fence_release+0x140/0x140
Nov 10 09:43:24 katana kernel: [ 3625.996481] 
dma_fence_wait_timeout+0xb7/0xd0
Nov 10 09:43:24 katana kernel: [ 3625.996483] 
drm_sched_entity_fini+0xd8/0x220 [gpu_sched]
Nov 10 09:43:24 katana kernel: [ 3625.996487] 
amdgpu_ctx_mgr_entity_fini+0xa6/0xf0 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.996620] 
amdgpu_ctx_mgr_fini+0x32/0xc0 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.996749] 
amdgpu_driver_postclose_kms+0x16e/0x240 [amdgpu]
Nov 10 09:43:24 katana kernel: [ 3625.996872]  ? idr_destroy+0x7f/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996876] 
drm_file_free.part.0+0x1e5/0x250 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996895] 
drm_close_helper.isra.0+0x65/0x70 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996913]  drm_release+0x6e/0xf0 [drm]
Nov 10 09:43:24 katana kernel: [ 3625.996931]  __fput+0x9f/0x260
Nov 10 09:43:24 katana kernel: [ 3625.996933]  ____fput+0xe/0x10
Nov 10 09:43:24 katana kernel: [ 3625.996935] task_work_run+0x70/0xb0
Nov 10 09:43:24 katana kernel: [ 3625.996937]  do_exit+0x367/0xad0
Nov 10 09:43:24 katana kernel: [ 3625.996940] do_group_exit+0x43/0xb0
Nov 10 09:43:24 katana kernel: [ 3625.996941] get_signal+0x171/0x890
Nov 10 09:43:24 katana kernel: [ 3625.996944]  ? do_futex+0x1b6/0x820
Nov 10 09:43:24 katana kernel: [ 3625.996947] 
arch_do_signal_or_restart+0xf3/0x290
Nov 10 09:43:24 katana kernel: [ 3625.996951] 
exit_to_user_mode_prepare+0x12c/0x1c0
Nov 10 09:43:24 katana kernel: [ 3625.996952] 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:43:24 katana kernel: [ 3625.996955] do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996956]  ? switch_fpu_return+0x56/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996960]  ? 
exit_to_user_mode_prepare+0x98/0x1c0
Nov 10 09:43:24 katana kernel: [ 3625.996961]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:43:24 katana kernel: [ 3625.996963]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996965]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996966]  ? 
syscall_exit_to_user_mode+0x27/0x50
Nov 10 09:43:24 katana kernel: [ 3625.996968]  ? __x64_sys_close+0x12/0x40
Nov 10 09:43:24 katana kernel: [ 3625.996970]  ? do_syscall_64+0x69/0xc0
Nov 10 09:43:24 katana kernel: [ 3625.996971] 
entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov 10 09:43:24 katana kernel: [ 3625.996973] RIP: 0033:0x7f0c2e2d5376
Nov 10 09:43:24 katana kernel: [ 3625.996975] RSP: 002b:00007f0c1c84b580 
EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
Nov 10 09:43:24 katana kernel: [ 3625.996976] RAX: fffffffffffffe00 RBX: 
0000000000000000 RCX: 00007f0c2e2d5376
Nov 10 09:43:24 katana kernel: [ 3625.996977] RDX: 0000000000000000 RSI: 
0000000000000080 RDI: 00002bc8000dae24
Nov 10 09:43:24 katana kernel: [ 3625.996978] RBP: 00002bc8000dadf8 R08: 
0000000000000000 R09: 0000000000000004
Nov 10 09:43:24 katana kernel: [ 3625.996979] R10: 0000000000000000 R11: 
0000000000000282 R12: 00002bc8000dae1c
Nov 10 09:43:24 katana kernel: [ 3625.996980] R13: 00002bc8000dadd0 R14: 
00007f0c1c84b5c0 R15: 00002bc8000dae24

Cheers,

Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 16:27 kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank Mark Boddington
@ 2021-11-10 17:38 ` Greg KH
  2021-11-10 18:53   ` Mark Boddington
  2021-11-10 23:02   ` Mark Boddington
  2021-11-11  6:09 ` Thorsten Leemhuis
  1 sibling, 2 replies; 10+ messages in thread
From: Greg KH @ 2021-11-10 17:38 UTC (permalink / raw)
  To: Mark Boddington; +Cc: stable, regressions

On Wed, Nov 10, 2021 at 04:27:39PM +0000, Mark Boddington wrote:
> Hi all,
> 
> I run the mainline Linux kernel on Ubuntu 20.04, built from
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
> 
> There appears to be a regression in 5.15.1 which causes the GPU to fail to
> resume after power saving.
> 
> Could it be this change??:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad

If you revert it, does it solve the problem for you?

If not, what kernel version did work for you with this hardware?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 17:38 ` Greg KH
@ 2021-11-10 18:53   ` Mark Boddington
  2021-11-10 23:02   ` Mark Boddington
  1 sibling, 0 replies; 10+ messages in thread
From: Mark Boddington @ 2021-11-10 18:53 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, regressions

Hi Greg,

I'll can try reverting that patch, and check.

This hardware has been stable with all kernels from around 5.11 up to 
and including 5.15.

Cheers,

Mark

On 10/11/2021 17:38, Greg KH wrote:
> On Wed, Nov 10, 2021 at 04:27:39PM +0000, Mark Boddington wrote:
>> Hi all,
>>
>> I run the mainline Linux kernel on Ubuntu 20.04, built from
>> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
>>
>> There appears to be a regression in 5.15.1 which causes the GPU to fail to
>> resume after power saving.
>>
>> Could it be this change??:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad
> If you revert it, does it solve the problem for you?
>
> If not, what kernel version did work for you with this hardware?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 17:38 ` Greg KH
  2021-11-10 18:53   ` Mark Boddington
@ 2021-11-10 23:02   ` Mark Boddington
  2021-11-10 23:11     ` Mark Boddington
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Boddington @ 2021-11-10 23:02 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, regressions

I think I've found the problem.

The amdgpu_amdkfd_resume_iommu(adev) call has been moved around a few 
times recently, but in 5.15.1 it's been removed completely.

I think reverting this patch fixes the issue: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f17dca0ab3f38b19c0f1b935f417f62d4a528723

See also: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=714d9e4574d54596973ee3b0624ee4a16264d700

Cheers,

Mark

On 10/11/2021 17:38, Greg KH wrote:
> On Wed, Nov 10, 2021 at 04:27:39PM +0000, Mark Boddington wrote:
>> Hi all,
>>
>> I run the mainline Linux kernel on Ubuntu 20.04, built from
>> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
>>
>> There appears to be a regression in 5.15.1 which causes the GPU to fail to
>> resume after power saving.
>>
>> Could it be this change??:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad
> If you revert it, does it solve the problem for you?
>
> If not, what kernel version did work for you with this hardware?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 23:02   ` Mark Boddington
@ 2021-11-10 23:11     ` Mark Boddington
  2021-11-12 16:04       ` Thorsten Leemhuis
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Boddington @ 2021-11-10 23:11 UTC (permalink / raw)
  To: Greg KH; +Cc: stable, regressions

And also 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f02abeb0779700c308e661a412451b38962b8a0b

Maybe if the function is called during resume() without being called 
during init(), bad things happen???

Cheers

On 10/11/2021 23:02, Mark Boddington wrote:
> I think I've found the problem.
>
> The amdgpu_amdkfd_resume_iommu(adev) call has been moved around a few 
> times recently, but in 5.15.1 it's been removed completely.
>
> I think reverting this patch fixes the issue: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f17dca0ab3f38b19c0f1b935f417f62d4a528723
>
> See also: 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=714d9e4574d54596973ee3b0624ee4a16264d700
>
> Cheers,
>
> Mark
>
> On 10/11/2021 17:38, Greg KH wrote:
>> On Wed, Nov 10, 2021 at 04:27:39PM +0000, Mark Boddington wrote:
>>> Hi all,
>>>
>>> I run the mainline Linux kernel on Ubuntu 20.04, built from
>>> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
>>>
>>> There appears to be a regression in 5.15.1 which causes the GPU to 
>>> fail to
>>> resume after power saving.
>>>
>>> Could it be this change??:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad 
>>>
>> If you revert it, does it solve the problem for you?
>>
>> If not, what kernel version did work for you with this hardware?
>>
>> thanks,
>>
>> greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 16:27 kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank Mark Boddington
  2021-11-10 17:38 ` Greg KH
@ 2021-11-11  6:09 ` Thorsten Leemhuis
  2021-11-25 11:11   ` Thorsten Leemhuis
  1 sibling, 1 reply; 10+ messages in thread
From: Thorsten Leemhuis @ 2021-11-11  6:09 UTC (permalink / raw)
  To: Mark Boddington, regressions

On 10.11.21 17:27, Mark Boddington wrote:
> 
> I run the mainline Linux kernel on Ubuntu 20.04, built from
> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
> 
> There appears to be a regression in 5.15.1 which causes the GPU to fail
> to resume after power saving.
> 
> Could it be this change??:

Mark, thx for CCing the regression list. To be sure this issue doesn't
fall through the cracks unnoticed, I'm adding it to regzbot, the Linux
kernel regression tracking bot:

#regzbot ^introduced f17dca0ab3f38b19c0f1b935f417f62d4a528723
#regzbot ignore-activity

FYI: I removed everyone else and the other lists from the To or CC to
avoid noise, as this mail is meaningless for them.

Ciao, Thorsten, your Linux kernel regression tracker.

P.S.: If you want to know more about regzbot, check out its
web-interface, the getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

But note, regzbot is doing its first field-testing now and thus still
has some bugs. Adding this regression will help be to find them, hence
feel free to ignore it and any errors in the web-ui.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-10 23:11     ` Mark Boddington
@ 2021-11-12 16:04       ` Thorsten Leemhuis
  2021-11-12 17:24         ` Mark Boddington
  0 siblings, 1 reply; 10+ messages in thread
From: Thorsten Leemhuis @ 2021-11-12 16:04 UTC (permalink / raw)
  To: Mark Boddington, Greg KH; +Cc: stable, regressions

Hi Mark. Replying inline
(https://en.wikipedia.org/wiki/Posting_style#Interleaved_style ), as
that's the norm and kinda expected on Linux kernel mailing lists:

On 11.11.21 00:11, Mark Boddington wrote:
> And also
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f02abeb0779700c308e661a412451b38962b8a0b
> 
> Maybe if the function is called during resume() without being called
> during init(), bad things happen???

Have you tried to revert any of the patches you suspect to cause this
and see if things improve? And BTW: did 5.15 (aka 5.15.0) work? Or was
some process to resolve this made already somewhere else and I just
missed it?

Ciao, Thorsten (with his Linux kernel regression tracker hat on)

#regzbot poke

> On 10/11/2021 23:02, Mark Boddington wrote:
>> I think I've found the problem.
>>
>> The amdgpu_amdkfd_resume_iommu(adev) call has been moved around a few
>> times recently, but in 5.15.1 it's been removed completely.
>>
>> I think reverting this patch fixes the issue:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f17dca0ab3f38b19c0f1b935f417f62d4a528723
>>
>>
>> See also:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=714d9e4574d54596973ee3b0624ee4a16264d700
>>
>>
>> Cheers,
>>
>> Mark
>>
>> On 10/11/2021 17:38, Greg KH wrote:
>>> On Wed, Nov 10, 2021 at 04:27:39PM +0000, Mark Boddington wrote:
>>>> Hi all,
>>>>
>>>> I run the mainline Linux kernel on Ubuntu 20.04, built from
>>>> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
>>>>
>>>> There appears to be a regression in 5.15.1 which causes the GPU to
>>>> fail to
>>>> resume after power saving.
>>>>
>>>> Could it be this change??:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c?h=v5.15.1&id=8af3a335b5531ca3df0920b1cca43e456cd110ad
>>>>
>>> If you revert it, does it solve the problem for you?
>>>
>>> If not, what kernel version did work for you with this hardware?
>>>
>>> thanks,
>>>
>>> greg k-h
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-12 16:04       ` Thorsten Leemhuis
@ 2021-11-12 17:24         ` Mark Boddington
  2021-11-13  8:15           ` Thorsten Leemhuis
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Boddington @ 2021-11-12 17:24 UTC (permalink / raw)
  To: Thorsten Leemhuis, Greg KH; +Cc: stable, regressions

Hi,

On 12/11/2021 16:04, Thorsten Leemhuis wrote:
> Hi Mark. Replying inline
> (https://en.wikipedia.org/wiki/Posting_style#Interleaved_style ), as
> that's the norm and kinda expected on Linux kernel mailing lists:
>
> On 11.11.21 00:11, Mark Boddington wrote:
>> And also
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f02abeb0779700c308e661a412451b38962b8a0b
>>
>> Maybe if the function is called during resume() without being called
>> during init(), bad things happen???
> Have you tried to revert any of the patches you suspect to cause this
> and see if things improve? And BTW: did 5.15 (aka 5.15.0) work? Or was
> some process to resolve this made already somewhere else and I just
> missed it?
>
> Ciao, Thorsten (with his Linux kernel regression tracker hat on)
>
> #regzbot poke

I tried reverting both, but they didn't improve the situation.

I also had the deadlock happen on 5.15 yesterday, so the last stable 
kernel I have used is 5.14.15. I can try the latest 5.14.x if that will 
help?

Cheers,

Mark


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-12 17:24         ` Mark Boddington
@ 2021-11-13  8:15           ` Thorsten Leemhuis
  0 siblings, 0 replies; 10+ messages in thread
From: Thorsten Leemhuis @ 2021-11-13  8:15 UTC (permalink / raw)
  To: Mark Boddington, Greg KH; +Cc: stable, regressions

On 12.11.21 18:24, Mark Boddington wrote:
> On 12/11/2021 16:04, Thorsten Leemhuis wrote:
> [...]
>> On 11.11.21 00:11, Mark Boddington wrote:
>>> And also
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?h=linux-5.15.y&id=f02abeb0779700c308e661a412451b38962b8a0b
>>>
>>>
>>> Maybe if the function is called during resume() without being called
>>> during init(), bad things happen???
>> Have you tried to revert any of the patches you suspect to cause this
>> and see if things improve? And BTW: did 5.15 (aka 5.15.0) work? Or was
>> some process to resolve this made already somewhere else and I just
>> missed it?
>>
>> Ciao, Thorsten (with his Linux kernel regression tracker hat on)
>>
>> #regzbot poke
> 
> I tried reverting both, but they didn't improve the situation.
> 
> I also had the deadlock happen on 5.15 yesterday so the last stabl> kernel I have used is 5.14.15. I can try the latest 5.14.x if that will
> help?

You can give it a shot, maybe the problem shows up there now. But I
doubt it, as in situations like this the change that's causing the
problem likely was introduced in mainline (v5.14..v5.15) and not in
stable (v5.15..v5.15.1).

I'd suggest you do the following: install and run the latest 5.14.y
release, just to be sure (and to get something installed running properly).

In parallel check if someone reported such a problem already to the
developers of the driver in question. This document explains how to find
their mailing list or bug tracker archive to check:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
It also explains many other aspects around searching for existing
reports and reporting Linux kernel issues that might be of help.

If you don't find any existing reports, report your problem anew with
the regression mailing list in CC (no need to CC me or the stable list).
Maybe a developer then has a idea what might cause this and point you in
some direction to confirm. But in the end in situations like this you'll
likely need to bisect the problem using a git bisection (see reporting
issues). So consider to do that before reporting, it's not as hard and
time-consuming as many people think.

Note: As a Linux kernel regression tracker I'm getting a lot of reports
on my table and can only look briefly into them. Due to that I sometimes
will get things wrong and thus might give bad advice. I hope that's not
the case here. But if you think I got something wrong, don't hesitate to
tell me about that. That's in both other interest to prevent you from
going down the wrong rabbit hole.

Ciao, Thorsten (carrying his Linux kernel regression tracker hat)

P.S.: Feel free to ignore the following lines, they are for regzbot, my
Linux kernel regression tracking bot:

#regzbot introduced v5.14..v5.15

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank
  2021-11-11  6:09 ` Thorsten Leemhuis
@ 2021-11-25 11:11   ` Thorsten Leemhuis
  0 siblings, 0 replies; 10+ messages in thread
From: Thorsten Leemhuis @ 2021-11-25 11:11 UTC (permalink / raw)
  To: Mark Boddington, regressions



On 11.11.21 07:09, Thorsten Leemhuis wrote:
> On 10.11.21 17:27, Mark Boddington wrote:
>>
>> I run the mainline Linux kernel on Ubuntu 20.04, built from
>> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15.1/
>>
>> There appears to be a regression in 5.15.1 which causes the GPU to fail
>> to resume after power saving.
>>
>> Could it be this change??:
> 
> Mark, thx for CCing the regression list. To be sure this issue doesn't
> fall through the cracks unnoticed, I'm adding it to regzbot, the Linux
> kernel regression tracking bot:
> 
> #regzbot ^introduced f17dca0ab3f38b19c0f1b935f417f62d4a528723
> #regzbot ignore-activity
> 
> FYI: I removed everyone else and the other lists from the To or CC to
> avoid noise, as this mail is meaningless for them.
> 
> Ciao, Thorsten, your Linux kernel regression tracker.
> 
> P.S.: If you want to know more about regzbot, check out its
> web-interface, the getting start guide, and/or the references documentation:
> 
> https://linux-regtracking.leemhuis.info/regzbot/
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> 
> But note, regzbot is doing its first field-testing now and thus still
> has some bugs. Adding this regression will help be to find them, hence
> feel free to ignore it and any errors in the web-ui.

#regzbot dup-of:
https://lore.kernel.org/all/99797fb7-76eb-9d86-ad2f-591243eca404@badpenguin.co.uk/
#regzbot ignore-activity

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-25 11:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-10 16:27 kernel 5.15.1: AMD RX 6700 XT - Fails to resume after screen blank Mark Boddington
2021-11-10 17:38 ` Greg KH
2021-11-10 18:53   ` Mark Boddington
2021-11-10 23:02   ` Mark Boddington
2021-11-10 23:11     ` Mark Boddington
2021-11-12 16:04       ` Thorsten Leemhuis
2021-11-12 17:24         ` Mark Boddington
2021-11-13  8:15           ` Thorsten Leemhuis
2021-11-11  6:09 ` Thorsten Leemhuis
2021-11-25 11:11   ` Thorsten Leemhuis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).