kernel 5.15.x: AMD RX 6700 XT

* kernel 5.15.x: AMD RX 6700 XT - Fails to resume after screen blank
@ 2021-11-24 19:14 Mark Boddington
  2021-11-25 11:09 ` Thorsten Leemhuis
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Boddington @ 2021-11-24 19:14 UTC (permalink / raw)
  To: amd-gfx; +Cc: regressions

Hi all,

TL;DR - git bisection points to 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.15.4&id=61d861cf478576d85d6032f864360a34b26084b1 
as causing an issue when changing power state after idle.

Since 5.15.0 I have had intermittent issues with my GPU failing to 
resume after entering power saving. I have errors like these:

Nov 18 09:52:19 katana kernel: [ 4921.669813] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:21 katana kernel: [ 4923.667803] snd_hda_intel 
0000:0d:00.1: refused to change power state from D0 to D3hot
Nov 18 09:52:26 katana kernel: [ 4928.622234] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 18 09:52:31 katana kernel: [ 4933.371814] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:31 katana kernel: [ 4933.650854] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4933.921708] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4933.940249] amdgpu 0000:0d:00.0: 
amdgpu: SMU: I'm not done with your previous command!
Nov 18 09:52:32 katana kernel: [ 4933.940254] amdgpu 0000:0d:00.0: 
amdgpu: Failed to export SMU metrics table!
Nov 18 09:52:32 katana kernel: [ 4934.192236] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:32 katana kernel: [ 4934.463213] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4934.736895] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.007928] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.279063] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:33 katana kernel: [ 4935.550243] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4935.824034] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.095158] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.366210] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:34 katana kernel: [ 4936.629193] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4936.886333] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.140815] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.395341] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:35 katana kernel: [ 4937.649885] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:36 katana kernel: [ 4937.906944] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 18 09:52:36 katana kernel: [ 4938.162866] [drm:dc_dmub_srv_wait_idle 
[amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

this eventually leads to processes crashing, and the system locking up 
during shutdown.

A git bisection has isolated the following patch as the cause.

commit 8f0284f190e6a0aa09015090568c03f18288231a (refs/bisect/bad)
Merge: 5bea1c8ce673 61d861cf4785
Author: Dave Airlie <airlied@redhat.com>
Date:   Mon Aug 30 09:06:01 2021 +1000

     Merge tag 'amd-drm-next-5.15-2021-08-27' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next

     amd-drm-next-5.15-2021-08-27:

     amdgpu:
     - PLL fix for SI
     - Misc code cleanups
     - RAS fixes
     - PSP cleanups
     - Polaris UVD/VCE suspend fixes
     - aldebaran fixes
     - DCN3.x mclk fixes

     amdkfd:
     - CWSR fixes for arcturus and aldebaran
     - SVM fixes

     Signed-off-by: Dave Airlie <airlied@redhat.com>
     From: Alex Deucher <alexander.deucher@amd.com>
     Link: 
https://patchwork.freedesktop.org/patch/msgid/20210827192336.4649-1-alexander.deucher@amd.com

commit 61d861cf478576d85d6032f864360a34b26084b1 (HEAD)
Author: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Date:   Wed May 13 11:58:50 2020 -0400

     drm/amd/display: Move AllowDRAMSelfRefreshOrDRAMClockChangeInVblank 
to bounding box

     [Why]
     This is a global parameter, not a per pipe parameter and it's useful
     for experimenting with the prefetch schedule to be adjustable from
     the SOC bb.

     [How]
     Add a parameter to the SOC bb, default is the existing policy for
     all DCN. Fill it in when filling SOC bb parameters.

     Revert the policy to use MinDCFClk at the same time since that's not
     going to give us P-State in most cases on the spreadsheet.

     Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1403
     Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
     Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
     Tested-by: Daniel Wheeler <Daniel.Wheeler@amd.com>
     Acked-by: Alex Deucher <alexander.deucher@amd.com>
     Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

I have been running 5.15.4 with 61d861cf478576d85d6032f864360a34b26084b1 
backed out for a few hours with multiple periods of power saving, and so 
far so good.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 4+ messages in thread