Hi Paul,

On 4/14/2022 2:52 AM, Paul Menzel wrote:
[Cc: -kernel test robot <lkp@intel.com>]

Dear Alex, dear Richard,


Am 13.04.22 um 15:00 schrieb Alex Deucher:
On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:

Thank you for sending out v4.

Am 12.04.22 um 23:50 schrieb Richard Gong:
Active State Power Management (ASPM) feature is enabled since kernel 5.14.
There are some AMD GFX cards (such as WX3200 and RX640) that won't work
with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
video/display output, Intel Alder Lake based systems will hang during
suspend/resume.

I am still not clear, what “hang during suspend/resume” means. I guess
suspending works fine? During resume (S3 or S0ix?), where does it hang?
The system is functional, but there are only display problems?
System freeze after suspend/resume.

The issue was initially reported on one system (Dell Precision 3660 with
BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
Lake based systems.

Add extra check to disable ASPM on Intel Alder Lake based systems.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=04%7C01%7Crichard.gong%40amd.com%7Ce7febed5d6a441c3a58008da1debb99c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637855195670542145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7cEnE%2BSM9e5IGFxSLloCLtCOxovBpaPz0Ns0Ta2vVlc%3D&amp;reserved=0
Reported-by: kernel test robot <lkp@intel.com>

This tag is a little confusing. Maybe clarify that it was for an issue
in a previous patch iteration?

I did describe in change-list version 3 below, which corrected the build error with W=1 option.

It is not good idea to add the description for that to the commit message, this is why I add descriptions on change-list version 3.


Signed-off-by: Richard Gong <richard.gong@amd.com>
---
v4: s/CONFIG_X86_64/CONFIG_X86
      enhanced check logic
v3: s/intel_core_asom_chk/aspm_support_quirk_check
      correct build error with W=1 option
v2: correct commit description
      move the check from chip family to problematic platform
---
   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
   1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index 039b90cdc3bc..b33e0a9bee65 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -81,6 +81,10 @@
   #include "mxgpu_vi.h"
   #include "amdgpu_dm.h"

+#if IS_ENABLED(CONFIG_X86)
+#include <asm/intel-family.h>
+#endif
+
   #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
   #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK    0x00000001L
   #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK        0x00000002L
@@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
               WREG32_PCIE(ixPCIE_LC_CNTL, data);
   }

+static bool aspm_support_quirk_check(void)
+{
+     if (IS_ENABLED(CONFIG_X86)) {
+             struct cpuinfo_x86 *c = &cpu_data(0);
+
+             return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
+     }
+
+     return true;
+}
+
   static void vi_program_aspm(struct amdgpu_device *adev)
   {
       u32 data, data1, orig;
       bool bL1SS = false;
       bool bClkReqSupport = true;

-     if (!amdgpu_device_should_use_aspm(adev))
+     if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
               return;

Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?

As Mario mentioned in a separate reply, we can't forcefully enable ASPM with the parameter 'amdgpu.aspm'.

       if (adev->flags & AMD_IS_APU ||

If I remember correctly, there were also newer cards, where ASPM worked
with Intel Alder Lake, right? Can only the problematic generations for
WX3200 and RX640 be excluded from ASPM?

This patch only disables it for the generation that was problematic.

Could that please be made clear in the commit message summary, and message?

Are you ok with the commit messages below?

Active State Power Management (ASPM) feature is enabled since kernel 5.14.

There are some AMD GFX cards (such as WX3200 and RX640) that won't work

with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as

video/display output, Intel Alder Lake based systems will freeze after

suspend/resume.

The issue was initially reported on one system (Dell Precision 3660 with

BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder

Lake based systems.

Add extra check to disable ASPM on Intel Alder Lake based systems with

problematic generation GFX cards.


Loosely related, is there a public (or internal issue) to analyze how to get ASPM working for VI generation devices with Intel Alder Lake?

As Alex mentioned, we need support from Intel. We don't have any update on that.


Regards,

Richard



Kind regards,

Paul