linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
@ 2022-04-12 21:50 Richard Gong
  2022-04-13  4:29 ` Lazar, Lijo
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Richard Gong @ 2022-04-12 21:50 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, xinhui.pan, airlied, daniel
  Cc: amd-gfx, dri-devel, linux-kernel, mario.limonciello,
	richard.gong, kernel test robot

Active State Power Management (ASPM) feature is enabled since kernel 5.14.
There are some AMD GFX cards (such as WX3200 and RX640) that won't work
with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
video/display output, Intel Alder Lake based systems will hang during
suspend/resume.

The issue was initially reported on one system (Dell Precision 3660 with
BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
Lake based systems.

Add extra check to disable ASPM on Intel Alder Lake based systems.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Richard Gong <richard.gong@amd.com>
---
v4: s/CONFIG_X86_64/CONFIG_X86
    enhanced check logic
v3: s/intel_core_asom_chk/aspm_support_quirk_check
    correct build error with W=1 option
v2: correct commit description
    move the check from chip family to problematic platform
---
 drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index 039b90cdc3bc..b33e0a9bee65 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -81,6 +81,10 @@
 #include "mxgpu_vi.h"
 #include "amdgpu_dm.h"
 
+#if IS_ENABLED(CONFIG_X86)
+#include <asm/intel-family.h>
+#endif
+
 #define ixPCIE_LC_L1_PM_SUBSTATE	0x100100C6
 #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK	0x00000001L
 #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK	0x00000002L
@@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
 		WREG32_PCIE(ixPCIE_LC_CNTL, data);
 }
 
+static bool aspm_support_quirk_check(void)
+{
+	if (IS_ENABLED(CONFIG_X86)) {
+		struct cpuinfo_x86 *c = &cpu_data(0);
+
+		return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
+	}
+
+	return true;
+}
+
 static void vi_program_aspm(struct amdgpu_device *adev)
 {
 	u32 data, data1, orig;
 	bool bL1SS = false;
 	bool bClkReqSupport = true;
 
-	if (!amdgpu_device_should_use_aspm(adev))
+	if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
 		return;
 
 	if (adev->flags & AMD_IS_APU ||
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-12 21:50 [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Richard Gong
@ 2022-04-13  4:29 ` Lazar, Lijo
  2022-04-13  7:43 ` Paul Menzel
  2022-04-13 15:40 ` Nathan Chancellor
  2 siblings, 0 replies; 25+ messages in thread
From: Lazar, Lijo @ 2022-04-13  4:29 UTC (permalink / raw)
  To: Richard Gong, alexander.deucher, christian.koenig, xinhui.pan,
	airlied, daniel
  Cc: amd-gfx, kernel test robot, linux-kernel, dri-devel, mario.limonciello



On 4/13/2022 3:20 AM, Richard Gong wrote:
> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> video/display output, Intel Alder Lake based systems will hang during
> suspend/resume.
> 
> The issue was initially reported on one system (Dell Precision 3660 with
> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> Lake based systems.
> 
> Add extra check to disable ASPM on Intel Alder Lake based systems.
> 
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Richard Gong <richard.gong@amd.com>

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>

Thanks,
Lijo

> ---
> v4: s/CONFIG_X86_64/CONFIG_X86
>      enhanced check logic
> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>      correct build error with W=1 option
> v2: correct commit description
>      move the check from chip family to problematic platform
> ---
>   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 039b90cdc3bc..b33e0a9bee65 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -81,6 +81,10 @@
>   #include "mxgpu_vi.h"
>   #include "amdgpu_dm.h"
>   
> +#if IS_ENABLED(CONFIG_X86)
> +#include <asm/intel-family.h>
> +#endif
> +
>   #define ixPCIE_LC_L1_PM_SUBSTATE	0x100100C6
>   #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK	0x00000001L
>   #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK	0x00000002L
> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
>   		WREG32_PCIE(ixPCIE_LC_CNTL, data);
>   }
>   
> +static bool aspm_support_quirk_check(void)
> +{
> +	if (IS_ENABLED(CONFIG_X86)) {
> +		struct cpuinfo_x86 *c = &cpu_data(0);
> +
> +		return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> +	}
> +
> +	return true;
> +}
> +
>   static void vi_program_aspm(struct amdgpu_device *adev)
>   {
>   	u32 data, data1, orig;
>   	bool bL1SS = false;
>   	bool bClkReqSupport = true;
>   
> -	if (!amdgpu_device_should_use_aspm(adev))
> +	if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
>   		return;
>   
>   	if (adev->flags & AMD_IS_APU ||
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-12 21:50 [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Richard Gong
  2022-04-13  4:29 ` Lazar, Lijo
@ 2022-04-13  7:43 ` Paul Menzel
  2022-04-13 13:00   ` Alex Deucher
  2022-04-13 15:40 ` Nathan Chancellor
  2 siblings, 1 reply; 25+ messages in thread
From: Paul Menzel @ 2022-04-13  7:43 UTC (permalink / raw)
  To: Richard Gong
  Cc: alexander.deucher, christian.koenig, xinhui.pan, airlied, daniel,
	amd-gfx, kernel test robot, linux-kernel, dri-devel,
	mario.limonciello

Dear Richard,


Thank you for sending out v4.

Am 12.04.22 um 23:50 schrieb Richard Gong:
> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> video/display output, Intel Alder Lake based systems will hang during
> suspend/resume.

I am still not clear, what “hang during suspend/resume” means. I guess 
suspending works fine? During resume (S3 or S0ix?), where does it hang? 
The system is functional, but there are only display problems?

> The issue was initially reported on one system (Dell Precision 3660 with
> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> Lake based systems.
> 
> Add extra check to disable ASPM on Intel Alder Lake based systems.
> 
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> Reported-by: kernel test robot <lkp@intel.com>

This tag is a little confusing. Maybe clarify that it was for an issue 
in a previous patch iteration?

> Signed-off-by: Richard Gong <richard.gong@amd.com>
> ---
> v4: s/CONFIG_X86_64/CONFIG_X86
>      enhanced check logic
> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>      correct build error with W=1 option
> v2: correct commit description
>      move the check from chip family to problematic platform
> ---
>   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 039b90cdc3bc..b33e0a9bee65 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -81,6 +81,10 @@
>   #include "mxgpu_vi.h"
>   #include "amdgpu_dm.h"
>   
> +#if IS_ENABLED(CONFIG_X86)
> +#include <asm/intel-family.h>
> +#endif
> +
>   #define ixPCIE_LC_L1_PM_SUBSTATE	0x100100C6
>   #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK	0x00000001L
>   #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK	0x00000002L
> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
>   		WREG32_PCIE(ixPCIE_LC_CNTL, data);
>   }
>   
> +static bool aspm_support_quirk_check(void)
> +{
> +	if (IS_ENABLED(CONFIG_X86)) {
> +		struct cpuinfo_x86 *c = &cpu_data(0);
> +
> +		return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> +	}
> +
> +	return true;
> +}
> +
>   static void vi_program_aspm(struct amdgpu_device *adev)
>   {
>   	u32 data, data1, orig;
>   	bool bL1SS = false;
>   	bool bClkReqSupport = true;
>   
> -	if (!amdgpu_device_should_use_aspm(adev))
> +	if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
>   		return;

Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?

>   
>   	if (adev->flags & AMD_IS_APU ||

If I remember correctly, there were also newer cards, where ASPM worked 
with Intel Alder Lake, right? Can only the problematic generations for 
WX3200 and RX640 be excluded from ASPM?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-13  7:43 ` Paul Menzel
@ 2022-04-13 13:00   ` Alex Deucher
  2022-04-13 13:28     ` Limonciello, Mario
  2022-04-14  7:52     ` Paul Menzel
  0 siblings, 2 replies; 25+ messages in thread
From: Alex Deucher @ 2022-04-13 13:00 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Richard Gong, kernel test robot, Dave Airlie, xinhui pan, LKML,
	amd-gfx list, Maling list - DRI developers, Daniel Vetter,
	Deucher, Alexander, Christian Koenig, Limonciello, Mario

On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Richard,
>
>
> Thank you for sending out v4.
>
> Am 12.04.22 um 23:50 schrieb Richard Gong:
> > Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> > There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> > with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> > video/display output, Intel Alder Lake based systems will hang during
> > suspend/resume.
>
> I am still not clear, what “hang during suspend/resume” means. I guess
> suspending works fine? During resume (S3 or S0ix?), where does it hang?
> The system is functional, but there are only display problems?
>
> > The issue was initially reported on one system (Dell Precision 3660 with
> > BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> > Lake based systems.
> >
> > Add extra check to disable ASPM on Intel Alder Lake based systems.
> >
> > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > Reported-by: kernel test robot <lkp@intel.com>
>
> This tag is a little confusing. Maybe clarify that it was for an issue
> in a previous patch iteration?
>
> > Signed-off-by: Richard Gong <richard.gong@amd.com>
> > ---
> > v4: s/CONFIG_X86_64/CONFIG_X86
> >      enhanced check logic
> > v3: s/intel_core_asom_chk/aspm_support_quirk_check
> >      correct build error with W=1 option
> > v2: correct commit description
> >      move the check from chip family to problematic platform
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> >   1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> > index 039b90cdc3bc..b33e0a9bee65 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > @@ -81,6 +81,10 @@
> >   #include "mxgpu_vi.h"
> >   #include "amdgpu_dm.h"
> >
> > +#if IS_ENABLED(CONFIG_X86)
> > +#include <asm/intel-family.h>
> > +#endif
> > +
> >   #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> >   #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK    0x00000001L
> >   #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK        0x00000002L
> > @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
> >               WREG32_PCIE(ixPCIE_LC_CNTL, data);
> >   }
> >
> > +static bool aspm_support_quirk_check(void)
> > +{
> > +     if (IS_ENABLED(CONFIG_X86)) {
> > +             struct cpuinfo_x86 *c = &cpu_data(0);
> > +
> > +             return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> > +     }
> > +
> > +     return true;
> > +}
> > +
> >   static void vi_program_aspm(struct amdgpu_device *adev)
> >   {
> >       u32 data, data1, orig;
> >       bool bL1SS = false;
> >       bool bClkReqSupport = true;
> >
> > -     if (!amdgpu_device_should_use_aspm(adev))
> > +     if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
> >               return;
>
> Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?
>
> >
> >       if (adev->flags & AMD_IS_APU ||
>
> If I remember correctly, there were also newer cards, where ASPM worked
> with Intel Alder Lake, right? Can only the problematic generations for
> WX3200 and RX640 be excluded from ASPM?

This patch only disables it for the generation that was problematic.

Alex

>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-13 13:00   ` Alex Deucher
@ 2022-04-13 13:28     ` Limonciello, Mario
  2022-04-14  7:52     ` Paul Menzel
  1 sibling, 0 replies; 25+ messages in thread
From: Limonciello, Mario @ 2022-04-13 13:28 UTC (permalink / raw)
  To: Alex Deucher, Paul Menzel
  Cc: Gong, Richard, kernel test robot, Dave Airlie, Pan, Xinhui, LKML,
	amd-gfx list, Maling list - DRI developers, Daniel Vetter,
	Deucher, Alexander, Koenig, Christian

[Public]

 
> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel <pmenzel@molgen.mpg.de>
> wrote:
> >
> > Dear Richard,
> >
> >
> > Thank you for sending out v4.
> >
> > Am 12.04.22 um 23:50 schrieb Richard Gong:
> > > Active State Power Management (ASPM) feature is enabled since kernel
> 5.14.
> > > There are some AMD GFX cards (such as WX3200 and RX640) that won't
> work
> > > with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> cards as
> > > video/display output, Intel Alder Lake based systems will hang during
> > > suspend/resume.
> >
> > I am still not clear, what "hang during suspend/resume" means. I guess
> > suspending works fine? During resume (S3 or S0ix?), where does it hang?
> > The system is functional, but there are only display problems?

I believe Intel would need to identify the state of the SOC to determine where
the PCIE problem actually occurs; on the way down or up.

As he said in the commit message it results in a hang.

> >
> > > The issue was initially reported on one system (Dell Precision 3660 with
> > > BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> > > Lake based systems.
> > >
> > > Add extra check to disable ASPM on Intel Alder Lake based systems.
> > >
> > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > > Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitla
> b.freedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1885&amp;data=04%7C01%7Cmario.limonciello%40amd.com%
> 7Cfe4b6b553c3b47c1288f08da1d4da9c8%7C3dd8961fe4884e608e11a82d994e
> 183d%7C0%7C0%7C637854516675116782%7CUnknown%7CTWFpbGZsb3d8ey
> JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> 7C3000&amp;sdata=gvFmP1HQP%2FyzLfT0gYYCupAQBIG%2FtiDYelQNqTLAx
> ck%3D&amp;reserved=0
> > > Reported-by: kernel test robot <lkp@intel.com>
> >
> > This tag is a little confusing. Maybe clarify that it was for an issue
> > in a previous patch iteration?
> >
> > > Signed-off-by: Richard Gong <richard.gong@amd.com>
> > > ---
> > > v4: s/CONFIG_X86_64/CONFIG_X86
> > >      enhanced check logic
> > > v3: s/intel_core_asom_chk/aspm_support_quirk_check
> > >      correct build error with W=1 option
> > > v2: correct commit description
> > >      move the check from chip family to problematic platform
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> > >   1 file changed, 16 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> b/drivers/gpu/drm/amd/amdgpu/vi.c
> > > index 039b90cdc3bc..b33e0a9bee65 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > > @@ -81,6 +81,10 @@
> > >   #include "mxgpu_vi.h"
> > >   #include "amdgpu_dm.h"
> > >
> > > +#if IS_ENABLED(CONFIG_X86)
> > > +#include <asm/intel-family.h>
> > > +#endif
> > > +
> > >   #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> > >   #define
> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
> 0x00000001L
> > >   #define
> PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
> 0x00000002L
> > > @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
> amdgpu_device *adev)
> > >               WREG32_PCIE(ixPCIE_LC_CNTL, data);
> > >   }
> > >
> > > +static bool aspm_support_quirk_check(void)
> > > +{
> > > +     if (IS_ENABLED(CONFIG_X86)) {
> > > +             struct cpuinfo_x86 *c = &cpu_data(0);
> > > +
> > > +             return !(c->x86 == 6 && c->x86_model ==
> INTEL_FAM6_ALDERLAKE);

Don't you need to check x86_vendor?  Although extremely unlikely if you don't
check x86_vendor nothing to stop another X86 manufacturer from having a
design that has same model # as INTEL_FAM6_ALDERLAKE.

> > > +     }
> > > +
> > > +     return true;
> > > +}
> > > +
> > >   static void vi_program_aspm(struct amdgpu_device *adev)
> > >   {
> > >       u32 data, data1, orig;
> > >       bool bL1SS = false;
> > >       bool bClkReqSupport = true;
> > >
> > > -     if (!amdgpu_device_should_use_aspm(adev))
> > > +     if (!amdgpu_device_should_use_aspm(adev) ||
> !aspm_support_quirk_check())
> > >               return;
> >
> > Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?

amdgpu.aspm is module wide not just for one card.  That is it covers all AMD GPU cards
in the system.  If it's set to 1 or pcie_aspm_enabled returns true it will enable for other
GPUs besides these.

There is the possibility to move this quirk check within " amdgpu_device_should_use_aspm"
and only match this combination when set to amdgpu.aspm is set to "-1" (the default), something
like this:

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1335,6 +1335,8 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev)
        default:
                return false;
        }
+       if (amdgpu_device_is_quirked_for_aspm(adev))
+               return false;
        return pcie_aspm_enabled(adev->pdev);
 }


> >
> > >
> > >       if (adev->flags & AMD_IS_APU ||
> >
> > If I remember correctly, there were also newer cards, where ASPM worked
> > with Intel Alder Lake, right? Can only the problematic generations for
> > WX3200 and RX640 be excluded from ASPM?
> 
> This patch only disables it for the generation that was problematic.
> 
> Alex
> 
> >
> >
> > Kind regards,
> >
> > Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-12 21:50 [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Richard Gong
  2022-04-13  4:29 ` Lazar, Lijo
  2022-04-13  7:43 ` Paul Menzel
@ 2022-04-13 15:40 ` Nathan Chancellor
  2022-04-19 21:08   ` Gong, Richard
  2 siblings, 1 reply; 25+ messages in thread
From: Nathan Chancellor @ 2022-04-13 15:40 UTC (permalink / raw)
  To: Richard Gong
  Cc: alexander.deucher, christian.koenig, xinhui.pan, airlied, daniel,
	amd-gfx, dri-devel, linux-kernel, mario.limonciello,
	kernel test robot

Hi Richard,

On Tue, Apr 12, 2022 at 04:50:00PM -0500, Richard Gong wrote:
> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> video/display output, Intel Alder Lake based systems will hang during
> suspend/resume.
> 
> The issue was initially reported on one system (Dell Precision 3660 with
> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> Lake based systems.
> 
> Add extra check to disable ASPM on Intel Alder Lake based systems.
> 
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Richard Gong <richard.gong@amd.com>
> ---
> v4: s/CONFIG_X86_64/CONFIG_X86
>     enhanced check logic
> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>     correct build error with W=1 option
> v2: correct commit description
>     move the check from chip family to problematic platform
> ---
>  drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 039b90cdc3bc..b33e0a9bee65 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -81,6 +81,10 @@
>  #include "mxgpu_vi.h"
>  #include "amdgpu_dm.h"
>  
> +#if IS_ENABLED(CONFIG_X86)
> +#include <asm/intel-family.h>
> +#endif
> +
>  #define ixPCIE_LC_L1_PM_SUBSTATE	0x100100C6
>  #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK	0x00000001L
>  #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK	0x00000002L
> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
>  		WREG32_PCIE(ixPCIE_LC_CNTL, data);
>  }
>  
> +static bool aspm_support_quirk_check(void)
> +{
> +	if (IS_ENABLED(CONFIG_X86)) {
> +		struct cpuinfo_x86 *c = &cpu_data(0);
> +
> +		return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> +	}

I have not seen this reported by a bot, sorry if it is a duplicate. This
breaks non-x86 builds (arm64 allmodconfig for example):

drivers/gpu/drm/amd/amdgpu/vi.c:1144:28: error: implicit declaration of function 'cpu_data' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
                struct cpuinfo_x86 *c = &cpu_data(0);
                                         ^
drivers/gpu/drm/amd/amdgpu/vi.c:1144:27: error: cannot take the address of an rvalue of type 'int'
                struct cpuinfo_x86 *c = &cpu_data(0);
                                        ^~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/vi.c:1146:13: error: incomplete definition of type 'struct cpuinfo_x86'
                return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
                         ~^
drivers/gpu/drm/amd/amdgpu/vi.c:1144:10: note: forward declaration of 'struct cpuinfo_x86'
                struct cpuinfo_x86 *c = &cpu_data(0);
                       ^
drivers/gpu/drm/amd/amdgpu/vi.c:1146:28: error: incomplete definition of type 'struct cpuinfo_x86'
                return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
                                        ~^
drivers/gpu/drm/amd/amdgpu/vi.c:1144:10: note: forward declaration of 'struct cpuinfo_x86'
                struct cpuinfo_x86 *c = &cpu_data(0);
                       ^
drivers/gpu/drm/amd/amdgpu/vi.c:1146:43: error: use of undeclared identifier 'INTEL_FAM6_ALDERLAKE'
                return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
                                                        ^
5 errors generated.

'struct cpuinfo_x86' is only defined for CONFIG_X86 so this section
needs to guarded with the preprocessor, which is how it was done in v2.
Please go back to that.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-13 13:00   ` Alex Deucher
  2022-04-13 13:28     ` Limonciello, Mario
@ 2022-04-14  7:52     ` Paul Menzel
  2022-04-14 13:11       ` Alex Deucher
       [not found]       ` <94fd858d-1792-9c05-b5c6-1b028427687d@amd.com>
  1 sibling, 2 replies; 25+ messages in thread
From: Paul Menzel @ 2022-04-14  7:52 UTC (permalink / raw)
  To: Alex Deucher, Richard Gong
  Cc: Dave Airlie, xinhui pan, LKML, amd-gfx, dri-devel, Daniel Vetter,
	Alexander Deucher, Christian König, Mario Limonciello

[Cc: -kernel test robot <lkp@intel.com>]

Dear Alex, dear Richard,


Am 13.04.22 um 15:00 schrieb Alex Deucher:
> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:

>> Thank you for sending out v4.
>>
>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
>>> video/display output, Intel Alder Lake based systems will hang during
>>> suspend/resume.
>>
>> I am still not clear, what “hang during suspend/resume” means. I guess
>> suspending works fine? During resume (S3 or S0ix?), where does it hang?
>> The system is functional, but there are only display problems?
>>
>>> The issue was initially reported on one system (Dell Precision 3660 with
>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
>>> Lake based systems.
>>>
>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>
>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
>>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> This tag is a little confusing. Maybe clarify that it was for an issue
>> in a previous patch iteration?
>>
>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>> ---
>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>       enhanced check logic
>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>       correct build error with W=1 option
>>> v2: correct commit description
>>>       move the check from chip family to problematic platform
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>    1 file changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>> @@ -81,6 +81,10 @@
>>>    #include "mxgpu_vi.h"
>>>    #include "amdgpu_dm.h"
>>>
>>> +#if IS_ENABLED(CONFIG_X86)
>>> +#include <asm/intel-family.h>
>>> +#endif
>>> +
>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK    0x00000001L
>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK        0x00000002L
>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>    }
>>>
>>> +static bool aspm_support_quirk_check(void)
>>> +{
>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>> +
>>> +             return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>>> +     }
>>> +
>>> +     return true;
>>> +}
>>> +
>>>    static void vi_program_aspm(struct amdgpu_device *adev)
>>>    {
>>>        u32 data, data1, orig;
>>>        bool bL1SS = false;
>>>        bool bClkReqSupport = true;
>>>
>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>> +     if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
>>>                return;
>>
>> Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?
>>
>>>
>>>        if (adev->flags & AMD_IS_APU ||
>>
>> If I remember correctly, there were also newer cards, where ASPM worked
>> with Intel Alder Lake, right? Can only the problematic generations for
>> WX3200 and RX640 be excluded from ASPM?
> 
> This patch only disables it for the generation that was problematic.

Could that please be made clear in the commit message summary, and message?

Loosely related, is there a public (or internal issue) to analyze how to 
get ASPM working for VI generation devices with Intel Alder Lake?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-14  7:52     ` Paul Menzel
@ 2022-04-14 13:11       ` Alex Deucher
       [not found]       ` <94fd858d-1792-9c05-b5c6-1b028427687d@amd.com>
  1 sibling, 0 replies; 25+ messages in thread
From: Alex Deucher @ 2022-04-14 13:11 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Richard Gong, Dave Airlie, xinhui pan, LKML, amd-gfx list,
	Maling list - DRI developers, Daniel Vetter, Alexander Deucher,
	Christian König, Mario Limonciello

On Thu, Apr 14, 2022 at 3:52 AM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> [Cc: -kernel test robot <lkp@intel.com>]
>
> Dear Alex, dear Richard,
>
>
> Am 13.04.22 um 15:00 schrieb Alex Deucher:
> > On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>
> >> Thank you for sending out v4.
> >>
> >> Am 12.04.22 um 23:50 schrieb Richard Gong:
> >>> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> >>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> >>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> >>> video/display output, Intel Alder Lake based systems will hang during
> >>> suspend/resume.
> >>
> >> I am still not clear, what “hang during suspend/resume” means. I guess
> >> suspending works fine? During resume (S3 or S0ix?), where does it hang?
> >> The system is functional, but there are only display problems?
> >>
> >>> The issue was initially reported on one system (Dell Precision 3660 with
> >>> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> >>> Lake based systems.
> >>>
> >>> Add extra check to disable ASPM on Intel Alder Lake based systems.
> >>>
> >>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> >>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> >>> Reported-by: kernel test robot <lkp@intel.com>
> >>
> >> This tag is a little confusing. Maybe clarify that it was for an issue
> >> in a previous patch iteration?
> >>
> >>> Signed-off-by: Richard Gong <richard.gong@amd.com>
> >>> ---
> >>> v4: s/CONFIG_X86_64/CONFIG_X86
> >>>       enhanced check logic
> >>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
> >>>       correct build error with W=1 option
> >>> v2: correct commit description
> >>>       move the check from chip family to problematic platform
> >>> ---
> >>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> >>>    1 file changed, 16 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> index 039b90cdc3bc..b33e0a9bee65 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> @@ -81,6 +81,10 @@
> >>>    #include "mxgpu_vi.h"
> >>>    #include "amdgpu_dm.h"
> >>>
> >>> +#if IS_ENABLED(CONFIG_X86)
> >>> +#include <asm/intel-family.h>
> >>> +#endif
> >>> +
> >>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> >>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK    0x00000001L
> >>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK        0x00000002L
> >>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
> >>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
> >>>    }
> >>>
> >>> +static bool aspm_support_quirk_check(void)
> >>> +{
> >>> +     if (IS_ENABLED(CONFIG_X86)) {
> >>> +             struct cpuinfo_x86 *c = &cpu_data(0);
> >>> +
> >>> +             return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> >>> +     }
> >>> +
> >>> +     return true;
> >>> +}
> >>> +
> >>>    static void vi_program_aspm(struct amdgpu_device *adev)
> >>>    {
> >>>        u32 data, data1, orig;
> >>>        bool bL1SS = false;
> >>>        bool bClkReqSupport = true;
> >>>
> >>> -     if (!amdgpu_device_should_use_aspm(adev))
> >>> +     if (!amdgpu_device_should_use_aspm(adev) || !aspm_support_quirk_check())
> >>>                return;
> >>
> >> Can users still forcefully enable ASPM with the parameter `amdgpu.aspm`?
> >>
> >>>
> >>>        if (adev->flags & AMD_IS_APU ||
> >>
> >> If I remember correctly, there were also newer cards, where ASPM worked
> >> with Intel Alder Lake, right? Can only the problematic generations for
> >> WX3200 and RX640 be excluded from ASPM?
> >
> > This patch only disables it for the generation that was problematic.
>
> Could that please be made clear in the commit message summary, and message?

Sure.  Richard, please add that this only disables ASPM on VI parts
when in an alderlake system.

>
> Loosely related, is there a public (or internal issue) to analyze how to
> get ASPM working for VI generation devices with Intel Alder Lake?

We'd need support from Intel.  I'm not sure where things currently stand.

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-13 15:40 ` Nathan Chancellor
@ 2022-04-19 21:08   ` Gong, Richard
  0 siblings, 0 replies; 25+ messages in thread
From: Gong, Richard @ 2022-04-19 21:08 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: alexander.deucher, christian.koenig, xinhui.pan, airlied, daniel,
	amd-gfx, dri-devel, linux-kernel, mario.limonciello,
	kernel test robot

Hi Nathan,

On 4/13/2022 10:40 AM, Nathan Chancellor wrote:
> Hi Richard,
>
> On Tue, Apr 12, 2022 at 04:50:00PM -0500, Richard Gong wrote:
>> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
>> video/display output, Intel Alder Lake based systems will hang during
>> suspend/resume.
>>
>> The issue was initially reported on one system (Dell Precision 3660 with
>> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
>> Lake based systems.
>>
>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>
>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=04%7C01%7Crichard.gong%40amd.com%7C35699b2c088747daedf508da1d63f1f3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637854612351767549%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=lzgZ3bV0PLFFl9uo3wt6N1dOoZpU2DqpddAk%2BTX8rEI%3D&amp;reserved=0
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>> ---
>> v4: s/CONFIG_X86_64/CONFIG_X86
>>      enhanced check logic
>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>      correct build error with W=1 option
>> v2: correct commit description
>>      move the check from chip family to problematic platform
>> ---
>>   drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>   1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
>> index 039b90cdc3bc..b33e0a9bee65 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>> @@ -81,6 +81,10 @@
>>   #include "mxgpu_vi.h"
>>   #include "amdgpu_dm.h"
>>   
>> +#if IS_ENABLED(CONFIG_X86)
>> +#include <asm/intel-family.h>
>> +#endif
>> +
>>   #define ixPCIE_LC_L1_PM_SUBSTATE	0x100100C6
>>   #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK	0x00000001L
>>   #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK	0x00000002L
>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct amdgpu_device *adev)
>>   		WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>   }
>>   
>> +static bool aspm_support_quirk_check(void)
>> +{
>> +	if (IS_ENABLED(CONFIG_X86)) {
>> +		struct cpuinfo_x86 *c = &cpu_data(0);
>> +
>> +		return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>> +	}
> I have not seen this reported by a bot, sorry if it is a duplicate. This
> breaks non-x86 builds (arm64 allmodconfig for example):
>
> drivers/gpu/drm/amd/amdgpu/vi.c:1144:28: error: implicit declaration of function 'cpu_data' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
>                  struct cpuinfo_x86 *c = &cpu_data(0);
>                                           ^
> drivers/gpu/drm/amd/amdgpu/vi.c:1144:27: error: cannot take the address of an rvalue of type 'int'
>                  struct cpuinfo_x86 *c = &cpu_data(0);
>                                          ^~~~~~~~~~~~
> drivers/gpu/drm/amd/amdgpu/vi.c:1146:13: error: incomplete definition of type 'struct cpuinfo_x86'
>                  return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>                           ~^
> drivers/gpu/drm/amd/amdgpu/vi.c:1144:10: note: forward declaration of 'struct cpuinfo_x86'
>                  struct cpuinfo_x86 *c = &cpu_data(0);
>                         ^
> drivers/gpu/drm/amd/amdgpu/vi.c:1146:28: error: incomplete definition of type 'struct cpuinfo_x86'
>                  return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>                                          ~^
> drivers/gpu/drm/amd/amdgpu/vi.c:1144:10: note: forward declaration of 'struct cpuinfo_x86'
>                  struct cpuinfo_x86 *c = &cpu_data(0);
>                         ^
> drivers/gpu/drm/amd/amdgpu/vi.c:1146:43: error: use of undeclared identifier 'INTEL_FAM6_ALDERLAKE'
>                  return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>                                                          ^
> 5 errors generated.
>
> 'struct cpuinfo_x86' is only defined for CONFIG_X86 so this section
> needs to guarded with the preprocessor, which is how it was done in v2.
> Please go back to that.

Thanks, I will do that.

Regards,

Richard

>
> Cheers,
> Nathan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
       [not found]       ` <94fd858d-1792-9c05-b5c6-1b028427687d@amd.com>
@ 2022-04-20 20:29         ` Paul Menzel
  2022-04-20 20:40           ` Alex Deucher
  2022-04-21  1:12           ` Gong, Richard
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Menzel @ 2022-04-20 20:29 UTC (permalink / raw)
  To: Richard Gong
  Cc: Alex Deucher, Dave Airlie, Xinhui Pan, LKML, dri-devel, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello

Dear Richard,


Am 19.04.22 um 23:46 schrieb Gong, Richard:

> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>> [Cc: -kernel test robot <lkp@intel.com>]

[…]

>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>
>>>> Thank you for sending out v4.
>>>>
>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>> kernel 5.14.
>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't 
>>>>> work
>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>> cards as
>>>>> video/display output, Intel Alder Lake based systems will hang during
>>>>> suspend/resume.

[Your email program wraps lines in cited text for some reason, making 
the citation harder to read.]

>>>>
>>>> I am still not clear, what “hang during suspend/resume” means. I guess
>>>> suspending works fine? During resume (S3 or S0ix?), where does it hang?
>>>> The system is functional, but there are only display problems?
> System freeze after suspend/resume.

But you see certain messages still? At what point does it freeze 
exactly? In the bug report you posted Linux messages.

>>>>> The issue was initially reported on one system (Dell Precision 3660 
>>>>> with
>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>>>>> Alder
>>>>> Lake based systems.
>>>>>
>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>
>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>> Link: 
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=04%7C01%7Crichard.gong%40amd.com%7Ce7febed5d6a441c3a58008da1debb99c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637855195670542145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7cEnE%2BSM9e5IGFxSLloCLtCOxovBpaPz0Ns0Ta2vVlc%3D&amp;reserved=0

Thank you Microsoft Outlook for keeping us safe. :(

>>>>>
>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>
>>>> This tag is a little confusing. Maybe clarify that it was for an issue
>>>> in a previous patch iteration?
> 
> I did describe in change-list version 3 below, which corrected the build 
> error with W=1 option.
> 
> It is not good idea to add the description for that to the commit 
> message, this is why I add descriptions on change-list version 3.

Do as you wish, but the current style is confusing, and readers of the 
commit are going to think, the kernel test robot reported the problem 
with AMD VI ASICs and Intel Alder Lake systems.

>>>>
>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>> ---
>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>       enhanced check logic
>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>       correct build error with W=1 option
>>>>> v2: correct commit description
>>>>>       move the check from chip family to problematic platform
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>    1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>> @@ -81,6 +81,10 @@
>>>>>    #include "mxgpu_vi.h"
>>>>>    #include "amdgpu_dm.h"
>>>>>
>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>> +#include <asm/intel-family.h>
>>>>> +#endif
>>>>> +
>>>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 
>>>>> 0x00000001L
>>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 
>>>>> 0x00000002L
>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct 
>>>>> amdgpu_device *adev)
>>>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>    }
>>>>>
>>>>> +static bool aspm_support_quirk_check(void)
>>>>> +{
>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>> +
>>>>> +             return !(c->x86 == 6 && c->x86_model == 
>>>>> INTEL_FAM6_ALDERLAKE);
>>>>> +     }
>>>>> +
>>>>> +     return true;
>>>>> +}
>>>>> +
>>>>>    static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>    {
>>>>>        u32 data, data1, orig;
>>>>>        bool bL1SS = false;
>>>>>        bool bClkReqSupport = true;
>>>>>
>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>> +     if (!amdgpu_device_should_use_aspm(adev) || 
>>>>> !aspm_support_quirk_check())
>>>>>                return;
>>>>
>>>> Can users still forcefully enable ASPM with the parameter 
>>>> `amdgpu.aspm`?
>>>>
> As Mario mentioned in a separate reply, we can't forcefully enable ASPM 
> with the parameter 'amdgpu.aspm'.

That would be a regression on systems where ASPM used to work. Hmm. I 
guess, you could say, there are no such systems.

>>>>>
>>>>>        if (adev->flags & AMD_IS_APU ||
>>>>
>>>> If I remember correctly, there were also newer cards, where ASPM worked
>>>> with Intel Alder Lake, right? Can only the problematic generations for
>>>> WX3200 and RX640 be excluded from ASPM?
>>>
>>> This patch only disables it for the generatioaon that was problematic.
>>
>> Could that please be made clear in the commit message summary, and 
>> message?
> 
> Are you ok with the commit messages below?

Please change the commit message summary. Maybe:

drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems

> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> 
> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> video/display output, Intel Alder Lake based systems will freeze after
> suspend/resume.

Something like:

On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic 
Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize 
when resuming from S0ix(?).


> The issue was initially reported on one system (Dell Precision 3660 with
> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> Lake based systems.

Which ones?

> Add extra check to disable ASPM on Intel Alder Lake based systems with
> problematic generation GFX cards.

… with the problematic Volcanic Islands GFX cards.

>>
>> Loosely related, is there a public (or internal issue) to analyze how 
>> to get ASPM working for VI generation devices with Intel Alder Lake?
> 
> As Alex mentioned, we need support from Intel. We don't have any update 
> on that.

It’d be great to get that fixed properly.

Last thing, please don’t hate me, does Linux log, that ASPM is disabled?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 20:29         ` Paul Menzel
@ 2022-04-20 20:40           ` Alex Deucher
  2022-04-20 20:48             ` Paul Menzel
  2022-04-21  1:12           ` Gong, Richard
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Deucher @ 2022-04-20 20:40 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Richard Gong, Dave Airlie, Xinhui Pan, LKML,
	Maling list - DRI developers, amd-gfx list, Daniel Vetter,
	Alexander Deucher, Christian König, Mario Limonciello

On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Richard,
>
>
> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>
> > On 4/14/2022 2:52 AM, Paul Menzel wrote:
> >> [Cc: -kernel test robot <lkp@intel.com>]
>
> […]
>
> >> Am 13.04.22 um 15:00 schrieb Alex Deucher:
> >>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
> >>
> >>>> Thank you for sending out v4.
> >>>>
> >>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
> >>>>> Active State Power Management (ASPM) feature is enabled since
> >>>>> kernel 5.14.
> >>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
> >>>>> work
> >>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> >>>>> cards as
> >>>>> video/display output, Intel Alder Lake based systems will hang during
> >>>>> suspend/resume.
>
> [Your email program wraps lines in cited text for some reason, making
> the citation harder to read.]
>
> >>>>
> >>>> I am still not clear, what “hang during suspend/resume” means. I guess
> >>>> suspending works fine? During resume (S3 or S0ix?), where does it hang?
> >>>> The system is functional, but there are only display problems?
> > System freeze after suspend/resume.
>
> But you see certain messages still? At what point does it freeze
> exactly? In the bug report you posted Linux messages.
>
> >>>>> The issue was initially reported on one system (Dell Precision 3660
> >>>>> with
> >>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4
> >>>>> Alder
> >>>>> Lake based systems.
> >>>>>
> >>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
> >>>>>
> >>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> >>>>> Link:
> >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=04%7C01%7Crichard.gong%40amd.com%7Ce7febed5d6a441c3a58008da1debb99c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637855195670542145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7cEnE%2BSM9e5IGFxSLloCLtCOxovBpaPz0Ns0Ta2vVlc%3D&amp;reserved=0
>
> Thank you Microsoft Outlook for keeping us safe. :(
>
> >>>>>
> >>>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>>
> >>>> This tag is a little confusing. Maybe clarify that it was for an issue
> >>>> in a previous patch iteration?
> >
> > I did describe in change-list version 3 below, which corrected the build
> > error with W=1 option.
> >
> > It is not good idea to add the description for that to the commit
> > message, this is why I add descriptions on change-list version 3.
>
> Do as you wish, but the current style is confusing, and readers of the
> commit are going to think, the kernel test robot reported the problem
> with AMD VI ASICs and Intel Alder Lake systems.
>
> >>>>
> >>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
> >>>>> ---
> >>>>> v4: s/CONFIG_X86_64/CONFIG_X86
> >>>>>       enhanced check logic
> >>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
> >>>>>       correct build error with W=1 option
> >>>>> v2: correct commit description
> >>>>>       move the check from chip family to problematic platform
> >>>>> ---
> >>>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> >>>>>    1 file changed, 16 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>> index 039b90cdc3bc..b33e0a9bee65 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>> @@ -81,6 +81,10 @@
> >>>>>    #include "mxgpu_vi.h"
> >>>>>    #include "amdgpu_dm.h"
> >>>>>
> >>>>> +#if IS_ENABLED(CONFIG_X86)
> >>>>> +#include <asm/intel-family.h>
> >>>>> +#endif
> >>>>> +
> >>>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> >>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
> >>>>> 0x00000001L
> >>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
> >>>>> 0x00000002L
> >>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
> >>>>> amdgpu_device *adev)
> >>>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
> >>>>>    }
> >>>>>
> >>>>> +static bool aspm_support_quirk_check(void)
> >>>>> +{
> >>>>> +     if (IS_ENABLED(CONFIG_X86)) {
> >>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
> >>>>> +
> >>>>> +             return !(c->x86 == 6 && c->x86_model ==
> >>>>> INTEL_FAM6_ALDERLAKE);
> >>>>> +     }
> >>>>> +
> >>>>> +     return true;
> >>>>> +}
> >>>>> +
> >>>>>    static void vi_program_aspm(struct amdgpu_device *adev)
> >>>>>    {
> >>>>>        u32 data, data1, orig;
> >>>>>        bool bL1SS = false;
> >>>>>        bool bClkReqSupport = true;
> >>>>>
> >>>>> -     if (!amdgpu_device_should_use_aspm(adev))
> >>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
> >>>>> !aspm_support_quirk_check())
> >>>>>                return;
> >>>>
> >>>> Can users still forcefully enable ASPM with the parameter
> >>>> `amdgpu.aspm`?
> >>>>
> > As Mario mentioned in a separate reply, we can't forcefully enable ASPM
> > with the parameter 'amdgpu.aspm'.
>
> That would be a regression on systems where ASPM used to work. Hmm. I
> guess, you could say, there are no such systems.
>
> >>>>>
> >>>>>        if (adev->flags & AMD_IS_APU ||
> >>>>
> >>>> If I remember correctly, there were also newer cards, where ASPM worked
> >>>> with Intel Alder Lake, right? Can only the problematic generations for
> >>>> WX3200 and RX640 be excluded from ASPM?
> >>>
> >>> This patch only disables it for the generatioaon that was problematic.
> >>
> >> Could that please be made clear in the commit message summary, and
> >> message?
> >
> > Are you ok with the commit messages below?
>
> Please change the commit message summary. Maybe:
>
> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>
> > Active State Power Management (ASPM) feature is enabled since kernel 5.14.
> >
> > There are some AMD GFX cards (such as WX3200 and RX640) that won't work
> > with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
> > video/display output, Intel Alder Lake based systems will freeze after
> > suspend/resume.
>
> Something like:
>
> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
> when resuming from S0ix(?).
>
>
> > The issue was initially reported on one system (Dell Precision 3660 with
> > BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
> > Lake based systems.
>
> Which ones?
>
> > Add extra check to disable ASPM on Intel Alder Lake based systems with
> > problematic generation GFX cards.
>
> … with the problematic Volcanic Islands GFX cards.
>
> >>
> >> Loosely related, is there a public (or internal issue) to analyze how
> >> to get ASPM working for VI generation devices with Intel Alder Lake?
> >
> > As Alex mentioned, we need support from Intel. We don't have any update
> > on that.
>
> It’d be great to get that fixed properly.
>
> Last thing, please don’t hate me, does Linux log, that ASPM is disabled?

I'm not sure what gets logged at the platform level with respect to
ASPM, but whether or not the driver enables ASPM is tied to whether
ASPM is allowed at the platform level or not so if the platform
indicates that ASPM is not supported, the driver won't enable it.  The
driver does not log whether ASPM is enabled or not if that is what you
are asking.  As to whether or not it should, it comes down to how much
stuff is worth indiciating in the log.  The driver is already pretty
chatty by driver standards.

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 20:40           ` Alex Deucher
@ 2022-04-20 20:48             ` Paul Menzel
  2022-04-20 20:56               ` Gong, Richard
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Menzel @ 2022-04-20 20:48 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Dave Airlie, Richard Gong, Xinhui Pan, LKML, amd-gfx list,
	Maling list - DRI developers, Daniel Vetter, Alexander Deucher,
	Christian König, Mario Limonciello

Dear Alex,


Am 20.04.22 um 22:40 schrieb Alex Deucher:
> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:

>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>
>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>
>> […]
>>
>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>
>>>>>> Thank you for sending out v4.
>>>>>>
>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>> kernel 5.14.
>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
>>>>>>> work
>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
>>>>>>> cards as
>>>>>>> video/display output, Intel Alder Lake based systems will hang during
>>>>>>> suspend/resume.
>>
>> [Your email program wraps lines in cited text for some reason, making
>> the citation harder to read.]
>>
>>>>>>
>>>>>> I am still not clear, what “hang during suspend/resume” means. I guess
>>>>>> suspending works fine? During resume (S3 or S0ix?), where does it hang?
>>>>>> The system is functional, but there are only display problems?
>>> System freeze after suspend/resume.
>>
>> But you see certain messages still? At what point does it freeze
>> exactly? In the bug report you posted Linux messages.
>>
>>>>>>> The issue was initially reported on one system (Dell Precision 3660
>>>>>>> with
>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4
>>>>>>> Alder
>>>>>>> Lake based systems.
>>>>>>>
>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>
>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>> Link:
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=04%7C01%7Crichard.gong%40amd.com%7Ce7febed5d6a441c3a58008da1debb99c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637855195670542145%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=7cEnE%2BSM9e5IGFxSLloCLtCOxovBpaPz0Ns0Ta2vVlc%3D&amp;reserved=0
>>
>> Thank you Microsoft Outlook for keeping us safe. :(
>>
>>>>>>>
>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>
>>>>>> This tag is a little confusing. Maybe clarify that it was for an issue
>>>>>> in a previous patch iteration?
>>>
>>> I did describe in change-list version 3 below, which corrected the build
>>> error with W=1 option.
>>>
>>> It is not good idea to add the description for that to the commit
>>> message, this is why I add descriptions on change-list version 3.
>>
>> Do as you wish, but the current style is confusing, and readers of the
>> commit are going to think, the kernel test robot reported the problem
>> with AMD VI ASICs and Intel Alder Lake systems.
>>
>>>>>>
>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>> ---
>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>        enhanced check logic
>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>        correct build error with W=1 option
>>>>>>> v2: correct commit description
>>>>>>>        move the check from chip family to problematic platform
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>
>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>> +#include <asm/intel-family.h>
>>>>>>> +#endif
>>>>>>> +
>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>> 0x00000001L
>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>> 0x00000002L
>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>> amdgpu_device *adev)
>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>     }
>>>>>>>
>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>> +{
>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>> +
>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     return true;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>     {
>>>>>>>         u32 data, data1, orig;
>>>>>>>         bool bL1SS = false;
>>>>>>>         bool bClkReqSupport = true;
>>>>>>>
>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>> !aspm_support_quirk_check())
>>>>>>>                 return;
>>>>>>
>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>> `amdgpu.aspm`?
>>>>>>
>>> As Mario mentioned in a separate reply, we can't forcefully enable ASPM
>>> with the parameter 'amdgpu.aspm'.
>>
>> That would be a regression on systems where ASPM used to work. Hmm. I
>> guess, you could say, there are no such systems.
>>
>>>>>>>
>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>
>>>>>> If I remember correctly, there were also newer cards, where ASPM worked
>>>>>> with Intel Alder Lake, right? Can only the problematic generations for
>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>
>>>>> This patch only disables it for the generatioaon that was problematic.
>>>>
>>>> Could that please be made clear in the commit message summary, and
>>>> message?
>>>
>>> Are you ok with the commit messages below?
>>
>> Please change the commit message summary. Maybe:
>>
>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>
>>> Active State Power Management (ASPM) feature is enabled since kernel 5.14.
>>>
>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards as
>>> video/display output, Intel Alder Lake based systems will freeze after
>>> suspend/resume.
>>
>> Something like:
>>
>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>> when resuming from S0ix(?).
>>
>>
>>> The issue was initially reported on one system (Dell Precision 3660 with
>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 Alder
>>> Lake based systems.
>>
>> Which ones?
>>
>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>> problematic generation GFX cards.
>>
>> … with the problematic Volcanic Islands GFX cards.
>>
>>>>
>>>> Loosely related, is there a public (or internal issue) to analyze how
>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
>>>
>>> As Alex mentioned, we need support from Intel. We don't have any update
>>> on that.
>>
>> It’d be great to get that fixed properly.
>>
>> Last thing, please don’t hate me, does Linux log, that ASPM is disabled?
> 
> I'm not sure what gets logged at the platform level with respect to
> ASPM, but whether or not the driver enables ASPM is tied to whether
> ASPM is allowed at the platform level or not so if the platform
> indicates that ASPM is not supported, the driver won't enable it.  The
> driver does not log whether ASPM is enabled or not if that is what you
> are asking.  As to whether or not it should, it comes down to how much
> stuff is worth indiciating in the log.  The driver is already pretty
> chatty by driver standards.

I specifically mean, Linux should log the quirks it applies. (As a 
normal user, I’d also expect ASPM to work nowadays, so a message, that 
it’s disabled would help a lot.)


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 20:48             ` Paul Menzel
@ 2022-04-20 20:56               ` Gong, Richard
  2022-04-20 21:02                 ` Paul Menzel
  0 siblings, 1 reply; 25+ messages in thread
From: Gong, Richard @ 2022-04-20 20:56 UTC (permalink / raw)
  To: Paul Menzel, Alex Deucher
  Cc: Dave Airlie, Xinhui Pan, LKML, amd-gfx list,
	Maling list - DRI developers, Daniel Vetter, Alexander Deucher,
	Christian König, Mario Limonciello

Hi Paul,

On 4/20/2022 3:48 PM, Paul Menzel wrote:
> Dear Alex,
>
>
> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>> wrote:
>
>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>
>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>
>>> […]
>>>
>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>
>>>>>>> Thank you for sending out v4.
>>>>>>>
>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>> kernel 5.14.
>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
>>>>>>>> work
>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
>>>>>>>> cards as
>>>>>>>> video/display output, Intel Alder Lake based systems will hang 
>>>>>>>> during
>>>>>>>> suspend/resume.
>>>
>>> [Your email program wraps lines in cited text for some reason, making
>>> the citation harder to read.]
>>>
>>>>>>>
>>>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>>>> guess
>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>> it hang?
>>>>>>> The system is functional, but there are only display problems?
>>>> System freeze after suspend/resume.
>>>
>>> But you see certain messages still? At what point does it freeze
>>> exactly? In the bug report you posted Linux messages.
>>>
>>>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>>>> 3660
>>>>>>>> with
>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>> least 4
>>>>>>>> Alder
>>>>>>>> Lake based systems.
>>>>>>>>
>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>>
>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>> Link:
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C487aaa63098b462e146a08da230f2319%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860845178176835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3IVldn05qNa2XVp1Lu58SriS8k9mk4U9K9p3F3IYPe0%3D&amp;reserved=0 
>>>>>>>>
>>>
>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>
>>>>>>>>
>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>
>>>>>>> This tag is a little confusing. Maybe clarify that it was for an 
>>>>>>> issue
>>>>>>> in a previous patch iteration?
>>>>
>>>> I did describe in change-list version 3 below, which corrected the 
>>>> build
>>>> error with W=1 option.
>>>>
>>>> It is not good idea to add the description for that to the commit
>>>> message, this is why I add descriptions on change-list version 3.
>>>
>>> Do as you wish, but the current style is confusing, and readers of the
>>> commit are going to think, the kernel test robot reported the problem
>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>
>>>>>>>
>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>> ---
>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>        enhanced check logic
>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>        correct build error with W=1 option
>>>>>>>> v2: correct commit description
>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>
>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>> +#endif
>>>>>>>> +
>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>>>     #define 
>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>> 0x00000001L
>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>> 0x00000002L
>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>> amdgpu_device *adev)
>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>     }
>>>>>>>>
>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>> +{
>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>> +
>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>> +     }
>>>>>>>> +
>>>>>>>> +     return true;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>     {
>>>>>>>>         u32 data, data1, orig;
>>>>>>>>         bool bL1SS = false;
>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>
>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>                 return;
>>>>>>>
>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>> `amdgpu.aspm`?
>>>>>>>
>>>> As Mario mentioned in a separate reply, we can't forcefully enable 
>>>> ASPM
>>>> with the parameter 'amdgpu.aspm'.
>>>
>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>> guess, you could say, there are no such systems.
>>>
>>>>>>>>
>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>
>>>>>>> If I remember correctly, there were also newer cards, where ASPM 
>>>>>>> worked
>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>> generations for
>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>
>>>>>> This patch only disables it for the generatioaon that was 
>>>>>> problematic.
>>>>>
>>>>> Could that please be made clear in the commit message summary, and
>>>>> message?
>>>>
>>>> Are you ok with the commit messages below?
>>>
>>> Please change the commit message summary. Maybe:
>>>
>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>
>>>> Active State Power Management (ASPM) feature is enabled since 
>>>> kernel 5.14.
>>>>
>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't 
>>>> work
>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>> cards as
>>>> video/display output, Intel Alder Lake based systems will freeze after
>>>> suspend/resume.
>>>
>>> Something like:
>>>
>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>> when resuming from S0ix(?).
>>>
>>>
>>>> The issue was initially reported on one system (Dell Precision 3660 
>>>> with
>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>>>> Alder
>>>> Lake based systems.
>>>
>>> Which ones?
>>>
>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>>> problematic generation GFX cards.
>>>
>>> … with the problematic Volcanic Islands GFX cards.
>>>
>>>>>
>>>>> Loosely related, is there a public (or internal issue) to analyze how
>>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
>>>>
>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>> update
>>>> on that.
>>>
>>> It’d be great to get that fixed properly.
>>>
>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>> disabled?
>>
>> I'm not sure what gets logged at the platform level with respect to
>> ASPM, but whether or not the driver enables ASPM is tied to whether
>> ASPM is allowed at the platform level or not so if the platform
>> indicates that ASPM is not supported, the driver won't enable it.  The
>> driver does not log whether ASPM is enabled or not if that is what you
>> are asking.  As to whether or not it should, it comes down to how much
>> stuff is worth indiciating in the log.  The driver is already pretty
>> chatty by driver standards.
>
> I specifically mean, Linux should log the quirks it applies. (As a 
> normal user, I’d also expect ASPM to work nowadays, so a message, that 
> it’s disabled would help a lot.)

In general rule we shouldn't generate additional log unless something 
went wrong with the system.

>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 20:56               ` Gong, Richard
@ 2022-04-20 21:02                 ` Paul Menzel
  2022-04-20 21:12                   ` Gong, Richard
  2022-04-20 21:13                   ` Alex Deucher
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Menzel @ 2022-04-20 21:02 UTC (permalink / raw)
  To: Richard Gong
  Cc: Alex Deucher, Dave Airlie, Xinhui Pan, LKML, dri-devel, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello

Dear Richard,


Am 20.04.22 um 22:56 schrieb Gong, Richard:

> On 4/20/2022 3:48 PM, Paul Menzel wrote:

>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>>> wrote:
>>
>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>
>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>
>>>> […]
>>>>
>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>
>>>>>>>> Thank you for sending out v4.
>>>>>>>>
>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>>> kernel 5.14.
>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
>>>>>>>>> work
>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
>>>>>>>>> cards as
>>>>>>>>> video/display output, Intel Alder Lake based systems will hang 
>>>>>>>>> during
>>>>>>>>> suspend/resume.
>>>>
>>>> [Your email program wraps lines in cited text for some reason, making
>>>> the citation harder to read.]
>>>>
>>>>>>>>
>>>>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>>>>> guess
>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>> it hang?
>>>>>>>> The system is functional, but there are only display problems?
>>>>> System freeze after suspend/resume.
>>>>
>>>> But you see certain messages still? At what point does it freeze
>>>> exactly? In the bug report you posted Linux messages.
>>>>
>>>>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>>>>> 3660
>>>>>>>>> with
>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>>> least 4
>>>>>>>>> Alder
>>>>>>>>> Lake based systems.
>>>>>>>>>
>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>>>
>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>>> Link:
>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C487aaa63098b462e146a08da230f2319%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860845178176835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3IVldn05qNa2XVp1Lu58SriS8k9mk4U9K9p3F3IYPe0%3D&amp;reserved=0 
>>>>>>>>>
>>>>
>>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>>
>>>>>>>>>
>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>
>>>>>>>> This tag is a little confusing. Maybe clarify that it was for an 
>>>>>>>> issue
>>>>>>>> in a previous patch iteration?
>>>>>
>>>>> I did describe in change-list version 3 below, which corrected the 
>>>>> build
>>>>> error with W=1 option.
>>>>>
>>>>> It is not good idea to add the description for that to the commit
>>>>> message, this is why I add descriptions on change-list version 3.
>>>>
>>>> Do as you wish, but the current style is confusing, and readers of the
>>>> commit are going to think, the kernel test robot reported the problem
>>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>>
>>>>>>>>
>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>>> ---
>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>>        enhanced check logic
>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>>        correct build error with W=1 option
>>>>>>>>> v2: correct commit description
>>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>>
>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>>> +#endif
>>>>>>>>> +
>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>>>>     #define 
>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>>> 0x00000001L
>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>>> 0x00000002L
>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>>> +{
>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>>> +
>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>>> +     }
>>>>>>>>> +
>>>>>>>>> +     return true;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>>     {
>>>>>>>>>         u32 data, data1, orig;
>>>>>>>>>         bool bL1SS = false;
>>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>>
>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>>                 return;
>>>>>>>>
>>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>>> `amdgpu.aspm`?
>>>>>>>>
>>>>> As Mario mentioned in a separate reply, we can't forcefully enable 
>>>>> ASPM
>>>>> with the parameter 'amdgpu.aspm'.
>>>>
>>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>>> guess, you could say, there are no such systems.
>>>>
>>>>>>>>>
>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>>
>>>>>>>> If I remember correctly, there were also newer cards, where ASPM 
>>>>>>>> worked
>>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>>> generations for
>>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>>
>>>>>>> This patch only disables it for the generatioaon that was 
>>>>>>> problematic.
>>>>>>
>>>>>> Could that please be made clear in the commit message summary, and
>>>>>> message?
>>>>>
>>>>> Are you ok with the commit messages below?
>>>>
>>>> Please change the commit message summary. Maybe:
>>>>
>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>>
>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>> kernel 5.14.
>>>>>
>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't 
>>>>> work
>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>> cards as
>>>>> video/display output, Intel Alder Lake based systems will freeze after
>>>>> suspend/resume.
>>>>
>>>> Something like:
>>>>
>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>>> when resuming from S0ix(?).
>>>>
>>>>
>>>>> The issue was initially reported on one system (Dell Precision 3660 
>>>>> with
>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>>>>> Alder
>>>>> Lake based systems.
>>>>
>>>> Which ones?
>>>>
>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>>>> problematic generation GFX cards.
>>>>
>>>> … with the problematic Volcanic Islands GFX cards.
>>>>
>>>>>>
>>>>>> Loosely related, is there a public (or internal issue) to analyze how
>>>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
>>>>>
>>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>>> update
>>>>> on that.
>>>>
>>>> It’d be great to get that fixed properly.
>>>>
>>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>>> disabled?
>>>
>>> I'm not sure what gets logged at the platform level with respect to
>>> ASPM, but whether or not the driver enables ASPM is tied to whether
>>> ASPM is allowed at the platform level or not so if the platform
>>> indicates that ASPM is not supported, the driver won't enable it.  The
>>> driver does not log whether ASPM is enabled or not if that is what you
>>> are asking.  As to whether or not it should, it comes down to how much
>>> stuff is worth indiciating in the log.  The driver is already pretty
>>> chatty by driver standards.
>>
>> I specifically mean, Linux should log the quirks it applies. (As a 
>> normal user, I’d also expect ASPM to work nowadays, so a message, that 
>> it’s disabled would help a lot.)
> 
> In general rule we shouldn't generate additional log unless something 
> went wrong with the system.

Please run `dmesg` and see that your statement is false. That’s what log 
levels are for, and in your case, it would be at least error level. 
Also, I claim, something indeed went wrong, because a quirk had to be 
applied. So please add a notice log level, that ASPM gets disabled:

Disable ASPM on Alder Lake with Volcanic Islands card due to resume 
problems. System energy consumption might be higher than expected.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 21:02                 ` Paul Menzel
@ 2022-04-20 21:12                   ` Gong, Richard
  2022-04-20 21:15                     ` Alex Deucher
  2022-04-20 21:13                   ` Alex Deucher
  1 sibling, 1 reply; 25+ messages in thread
From: Gong, Richard @ 2022-04-20 21:12 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Alex Deucher, Dave Airlie, Xinhui Pan, LKML, dri-devel, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello


On 4/20/2022 4:02 PM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 20.04.22 um 22:56 schrieb Gong, Richard:
>
>> On 4/20/2022 3:48 PM, Paul Menzel wrote:
>
>>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>>>> wrote:
>>>
>>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>>
>>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>>
>>>>> […]
>>>>>
>>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>>
>>>>>>>>> Thank you for sending out v4.
>>>>>>>>>
>>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>>>> kernel 5.14.
>>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>>>>> won't
>>>>>>>>>> work
>>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these 
>>>>>>>>>> GFX
>>>>>>>>>> cards as
>>>>>>>>>> video/display output, Intel Alder Lake based systems will 
>>>>>>>>>> hang during
>>>>>>>>>> suspend/resume.
>>>>>
>>>>> [Your email program wraps lines in cited text for some reason, making
>>>>> the citation harder to read.]
>>>>>
>>>>>>>>>
>>>>>>>>> I am still not clear, what “hang during suspend/resume” means. 
>>>>>>>>> I guess
>>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>>> it hang?
>>>>>>>>> The system is functional, but there are only display problems?
>>>>>> System freeze after suspend/resume.
>>>>>
>>>>> But you see certain messages still? At what point does it freeze
>>>>> exactly? In the bug report you posted Linux messages.
>>>>>
>>>>>>>>>> The issue was initially reported on one system (Dell 
>>>>>>>>>> Precision 3660
>>>>>>>>>> with
>>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>>>> least 4
>>>>>>>>>> Alder
>>>>>>>>>> Lake based systems.
>>>>>>>>>>
>>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based 
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>>>> Link:
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C509e0378edcf477605a708da231114f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860853537880384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SoXDKGHUiiQN4rcL7FpCotouWFt0kkAbcHyO3esfNlE%3D&amp;reserved=0 
>>>>>>>>>>
>>>>>
>>>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>>>
>>>>>>>>>>
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>
>>>>>>>>> This tag is a little confusing. Maybe clarify that it was for 
>>>>>>>>> an issue
>>>>>>>>> in a previous patch iteration?
>>>>>>
>>>>>> I did describe in change-list version 3 below, which corrected 
>>>>>> the build
>>>>>> error with W=1 option.
>>>>>>
>>>>>> It is not good idea to add the description for that to the commit
>>>>>> message, this is why I add descriptions on change-list version 3.
>>>>>
>>>>> Do as you wish, but the current style is confusing, and readers of 
>>>>> the
>>>>> commit are going to think, the kernel test robot reported the problem
>>>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>>>> ---
>>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>>>        enhanced check logic
>>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>>>        correct build error with W=1 option
>>>>>>>>>> v2: correct commit description
>>>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>>>
>>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>>>>>>>>>>     #define 
>>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>>>> 0x00000001L
>>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>>>> 0x00000002L
>>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>>>> +{
>>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>>>> +
>>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>> +     return true;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>>>     {
>>>>>>>>>>         u32 data, data1, orig;
>>>>>>>>>>         bool bL1SS = false;
>>>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>>>
>>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>>>                 return;
>>>>>>>>>
>>>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>>>> `amdgpu.aspm`?
>>>>>>>>>
>>>>>> As Mario mentioned in a separate reply, we can't forcefully 
>>>>>> enable ASPM
>>>>>> with the parameter 'amdgpu.aspm'.
>>>>>
>>>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>>>> guess, you could say, there are no such systems.
>>>>>
>>>>>>>>>>
>>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>>>
>>>>>>>>> If I remember correctly, there were also newer cards, where 
>>>>>>>>> ASPM worked
>>>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>>>> generations for
>>>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>>>
>>>>>>>> This patch only disables it for the generatioaon that was 
>>>>>>>> problematic.
>>>>>>>
>>>>>>> Could that please be made clear in the commit message summary, and
>>>>>>> message?
>>>>>>
>>>>>> Are you ok with the commit messages below?
>>>>>
>>>>> Please change the commit message summary. Maybe:
>>>>>
>>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>>>
>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>> kernel 5.14.
>>>>>>
>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>> won't work
>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>> cards as
>>>>>> video/display output, Intel Alder Lake based systems will freeze 
>>>>>> after
>>>>>> suspend/resume.
>>>>>
>>>>> Something like:
>>>>>
>>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>>>> when resuming from S0ix(?).
>>>>>
>>>>>
>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>> 3660 with
>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>> 4 Alder
>>>>>> Lake based systems.
>>>>>
>>>>> Which ones?
>>>>>
>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems 
>>>>>> with
>>>>>> problematic generation GFX cards.
>>>>>
>>>>> … with the problematic Volcanic Islands GFX cards.
>>>>>
>>>>>>>
>>>>>>> Loosely related, is there a public (or internal issue) to 
>>>>>>> analyze how
>>>>>>> to get ASPM working for VI generation devices with Intel Alder 
>>>>>>> Lake?
>>>>>>
>>>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>>>> update
>>>>>> on that.
>>>>>
>>>>> It’d be great to get that fixed properly.
>>>>>
>>>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>>>> disabled?
>>>>
>>>> I'm not sure what gets logged at the platform level with respect to
>>>> ASPM, but whether or not the driver enables ASPM is tied to whether
>>>> ASPM is allowed at the platform level or not so if the platform
>>>> indicates that ASPM is not supported, the driver won't enable it.  The
>>>> driver does not log whether ASPM is enabled or not if that is what you
>>>> are asking.  As to whether or not it should, it comes down to how much
>>>> stuff is worth indiciating in the log.  The driver is already pretty
>>>> chatty by driver standards.
>>>
>>> I specifically mean, Linux should log the quirks it applies. (As a 
>>> normal user, I’d also expect ASPM to work nowadays, so a message, 
>>> that it’s disabled would help a lot.)
>>
>> In general rule we shouldn't generate additional log unless something 
>> went wrong with the system.
>
> Please run `dmesg` and see that your statement is false. That’s what 
> log levels are for, and in your case, it would be at least error 
> level. Also, I claim, something indeed went wrong, because a quirk had 
> to be applied. So please add a notice log level, that ASPM gets disabled:

 From my previous experience with upstream, the maintainers simply don't 
like adding logs unless absolutely need.

I can add a pr_warn or dev_warn, but I can't guarantee that maintainers 
will take that in my case.

>
> Disable ASPM on Alder Lake with Volcanic Islands card due to resume 
> problems. System energy consumption might be higher than expected.
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 21:02                 ` Paul Menzel
  2022-04-20 21:12                   ` Gong, Richard
@ 2022-04-20 21:13                   ` Alex Deucher
  2022-04-20 21:16                     ` Limonciello, Mario
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Deucher @ 2022-04-20 21:13 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Richard Gong, Dave Airlie, Xinhui Pan, LKML,
	Maling list - DRI developers, amd-gfx list, Daniel Vetter,
	Alexander Deucher, Christian König, Mario Limonciello

On Wed, Apr 20, 2022 at 5:02 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Richard,
>
>
> Am 20.04.22 um 22:56 schrieb Gong, Richard:
>
> > On 4/20/2022 3:48 PM, Paul Menzel wrote:
>
> >> Am 20.04.22 um 22:40 schrieb Alex Deucher:
> >>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de>
> >>> wrote:
> >>
> >>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
> >>>>
> >>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
> >>>>>> [Cc: -kernel test robot <lkp@intel.com>]
> >>>>
> >>>> […]
> >>>>
> >>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
> >>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
> >>>>>>
> >>>>>>>> Thank you for sending out v4.
> >>>>>>>>
> >>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
> >>>>>>>>> Active State Power Management (ASPM) feature is enabled since
> >>>>>>>>> kernel 5.14.
> >>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
> >>>>>>>>> work
> >>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> >>>>>>>>> cards as
> >>>>>>>>> video/display output, Intel Alder Lake based systems will hang
> >>>>>>>>> during
> >>>>>>>>> suspend/resume.
> >>>>
> >>>> [Your email program wraps lines in cited text for some reason, making
> >>>> the citation harder to read.]
> >>>>
> >>>>>>>>
> >>>>>>>> I am still not clear, what “hang during suspend/resume” means. I
> >>>>>>>> guess
> >>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does
> >>>>>>>> it hang?
> >>>>>>>> The system is functional, but there are only display problems?
> >>>>> System freeze after suspend/resume.
> >>>>
> >>>> But you see certain messages still? At what point does it freeze
> >>>> exactly? In the bug report you posted Linux messages.
> >>>>
> >>>>>>>>> The issue was initially reported on one system (Dell Precision
> >>>>>>>>> 3660
> >>>>>>>>> with
> >>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at
> >>>>>>>>> least 4
> >>>>>>>>> Alder
> >>>>>>>>> Lake based systems.
> >>>>>>>>>
> >>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
> >>>>>>>>>
> >>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> >>>>>>>>> Link:
> >>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C487aaa63098b462e146a08da230f2319%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860845178176835%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=3IVldn05qNa2XVp1Lu58SriS8k9mk4U9K9p3F3IYPe0%3D&amp;reserved=0
> >>>>>>>>>
> >>>>
> >>>> Thank you Microsoft Outlook for keeping us safe. :(
> >>>>
> >>>>>>>>>
> >>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>>>>>>
> >>>>>>>> This tag is a little confusing. Maybe clarify that it was for an
> >>>>>>>> issue
> >>>>>>>> in a previous patch iteration?
> >>>>>
> >>>>> I did describe in change-list version 3 below, which corrected the
> >>>>> build
> >>>>> error with W=1 option.
> >>>>>
> >>>>> It is not good idea to add the description for that to the commit
> >>>>> message, this is why I add descriptions on change-list version 3.
> >>>>
> >>>> Do as you wish, but the current style is confusing, and readers of the
> >>>> commit are going to think, the kernel test robot reported the problem
> >>>> with AMD VI ASICs and Intel Alder Lake systems.
> >>>>
> >>>>>>>>
> >>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
> >>>>>>>>> ---
> >>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
> >>>>>>>>>        enhanced check logic
> >>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
> >>>>>>>>>        correct build error with W=1 option
> >>>>>>>>> v2: correct commit description
> >>>>>>>>>        move the check from chip family to problematic platform
> >>>>>>>>> ---
> >>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> >>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
> >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>> @@ -81,6 +81,10 @@
> >>>>>>>>>     #include "mxgpu_vi.h"
> >>>>>>>>>     #include "amdgpu_dm.h"
> >>>>>>>>>
> >>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
> >>>>>>>>> +#include <asm/intel-family.h>
> >>>>>>>>> +#endif
> >>>>>>>>> +
> >>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> >>>>>>>>>     #define
> >>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
> >>>>>>>>> 0x00000001L
> >>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
> >>>>>>>>> 0x00000002L
> >>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
> >>>>>>>>> amdgpu_device *adev)
> >>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
> >>>>>>>>>     }
> >>>>>>>>>
> >>>>>>>>> +static bool aspm_support_quirk_check(void)
> >>>>>>>>> +{
> >>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
> >>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
> >>>>>>>>> +
> >>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
> >>>>>>>>> INTEL_FAM6_ALDERLAKE);
> >>>>>>>>> +     }
> >>>>>>>>> +
> >>>>>>>>> +     return true;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
> >>>>>>>>>     {
> >>>>>>>>>         u32 data, data1, orig;
> >>>>>>>>>         bool bL1SS = false;
> >>>>>>>>>         bool bClkReqSupport = true;
> >>>>>>>>>
> >>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
> >>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
> >>>>>>>>> !aspm_support_quirk_check())
> >>>>>>>>>                 return;
> >>>>>>>>
> >>>>>>>> Can users still forcefully enable ASPM with the parameter
> >>>>>>>> `amdgpu.aspm`?
> >>>>>>>>
> >>>>> As Mario mentioned in a separate reply, we can't forcefully enable
> >>>>> ASPM
> >>>>> with the parameter 'amdgpu.aspm'.
> >>>>
> >>>> That would be a regression on systems where ASPM used to work. Hmm. I
> >>>> guess, you could say, there are no such systems.
> >>>>
> >>>>>>>>>
> >>>>>>>>>         if (adev->flags & AMD_IS_APU ||
> >>>>>>>>
> >>>>>>>> If I remember correctly, there were also newer cards, where ASPM
> >>>>>>>> worked
> >>>>>>>> with Intel Alder Lake, right? Can only the problematic
> >>>>>>>> generations for
> >>>>>>>> WX3200 and RX640 be excluded from ASPM?
> >>>>>>>
> >>>>>>> This patch only disables it for the generatioaon that was
> >>>>>>> problematic.
> >>>>>>
> >>>>>> Could that please be made clear in the commit message summary, and
> >>>>>> message?
> >>>>>
> >>>>> Are you ok with the commit messages below?
> >>>>
> >>>> Please change the commit message summary. Maybe:
> >>>>
> >>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
> >>>>
> >>>>> Active State Power Management (ASPM) feature is enabled since
> >>>>> kernel 5.14.
> >>>>>
> >>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
> >>>>> work
> >>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> >>>>> cards as
> >>>>> video/display output, Intel Alder Lake based systems will freeze after
> >>>>> suspend/resume.
> >>>>
> >>>> Something like:
> >>>>
> >>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
> >>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
> >>>> when resuming from S0ix(?).
> >>>>
> >>>>
> >>>>> The issue was initially reported on one system (Dell Precision 3660
> >>>>> with
> >>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4
> >>>>> Alder
> >>>>> Lake based systems.
> >>>>
> >>>> Which ones?
> >>>>
> >>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
> >>>>> problematic generation GFX cards.
> >>>>
> >>>> … with the problematic Volcanic Islands GFX cards.
> >>>>
> >>>>>>
> >>>>>> Loosely related, is there a public (or internal issue) to analyze how
> >>>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
> >>>>>
> >>>>> As Alex mentioned, we need support from Intel. We don't have any
> >>>>> update
> >>>>> on that.
> >>>>
> >>>> It’d be great to get that fixed properly.
> >>>>
> >>>> Last thing, please don’t hate me, does Linux log, that ASPM is
> >>>> disabled?
> >>>
> >>> I'm not sure what gets logged at the platform level with respect to
> >>> ASPM, but whether or not the driver enables ASPM is tied to whether
> >>> ASPM is allowed at the platform level or not so if the platform
> >>> indicates that ASPM is not supported, the driver won't enable it.  The
> >>> driver does not log whether ASPM is enabled or not if that is what you
> >>> are asking.  As to whether or not it should, it comes down to how much
> >>> stuff is worth indiciating in the log.  The driver is already pretty
> >>> chatty by driver standards.
> >>
> >> I specifically mean, Linux should log the quirks it applies. (As a
> >> normal user, I’d also expect ASPM to work nowadays, so a message, that
> >> it’s disabled would help a lot.)
> >
> > In general rule we shouldn't generate additional log unless something
> > went wrong with the system.
>
> Please run `dmesg` and see that your statement is false. That’s what log
> levels are for, and in your case, it would be at least error level.
> Also, I claim, something indeed went wrong, because a quirk had to be
> applied. So please add a notice log level, that ASPM gets disabled:
>
> Disable ASPM on Alder Lake with Volcanic Islands card due to resume
> problems. System energy consumption might be higher than expected.

ASPM does not save that much power.  I doubt you could really measure
it effectively without dedicated equipment.  Adding too many of these
types of messages just leads to lots of useless bug reports.  Users
see the message and file bugs.

Alex

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 21:12                   ` Gong, Richard
@ 2022-04-20 21:15                     ` Alex Deucher
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Deucher @ 2022-04-20 21:15 UTC (permalink / raw)
  To: Gong, Richard
  Cc: Paul Menzel, Dave Airlie, Xinhui Pan, LKML,
	Maling list - DRI developers, amd-gfx list, Daniel Vetter,
	Alexander Deucher, Christian König, Mario Limonciello

On Wed, Apr 20, 2022 at 5:13 PM Gong, Richard <richard.gong@amd.com> wrote:
>
>
> On 4/20/2022 4:02 PM, Paul Menzel wrote:
> > Dear Richard,
> >
> >
> > Am 20.04.22 um 22:56 schrieb Gong, Richard:
> >
> >> On 4/20/2022 3:48 PM, Paul Menzel wrote:
> >
> >>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
> >>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de>
> >>>> wrote:
> >>>
> >>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
> >>>>>
> >>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
> >>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
> >>>>>
> >>>>> […]
> >>>>>
> >>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
> >>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
> >>>>>>>
> >>>>>>>>> Thank you for sending out v4.
> >>>>>>>>>
> >>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
> >>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
> >>>>>>>>>> kernel 5.14.
> >>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that
> >>>>>>>>>> won't
> >>>>>>>>>> work
> >>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these
> >>>>>>>>>> GFX
> >>>>>>>>>> cards as
> >>>>>>>>>> video/display output, Intel Alder Lake based systems will
> >>>>>>>>>> hang during
> >>>>>>>>>> suspend/resume.
> >>>>>
> >>>>> [Your email program wraps lines in cited text for some reason, making
> >>>>> the citation harder to read.]
> >>>>>
> >>>>>>>>>
> >>>>>>>>> I am still not clear, what “hang during suspend/resume” means.
> >>>>>>>>> I guess
> >>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does
> >>>>>>>>> it hang?
> >>>>>>>>> The system is functional, but there are only display problems?
> >>>>>> System freeze after suspend/resume.
> >>>>>
> >>>>> But you see certain messages still? At what point does it freeze
> >>>>> exactly? In the bug report you posted Linux messages.
> >>>>>
> >>>>>>>>>> The issue was initially reported on one system (Dell
> >>>>>>>>>> Precision 3660
> >>>>>>>>>> with
> >>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at
> >>>>>>>>>> least 4
> >>>>>>>>>> Alder
> >>>>>>>>>> Lake based systems.
> >>>>>>>>>>
> >>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based
> >>>>>>>>>> systems.
> >>>>>>>>>>
> >>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> >>>>>>>>>> Link:
> >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C509e0378edcf477605a708da231114f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860853537880384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SoXDKGHUiiQN4rcL7FpCotouWFt0kkAbcHyO3esfNlE%3D&amp;reserved=0
> >>>>>>>>>>
> >>>>>
> >>>>> Thank you Microsoft Outlook for keeping us safe. :(
> >>>>>
> >>>>>>>>>>
> >>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
> >>>>>>>>>
> >>>>>>>>> This tag is a little confusing. Maybe clarify that it was for
> >>>>>>>>> an issue
> >>>>>>>>> in a previous patch iteration?
> >>>>>>
> >>>>>> I did describe in change-list version 3 below, which corrected
> >>>>>> the build
> >>>>>> error with W=1 option.
> >>>>>>
> >>>>>> It is not good idea to add the description for that to the commit
> >>>>>> message, this is why I add descriptions on change-list version 3.
> >>>>>
> >>>>> Do as you wish, but the current style is confusing, and readers of
> >>>>> the
> >>>>> commit are going to think, the kernel test robot reported the problem
> >>>>> with AMD VI ASICs and Intel Alder Lake systems.
> >>>>>
> >>>>>>>>>
> >>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
> >>>>>>>>>> ---
> >>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
> >>>>>>>>>>        enhanced check logic
> >>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
> >>>>>>>>>>        correct build error with W=1 option
> >>>>>>>>>> v2: correct commit description
> >>>>>>>>>>        move the check from chip family to problematic platform
> >>>>>>>>>> ---
> >>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> >>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
> >>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>>>>>>>>> @@ -81,6 +81,10 @@
> >>>>>>>>>>     #include "mxgpu_vi.h"
> >>>>>>>>>>     #include "amdgpu_dm.h"
> >>>>>>>>>>
> >>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
> >>>>>>>>>> +#include <asm/intel-family.h>
> >>>>>>>>>> +#endif
> >>>>>>>>>> +
> >>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
> >>>>>>>>>>     #define
> >>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
> >>>>>>>>>> 0x00000001L
> >>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
> >>>>>>>>>> 0x00000002L
> >>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
> >>>>>>>>>> amdgpu_device *adev)
> >>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
> >>>>>>>>>>     }
> >>>>>>>>>>
> >>>>>>>>>> +static bool aspm_support_quirk_check(void)
> >>>>>>>>>> +{
> >>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
> >>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
> >>>>>>>>>> +
> >>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
> >>>>>>>>>> INTEL_FAM6_ALDERLAKE);
> >>>>>>>>>> +     }
> >>>>>>>>>> +
> >>>>>>>>>> +     return true;
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
> >>>>>>>>>>     {
> >>>>>>>>>>         u32 data, data1, orig;
> >>>>>>>>>>         bool bL1SS = false;
> >>>>>>>>>>         bool bClkReqSupport = true;
> >>>>>>>>>>
> >>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
> >>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
> >>>>>>>>>> !aspm_support_quirk_check())
> >>>>>>>>>>                 return;
> >>>>>>>>>
> >>>>>>>>> Can users still forcefully enable ASPM with the parameter
> >>>>>>>>> `amdgpu.aspm`?
> >>>>>>>>>
> >>>>>> As Mario mentioned in a separate reply, we can't forcefully
> >>>>>> enable ASPM
> >>>>>> with the parameter 'amdgpu.aspm'.
> >>>>>
> >>>>> That would be a regression on systems where ASPM used to work. Hmm. I
> >>>>> guess, you could say, there are no such systems.
> >>>>>
> >>>>>>>>>>
> >>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
> >>>>>>>>>
> >>>>>>>>> If I remember correctly, there were also newer cards, where
> >>>>>>>>> ASPM worked
> >>>>>>>>> with Intel Alder Lake, right? Can only the problematic
> >>>>>>>>> generations for
> >>>>>>>>> WX3200 and RX640 be excluded from ASPM?
> >>>>>>>>
> >>>>>>>> This patch only disables it for the generatioaon that was
> >>>>>>>> problematic.
> >>>>>>>
> >>>>>>> Could that please be made clear in the commit message summary, and
> >>>>>>> message?
> >>>>>>
> >>>>>> Are you ok with the commit messages below?
> >>>>>
> >>>>> Please change the commit message summary. Maybe:
> >>>>>
> >>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
> >>>>>
> >>>>>> Active State Power Management (ASPM) feature is enabled since
> >>>>>> kernel 5.14.
> >>>>>>
> >>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that
> >>>>>> won't work
> >>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> >>>>>> cards as
> >>>>>> video/display output, Intel Alder Lake based systems will freeze
> >>>>>> after
> >>>>>> suspend/resume.
> >>>>>
> >>>>> Something like:
> >>>>>
> >>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
> >>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
> >>>>> when resuming from S0ix(?).
> >>>>>
> >>>>>
> >>>>>> The issue was initially reported on one system (Dell Precision
> >>>>>> 3660 with
> >>>>>> BIOS version 0.14.81), but was later confirmed to affect at least
> >>>>>> 4 Alder
> >>>>>> Lake based systems.
> >>>>>
> >>>>> Which ones?
> >>>>>
> >>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems
> >>>>>> with
> >>>>>> problematic generation GFX cards.
> >>>>>
> >>>>> … with the problematic Volcanic Islands GFX cards.
> >>>>>
> >>>>>>>
> >>>>>>> Loosely related, is there a public (or internal issue) to
> >>>>>>> analyze how
> >>>>>>> to get ASPM working for VI generation devices with Intel Alder
> >>>>>>> Lake?
> >>>>>>
> >>>>>> As Alex mentioned, we need support from Intel. We don't have any
> >>>>>> update
> >>>>>> on that.
> >>>>>
> >>>>> It’d be great to get that fixed properly.
> >>>>>
> >>>>> Last thing, please don’t hate me, does Linux log, that ASPM is
> >>>>> disabled?
> >>>>
> >>>> I'm not sure what gets logged at the platform level with respect to
> >>>> ASPM, but whether or not the driver enables ASPM is tied to whether
> >>>> ASPM is allowed at the platform level or not so if the platform
> >>>> indicates that ASPM is not supported, the driver won't enable it.  The
> >>>> driver does not log whether ASPM is enabled or not if that is what you
> >>>> are asking.  As to whether or not it should, it comes down to how much
> >>>> stuff is worth indiciating in the log.  The driver is already pretty
> >>>> chatty by driver standards.
> >>>
> >>> I specifically mean, Linux should log the quirks it applies. (As a
> >>> normal user, I’d also expect ASPM to work nowadays, so a message,
> >>> that it’s disabled would help a lot.)
> >>
> >> In general rule we shouldn't generate additional log unless something
> >> went wrong with the system.
> >
> > Please run `dmesg` and see that your statement is false. That’s what
> > log levels are for, and in your case, it would be at least error
> > level. Also, I claim, something indeed went wrong, because a quirk had
> > to be applied. So please add a notice log level, that ASPM gets disabled:
>
>  From my previous experience with upstream, the maintainers simply don't
> like adding logs unless absolutely need.
>
> I can add a pr_warn or dev_warn, but I can't guarantee that maintainers
> will take that in my case.

Certainly don't make it a warning.

Alex

>
> >
> > Disable ASPM on Alder Lake with Volcanic Islands card due to resume
> > problems. System energy consumption might be higher than expected.
> >
> >
> > Kind regards,
> >
> > Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 21:13                   ` Alex Deucher
@ 2022-04-20 21:16                     ` Limonciello, Mario
  0 siblings, 0 replies; 25+ messages in thread
From: Limonciello, Mario @ 2022-04-20 21:16 UTC (permalink / raw)
  To: Alex Deucher, Paul Menzel
  Cc: Gong, Richard, Dave Airlie, Pan, Xinhui, LKML,
	Maling list - DRI developers, amd-gfx list, Daniel Vetter,
	Deucher, Alexander, Koenig, Christian

[Public]



> -----Original Message-----
> From: Alex Deucher <alexdeucher@gmail.com>
> Sent: Wednesday, April 20, 2022 16:14
> To: Paul Menzel <pmenzel@molgen.mpg.de>
> Cc: Gong, Richard <Richard.Gong@amd.com>; Dave Airlie <airlied@linux.ie>;
> Pan, Xinhui <Xinhui.Pan@amd.com>; LKML <linux-kernel@vger.kernel.org>;
> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; amd-gfx list
> <amd-gfx@lists.freedesktop.org>; Daniel Vetter <daniel@ffwll.ch>; Deucher,
> Alexander <Alexander.Deucher@amd.com>; Koenig, Christian
> <Christian.Koenig@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>
> Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based
> systems
> 
> On Wed, Apr 20, 2022 at 5:02 PM Paul Menzel <pmenzel@molgen.mpg.de>
> wrote:
> >
> > Dear Richard,
> >
> >
> > Am 20.04.22 um 22:56 schrieb Gong, Richard:
> >
> > > On 4/20/2022 3:48 PM, Paul Menzel wrote:
> >
> > >> Am 20.04.22 um 22:40 schrieb Alex Deucher:
> > >>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel
> <pmenzel@molgen.mpg.de>
> > >>> wrote:
> > >>
> > >>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
> > >>>>
> > >>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
> > >>>>>> [Cc: -kernel test robot <lkp@intel.com>]
> > >>>>
> > >>>> [...]
> > >>>>
> > >>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
> > >>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
> > >>>>>>
> > >>>>>>>> Thank you for sending out v4.
> > >>>>>>>>
> > >>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
> > >>>>>>>>> Active State Power Management (ASPM) feature is enabled since
> > >>>>>>>>> kernel 5.14.
> > >>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that
> won't
> > >>>>>>>>> work
> > >>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these
> GFX
> > >>>>>>>>> cards as
> > >>>>>>>>> video/display output, Intel Alder Lake based systems will hang
> > >>>>>>>>> during
> > >>>>>>>>> suspend/resume.
> > >>>>
> > >>>> [Your email program wraps lines in cited text for some reason, making
> > >>>> the citation harder to read.]
> > >>>>
> > >>>>>>>>
> > >>>>>>>> I am still not clear, what "hang during suspend/resume" means. I
> > >>>>>>>> guess
> > >>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does
> > >>>>>>>> it hang?
> > >>>>>>>> The system is functional, but there are only display problems?
> > >>>>> System freeze after suspend/resume.
> > >>>>
> > >>>> But you see certain messages still? At what point does it freeze
> > >>>> exactly? In the bug report you posted Linux messages.
> > >>>>
> > >>>>>>>>> The issue was initially reported on one system (Dell Precision
> > >>>>>>>>> 3660
> > >>>>>>>>> with
> > >>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at
> > >>>>>>>>> least 4
> > >>>>>>>>> Alder
> > >>>>>>>>> Lake based systems.
> > >>>>>>>>>
> > >>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based
> systems.
> > >>>>>>>>>
> > >>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > >>>>>>>>> Link:
> > >>>>>>>>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.fr
> eedesktop.org%2Fdrm%2Famd%2F-
> %2Fissues%2F1885&amp;data=05%7C01%7Cmario.limonciello%40amd.com%7
> Ce74863210c324bc6fda608da2312b506%7C3dd8961fe4884e608e11a82d994e1
> 83d%7C0%7C0%7C637860860514174025%7CUnknown%7CTWFpbGZsb3d8eyJ
> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 3000%7C%7C%7C&amp;sdata=NUGXlybuH3volccVuN%2BGQ0kXwsOfCqM%2F
> wqHL6%2F%2FGYUc%3D&amp;reserved=0
> > >>>>>>>>>
> > >>>>
> > >>>> Thank you Microsoft Outlook for keeping us safe. :(
> > >>>>
> > >>>>>>>>>
> > >>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
> > >>>>>>>>
> > >>>>>>>> This tag is a little confusing. Maybe clarify that it was for an
> > >>>>>>>> issue
> > >>>>>>>> in a previous patch iteration?
> > >>>>>
> > >>>>> I did describe in change-list version 3 below, which corrected the
> > >>>>> build
> > >>>>> error with W=1 option.
> > >>>>>
> > >>>>> It is not good idea to add the description for that to the commit
> > >>>>> message, this is why I add descriptions on change-list version 3.
> > >>>>
> > >>>> Do as you wish, but the current style is confusing, and readers of the
> > >>>> commit are going to think, the kernel test robot reported the problem
> > >>>> with AMD VI ASICs and Intel Alder Lake systems.
> > >>>>
> > >>>>>>>>
> > >>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
> > >>>>>>>>> ---
> > >>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
> > >>>>>>>>>        enhanced check logic
> > >>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
> > >>>>>>>>>        correct build error with W=1 option
> > >>>>>>>>> v2: correct commit description
> > >>>>>>>>>        move the check from chip family to problematic platform
> > >>>>>>>>> ---
> > >>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
> > >>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> > >>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
> > >>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
> > >>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > >>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > >>>>>>>>> @@ -81,6 +81,10 @@
> > >>>>>>>>>     #include "mxgpu_vi.h"
> > >>>>>>>>>     #include "amdgpu_dm.h"
> > >>>>>>>>>
> > >>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
> > >>>>>>>>> +#include <asm/intel-family.h>
> > >>>>>>>>> +#endif
> > >>>>>>>>> +
> > >>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
> > >>>>>>>>>     #define
> > >>>>>>>>>
> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
> > >>>>>>>>> 0x00000001L
> > >>>>>>>>>     #define
> PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
> > >>>>>>>>> 0x00000002L
> > >>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
> > >>>>>>>>> amdgpu_device *adev)
> > >>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
> > >>>>>>>>>     }
> > >>>>>>>>>
> > >>>>>>>>> +static bool aspm_support_quirk_check(void)
> > >>>>>>>>> +{
> > >>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
> > >>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
> > >>>>>>>>> +
> > >>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
> > >>>>>>>>> INTEL_FAM6_ALDERLAKE);
> > >>>>>>>>> +     }
> > >>>>>>>>> +
> > >>>>>>>>> +     return true;
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
> > >>>>>>>>>     {
> > >>>>>>>>>         u32 data, data1, orig;
> > >>>>>>>>>         bool bL1SS = false;
> > >>>>>>>>>         bool bClkReqSupport = true;
> > >>>>>>>>>
> > >>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
> > >>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
> > >>>>>>>>> !aspm_support_quirk_check())
> > >>>>>>>>>                 return;
> > >>>>>>>>
> > >>>>>>>> Can users still forcefully enable ASPM with the parameter
> > >>>>>>>> `amdgpu.aspm`?
> > >>>>>>>>
> > >>>>> As Mario mentioned in a separate reply, we can't forcefully enable
> > >>>>> ASPM
> > >>>>> with the parameter 'amdgpu.aspm'.
> > >>>>
> > >>>> That would be a regression on systems where ASPM used to work. Hmm.
> I
> > >>>> guess, you could say, there are no such systems.
> > >>>>
> > >>>>>>>>>
> > >>>>>>>>>         if (adev->flags & AMD_IS_APU ||
> > >>>>>>>>
> > >>>>>>>> If I remember correctly, there were also newer cards, where ASPM
> > >>>>>>>> worked
> > >>>>>>>> with Intel Alder Lake, right? Can only the problematic
> > >>>>>>>> generations for
> > >>>>>>>> WX3200 and RX640 be excluded from ASPM?
> > >>>>>>>
> > >>>>>>> This patch only disables it for the generatioaon that was
> > >>>>>>> problematic.
> > >>>>>>
> > >>>>>> Could that please be made clear in the commit message summary, and
> > >>>>>> message?
> > >>>>>
> > >>>>> Are you ok with the commit messages below?
> > >>>>
> > >>>> Please change the commit message summary. Maybe:
> > >>>>
> > >>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
> > >>>>
> > >>>>> Active State Power Management (ASPM) feature is enabled since
> > >>>>> kernel 5.14.
> > >>>>>
> > >>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't
> > >>>>> work
> > >>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX
> > >>>>> cards as
> > >>>>> video/display output, Intel Alder Lake based systems will freeze after
> > >>>>> suspend/resume.
> > >>>>
> > >>>> Something like:
> > >>>>
> > >>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
> > >>>> Islands (VI) cards, like WX3200 and RX640, graphics don't initialize
> > >>>> when resuming from S0ix(?).
> > >>>>
> > >>>>
> > >>>>> The issue was initially reported on one system (Dell Precision 3660
> > >>>>> with
> > >>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4
> > >>>>> Alder
> > >>>>> Lake based systems.
> > >>>>
> > >>>> Which ones?
> > >>>>
> > >>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
> > >>>>> problematic generation GFX cards.
> > >>>>
> > >>>> ... with the problematic Volcanic Islands GFX cards.
> > >>>>
> > >>>>>>
> > >>>>>> Loosely related, is there a public (or internal issue) to analyze how
> > >>>>>> to get ASPM working for VI generation devices with Intel Alder Lake?
> > >>>>>
> > >>>>> As Alex mentioned, we need support from Intel. We don't have any
> > >>>>> update
> > >>>>> on that.
> > >>>>
> > >>>> It'd be great to get that fixed properly.
> > >>>>
> > >>>> Last thing, please don't hate me, does Linux log, that ASPM is
> > >>>> disabled?
> > >>>
> > >>> I'm not sure what gets logged at the platform level with respect to
> > >>> ASPM, but whether or not the driver enables ASPM is tied to whether
> > >>> ASPM is allowed at the platform level or not so if the platform
> > >>> indicates that ASPM is not supported, the driver won't enable it.  The
> > >>> driver does not log whether ASPM is enabled or not if that is what you
> > >>> are asking.  As to whether or not it should, it comes down to how much
> > >>> stuff is worth indiciating in the log.  The driver is already pretty
> > >>> chatty by driver standards.
> > >>
> > >> I specifically mean, Linux should log the quirks it applies. (As a
> > >> normal user, I'd also expect ASPM to work nowadays, so a message, that
> > >> it's disabled would help a lot.)
> > >
> > > In general rule we shouldn't generate additional log unless something
> > > went wrong with the system.
> >
> > Please run `dmesg` and see that your statement is false. That's what log
> > levels are for, and in your case, it would be at least error level.
> > Also, I claim, something indeed went wrong, because a quirk had to be
> > applied. So please add a notice log level, that ASPM gets disabled:
> >
> > Disable ASPM on Alder Lake with Volcanic Islands card due to resume
> > problems. System energy consumption might be higher than expected.
> 
> ASPM does not save that much power.  I doubt you could really measure
> it effectively without dedicated equipment.  Adding too many of these
> types of messages just leads to lots of useless bug reports.  Users
> see the message and file bugs.

IMO warn and error level definitely lead to bug reports.  I've seen plenty of
these filed even from Paul that the levels are wrong.

*If* there was a message added it should be info or notice.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-20 20:29         ` Paul Menzel
  2022-04-20 20:40           ` Alex Deucher
@ 2022-04-21  1:12           ` Gong, Richard
  2022-04-21  5:35             ` Paul Menzel
  1 sibling, 1 reply; 25+ messages in thread
From: Gong, Richard @ 2022-04-21  1:12 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Alex Deucher, Dave Airlie, Xinhui Pan, LKML, dri-devel, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello

Hi Paul,

On 4/20/2022 3:29 PM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>
>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>> [Cc: -kernel test robot <lkp@intel.com>]
>
> […]
>
>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>
>>>>> Thank you for sending out v4.
>>>>>
>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>> kernel 5.14.
>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>> won't work
>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>> cards as
>>>>>> video/display output, Intel Alder Lake based systems will hang 
>>>>>> during
>>>>>> suspend/resume.
>
> [Your email program wraps lines in cited text for some reason, making 
> the citation harder to read.]
>
Not sure why, I am using Mozila Thunderbird for email. I am not using MS 
Outlook for upstream email.
>>>>>
>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>> guess
>>>>> suspending works fine? During resume (S3 or S0ix?), where does it 
>>>>> hang?
>>>>> The system is functional, but there are only display problems?
>> System freeze after suspend/resume.
>
> But you see certain messages still? At what point does it freeze 
> exactly? In the bug report you posted Linux messages.

No, the system freeze then users have to recycle power to recover.

>
>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>> 3660 with
>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>> 4 Alder
>>>>>> Lake based systems.
>>>>>>
>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>
>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>> Link: 
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7Cce01de048c61456174ff08da230c750d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860833680922036%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vqhh3dTc%2FgBt7GrP9hKppWlrFy2F7DaivkNEuGekl0g%3D&amp;reserved=0
>
> Thank you Microsoft Outlook for keeping us safe. :(
I am not using MS Outlook for the email exchanges.
>
>>>>>>
>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>
>>>>> This tag is a little confusing. Maybe clarify that it was for an 
>>>>> issue
>>>>> in a previous patch iteration?
>>
>> I did describe in change-list version 3 below, which corrected the 
>> build error with W=1 option.
>>
>> It is not good idea to add the description for that to the commit 
>> message, this is why I add descriptions on change-list version 3.
>
> Do as you wish, but the current style is confusing, and readers of the 
> commit are going to think, the kernel test robot reported the problem 
> with AMD VI ASICs and Intel Alder Lake systems.
>
>>>>>
>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>> ---
>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>       enhanced check logic
>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>       correct build error with W=1 option
>>>>>> v2: correct commit description
>>>>>>       move the check from chip family to problematic platform
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>    1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>> @@ -81,6 +81,10 @@
>>>>>>    #include "mxgpu_vi.h"
>>>>>>    #include "amdgpu_dm.h"
>>>>>>
>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>> +#include <asm/intel-family.h>
>>>>>> +#endif
>>>>>> +
>>>>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>    #define 
>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 0x00000001L
>>>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 
>>>>>> 0x00000002L
>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct 
>>>>>> amdgpu_device *adev)
>>>>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>    }
>>>>>>
>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>> +{
>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>> +
>>>>>> +             return !(c->x86 == 6 && c->x86_model == 
>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>> +     }
>>>>>> +
>>>>>> +     return true;
>>>>>> +}
>>>>>> +
>>>>>>    static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>    {
>>>>>>        u32 data, data1, orig;
>>>>>>        bool bL1SS = false;
>>>>>>        bool bClkReqSupport = true;
>>>>>>
>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) || 
>>>>>> !aspm_support_quirk_check())
>>>>>>                return;
>>>>>
>>>>> Can users still forcefully enable ASPM with the parameter 
>>>>> `amdgpu.aspm`?
>>>>>
>> As Mario mentioned in a separate reply, we can't forcefully enable 
>> ASPM with the parameter 'amdgpu.aspm'.
>
> That would be a regression on systems where ASPM used to work. Hmm. I 
> guess, you could say, there are no such systems.
>
>>>>>>
>>>>>>        if (adev->flags & AMD_IS_APU ||
>>>>>
>>>>> If I remember correctly, there were also newer cards, where ASPM 
>>>>> worked
>>>>> with Intel Alder Lake, right? Can only the problematic generations 
>>>>> for
>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>
>>>> This patch only disables it for the generatioaon that was problematic.
>>>
>>> Could that please be made clear in the commit message summary, and 
>>> message?
>>
>> Are you ok with the commit messages below?
>
> Please change the commit message summary. Maybe:
>
> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>
>> Active State Power Management (ASPM) feature is enabled since kernel 
>> 5.14.
>>
>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>> cards as
>> video/display output, Intel Alder Lake based systems will freeze after
>> suspend/resume.
>
> Something like:
>
> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic 
> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize 
> when resuming from S0ix(?).
>
>
>> The issue was initially reported on one system (Dell Precision 3660 with
>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>> Alder
>> Lake based systems.
>
> Which ones?
those are pre-production Alder Lake based OEM systems
>
>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>> problematic generation GFX cards.
>
> … with the problematic Volcanic Islands GFX cards.
>
>>>
>>> Loosely related, is there a public (or internal issue) to analyze 
>>> how to get ASPM working for VI generation devices with Intel Alder 
>>> Lake?
>>
>> As Alex mentioned, we need support from Intel. We don't have any 
>> update on that.
>
> It’d be great to get that fixed properly.
>
> Last thing, please don’t hate me, does Linux log, that ASPM is disabled?
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-21  1:12           ` Gong, Richard
@ 2022-04-21  5:35             ` Paul Menzel
  2022-04-26 13:53               ` Gong, Richard
  0 siblings, 1 reply; 25+ messages in thread
From: Paul Menzel @ 2022-04-21  5:35 UTC (permalink / raw)
  To: Richard Gong
  Cc: Dave Airlie, Xinhui Pan, LKML, amd-gfx, Alexander Deucher,
	dri-devel, Daniel Vetter, Alex Deucher, Christian König,
	Mario Limonciello

Dear Richard,


Am 21.04.22 um 03:12 schrieb Gong, Richard:

> On 4/20/2022 3:29 PM, Paul Menzel wrote:

>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>
>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>
>> […]
>>
>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>
>>>>>> Thank you for sending out v4.
>>>>>>
>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>>> kernel 5.14.
>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>> won't work
>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>>> cards as
>>>>>>> video/display output, Intel Alder Lake based systems will hang 
>>>>>>> during
>>>>>>> suspend/resume.
>>
>> [Your email program wraps lines in cited text for some reason, making 
>> the citation harder to read.]
>>
> Not sure why, I am using Mozila Thunderbird for email. I am not using MS 
> Outlook for upstream email.

Strange. No idea if there were bugs in Mozilla Thunderbird 91.2.0, 
released over half year ago. The current version is 91.8.1. [1]

>>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>>> guess
>>>>>> suspending works fine? During resume (S3 or S0ix?), where does it 
>>>>>> hang?
>>>>>> The system is functional, but there are only display problems?
>>> System freeze after suspend/resume.
>>
>> But you see certain messages still? At what point does it freeze 
>> exactly? In the bug report you posted Linux messages.
> 
> No, the system freeze then users have to recycle power to recover.

Then I misread the issue? Did you capture the messages over serial log then?

>>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>>> 3660 with
>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>>> 4 Alder
>>>>>>> Lake based systems.
>>>>>>>
>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>
>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>> Link: 
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7Cce01de048c61456174ff08da230c750d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860833680922036%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=vqhh3dTc%2FgBt7GrP9hKppWlrFy2F7DaivkNEuGekl0g%3D&amp;reserved=0 
>>>>>>>
>>
>> Thank you Microsoft Outlook for keeping us safe. :(
> I am not using MS Outlook for the email exchanges.

I guess, it’s not the client but the Microsoft email service (Exchange?) 
no idea adding these protection links. (Making it even harder for users 
to actually verify domain. No idea who comes up with these ideas, and 
customers actually accepting those.)

>>>>>>>
>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>
>>>>>> This tag is a little confusing. Maybe clarify that it was for an 
>>>>>> issue
>>>>>> in a previous patch iteration?
>>>
>>> I did describe in change-list version 3 below, which corrected the 
>>> build error with W=1 option.
>>>
>>> It is not good idea to add the description for that to the commit 
>>> message, this is why I add descriptions on change-list version 3.
>>
>> Do as you wish, but the current style is confusing, and readers of the 
>> commit are going to think, the kernel test robot reported the problem 
>> with AMD VI ASICs and Intel Alder Lake systems.
>>
>>>>>>
>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>> ---
>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>       enhanced check logic
>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>       correct build error with W=1 option
>>>>>>> v2: correct commit description
>>>>>>>       move the check from chip family to problematic platform
>>>>>>> ---
>>>>>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>    1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>    #include "mxgpu_vi.h"
>>>>>>>    #include "amdgpu_dm.h"
>>>>>>>
>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>> +#include <asm/intel-family.h>
>>>>>>> +#endif
>>>>>>> +
>>>>>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>>    #define 
>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 0x00000001L
>>>>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 
>>>>>>> 0x00000002L
>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct 
>>>>>>> amdgpu_device *adev)
>>>>>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>    }
>>>>>>>
>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>> +{
>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>> +
>>>>>>> +             return !(c->x86 == 6 && c->x86_model == 
>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     return true;
>>>>>>> +}
>>>>>>> +
>>>>>>>    static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>    {
>>>>>>>        u32 data, data1, orig;
>>>>>>>        bool bL1SS = false;
>>>>>>>        bool bClkReqSupport = true;
>>>>>>>
>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) || 
>>>>>>> !aspm_support_quirk_check())
>>>>>>>                return;
>>>>>>
>>>>>> Can users still forcefully enable ASPM with the parameter 
>>>>>> `amdgpu.aspm`?
>>>>>>
>>> As Mario mentioned in a separate reply, we can't forcefully enable 
>>> ASPM with the parameter 'amdgpu.aspm'.
>>
>> That would be a regression on systems where ASPM used to work. Hmm. I 
>> guess, you could say, there are no such systems.
>>
>>>>>>>
>>>>>>>        if (adev->flags & AMD_IS_APU ||
>>>>>>
>>>>>> If I remember correctly, there were also newer cards, where ASPM 
>>>>>> worked
>>>>>> with Intel Alder Lake, right? Can only the problematic generations 
>>>>>> for
>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>
>>>>> This patch only disables it for the generatioaon that was problematic.
>>>>
>>>> Could that please be made clear in the commit message summary, and 
>>>> message?
>>>
>>> Are you ok with the commit messages below?
>>
>> Please change the commit message summary. Maybe:
>>
>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>
>>> Active State Power Management (ASPM) feature is enabled since kernel 
>>> 5.14.
>>>
>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't work
>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>> cards as
>>> video/display output, Intel Alder Lake based systems will freeze after
>>> suspend/resume.
>>
>> Something like:
>>
>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic 
>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize 
>> when resuming from S0ix(?).
>>
>>
>>> The issue was initially reported on one system (Dell Precision 3660 with
>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>>> Alder
>>> Lake based systems.
>>
>> Which ones?
> those are pre-production Alder Lake based OEM systems

Just write that then: at least four pre-production Alder Lake based systems.

>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>> problematic generation GFX cards.
>>
>> … with the problematic Volcanic Islands GFX cards.
>>
>>>>
>>>> Loosely related, is there a public (or internal issue) to analyze 
>>>> how to get ASPM working for VI generation devices with Intel Alder 
>>>> Lake?
>>>
>>> As Alex mentioned, we need support from Intel. We don't have any 
>>> update on that.
>>
>> It’d be great to get that fixed properly.
>>
>> Last thing, please don’t hate me, does Linux log, that ASPM is disabled?


Kind regards,

Paul


[1]: https://www.thunderbird.net/en-US/thunderbird/releases/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-21  5:35             ` Paul Menzel
@ 2022-04-26 13:53               ` Gong, Richard
  2022-05-01  7:08                 ` Paul Menzel
  0 siblings, 1 reply; 25+ messages in thread
From: Gong, Richard @ 2022-04-26 13:53 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Dave Airlie, Xinhui Pan, LKML, amd-gfx, Alexander Deucher,
	dri-devel, Daniel Vetter, Alex Deucher, Christian König,
	Mario Limonciello

Hi Paul,

On 4/21/2022 12:35 AM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 21.04.22 um 03:12 schrieb Gong, Richard:
>
>> On 4/20/2022 3:29 PM, Paul Menzel wrote:
>
>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>
>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>
>>> […]
>>>
>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>
>>>>>>> Thank you for sending out v4.
>>>>>>>
>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>>>> kernel 5.14.
>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>>> won't work
>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these 
>>>>>>>> GFX cards as
>>>>>>>> video/display output, Intel Alder Lake based systems will hang 
>>>>>>>> during
>>>>>>>> suspend/resume.
>>>
>>> [Your email program wraps lines in cited text for some reason, 
>>> making the citation harder to read.]
>>>
>> Not sure why, I am using Mozila Thunderbird for email. I am not using 
>> MS Outlook for upstream email.
>
> Strange. No idea if there were bugs in Mozilla Thunderbird 91.2.0, 
> released over half year ago. The current version is 91.8.1. [1]
>
>>>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>>>> guess
>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>> it hang?
>>>>>>> The system is functional, but there are only display problems?
>>>> System freeze after suspend/resume.
>>>
>>> But you see certain messages still? At what point does it freeze 
>>> exactly? In the bug report you posted Linux messages.
>>
>> No, the system freeze then users have to recycle power to recover.
>
> Then I misread the issue? Did you capture the messages over serial log 
> then?

I think so. We captured dmesg log.

As mentioned early we need support from Intel on how to get ASPM working 
for VI generation on Intel Alder Lake, but we don't know where things 
currently stand.

>
>>>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>>>> 3660 with
>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>> least 4 Alder
>>>>>>>> Lake based systems.
>>>>>>>>
>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems.
>>>>>>>>
>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>> Link: 
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C5990a9e58af0438b80c308da2358d216%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637861161666341691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Ka3NSmXuyji%2F%2FRdH319aFk9ya5UytU8lq3FhiuMd%2FcU%3D&amp;reserved=0 
>>>>>>>>
>>>
>>> Thank you Microsoft Outlook for keeping us safe. :(
>> I am not using MS Outlook for the email exchanges.
>
> I guess, it’s not the client but the Microsoft email service 
> (Exchange?) no idea adding these protection links. (Making it even 
> harder for users to actually verify domain. No idea who comes up with 
> these ideas, and customers actually accepting those.)
>
>>>>>>>>
>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>
>>>>>>> This tag is a little confusing. Maybe clarify that it was for an 
>>>>>>> issue
>>>>>>> in a previous patch iteration?
>>>>
>>>> I did describe in change-list version 3 below, which corrected the 
>>>> build error with W=1 option.
>>>>
>>>> It is not good idea to add the description for that to the commit 
>>>> message, this is why I add descriptions on change-list version 3.
>>>
>>> Do as you wish, but the current style is confusing, and readers of 
>>> the commit are going to think, the kernel test robot reported the 
>>> problem with AMD VI ASICs and Intel Alder Lake systems.
>>>
>>>>>>>
>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>> ---
>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>       enhanced check logic
>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>       correct build error with W=1 option
>>>>>>>> v2: correct commit description
>>>>>>>>       move the check from chip family to problematic platform
>>>>>>>> ---
>>>>>>>>    drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>    1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>    #include "mxgpu_vi.h"
>>>>>>>>    #include "amdgpu_dm.h"
>>>>>>>>
>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>> +#endif
>>>>>>>> +
>>>>>>>>    #define ixPCIE_LC_L1_PM_SUBSTATE    0x100100C6
>>>>>>>>    #define 
>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK 
>>>>>>>> 0x00000001L
>>>>>>>>    #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK 
>>>>>>>> 0x00000002L
>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct 
>>>>>>>> amdgpu_device *adev)
>>>>>>>>                WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>    }
>>>>>>>>
>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>> +{
>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>> +
>>>>>>>> +             return !(c->x86 == 6 && c->x86_model == 
>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>> +     }
>>>>>>>> +
>>>>>>>> +     return true;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>    static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>    {
>>>>>>>>        u32 data, data1, orig;
>>>>>>>>        bool bL1SS = false;
>>>>>>>>        bool bClkReqSupport = true;
>>>>>>>>
>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) || 
>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>                return;
>>>>>>>
>>>>>>> Can users still forcefully enable ASPM with the parameter 
>>>>>>> `amdgpu.aspm`?
>>>>>>>
>>>> As Mario mentioned in a separate reply, we can't forcefully enable 
>>>> ASPM with the parameter 'amdgpu.aspm'.
>>>
>>> That would be a regression on systems where ASPM used to work. Hmm. 
>>> I guess, you could say, there are no such systems.
>>>
>>>>>>>>
>>>>>>>>        if (adev->flags & AMD_IS_APU ||
>>>>>>>
>>>>>>> If I remember correctly, there were also newer cards, where ASPM 
>>>>>>> worked
>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>> generations for
>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>
>>>>>> This patch only disables it for the generatioaon that was 
>>>>>> problematic.
>>>>>
>>>>> Could that please be made clear in the commit message summary, and 
>>>>> message?
>>>>
>>>> Are you ok with the commit messages below?
>>>
>>> Please change the commit message summary. Maybe:
>>>
>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>
>>>> Active State Power Management (ASPM) feature is enabled since 
>>>> kernel 5.14.
>>>>
>>>> There are some AMD GFX cards (such as WX3200 and RX640) that won't 
>>>> work
>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>> cards as
>>>> video/display output, Intel Alder Lake based systems will freeze after
>>>> suspend/resume.
>>>
>>> Something like:
>>>
>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic 
>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize 
>>> when resuming from S0ix(?).
>>>
>>>
>>>> The issue was initially reported on one system (Dell Precision 3660 
>>>> with
>>>> BIOS version 0.14.81), but was later confirmed to affect at least 4 
>>>> Alder
>>>> Lake based systems.
>>>
>>> Which ones?
>> those are pre-production Alder Lake based OEM systems
>
> Just write that then: at least four pre-production Alder Lake based 
> systems.
>
>>>> Add extra check to disable ASPM on Intel Alder Lake based systems with
>>>> problematic generation GFX cards.
>>>
>>> … with the problematic Volcanic Islands GFX cards.
>>>
>>>>>
>>>>> Loosely related, is there a public (or internal issue) to analyze 
>>>>> how to get ASPM working for VI generation devices with Intel Alder 
>>>>> Lake?
>>>>
>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>> update on that.
>>>
>>> It’d be great to get that fixed properly.
>>>
>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>> disabled?
>
>
> Kind regards,
>
> Paul
>
>
> [1]: 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.thunderbird.net%2Fen-US%2Fthunderbird%2Freleases%2F&amp;data=05%7C01%7Crichard.gong%40amd.com%7C5990a9e58af0438b80c308da2358d216%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637861161666341691%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=HYlNdeVKxSWQto%2BWGAoUc5etFwhdlyTUoox71SQjCtY%3D&amp;reserved=0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-04-26 13:53               ` Gong, Richard
@ 2022-05-01  7:08                 ` Paul Menzel
  2022-05-02 14:56                   ` Gong, Richard
  2022-05-03 12:25                   ` Daniel Stone
  0 siblings, 2 replies; 25+ messages in thread
From: Paul Menzel @ 2022-05-01  7:08 UTC (permalink / raw)
  To: Richard Gong
  Cc: Dave Airlie, Xinhui Pan, LKML, dri-devel, Alex Deucher, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello

Dear Richard,


Sorry for the late reply.

Am 26.04.22 um 15:53 schrieb Gong, Richard:

> On 4/21/2022 12:35 AM, Paul Menzel wrote:

>> Am 21.04.22 um 03:12 schrieb Gong, Richard:
>>
>>> On 4/20/2022 3:29 PM, Paul Menzel wrote:
>>
>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>
>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>
>>>> […]
>>>>
>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>
>>>>>>>> Thank you for sending out v4.
>>>>>>>>
>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:

[…]

>>>>>>>> I am still not clear, what “hang during suspend/resume” means. I 
>>>>>>>> guess
>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>> it hang?
>>>>>>>> The system is functional, but there are only display problems?
>>>>> System freeze after suspend/resume.
>>>>
>>>> But you see certain messages still? At what point does it freeze 
>>>> exactly? In the bug report you posted Linux messages.
>>>
>>> No, the system freeze then users have to recycle power to recover.
>>
>> Then I misread the issue? Did you capture the messages over serial log 
>> then?
> 
> I think so. We captured dmesg log.

Then the (whole) system did *not* freeze, if you could still log in 
(maybe over network) and execute `dmesg`. Please also paste the 
amdgpu(?) error logs in the commit message.

> As mentioned early we need support from Intel on how to get ASPM working 
> for VI generation on Intel Alder Lake, but we don't know where things 
> currently stand.

Who is working on this, and knows?


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-05-01  7:08                 ` Paul Menzel
@ 2022-05-02 14:56                   ` Gong, Richard
  2022-05-03 12:25                   ` Daniel Stone
  1 sibling, 0 replies; 25+ messages in thread
From: Gong, Richard @ 2022-05-02 14:56 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Dave Airlie, Xinhui Pan, LKML, dri-devel, Alex Deucher, amd-gfx,
	Daniel Vetter, Alexander Deucher, Christian König,
	Mario Limonciello

Hi Paul,

On 5/1/2022 2:08 AM, Paul Menzel wrote:
> Dear Richard,
>
>
> Sorry for the late reply.
>
> Am 26.04.22 um 15:53 schrieb Gong, Richard:
>
>> On 4/21/2022 12:35 AM, Paul Menzel wrote:
>
>>> Am 21.04.22 um 03:12 schrieb Gong, Richard:
>>>
>>>> On 4/20/2022 3:29 PM, Paul Menzel wrote:
>>>
>>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>>
>>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>>
>>>>> […]
>>>>>
>>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>>
>>>>>>>>> Thank you for sending out v4.
>>>>>>>>>
>>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>
> […]
>
>>>>>>>>> I am still not clear, what “hang during suspend/resume” means. 
>>>>>>>>> I guess
>>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>>> it hang?
>>>>>>>>> The system is functional, but there are only display problems?
>>>>>> System freeze after suspend/resume.
>>>>>
>>>>> But you see certain messages still? At what point does it freeze 
>>>>> exactly? In the bug report you posted Linux messages.
>>>>
>>>> No, the system freeze then users have to recycle power to recover.
>>>
>>> Then I misread the issue? Did you capture the messages over serial 
>>> log then?
>>
>> I think so. We captured dmesg log.

make a correction, the previous 'dmesg log' description was not accurate.

I referred that to the kernel log captured via 'journalctl' after 
recycling the power. I should use kernel log rather than 'dmesg log' to 
avoid the confusion.

>
> Then the (whole) system did *not* freeze, if you could still log in 
> (maybe over network) and execute `dmesg`. Please also paste the 
> amdgpu(?) error logs in the commit message.

As mentioned in my "previous previous" reply, the user have to recycle 
power to reset the system.

When issue occurred, keyboard/mouse didn't work.  'demsg' and ssh didn't 
work either.

>
>> As mentioned early we need support from Intel on how to get ASPM 
>> working for VI generation on Intel Alder Lake, but we don't know 
>> where things currently stand.
>
> Who is working on this, and knows?

I have no idea.


>
> Kind regards,
>
> Paul

Regards,

Richard


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-05-01  7:08                 ` Paul Menzel
  2022-05-02 14:56                   ` Gong, Richard
@ 2022-05-03 12:25                   ` Daniel Stone
  2022-05-03 12:44                     ` Paul Menzel
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel Stone @ 2022-05-03 12:25 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Richard Gong, Dave Airlie, Xinhui Pan, LKML, amd-gfx,
	Alexander Deucher, dri-devel, Christian König,
	Mario Limonciello

On Sun, 1 May 2022 at 08:08, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> Am 26.04.22 um 15:53 schrieb Gong, Richard:
> > I think so. We captured dmesg log.
>
> Then the (whole) system did *not* freeze, if you could still log in
> (maybe over network) and execute `dmesg`. Please also paste the
> amdgpu(?) error logs in the commit message.
>
> > As mentioned early we need support from Intel on how to get ASPM working
> > for VI generation on Intel Alder Lake, but we don't know where things
> > currently stand.
>
> Who is working on this, and knows?

This has gone beyond the point of a reasonable request. The amount of
detail you're demanding is completely unnecessary.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
  2022-05-03 12:25                   ` Daniel Stone
@ 2022-05-03 12:44                     ` Paul Menzel
  0 siblings, 0 replies; 25+ messages in thread
From: Paul Menzel @ 2022-05-03 12:44 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Dave Airlie, Richard Gong, Xinhui Pan, LKML, amd-gfx, dri-devel,
	Alexander Deucher, Christian König, Mario Limonciello

Dear Daniel,


Am 03.05.22 um 14:25 schrieb Daniel Stone:
> On Sun, 1 May 2022 at 08:08, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>> Am 26.04.22 um 15:53 schrieb Gong, Richard:
>>> I think so. We captured dmesg log.
>>
>> Then the (whole) system did *not* freeze, if you could still log in
>> (maybe over network) and execute `dmesg`. Please also paste the
>> amdgpu(?) error logs in the commit message.
>>
>>> As mentioned early we need support from Intel on how to get ASPM working
>>> for VI generation on Intel Alder Lake, but we don't know where things
>>> currently stand.
>>
>> Who is working on this, and knows?
> 
> This has gone beyond the point of a reasonable request. The amount of
> detail you're demanding is completely unnecessary.

If a quirk is introduced possibly leading to higher power consumption, 
especially on systems nobody has access to yet, then the detail, where 
the system hangs/freezes is not unreasonable at all.

In the Linux logs from the issue there are messages like

     [   58.101385] Freezing of tasks failed after 20.003 seconds (4 
tasks refusing to freeze, wq_busy=0):

     [   78.278403] Freezing of tasks failed after 20.008 seconds (4 
tasks refusing to freeze, wq_busy=0):

and it looks like several suspend/resume cycles were done.

I see a lot of commit messages over the whole Linux kernel, where this 
level of detail is provided (by default), and

The second question was not for the commit message, but just for 
documentation purpose when the problem is going to be fixed properly. 
And it looks like (at least publicly) analyzing the root cause is not 
happening, and once the quirk lands, nobody is going to feel the 
pressure to work on it, as everyone’s plates are full.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-05-03 12:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 21:50 [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Richard Gong
2022-04-13  4:29 ` Lazar, Lijo
2022-04-13  7:43 ` Paul Menzel
2022-04-13 13:00   ` Alex Deucher
2022-04-13 13:28     ` Limonciello, Mario
2022-04-14  7:52     ` Paul Menzel
2022-04-14 13:11       ` Alex Deucher
     [not found]       ` <94fd858d-1792-9c05-b5c6-1b028427687d@amd.com>
2022-04-20 20:29         ` Paul Menzel
2022-04-20 20:40           ` Alex Deucher
2022-04-20 20:48             ` Paul Menzel
2022-04-20 20:56               ` Gong, Richard
2022-04-20 21:02                 ` Paul Menzel
2022-04-20 21:12                   ` Gong, Richard
2022-04-20 21:15                     ` Alex Deucher
2022-04-20 21:13                   ` Alex Deucher
2022-04-20 21:16                     ` Limonciello, Mario
2022-04-21  1:12           ` Gong, Richard
2022-04-21  5:35             ` Paul Menzel
2022-04-26 13:53               ` Gong, Richard
2022-05-01  7:08                 ` Paul Menzel
2022-05-02 14:56                   ` Gong, Richard
2022-05-03 12:25                   ` Daniel Stone
2022-05-03 12:44                     ` Paul Menzel
2022-04-13 15:40 ` Nathan Chancellor
2022-04-19 21:08   ` Gong, Richard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).