All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Gong, Richard" <richard.gong@amd.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: "Alex Deucher" <alexdeucher@gmail.com>,
	"Dave Airlie" <airlied@linux.ie>,
	"Xinhui Pan" <xinhui.pan@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Alexander Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Mario Limonciello" <mario.limonciello@amd.com>
Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
Date: Wed, 20 Apr 2022 16:12:55 -0500	[thread overview]
Message-ID: <fefcb57e-cc28-d9c1-2fad-997ed3856a88@amd.com> (raw)
In-Reply-To: <462dce7d-73fc-ea8c-0a8a-5e8722ee1967@molgen.mpg.de>


On 4/20/2022 4:02 PM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 20.04.22 um 22:56 schrieb Gong, Richard:
>
>> On 4/20/2022 3:48 PM, Paul Menzel wrote:
>
>>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>>>> wrote:
>>>
>>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>>
>>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>>
>>>>> […]
>>>>>
>>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>>
>>>>>>>>> Thank you for sending out v4.
>>>>>>>>>
>>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>>>> kernel 5.14.
>>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>>>>> won't
>>>>>>>>>> work
>>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these 
>>>>>>>>>> GFX
>>>>>>>>>> cards as
>>>>>>>>>> video/display output, Intel Alder Lake based systems will 
>>>>>>>>>> hang during
>>>>>>>>>> suspend/resume.
>>>>>
>>>>> [Your email program wraps lines in cited text for some reason, making
>>>>> the citation harder to read.]
>>>>>
>>>>>>>>>
>>>>>>>>> I am still not clear, what “hang during suspend/resume” means. 
>>>>>>>>> I guess
>>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>>> it hang?
>>>>>>>>> The system is functional, but there are only display problems?
>>>>>> System freeze after suspend/resume.
>>>>>
>>>>> But you see certain messages still? At what point does it freeze
>>>>> exactly? In the bug report you posted Linux messages.
>>>>>
>>>>>>>>>> The issue was initially reported on one system (Dell 
>>>>>>>>>> Precision 3660
>>>>>>>>>> with
>>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>>>> least 4
>>>>>>>>>> Alder
>>>>>>>>>> Lake based systems.
>>>>>>>>>>
>>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based 
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>>>> Link:
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C509e0378edcf477605a708da231114f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860853537880384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SoXDKGHUiiQN4rcL7FpCotouWFt0kkAbcHyO3esfNlE%3D&amp;reserved=0 
>>>>>>>>>>
>>>>>
>>>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>>>
>>>>>>>>>>
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>
>>>>>>>>> This tag is a little confusing. Maybe clarify that it was for 
>>>>>>>>> an issue
>>>>>>>>> in a previous patch iteration?
>>>>>>
>>>>>> I did describe in change-list version 3 below, which corrected 
>>>>>> the build
>>>>>> error with W=1 option.
>>>>>>
>>>>>> It is not good idea to add the description for that to the commit
>>>>>> message, this is why I add descriptions on change-list version 3.
>>>>>
>>>>> Do as you wish, but the current style is confusing, and readers of 
>>>>> the
>>>>> commit are going to think, the kernel test robot reported the problem
>>>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>>>> ---
>>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>>>        enhanced check logic
>>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>>>        correct build error with W=1 option
>>>>>>>>>> v2: correct commit description
>>>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>>>
>>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>>>>>>>>>>     #define 
>>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>>>> 0x00000001L
>>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>>>> 0x00000002L
>>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>>>> +{
>>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>>>> +
>>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>> +     return true;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>>>     {
>>>>>>>>>>         u32 data, data1, orig;
>>>>>>>>>>         bool bL1SS = false;
>>>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>>>
>>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>>>                 return;
>>>>>>>>>
>>>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>>>> `amdgpu.aspm`?
>>>>>>>>>
>>>>>> As Mario mentioned in a separate reply, we can't forcefully 
>>>>>> enable ASPM
>>>>>> with the parameter 'amdgpu.aspm'.
>>>>>
>>>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>>>> guess, you could say, there are no such systems.
>>>>>
>>>>>>>>>>
>>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>>>
>>>>>>>>> If I remember correctly, there were also newer cards, where 
>>>>>>>>> ASPM worked
>>>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>>>> generations for
>>>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>>>
>>>>>>>> This patch only disables it for the generatioaon that was 
>>>>>>>> problematic.
>>>>>>>
>>>>>>> Could that please be made clear in the commit message summary, and
>>>>>>> message?
>>>>>>
>>>>>> Are you ok with the commit messages below?
>>>>>
>>>>> Please change the commit message summary. Maybe:
>>>>>
>>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>>>
>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>> kernel 5.14.
>>>>>>
>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>> won't work
>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>> cards as
>>>>>> video/display output, Intel Alder Lake based systems will freeze 
>>>>>> after
>>>>>> suspend/resume.
>>>>>
>>>>> Something like:
>>>>>
>>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>>>> when resuming from S0ix(?).
>>>>>
>>>>>
>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>> 3660 with
>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>> 4 Alder
>>>>>> Lake based systems.
>>>>>
>>>>> Which ones?
>>>>>
>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems 
>>>>>> with
>>>>>> problematic generation GFX cards.
>>>>>
>>>>> … with the problematic Volcanic Islands GFX cards.
>>>>>
>>>>>>>
>>>>>>> Loosely related, is there a public (or internal issue) to 
>>>>>>> analyze how
>>>>>>> to get ASPM working for VI generation devices with Intel Alder 
>>>>>>> Lake?
>>>>>>
>>>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>>>> update
>>>>>> on that.
>>>>>
>>>>> It’d be great to get that fixed properly.
>>>>>
>>>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>>>> disabled?
>>>>
>>>> I'm not sure what gets logged at the platform level with respect to
>>>> ASPM, but whether or not the driver enables ASPM is tied to whether
>>>> ASPM is allowed at the platform level or not so if the platform
>>>> indicates that ASPM is not supported, the driver won't enable it.  The
>>>> driver does not log whether ASPM is enabled or not if that is what you
>>>> are asking.  As to whether or not it should, it comes down to how much
>>>> stuff is worth indiciating in the log.  The driver is already pretty
>>>> chatty by driver standards.
>>>
>>> I specifically mean, Linux should log the quirks it applies. (As a 
>>> normal user, I’d also expect ASPM to work nowadays, so a message, 
>>> that it’s disabled would help a lot.)
>>
>> In general rule we shouldn't generate additional log unless something 
>> went wrong with the system.
>
> Please run `dmesg` and see that your statement is false. That’s what 
> log levels are for, and in your case, it would be at least error 
> level. Also, I claim, something indeed went wrong, because a quirk had 
> to be applied. So please add a notice log level, that ASPM gets disabled:

 From my previous experience with upstream, the maintainers simply don't 
like adding logs unless absolutely need.

I can add a pr_warn or dev_warn, but I can't guarantee that maintainers 
will take that in my case.

>
> Disable ASPM on Alder Lake with Volcanic Islands card due to resume 
> problems. System energy consumption might be higher than expected.
>
>
> Kind regards,
>
> Paul

WARNING: multiple messages have this Message-ID (diff)
From: "Gong, Richard" <richard.gong@amd.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: "Dave Airlie" <airlied@linux.ie>,
	"Xinhui Pan" <xinhui.pan@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx@lists.freedesktop.org,
	"Alexander Deucher" <alexander.deucher@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	"Mario Limonciello" <mario.limonciello@amd.com>
Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
Date: Wed, 20 Apr 2022 16:12:55 -0500	[thread overview]
Message-ID: <fefcb57e-cc28-d9c1-2fad-997ed3856a88@amd.com> (raw)
In-Reply-To: <462dce7d-73fc-ea8c-0a8a-5e8722ee1967@molgen.mpg.de>


On 4/20/2022 4:02 PM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 20.04.22 um 22:56 schrieb Gong, Richard:
>
>> On 4/20/2022 3:48 PM, Paul Menzel wrote:
>
>>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>>>> wrote:
>>>
>>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>>
>>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>>
>>>>> […]
>>>>>
>>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>>
>>>>>>>>> Thank you for sending out v4.
>>>>>>>>>
>>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>>>> kernel 5.14.
>>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>>>>> won't
>>>>>>>>>> work
>>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these 
>>>>>>>>>> GFX
>>>>>>>>>> cards as
>>>>>>>>>> video/display output, Intel Alder Lake based systems will 
>>>>>>>>>> hang during
>>>>>>>>>> suspend/resume.
>>>>>
>>>>> [Your email program wraps lines in cited text for some reason, making
>>>>> the citation harder to read.]
>>>>>
>>>>>>>>>
>>>>>>>>> I am still not clear, what “hang during suspend/resume” means. 
>>>>>>>>> I guess
>>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>>> it hang?
>>>>>>>>> The system is functional, but there are only display problems?
>>>>>> System freeze after suspend/resume.
>>>>>
>>>>> But you see certain messages still? At what point does it freeze
>>>>> exactly? In the bug report you posted Linux messages.
>>>>>
>>>>>>>>>> The issue was initially reported on one system (Dell 
>>>>>>>>>> Precision 3660
>>>>>>>>>> with
>>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>>>> least 4
>>>>>>>>>> Alder
>>>>>>>>>> Lake based systems.
>>>>>>>>>>
>>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based 
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>>>> Link:
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C509e0378edcf477605a708da231114f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860853537880384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SoXDKGHUiiQN4rcL7FpCotouWFt0kkAbcHyO3esfNlE%3D&amp;reserved=0 
>>>>>>>>>>
>>>>>
>>>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>>>
>>>>>>>>>>
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>
>>>>>>>>> This tag is a little confusing. Maybe clarify that it was for 
>>>>>>>>> an issue
>>>>>>>>> in a previous patch iteration?
>>>>>>
>>>>>> I did describe in change-list version 3 below, which corrected 
>>>>>> the build
>>>>>> error with W=1 option.
>>>>>>
>>>>>> It is not good idea to add the description for that to the commit
>>>>>> message, this is why I add descriptions on change-list version 3.
>>>>>
>>>>> Do as you wish, but the current style is confusing, and readers of 
>>>>> the
>>>>> commit are going to think, the kernel test robot reported the problem
>>>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>>>> ---
>>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>>>        enhanced check logic
>>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>>>        correct build error with W=1 option
>>>>>>>>>> v2: correct commit description
>>>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>>>
>>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>>>>>>>>>>     #define 
>>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>>>> 0x00000001L
>>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>>>> 0x00000002L
>>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>>>> +{
>>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>>>> +
>>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>> +     return true;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>>>     {
>>>>>>>>>>         u32 data, data1, orig;
>>>>>>>>>>         bool bL1SS = false;
>>>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>>>
>>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>>>                 return;
>>>>>>>>>
>>>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>>>> `amdgpu.aspm`?
>>>>>>>>>
>>>>>> As Mario mentioned in a separate reply, we can't forcefully 
>>>>>> enable ASPM
>>>>>> with the parameter 'amdgpu.aspm'.
>>>>>
>>>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>>>> guess, you could say, there are no such systems.
>>>>>
>>>>>>>>>>
>>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>>>
>>>>>>>>> If I remember correctly, there were also newer cards, where 
>>>>>>>>> ASPM worked
>>>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>>>> generations for
>>>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>>>
>>>>>>>> This patch only disables it for the generatioaon that was 
>>>>>>>> problematic.
>>>>>>>
>>>>>>> Could that please be made clear in the commit message summary, and
>>>>>>> message?
>>>>>>
>>>>>> Are you ok with the commit messages below?
>>>>>
>>>>> Please change the commit message summary. Maybe:
>>>>>
>>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>>>
>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>> kernel 5.14.
>>>>>>
>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>> won't work
>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>> cards as
>>>>>> video/display output, Intel Alder Lake based systems will freeze 
>>>>>> after
>>>>>> suspend/resume.
>>>>>
>>>>> Something like:
>>>>>
>>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>>>> when resuming from S0ix(?).
>>>>>
>>>>>
>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>> 3660 with
>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>> 4 Alder
>>>>>> Lake based systems.
>>>>>
>>>>> Which ones?
>>>>>
>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems 
>>>>>> with
>>>>>> problematic generation GFX cards.
>>>>>
>>>>> … with the problematic Volcanic Islands GFX cards.
>>>>>
>>>>>>>
>>>>>>> Loosely related, is there a public (or internal issue) to 
>>>>>>> analyze how
>>>>>>> to get ASPM working for VI generation devices with Intel Alder 
>>>>>>> Lake?
>>>>>>
>>>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>>>> update
>>>>>> on that.
>>>>>
>>>>> It’d be great to get that fixed properly.
>>>>>
>>>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>>>> disabled?
>>>>
>>>> I'm not sure what gets logged at the platform level with respect to
>>>> ASPM, but whether or not the driver enables ASPM is tied to whether
>>>> ASPM is allowed at the platform level or not so if the platform
>>>> indicates that ASPM is not supported, the driver won't enable it.  The
>>>> driver does not log whether ASPM is enabled or not if that is what you
>>>> are asking.  As to whether or not it should, it comes down to how much
>>>> stuff is worth indiciating in the log.  The driver is already pretty
>>>> chatty by driver standards.
>>>
>>> I specifically mean, Linux should log the quirks it applies. (As a 
>>> normal user, I’d also expect ASPM to work nowadays, so a message, 
>>> that it’s disabled would help a lot.)
>>
>> In general rule we shouldn't generate additional log unless something 
>> went wrong with the system.
>
> Please run `dmesg` and see that your statement is false. That’s what 
> log levels are for, and in your case, it would be at least error 
> level. Also, I claim, something indeed went wrong, because a quirk had 
> to be applied. So please add a notice log level, that ASPM gets disabled:

 From my previous experience with upstream, the maintainers simply don't 
like adding logs unless absolutely need.

I can add a pr_warn or dev_warn, but I can't guarantee that maintainers 
will take that in my case.

>
> Disable ASPM on Alder Lake with Volcanic Islands card due to resume 
> problems. System energy consumption might be higher than expected.
>
>
> Kind regards,
>
> Paul

WARNING: multiple messages have this Message-ID (diff)
From: "Gong, Richard" <richard.gong@amd.com>
To: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: "Dave Airlie" <airlied@linux.ie>,
	"Xinhui Pan" <xinhui.pan@amd.com>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx@lists.freedesktop.org,
	"Alexander Deucher" <alexander.deucher@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Alex Deucher" <alexdeucher@gmail.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Mario Limonciello" <mario.limonciello@amd.com>
Subject: Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems
Date: Wed, 20 Apr 2022 16:12:55 -0500	[thread overview]
Message-ID: <fefcb57e-cc28-d9c1-2fad-997ed3856a88@amd.com> (raw)
In-Reply-To: <462dce7d-73fc-ea8c-0a8a-5e8722ee1967@molgen.mpg.de>


On 4/20/2022 4:02 PM, Paul Menzel wrote:
> Dear Richard,
>
>
> Am 20.04.22 um 22:56 schrieb Gong, Richard:
>
>> On 4/20/2022 3:48 PM, Paul Menzel wrote:
>
>>> Am 20.04.22 um 22:40 schrieb Alex Deucher:
>>>> On Wed, Apr 20, 2022 at 4:29 PM Paul Menzel <pmenzel@molgen.mpg.de> 
>>>> wrote:
>>>
>>>>> Am 19.04.22 um 23:46 schrieb Gong, Richard:
>>>>>
>>>>>> On 4/14/2022 2:52 AM, Paul Menzel wrote:
>>>>>>> [Cc: -kernel test robot <lkp@intel.com>]
>>>>>
>>>>> […]
>>>>>
>>>>>>> Am 13.04.22 um 15:00 schrieb Alex Deucher:
>>>>>>>> On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote:
>>>>>>>
>>>>>>>>> Thank you for sending out v4.
>>>>>>>>>
>>>>>>>>> Am 12.04.22 um 23:50 schrieb Richard Gong:
>>>>>>>>>> Active State Power Management (ASPM) feature is enabled since
>>>>>>>>>> kernel 5.14.
>>>>>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>>>>>> won't
>>>>>>>>>> work
>>>>>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these 
>>>>>>>>>> GFX
>>>>>>>>>> cards as
>>>>>>>>>> video/display output, Intel Alder Lake based systems will 
>>>>>>>>>> hang during
>>>>>>>>>> suspend/resume.
>>>>>
>>>>> [Your email program wraps lines in cited text for some reason, making
>>>>> the citation harder to read.]
>>>>>
>>>>>>>>>
>>>>>>>>> I am still not clear, what “hang during suspend/resume” means. 
>>>>>>>>> I guess
>>>>>>>>> suspending works fine? During resume (S3 or S0ix?), where does 
>>>>>>>>> it hang?
>>>>>>>>> The system is functional, but there are only display problems?
>>>>>> System freeze after suspend/resume.
>>>>>
>>>>> But you see certain messages still? At what point does it freeze
>>>>> exactly? In the bug report you posted Linux messages.
>>>>>
>>>>>>>>>> The issue was initially reported on one system (Dell 
>>>>>>>>>> Precision 3660
>>>>>>>>>> with
>>>>>>>>>> BIOS version 0.14.81), but was later confirmed to affect at 
>>>>>>>>>> least 4
>>>>>>>>>> Alder
>>>>>>>>>> Lake based systems.
>>>>>>>>>>
>>>>>>>>>> Add extra check to disable ASPM on Intel Alder Lake based 
>>>>>>>>>> systems.
>>>>>>>>>>
>>>>>>>>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>>>>>>>>> Link:
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1885&amp;data=05%7C01%7Crichard.gong%40amd.com%7C509e0378edcf477605a708da231114f0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637860853537880384%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=SoXDKGHUiiQN4rcL7FpCotouWFt0kkAbcHyO3esfNlE%3D&amp;reserved=0 
>>>>>>>>>>
>>>>>
>>>>> Thank you Microsoft Outlook for keeping us safe. :(
>>>>>
>>>>>>>>>>
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>
>>>>>>>>> This tag is a little confusing. Maybe clarify that it was for 
>>>>>>>>> an issue
>>>>>>>>> in a previous patch iteration?
>>>>>>
>>>>>> I did describe in change-list version 3 below, which corrected 
>>>>>> the build
>>>>>> error with W=1 option.
>>>>>>
>>>>>> It is not good idea to add the description for that to the commit
>>>>>> message, this is why I add descriptions on change-list version 3.
>>>>>
>>>>> Do as you wish, but the current style is confusing, and readers of 
>>>>> the
>>>>> commit are going to think, the kernel test robot reported the problem
>>>>> with AMD VI ASICs and Intel Alder Lake systems.
>>>>>
>>>>>>>>>
>>>>>>>>>> Signed-off-by: Richard Gong <richard.gong@amd.com>
>>>>>>>>>> ---
>>>>>>>>>> v4: s/CONFIG_X86_64/CONFIG_X86
>>>>>>>>>>        enhanced check logic
>>>>>>>>>> v3: s/intel_core_asom_chk/aspm_support_quirk_check
>>>>>>>>>>        correct build error with W=1 option
>>>>>>>>>> v2: correct commit description
>>>>>>>>>>        move the check from chip family to problematic platform
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/amd/amdgpu/vi.c | 17 ++++++++++++++++-
>>>>>>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> index 039b90cdc3bc..b33e0a9bee65 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>>>>>>>>>> @@ -81,6 +81,10 @@
>>>>>>>>>>     #include "mxgpu_vi.h"
>>>>>>>>>>     #include "amdgpu_dm.h"
>>>>>>>>>>
>>>>>>>>>> +#if IS_ENABLED(CONFIG_X86)
>>>>>>>>>> +#include <asm/intel-family.h>
>>>>>>>>>> +#endif
>>>>>>>>>> +
>>>>>>>>>>     #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>>>>>>>>>>     #define 
>>>>>>>>>> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>>>>>>>>>> 0x00000001L
>>>>>>>>>>     #define PCIE_LC_L1_PM_SUBSTATE__LC_PCI_PM_L1_2_OVERRIDE_MASK
>>>>>>>>>> 0x00000002L
>>>>>>>>>> @@ -1134,13 +1138,24 @@ static void vi_enable_aspm(struct
>>>>>>>>>> amdgpu_device *adev)
>>>>>>>>>>                 WREG32_PCIE(ixPCIE_LC_CNTL, data);
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> +static bool aspm_support_quirk_check(void)
>>>>>>>>>> +{
>>>>>>>>>> +     if (IS_ENABLED(CONFIG_X86)) {
>>>>>>>>>> +             struct cpuinfo_x86 *c = &cpu_data(0);
>>>>>>>>>> +
>>>>>>>>>> +             return !(c->x86 == 6 && c->x86_model ==
>>>>>>>>>> INTEL_FAM6_ALDERLAKE);
>>>>>>>>>> +     }
>>>>>>>>>> +
>>>>>>>>>> +     return true;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>     static void vi_program_aspm(struct amdgpu_device *adev)
>>>>>>>>>>     {
>>>>>>>>>>         u32 data, data1, orig;
>>>>>>>>>>         bool bL1SS = false;
>>>>>>>>>>         bool bClkReqSupport = true;
>>>>>>>>>>
>>>>>>>>>> -     if (!amdgpu_device_should_use_aspm(adev))
>>>>>>>>>> +     if (!amdgpu_device_should_use_aspm(adev) ||
>>>>>>>>>> !aspm_support_quirk_check())
>>>>>>>>>>                 return;
>>>>>>>>>
>>>>>>>>> Can users still forcefully enable ASPM with the parameter
>>>>>>>>> `amdgpu.aspm`?
>>>>>>>>>
>>>>>> As Mario mentioned in a separate reply, we can't forcefully 
>>>>>> enable ASPM
>>>>>> with the parameter 'amdgpu.aspm'.
>>>>>
>>>>> That would be a regression on systems where ASPM used to work. Hmm. I
>>>>> guess, you could say, there are no such systems.
>>>>>
>>>>>>>>>>
>>>>>>>>>>         if (adev->flags & AMD_IS_APU ||
>>>>>>>>>
>>>>>>>>> If I remember correctly, there were also newer cards, where 
>>>>>>>>> ASPM worked
>>>>>>>>> with Intel Alder Lake, right? Can only the problematic 
>>>>>>>>> generations for
>>>>>>>>> WX3200 and RX640 be excluded from ASPM?
>>>>>>>>
>>>>>>>> This patch only disables it for the generatioaon that was 
>>>>>>>> problematic.
>>>>>>>
>>>>>>> Could that please be made clear in the commit message summary, and
>>>>>>> message?
>>>>>>
>>>>>> Are you ok with the commit messages below?
>>>>>
>>>>> Please change the commit message summary. Maybe:
>>>>>
>>>>> drm/amdgpu: VI: Disable ASPM on Intel Alder Lake based systems
>>>>>
>>>>>> Active State Power Management (ASPM) feature is enabled since 
>>>>>> kernel 5.14.
>>>>>>
>>>>>> There are some AMD GFX cards (such as WX3200 and RX640) that 
>>>>>> won't work
>>>>>> with ASPM-enabled Intel Alder Lake based systems. Using these GFX 
>>>>>> cards as
>>>>>> video/display output, Intel Alder Lake based systems will freeze 
>>>>>> after
>>>>>> suspend/resume.
>>>>>
>>>>> Something like:
>>>>>
>>>>> On Intel Alder Lake based systems using ASPM with AMD GFX Volcanic
>>>>> Islands (VI) cards, like WX3200 and RX640, graphics don’t initialize
>>>>> when resuming from S0ix(?).
>>>>>
>>>>>
>>>>>> The issue was initially reported on one system (Dell Precision 
>>>>>> 3660 with
>>>>>> BIOS version 0.14.81), but was later confirmed to affect at least 
>>>>>> 4 Alder
>>>>>> Lake based systems.
>>>>>
>>>>> Which ones?
>>>>>
>>>>>> Add extra check to disable ASPM on Intel Alder Lake based systems 
>>>>>> with
>>>>>> problematic generation GFX cards.
>>>>>
>>>>> … with the problematic Volcanic Islands GFX cards.
>>>>>
>>>>>>>
>>>>>>> Loosely related, is there a public (or internal issue) to 
>>>>>>> analyze how
>>>>>>> to get ASPM working for VI generation devices with Intel Alder 
>>>>>>> Lake?
>>>>>>
>>>>>> As Alex mentioned, we need support from Intel. We don't have any 
>>>>>> update
>>>>>> on that.
>>>>>
>>>>> It’d be great to get that fixed properly.
>>>>>
>>>>> Last thing, please don’t hate me, does Linux log, that ASPM is 
>>>>> disabled?
>>>>
>>>> I'm not sure what gets logged at the platform level with respect to
>>>> ASPM, but whether or not the driver enables ASPM is tied to whether
>>>> ASPM is allowed at the platform level or not so if the platform
>>>> indicates that ASPM is not supported, the driver won't enable it.  The
>>>> driver does not log whether ASPM is enabled or not if that is what you
>>>> are asking.  As to whether or not it should, it comes down to how much
>>>> stuff is worth indiciating in the log.  The driver is already pretty
>>>> chatty by driver standards.
>>>
>>> I specifically mean, Linux should log the quirks it applies. (As a 
>>> normal user, I’d also expect ASPM to work nowadays, so a message, 
>>> that it’s disabled would help a lot.)
>>
>> In general rule we shouldn't generate additional log unless something 
>> went wrong with the system.
>
> Please run `dmesg` and see that your statement is false. That’s what 
> log levels are for, and in your case, it would be at least error 
> level. Also, I claim, something indeed went wrong, because a quirk had 
> to be applied. So please add a notice log level, that ASPM gets disabled:

 From my previous experience with upstream, the maintainers simply don't 
like adding logs unless absolutely need.

I can add a pr_warn or dev_warn, but I can't guarantee that maintainers 
will take that in my case.

>
> Disable ASPM on Alder Lake with Volcanic Islands card due to resume 
> problems. System energy consumption might be higher than expected.
>
>
> Kind regards,
>
> Paul

  reply	other threads:[~2022-04-20 21:13 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12 21:50 [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems Richard Gong
2022-04-12 21:50 ` Richard Gong
2022-04-13  4:29 ` Lazar, Lijo
2022-04-13  4:29   ` Lazar, Lijo
2022-04-13  7:43 ` Paul Menzel
2022-04-13  7:43   ` Paul Menzel
2022-04-13  7:43   ` Paul Menzel
2022-04-13 13:00   ` Alex Deucher
2022-04-13 13:00     ` Alex Deucher
2022-04-13 13:00     ` Alex Deucher
2022-04-13 13:28     ` Limonciello, Mario
2022-04-13 13:28       ` Limonciello, Mario
2022-04-13 13:28       ` Limonciello, Mario
2022-04-14  7:52     ` Paul Menzel
2022-04-14  7:52       ` Paul Menzel
2022-04-14  7:52       ` Paul Menzel
2022-04-14 13:11       ` Alex Deucher
2022-04-14 13:11         ` Alex Deucher
2022-04-14 13:11         ` Alex Deucher
2022-04-19 21:46       ` Gong, Richard
2022-04-19 21:46         ` Gong, Richard
2022-04-20 20:29         ` Paul Menzel
2022-04-20 20:29           ` Paul Menzel
2022-04-20 20:29           ` Paul Menzel
2022-04-20 20:40           ` Alex Deucher
2022-04-20 20:40             ` Alex Deucher
2022-04-20 20:40             ` Alex Deucher
2022-04-20 20:48             ` Paul Menzel
2022-04-20 20:48               ` Paul Menzel
2022-04-20 20:48               ` Paul Menzel
2022-04-20 20:56               ` Gong, Richard
2022-04-20 20:56                 ` Gong, Richard
2022-04-20 20:56                 ` Gong, Richard
2022-04-20 21:02                 ` Paul Menzel
2022-04-20 21:02                   ` Paul Menzel
2022-04-20 21:02                   ` Paul Menzel
2022-04-20 21:12                   ` Gong, Richard [this message]
2022-04-20 21:12                     ` Gong, Richard
2022-04-20 21:12                     ` Gong, Richard
2022-04-20 21:15                     ` Alex Deucher
2022-04-20 21:15                       ` Alex Deucher
2022-04-20 21:15                       ` Alex Deucher
2022-04-20 21:13                   ` Alex Deucher
2022-04-20 21:13                     ` Alex Deucher
2022-04-20 21:13                     ` Alex Deucher
2022-04-20 21:16                     ` Limonciello, Mario
2022-04-20 21:16                       ` Limonciello, Mario
2022-04-20 21:16                       ` Limonciello, Mario
2022-04-21  1:12           ` Gong, Richard
2022-04-21  1:12             ` Gong, Richard
2022-04-21  1:12             ` Gong, Richard
2022-04-21  5:35             ` Paul Menzel
2022-04-21  5:35               ` Paul Menzel
2022-04-21  5:35               ` Paul Menzel
2022-04-26 13:53               ` Gong, Richard
2022-04-26 13:53                 ` Gong, Richard
2022-04-26 13:53                 ` Gong, Richard
2022-05-01  7:08                 ` Paul Menzel
2022-05-01  7:08                   ` Paul Menzel
2022-05-01  7:08                   ` Paul Menzel
2022-05-02 14:56                   ` Gong, Richard
2022-05-02 14:56                     ` Gong, Richard
2022-05-02 14:56                     ` Gong, Richard
2022-05-03 12:25                   ` Daniel Stone
2022-05-03 12:25                     ` Daniel Stone
2022-05-03 12:44                     ` Paul Menzel
2022-05-03 12:44                       ` Paul Menzel
2022-04-13 15:40 ` Nathan Chancellor
2022-04-13 15:40   ` Nathan Chancellor
2022-04-13 15:40   ` Nathan Chancellor
2022-04-19 21:08   ` Gong, Richard
2022-04-19 21:08     ` Gong, Richard
2022-04-19 21:08     ` Gong, Richard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fefcb57e-cc28-d9c1-2fad-997ed3856a88@amd.com \
    --to=richard.gong@amd.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=xinhui.pan@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.