dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
@ 2023-03-29  9:59 Kai-Heng Feng
  2023-03-29  9:59 ` [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV Kai-Heng Feng
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Kai-Heng Feng @ 2023-03-29  9:59 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan
  Cc: Jingyu Wang, Andrey Grodzovsky, Lijo Lazar, dri-devel,
	Michel Dänzer, YiPeng Chai, Mario Limonciello, Guchun Chen,
	Rafael J. Wysocki, amd-gfx, Jiansong Chen, Kenneth Feng,
	Tim Huang, Bokun Zhang, Hans de Goede, Maxime Ripard, Evan Quan,
	Somalapuram Amaranath, linux-kernel, Kai-Heng Feng,
	Hawking Zhang

When the power is lost due to ACPI power resources being turned off, the
driver should reset the GPU so it can work anew.

First, _PR3 support of the hierarchy needs to be found correctly. Since
the GPU on some discrete GFX cards is behind a PCIe switch, checking the
_PR3 on downstream port alone is not enough, as the _PR3 can associate
to the root port above the PCIe switch.

Once the _PR3 is found and BOCO support is correctly marked, use that
information to inform the GPU should be reset. This solves an issue that
system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
is supported for the GFX slot.

Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
 3 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index 60b1857f469e..407456ac0e84 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
 	if (amdgpu_sriov_vf(adev))
 		return false;
 
+	if (amdgpu_device_supports_boco(adev_to_drm(adev)))
+		return true;
+
 #if IS_ENABLED(CONFIG_SUSPEND)
 	return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
 #else
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f5658359ff5c..d56b7a2bafa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
 
 	if (!(adev->flags & AMD_IS_APU)) {
 		parent = pci_upstream_bridge(adev->pdev);
-		adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
+		do {
+			if (pci_pr3_present(parent)) {
+				adev->has_pr3 = true;
+				break;
+			}
+		} while ((parent = pci_upstream_bridge(parent)));
 	}
 
 	amdgpu_amdkfd_device_probe(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index ba5def374368..5d81fcac4b0a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
 	struct drm_device *drm_dev = dev_get_drvdata(dev);
 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
 
-	if (amdgpu_acpi_is_s0ix_active(adev))
-		adev->in_s0ix = true;
-	else if (amdgpu_acpi_is_s3_active(adev))
+	if (amdgpu_acpi_is_s3_active(adev) ||
+	    amdgpu_device_supports_boco(drm_dev))
 		adev->in_s3 = true;
+	else if (amdgpu_acpi_is_s0ix_active(adev))
+		adev->in_s0ix = true;
 	if (!adev->in_s0ix && !adev->in_s3)
 		return 0;
 	return amdgpu_device_suspend(drm_dev, true);
@@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
 		adev->no_hw_access = true;
 
 	r = amdgpu_device_resume(drm_dev, true);
-	if (amdgpu_acpi_is_s0ix_active(adev))
-		adev->in_s0ix = false;
-	else
-		adev->in_s3 = false;
+	adev->in_s0ix = adev->in_s3 = false;
 	return r;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV
  2023-03-29  9:59 [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Kai-Heng Feng
@ 2023-03-29  9:59 ` Kai-Heng Feng
  2023-03-29 13:08   ` Gong, Richard
  2023-03-29 13:21 ` [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Alex Deucher
  2023-03-29 13:23 ` Mario Limonciello
  2 siblings, 1 reply; 11+ messages in thread
From: Kai-Heng Feng @ 2023-03-29  9:59 UTC (permalink / raw)
  To: alexander.deucher, christian.koenig, Xinhui.Pan
  Cc: Victor Zhao, dri-devel, Kai-Heng Feng, Mario Limonciello,
	Likun Gao, Guchun Chen, amd-gfx, Veerabadhran Gopalakrishnan,
	Andrey Grodzovsky, Lijo Lazar, Bokun Zhang, Evan Quan, Jack Xiao,
	Richard Gong, Felix Kuehling, Somalapuram Amaranath,
	linux-kernel, YiPeng Chai, Hawking Zhang

Since the original issue is resolved by a new fix, the ASPM workaround
can be dropped.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ---------------
 drivers/gpu/drm/amd/amdgpu/nv.c            |  2 +-
 drivers/gpu/drm/amd/amdgpu/vi.c            |  2 +-
 4 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8cf2cc50b3de..a19a6489b117 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1248,7 +1248,6 @@ void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
 int amdgpu_device_pci_reset(struct amdgpu_device *adev);
 bool amdgpu_device_need_post(struct amdgpu_device *adev);
 bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
-bool amdgpu_device_aspm_support_quirk(void);
 
 void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
 				  u64 num_vis_bytes);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index d56b7a2bafa6..0cacace2d6c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -81,10 +81,6 @@
 
 #include <drm/drm_drv.h>
 
-#if IS_ENABLED(CONFIG_X86)
-#include <asm/intel-family.h>
-#endif
-
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -1377,17 +1373,6 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev)
 	return pcie_aspm_enabled(adev->pdev);
 }
 
-bool amdgpu_device_aspm_support_quirk(void)
-{
-#if IS_ENABLED(CONFIG_X86)
-	struct cpuinfo_x86 *c = &cpu_data(0);
-
-	return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
-#else
-	return true;
-#endif
-}
-
 /* if we get transitioned to only one device, take VGA back */
 /**
  * amdgpu_device_vga_set_decode - enable/disable vga decode
diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
index 47420b403871..15f3c6745ea9 100644
--- a/drivers/gpu/drm/amd/amdgpu/nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/nv.c
@@ -522,7 +522,7 @@ static int nv_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
 
 static void nv_program_aspm(struct amdgpu_device *adev)
 {
-	if (!amdgpu_device_should_use_aspm(adev) || !amdgpu_device_aspm_support_quirk())
+	if (!amdgpu_device_should_use_aspm(adev))
 		return;
 
 	if (!(adev->flags & AMD_IS_APU) &&
diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
index 531f173ade2d..81dcb1148a60 100644
--- a/drivers/gpu/drm/amd/amdgpu/vi.c
+++ b/drivers/gpu/drm/amd/amdgpu/vi.c
@@ -1122,7 +1122,7 @@ static void vi_program_aspm(struct amdgpu_device *adev)
 	bool bL1SS = false;
 	bool bClkReqSupport = true;
 
-	if (!amdgpu_device_should_use_aspm(adev) || !amdgpu_device_aspm_support_quirk())
+	if (!amdgpu_device_should_use_aspm(adev))
 		return;
 
 	if (adev->flags & AMD_IS_APU ||
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV
  2023-03-29  9:59 ` [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV Kai-Heng Feng
@ 2023-03-29 13:08   ` Gong, Richard
  2023-03-29 13:18     ` Mario Limonciello
  0 siblings, 1 reply; 11+ messages in thread
From: Gong, Richard @ 2023-03-29 13:08 UTC (permalink / raw)
  To: Kai-Heng Feng, alexander.deucher, christian.koenig, Xinhui.Pan
  Cc: Andrey Grodzovsky, Jack Xiao, Somalapuram Amaranath, dri-devel,
	Guchun Chen, Bokun Zhang, Felix Kuehling, Victor Zhao,
	linux-kernel, amd-gfx, Lijo Lazar, YiPeng Chai,
	Mario Limonciello, Veerabadhran Gopalakrishnan, Likun Gao,
	Evan Quan, Hawking Zhang


On 3/29/2023 4:59 AM, Kai-Heng Feng wrote:
> Since the original issue is resolved by a new fix, the ASPM workaround
> can be dropped.
What is the new fix? Can you elaborate more or add the new fix commit here?
>
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ---------------
>   drivers/gpu/drm/amd/amdgpu/nv.c            |  2 +-
>   drivers/gpu/drm/amd/amdgpu/vi.c            |  2 +-
>   4 files changed, 2 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 8cf2cc50b3de..a19a6489b117 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1248,7 +1248,6 @@ void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>   bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
> -bool amdgpu_device_aspm_support_quirk(void);
>   
>   void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes,
>   				  u64 num_vis_bytes);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index d56b7a2bafa6..0cacace2d6c2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -81,10 +81,6 @@
>   
>   #include <drm/drm_drv.h>
>   
> -#if IS_ENABLED(CONFIG_X86)
> -#include <asm/intel-family.h>
> -#endif
> -
>   MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>   MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>   MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> @@ -1377,17 +1373,6 @@ bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev)
>   	return pcie_aspm_enabled(adev->pdev);
>   }
>   
> -bool amdgpu_device_aspm_support_quirk(void)
> -{
> -#if IS_ENABLED(CONFIG_X86)
> -	struct cpuinfo_x86 *c = &cpu_data(0);
> -
> -	return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> -#else
> -	return true;
> -#endif
> -}
> -
>   /* if we get transitioned to only one device, take VGA back */
>   /**
>    * amdgpu_device_vga_set_decode - enable/disable vga decode
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 47420b403871..15f3c6745ea9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -522,7 +522,7 @@ static int nv_set_vce_clocks(struct amdgpu_device *adev, u32 evclk, u32 ecclk)
>   
>   static void nv_program_aspm(struct amdgpu_device *adev)
>   {
> -	if (!amdgpu_device_should_use_aspm(adev) || !amdgpu_device_aspm_support_quirk())
> +	if (!amdgpu_device_should_use_aspm(adev))
>   		return;
>   
>   	if (!(adev->flags & AMD_IS_APU) &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 531f173ade2d..81dcb1148a60 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -1122,7 +1122,7 @@ static void vi_program_aspm(struct amdgpu_device *adev)
>   	bool bL1SS = false;
>   	bool bClkReqSupport = true;
>   
> -	if (!amdgpu_device_should_use_aspm(adev) || !amdgpu_device_aspm_support_quirk())
> +	if (!amdgpu_device_should_use_aspm(adev))
>   		return;
>   
>   	if (adev->flags & AMD_IS_APU ||

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV
  2023-03-29 13:08   ` Gong, Richard
@ 2023-03-29 13:18     ` Mario Limonciello
  0 siblings, 0 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-03-29 13:18 UTC (permalink / raw)
  To: Gong, Richard, Kai-Heng Feng, alexander.deucher,
	christian.koenig, Xinhui.Pan
  Cc: Andrey Grodzovsky, Jack Xiao, Somalapuram Amaranath, Guchun Chen,
	Bokun Zhang, Felix Kuehling, Victor Zhao, linux-kernel, amd-gfx,
	Lijo Lazar, YiPeng Chai, dri-devel, Veerabadhran Gopalakrishnan,
	Likun Gao, Evan Quan, Hawking Zhang


On 3/29/23 08:08, Gong, Richard wrote:
>
> On 3/29/2023 4:59 AM, Kai-Heng Feng wrote:
>> Since the original issue is resolved by a new fix, the ASPM workaround
>> can be dropped.
> What is the new fix? Can you elaborate more or add the new fix commit 
> here?

It's his first patch in the series, but yes I agree it should be 
mentioned in this

message explicitly what about that means this can be dropped.

>>
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 -
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 ---------------
>>   drivers/gpu/drm/amd/amdgpu/nv.c            |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/vi.c            |  2 +-
>>   4 files changed, 2 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 8cf2cc50b3de..a19a6489b117 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1248,7 +1248,6 @@ void amdgpu_device_pci_config_reset(struct 
>> amdgpu_device *adev);
>>   int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>>   bool amdgpu_device_need_post(struct amdgpu_device *adev);
>>   bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
>> -bool amdgpu_device_aspm_support_quirk(void);
>>     void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 
>> num_bytes,
>>                     u64 num_vis_bytes);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index d56b7a2bafa6..0cacace2d6c2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -81,10 +81,6 @@
>>     #include <drm/drm_drv.h>
>>   -#if IS_ENABLED(CONFIG_X86)
>> -#include <asm/intel-family.h>
>> -#endif
>> -
>>   MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>>   MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>>   MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
>> @@ -1377,17 +1373,6 @@ bool amdgpu_device_should_use_aspm(struct 
>> amdgpu_device *adev)
>>       return pcie_aspm_enabled(adev->pdev);
>>   }
>>   -bool amdgpu_device_aspm_support_quirk(void)
>> -{
>> -#if IS_ENABLED(CONFIG_X86)
>> -    struct cpuinfo_x86 *c = &cpu_data(0);
>> -
>> -    return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
>> -#else
>> -    return true;
>> -#endif
>> -}
>> -
>>   /* if we get transitioned to only one device, take VGA back */
>>   /**
>>    * amdgpu_device_vga_set_decode - enable/disable vga decode
>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c 
>> b/drivers/gpu/drm/amd/amdgpu/nv.c
>> index 47420b403871..15f3c6745ea9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>> @@ -522,7 +522,7 @@ static int nv_set_vce_clocks(struct amdgpu_device 
>> *adev, u32 evclk, u32 ecclk)
>>     static void nv_program_aspm(struct amdgpu_device *adev)
>>   {
>> -    if (!amdgpu_device_should_use_aspm(adev) || 
>> !amdgpu_device_aspm_support_quirk())
>> +    if (!amdgpu_device_should_use_aspm(adev))
>>           return;
>>         if (!(adev->flags & AMD_IS_APU) &&
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c 
>> b/drivers/gpu/drm/amd/amdgpu/vi.c
>> index 531f173ade2d..81dcb1148a60 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>> @@ -1122,7 +1122,7 @@ static void vi_program_aspm(struct 
>> amdgpu_device *adev)
>>       bool bL1SS = false;
>>       bool bClkReqSupport = true;
>>   -    if (!amdgpu_device_should_use_aspm(adev) || 
>> !amdgpu_device_aspm_support_quirk())
>> +    if (!amdgpu_device_should_use_aspm(adev))
>>           return;
>>         if (adev->flags & AMD_IS_APU ||

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-29  9:59 [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Kai-Heng Feng
  2023-03-29  9:59 ` [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV Kai-Heng Feng
@ 2023-03-29 13:21 ` Alex Deucher
  2023-03-30  0:49   ` Kai-Heng Feng
  2023-03-29 13:23 ` Mario Limonciello
  2 siblings, 1 reply; 11+ messages in thread
From: Alex Deucher @ 2023-03-29 13:21 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jingyu Wang, Somalapuram Amaranath, Tim Huang, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	Guchun Chen, Rafael J. Wysocki, amd-gfx, Jiansong Chen,
	Kenneth Feng, Andrey Grodzovsky, Bokun Zhang, Hans de Goede,
	Maxime Ripard, Evan Quan, Xinhui.Pan, linux-kernel,
	alexander.deucher, christian.koenig, Hawking Zhang

On Wed, Mar 29, 2023 at 6:00 AM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> When the power is lost due to ACPI power resources being turned off, the
> driver should reset the GPU so it can work anew.
>
> First, _PR3 support of the hierarchy needs to be found correctly. Since
> the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> _PR3 on downstream port alone is not enough, as the _PR3 can associate
> to the root port above the PCIe switch.
>
> Once the _PR3 is found and BOCO support is correctly marked, use that
> information to inform the GPU should be reset. This solves an issue that
> system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> is supported for the GFX slot.

I don't think we need to reset the GPU.  If the power is turned off, a
reset shouldn't be necessary. The reset is only necessary when the
power is not turned off to put the GPU into a known good state.  It
should be in that state already if the power is turn off.  It sounds
like the device is not actually getting powered off.

Alex

>
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
>  3 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 60b1857f469e..407456ac0e84 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
>         if (amdgpu_sriov_vf(adev))
>                 return false;
>
> +       if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> +               return true;
> +
>  #if IS_ENABLED(CONFIG_SUSPEND)
>         return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
>  #else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f5658359ff5c..d56b7a2bafa6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
>
>         if (!(adev->flags & AMD_IS_APU)) {
>                 parent = pci_upstream_bridge(adev->pdev);
> -               adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> +               do {
> +                       if (pci_pr3_present(parent)) {
> +                               adev->has_pr3 = true;
> +                               break;
> +                       }
> +               } while ((parent = pci_upstream_bridge(parent)));
>         }
>
>         amdgpu_amdkfd_device_probe(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index ba5def374368..5d81fcac4b0a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
>         struct drm_device *drm_dev = dev_get_drvdata(dev);
>         struct amdgpu_device *adev = drm_to_adev(drm_dev);
>
> -       if (amdgpu_acpi_is_s0ix_active(adev))
> -               adev->in_s0ix = true;
> -       else if (amdgpu_acpi_is_s3_active(adev))
> +       if (amdgpu_acpi_is_s3_active(adev) ||
> +           amdgpu_device_supports_boco(drm_dev))
>                 adev->in_s3 = true;
> +       else if (amdgpu_acpi_is_s0ix_active(adev))
> +               adev->in_s0ix = true;
>         if (!adev->in_s0ix && !adev->in_s3)
>                 return 0;
>         return amdgpu_device_suspend(drm_dev, true);
> @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
>                 adev->no_hw_access = true;
>
>         r = amdgpu_device_resume(drm_dev, true);
> -       if (amdgpu_acpi_is_s0ix_active(adev))
> -               adev->in_s0ix = false;
> -       else
> -               adev->in_s3 = false;
> +       adev->in_s0ix = adev->in_s3 = false;
>         return r;
>  }
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-29  9:59 [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Kai-Heng Feng
  2023-03-29  9:59 ` [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV Kai-Heng Feng
  2023-03-29 13:21 ` [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Alex Deucher
@ 2023-03-29 13:23 ` Mario Limonciello
  2023-03-30  3:36   ` Kai-Heng Feng
  2 siblings, 1 reply; 11+ messages in thread
From: Mario Limonciello @ 2023-03-29 13:23 UTC (permalink / raw)
  To: Kai-Heng Feng, alexander.deucher, christian.koenig, Xinhui.Pan
  Cc: Tim Huang, Jingyu Wang, Lijo Lazar, Guchun Chen,
	Andrey Grodzovsky, Somalapuram Amaranath, Bokun Zhang, dri-devel,
	Rafael J. Wysocki, linux-kernel, amd-gfx, Hans de Goede,
	Michel Dänzer, YiPeng Chai, Maxime Ripard, Jiansong Chen,
	Evan Quan, Kenneth Feng, Hawking Zhang


On 3/29/23 04:59, Kai-Heng Feng wrote:
> When the power is lost due to ACPI power resources being turned off, the
> driver should reset the GPU so it can work anew.
>
> First, _PR3 support of the hierarchy needs to be found correctly. Since
> the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> _PR3 on downstream port alone is not enough, as the _PR3 can associate
> to the root port above the PCIe switch.

I think this should be split into two commits:

* One of them to look at _PR3 further up in hierarchy to fix indication
for BOCO support.

* One to adjust policy for whether to reset


> Once the _PR3 is found and BOCO support is correctly marked, use that
> information to inform the GPU should be reset. This solves an issue that
> system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> is supported for the GFX slot.

I'm worried this is still papering over an underlying issue with L0s
handling on ALD + Navi1x/Navi2x.

Also, what about runtime suspend?  If you unplug the monitor from this
dGPU and interact with it over SSH it should go into runtime suspend.

Is it working properly for that case now?

>
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
>   3 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 60b1857f469e..407456ac0e84 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
>   	if (amdgpu_sriov_vf(adev))
>   		return false;
>   
> +	if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> +		return true;
> +
>   #if IS_ENABLED(CONFIG_SUSPEND)
>   	return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
>   #else
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index f5658359ff5c..d56b7a2bafa6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
>   
>   	if (!(adev->flags & AMD_IS_APU)) {
>   		parent = pci_upstream_bridge(adev->pdev);
> -		adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> +		do {
> +			if (pci_pr3_present(parent)) {
> +				adev->has_pr3 = true;
> +				break;
> +			}
> +		} while ((parent = pci_upstream_bridge(parent)));
>   	}
>   
>   	amdgpu_amdkfd_device_probe(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index ba5def374368..5d81fcac4b0a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
>   	struct drm_device *drm_dev = dev_get_drvdata(dev);
>   	struct amdgpu_device *adev = drm_to_adev(drm_dev);
>   
> -	if (amdgpu_acpi_is_s0ix_active(adev))
> -		adev->in_s0ix = true;
> -	else if (amdgpu_acpi_is_s3_active(adev))
> +	if (amdgpu_acpi_is_s3_active(adev) ||
> +	    amdgpu_device_supports_boco(drm_dev))
>   		adev->in_s3 = true;
> +	else if (amdgpu_acpi_is_s0ix_active(adev))
> +		adev->in_s0ix = true;
>   	if (!adev->in_s0ix && !adev->in_s3)
>   		return 0;
>   	return amdgpu_device_suspend(drm_dev, true);
> @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
>   		adev->no_hw_access = true;
>   
>   	r = amdgpu_device_resume(drm_dev, true);
> -	if (amdgpu_acpi_is_s0ix_active(adev))
> -		adev->in_s0ix = false;
> -	else
> -		adev->in_s3 = false;
> +	adev->in_s0ix = adev->in_s3 = false;
>   	return r;
>   }
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-29 13:21 ` [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Alex Deucher
@ 2023-03-30  0:49   ` Kai-Heng Feng
  2023-03-30  2:08     ` Alex Deucher
  0 siblings, 1 reply; 11+ messages in thread
From: Kai-Heng Feng @ 2023-03-30  0:49 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Jingyu Wang, Somalapuram Amaranath, Tim Huang, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	Guchun Chen, Rafael J. Wysocki, amd-gfx, Jiansong Chen,
	Kenneth Feng, Andrey Grodzovsky, Bokun Zhang, Hans de Goede,
	Maxime Ripard, Evan Quan, Xinhui.Pan, linux-kernel,
	alexander.deucher, christian.koenig, Hawking Zhang

On Wed, Mar 29, 2023 at 9:21 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Mar 29, 2023 at 6:00 AM Kai-Heng Feng
> <kai.heng.feng@canonical.com> wrote:
> >
> > When the power is lost due to ACPI power resources being turned off, the
> > driver should reset the GPU so it can work anew.
> >
> > First, _PR3 support of the hierarchy needs to be found correctly. Since
> > the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> > _PR3 on downstream port alone is not enough, as the _PR3 can associate
> > to the root port above the PCIe switch.
> >
> > Once the _PR3 is found and BOCO support is correctly marked, use that
> > information to inform the GPU should be reset. This solves an issue that
> > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> > is supported for the GFX slot.
>
> I don't think we need to reset the GPU.  If the power is turned off, a
> reset shouldn't be necessary. The reset is only necessary when the
> power is not turned off to put the GPU into a known good state.  It
> should be in that state already if the power is turn off.  It sounds
> like the device is not actually getting powered off.

I had the impression that the GPU gets reset because S3 turned the
power rail off.

So the actual intention for GPU reset is because S3 doesn't guarantee
the power is being turned off?

Kai-Heng

>
> Alex
>
> >
> > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
> >  3 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 60b1857f469e..407456ac0e84 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
> >         if (amdgpu_sriov_vf(adev))
> >                 return false;
> >
> > +       if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> > +               return true;
> > +
> >  #if IS_ENABLED(CONFIG_SUSPEND)
> >         return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
> >  #else
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index f5658359ff5c..d56b7a2bafa6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
> >
> >         if (!(adev->flags & AMD_IS_APU)) {
> >                 parent = pci_upstream_bridge(adev->pdev);
> > -               adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > +               do {
> > +                       if (pci_pr3_present(parent)) {
> > +                               adev->has_pr3 = true;
> > +                               break;
> > +                       }
> > +               } while ((parent = pci_upstream_bridge(parent)));
> >         }
> >
> >         amdgpu_amdkfd_device_probe(adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index ba5def374368..5d81fcac4b0a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
> >         struct drm_device *drm_dev = dev_get_drvdata(dev);
> >         struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >
> > -       if (amdgpu_acpi_is_s0ix_active(adev))
> > -               adev->in_s0ix = true;
> > -       else if (amdgpu_acpi_is_s3_active(adev))
> > +       if (amdgpu_acpi_is_s3_active(adev) ||
> > +           amdgpu_device_supports_boco(drm_dev))
> >                 adev->in_s3 = true;
> > +       else if (amdgpu_acpi_is_s0ix_active(adev))
> > +               adev->in_s0ix = true;
> >         if (!adev->in_s0ix && !adev->in_s3)
> >                 return 0;
> >         return amdgpu_device_suspend(drm_dev, true);
> > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> >                 adev->no_hw_access = true;
> >
> >         r = amdgpu_device_resume(drm_dev, true);
> > -       if (amdgpu_acpi_is_s0ix_active(adev))
> > -               adev->in_s0ix = false;
> > -       else
> > -               adev->in_s3 = false;
> > +       adev->in_s0ix = adev->in_s3 = false;
> >         return r;
> >  }
> >
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-30  0:49   ` Kai-Heng Feng
@ 2023-03-30  2:08     ` Alex Deucher
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2023-03-30  2:08 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jingyu Wang, Somalapuram Amaranath, Tim Huang, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	Guchun Chen, Rafael J. Wysocki, amd-gfx, Jiansong Chen,
	Kenneth Feng, Andrey Grodzovsky, Bokun Zhang, Hans de Goede,
	Maxime Ripard, Evan Quan, Xinhui.Pan, linux-kernel,
	alexander.deucher, christian.koenig, Hawking Zhang

On Wed, Mar 29, 2023 at 8:49 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> On Wed, Mar 29, 2023 at 9:21 PM Alex Deucher <alexdeucher@gmail.com> wrote:
> >
> > On Wed, Mar 29, 2023 at 6:00 AM Kai-Heng Feng
> > <kai.heng.feng@canonical.com> wrote:
> > >
> > > When the power is lost due to ACPI power resources being turned off, the
> > > driver should reset the GPU so it can work anew.
> > >
> > > First, _PR3 support of the hierarchy needs to be found correctly. Since
> > > the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> > > _PR3 on downstream port alone is not enough, as the _PR3 can associate
> > > to the root port above the PCIe switch.
> > >
> > > Once the _PR3 is found and BOCO support is correctly marked, use that
> > > information to inform the GPU should be reset. This solves an issue that
> > > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> > > is supported for the GFX slot.
> >
> > I don't think we need to reset the GPU.  If the power is turned off, a
> > reset shouldn't be necessary. The reset is only necessary when the
> > power is not turned off to put the GPU into a known good state.  It
> > should be in that state already if the power is turn off.  It sounds
> > like the device is not actually getting powered off.
>
> I had the impression that the GPU gets reset because S3 turned the
> power rail off.
>
> So the actual intention for GPU reset is because S3 doesn't guarantee
> the power is being turned off?

For S4, the reset in freeze is there because once the boot kernel
transitions to the hibernated kernel, we need the reset to bring the
GPU back to a known state.  On dGPUs at least there are some engines
that can only be initialized once and then require a reset to be
initialized again.  The one in suspend was originally there to deal
with aborted suspends where we'd need to reset the GPU for the same
reason as S4.  However, it no longer really serves much purpose since
it got moved to noirq and it could probably be dropped.

Alex


>
> Kai-Heng
>
> >
> > Alex
> >
> > >
> > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
> > >  3 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > index 60b1857f469e..407456ac0e84 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
> > >         if (amdgpu_sriov_vf(adev))
> > >                 return false;
> > >
> > > +       if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> > > +               return true;
> > > +
> > >  #if IS_ENABLED(CONFIG_SUSPEND)
> > >         return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
> > >  #else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index f5658359ff5c..d56b7a2bafa6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
> > >
> > >         if (!(adev->flags & AMD_IS_APU)) {
> > >                 parent = pci_upstream_bridge(adev->pdev);
> > > -               adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > > +               do {
> > > +                       if (pci_pr3_present(parent)) {
> > > +                               adev->has_pr3 = true;
> > > +                               break;
> > > +                       }
> > > +               } while ((parent = pci_upstream_bridge(parent)));
> > >         }
> > >
> > >         amdgpu_amdkfd_device_probe(adev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index ba5def374368..5d81fcac4b0a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
> > >         struct drm_device *drm_dev = dev_get_drvdata(dev);
> > >         struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > >
> > > -       if (amdgpu_acpi_is_s0ix_active(adev))
> > > -               adev->in_s0ix = true;
> > > -       else if (amdgpu_acpi_is_s3_active(adev))
> > > +       if (amdgpu_acpi_is_s3_active(adev) ||
> > > +           amdgpu_device_supports_boco(drm_dev))
> > >                 adev->in_s3 = true;
> > > +       else if (amdgpu_acpi_is_s0ix_active(adev))
> > > +               adev->in_s0ix = true;
> > >         if (!adev->in_s0ix && !adev->in_s3)
> > >                 return 0;
> > >         return amdgpu_device_suspend(drm_dev, true);
> > > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> > >                 adev->no_hw_access = true;
> > >
> > >         r = amdgpu_device_resume(drm_dev, true);
> > > -       if (amdgpu_acpi_is_s0ix_active(adev))
> > > -               adev->in_s0ix = false;
> > > -       else
> > > -               adev->in_s3 = false;
> > > +       adev->in_s0ix = adev->in_s3 = false;
> > >         return r;
> > >  }
> > >
> > > --
> > > 2.34.1
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-29 13:23 ` Mario Limonciello
@ 2023-03-30  3:36   ` Kai-Heng Feng
  2023-03-30  4:20     ` Mario Limonciello
  2023-03-30 13:22     ` Alex Deucher
  0 siblings, 2 replies; 11+ messages in thread
From: Kai-Heng Feng @ 2023-03-30  3:36 UTC (permalink / raw)
  To: Mario Limonciello
  Cc: Jingyu Wang, Xinhui.Pan, Andrey Grodzovsky, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Guchun Chen,
	Rafael J. Wysocki, amd-gfx, Jiansong Chen, Kenneth Feng,
	Tim Huang, Bokun Zhang, Hans de Goede, Maxime Ripard, Evan Quan,
	Somalapuram Amaranath, linux-kernel, alexander.deucher,
	christian.koenig, Hawking Zhang

On Wed, Mar 29, 2023 at 9:23 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
>
> On 3/29/23 04:59, Kai-Heng Feng wrote:
> > When the power is lost due to ACPI power resources being turned off, the
> > driver should reset the GPU so it can work anew.
> >
> > First, _PR3 support of the hierarchy needs to be found correctly. Since
> > the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> > _PR3 on downstream port alone is not enough, as the _PR3 can associate
> > to the root port above the PCIe switch.
>
> I think this should be split into two commits:
>
> * One of them to look at _PR3 further up in hierarchy to fix indication
> for BOCO support.

Yes, this part can be split up.

>
> * One to adjust policy for whether to reset

IIUC, the GPU only needs to be reset when the power status isn't certain?

Assuming power resources in _PR3 are really disabled, GPU is already
reset by itself. That means reset shouldn't be necessary for D3cold,
am I understanding it correctly?

However, this is a desktop plugged with GFX card that has external
power, does that assumption still stand? Perform resetting on D3cold
can cover this scenario.

>
>
> > Once the _PR3 is found and BOCO support is correctly marked, use that
> > information to inform the GPU should be reset. This solves an issue that
> > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> > is supported for the GFX slot.
>
> I'm worried this is still papering over an underlying issue with L0s
> handling on ALD + Navi1x/Navi2x.

Is it possible to get the ASIC's ASPM parameter under Windows? Knowing
the difference can be useful.

>
> Also, what about runtime suspend?  If you unplug the monitor from this
> dGPU and interact with it over SSH it should go into runtime suspend.
>
> Is it working properly for that case now?

Thanks for the tip. Runtime resume doesn't work at all:
[ 1087.601631] pcieport 0000:00:01.0: power state changed by ACPI to D0
[ 1087.613820] pcieport 0000:00:01.0: restoring config space at offset
0x2c (was 0x43, writing 0x43)
[ 1087.613835] pcieport 0000:00:01.0: restoring config space at offset
0x28 (was 0x41, writing 0x41)
[ 1087.613841] pcieport 0000:00:01.0: restoring config space at offset
0x24 (was 0xfff10001, writing 0xfff10001)
[ 1087.613978] pcieport 0000:00:01.0: PME# disabled
[ 1087.613984] pcieport 0000:00:01.0: waiting 100 ms for downstream
link, after activation
[ 1089.330956] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
[ 1089.373036] pcieport 0000:01:00.0: Unable to change power state
from D3cold to D0, device inaccessible

After a short while the whole system froze.

So the upstream port of GFX's PCIe switch cannot be powered on again.

Kai-Heng

>
> >
> > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
> >   3 files changed, 14 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 60b1857f469e..407456ac0e84 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
> >       if (amdgpu_sriov_vf(adev))
> >               return false;
> >
> > +     if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> > +             return true;
> > +
> >   #if IS_ENABLED(CONFIG_SUSPEND)
> >       return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
> >   #else
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index f5658359ff5c..d56b7a2bafa6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
> >
> >       if (!(adev->flags & AMD_IS_APU)) {
> >               parent = pci_upstream_bridge(adev->pdev);
> > -             adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > +             do {
> > +                     if (pci_pr3_present(parent)) {
> > +                             adev->has_pr3 = true;
> > +                             break;
> > +                     }
> > +             } while ((parent = pci_upstream_bridge(parent)));
> >       }
> >
> >       amdgpu_amdkfd_device_probe(adev);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index ba5def374368..5d81fcac4b0a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
> >       struct drm_device *drm_dev = dev_get_drvdata(dev);
> >       struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >
> > -     if (amdgpu_acpi_is_s0ix_active(adev))
> > -             adev->in_s0ix = true;
> > -     else if (amdgpu_acpi_is_s3_active(adev))
> > +     if (amdgpu_acpi_is_s3_active(adev) ||
> > +         amdgpu_device_supports_boco(drm_dev))
> >               adev->in_s3 = true;
> > +     else if (amdgpu_acpi_is_s0ix_active(adev))
> > +             adev->in_s0ix = true;
> >       if (!adev->in_s0ix && !adev->in_s3)
> >               return 0;
> >       return amdgpu_device_suspend(drm_dev, true);
> > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> >               adev->no_hw_access = true;
> >
> >       r = amdgpu_device_resume(drm_dev, true);
> > -     if (amdgpu_acpi_is_s0ix_active(adev))
> > -             adev->in_s0ix = false;
> > -     else
> > -             adev->in_s3 = false;
> > +     adev->in_s0ix = adev->in_s3 = false;
> >       return r;
> >   }
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-30  3:36   ` Kai-Heng Feng
@ 2023-03-30  4:20     ` Mario Limonciello
  2023-03-30 13:22     ` Alex Deucher
  1 sibling, 0 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-03-30  4:20 UTC (permalink / raw)
  To: Kai-Heng Feng, Evan Quan
  Cc: Jingyu Wang, Xinhui.Pan, Andrey Grodzovsky, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Guchun Chen,
	Rafael J. Wysocki, amd-gfx, Jiansong Chen, Kenneth Feng,
	Tim Huang, Bokun Zhang, Hans de Goede, Maxime Ripard,
	Somalapuram Amaranath, linux-kernel, alexander.deucher,
	christian.koenig, Hawking Zhang


On 3/29/23 22:36, Kai-Heng Feng wrote:
> On Wed, Mar 29, 2023 at 9:23 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
>>
>> On 3/29/23 04:59, Kai-Heng Feng wrote:
>>> When the power is lost due to ACPI power resources being turned off, the
>>> driver should reset the GPU so it can work anew.
>>>
>>> First, _PR3 support of the hierarchy needs to be found correctly. Since
>>> the GPU on some discrete GFX cards is behind a PCIe switch, checking the
>>> _PR3 on downstream port alone is not enough, as the _PR3 can associate
>>> to the root port above the PCIe switch.
>> I think this should be split into two commits:
>>
>> * One of them to look at _PR3 further up in hierarchy to fix indication
>> for BOCO support.
> Yes, this part can be split up.
>
>> * One to adjust policy for whether to reset
> IIUC, the GPU only needs to be reset when the power status isn't certain?
Yeah; think of reset as a particular case that all hardware isn't 
initialized.
> Assuming power resources in _PR3 are really disabled, GPU is already
> reset by itself. That means reset shouldn't be necessary for D3cold,
> am I understanding it correctly?
Can we see the acpidump for this system?
> However, this is a desktop plugged with GFX card that has external
> power, does that assumption still stand? Perform resetting on D3cold
> can cover this scenario.

Are you sure it's going to D3cold?  Or is it D3hot?  What does _S0W have 
in this design?

>>
>>> Once the _PR3 is found and BOCO support is correctly marked, use that
>>> information to inform the GPU should be reset. This solves an issue that
>>> system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
>>> is supported for the GFX slot.
>> I'm worried this is still papering over an underlying issue with L0s
>> handling on ALD + Navi1x/Navi2x.
> Is it possible to get the ASIC's ASPM parameter under Windows? Knowing
> the difference can be useful.
Evan is in discussion with Windows guys about this issue too.
>
>> Also, what about runtime suspend?  If you unplug the monitor from this
>> dGPU and interact with it over SSH it should go into runtime suspend.
>>
>> Is it working properly for that case now?
> Thanks for the tip. Runtime resume doesn't work at all:
> [ 1087.601631] pcieport 0000:00:01.0: power state changed by ACPI to D0
> [ 1087.613820] pcieport 0000:00:01.0: restoring config space at offset
> 0x2c (was 0x43, writing 0x43)
> [ 1087.613835] pcieport 0000:00:01.0: restoring config space at offset
> 0x28 (was 0x41, writing 0x41)
> [ 1087.613841] pcieport 0000:00:01.0: restoring config space at offset
> 0x24 (was 0xfff10001, writing 0xfff10001)
> [ 1087.613978] pcieport 0000:00:01.0: PME# disabled
> [ 1087.613984] pcieport 0000:00:01.0: waiting 100 ms for downstream
> link, after activation
> [ 1089.330956] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> [ 1089.373036] pcieport 0000:01:00.0: Unable to change power state
> from D3cold to D0, device inaccessible
>
> After a short while the whole system froze.
>
> So the upstream port of GFX's PCIe switch cannot be powered on again.

What is the state of the kernel patches while doing this test?
Specifically does this happen without amdgpu.aspm set?  Or this happens
no matter what?

>
> Kai-Heng
>
>>> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
>>>    3 files changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
>>> index 60b1857f469e..407456ac0e84 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
>>> @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
>>>        if (amdgpu_sriov_vf(adev))
>>>                return false;
>>>
>>> +     if (amdgpu_device_supports_boco(adev_to_drm(adev)))
>>> +             return true;
>>> +
>>>    #if IS_ENABLED(CONFIG_SUSPEND)
>>>        return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
>>>    #else
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index f5658359ff5c..d56b7a2bafa6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
>>>
>>>        if (!(adev->flags & AMD_IS_APU)) {
>>>                parent = pci_upstream_bridge(adev->pdev);
>>> -             adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
>>> +             do {
>>> +                     if (pci_pr3_present(parent)) {
>>> +                             adev->has_pr3 = true;
>>> +                             break;
>>> +                     }
>>> +             } while ((parent = pci_upstream_bridge(parent)));
>>>        }
>>>
>>>        amdgpu_amdkfd_device_probe(adev);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index ba5def374368..5d81fcac4b0a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
>>>        struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>        struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>
>>> -     if (amdgpu_acpi_is_s0ix_active(adev))
>>> -             adev->in_s0ix = true;
>>> -     else if (amdgpu_acpi_is_s3_active(adev))
>>> +     if (amdgpu_acpi_is_s3_active(adev) ||
>>> +         amdgpu_device_supports_boco(drm_dev))
>>>                adev->in_s3 = true;
>>> +     else if (amdgpu_acpi_is_s0ix_active(adev))
>>> +             adev->in_s0ix = true;
>>>        if (!adev->in_s0ix && !adev->in_s3)
>>>                return 0;
>>>        return amdgpu_device_suspend(drm_dev, true);
>>> @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
>>>                adev->no_hw_access = true;
>>>
>>>        r = amdgpu_device_resume(drm_dev, true);
>>> -     if (amdgpu_acpi_is_s0ix_active(adev))
>>> -             adev->in_s0ix = false;
>>> -     else
>>> -             adev->in_s3 = false;
>>> +     adev->in_s0ix = adev->in_s3 = false;
>>>        return r;
>>>    }
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO
  2023-03-30  3:36   ` Kai-Heng Feng
  2023-03-30  4:20     ` Mario Limonciello
@ 2023-03-30 13:22     ` Alex Deucher
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Deucher @ 2023-03-30 13:22 UTC (permalink / raw)
  To: Kai-Heng Feng
  Cc: Jingyu Wang, Somalapuram Amaranath, Tim Huang, Lijo Lazar,
	dri-devel, Michel Dänzer, YiPeng Chai, Mario Limonciello,
	Guchun Chen, Rafael J. Wysocki, amd-gfx, Jiansong Chen,
	Kenneth Feng, Andrey Grodzovsky, Bokun Zhang, Hans de Goede,
	Maxime Ripard, Evan Quan, Xinhui.Pan, linux-kernel,
	alexander.deucher, christian.koenig, Hawking Zhang

On Wed, Mar 29, 2023 at 11:36 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> On Wed, Mar 29, 2023 at 9:23 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
> >
> >
> > On 3/29/23 04:59, Kai-Heng Feng wrote:
> > > When the power is lost due to ACPI power resources being turned off, the
> > > driver should reset the GPU so it can work anew.
> > >
> > > First, _PR3 support of the hierarchy needs to be found correctly. Since
> > > the GPU on some discrete GFX cards is behind a PCIe switch, checking the
> > > _PR3 on downstream port alone is not enough, as the _PR3 can associate
> > > to the root port above the PCIe switch.
> >
> > I think this should be split into two commits:
> >
> > * One of them to look at _PR3 further up in hierarchy to fix indication
> > for BOCO support.
>
> Yes, this part can be split up.
>
> >
> > * One to adjust policy for whether to reset
>
> IIUC, the GPU only needs to be reset when the power status isn't certain?
>
> Assuming power resources in _PR3 are really disabled, GPU is already
> reset by itself. That means reset shouldn't be necessary for D3cold,
> am I understanding it correctly?

Right, if D3cold actually works, then no reset is necessary.

>
> However, this is a desktop plugged with GFX card that has external
> power, does that assumption still stand? Perform resetting on D3cold
> can cover this scenario.

BOCO is generally only available on laptops and all-in-one systems
where the dGPU is integrated into the platform.  Power to the dGPU is
controlled by a GPIO which is toggled by the ACPI _PR3 method for the
device.  There is an ATPX method on all platforms which support BOCO.
Since this is an AIB in a desktop system, I doubt it actually supports
D3Cold.  For desktop systems, we have what we call BACO where the
driver controls power to everything on the GPU except the bus
interface.  In the BACO case, we can turn off the GPU, but the device
still shows up on the PCI bus.  For BOCO, the device is completely
powered down and disappears from the PCI bus.

Alex

>
> >
> >
> > > Once the _PR3 is found and BOCO support is correctly marked, use that
> > > information to inform the GPU should be reset. This solves an issue that
> > > system freeze on a Intel ADL desktop that uses S0ix for sleep and D3cold
> > > is supported for the GFX slot.
> >
> > I'm worried this is still papering over an underlying issue with L0s
> > handling on ALD + Navi1x/Navi2x.
>
> Is it possible to get the ASIC's ASPM parameter under Windows? Knowing
> the difference can be useful.
>
> >
> > Also, what about runtime suspend?  If you unplug the monitor from this
> > dGPU and interact with it over SSH it should go into runtime suspend.
> >
> > Is it working properly for that case now?
>
> Thanks for the tip. Runtime resume doesn't work at all:
> [ 1087.601631] pcieport 0000:00:01.0: power state changed by ACPI to D0
> [ 1087.613820] pcieport 0000:00:01.0: restoring config space at offset
> 0x2c (was 0x43, writing 0x43)
> [ 1087.613835] pcieport 0000:00:01.0: restoring config space at offset
> 0x28 (was 0x41, writing 0x41)
> [ 1087.613841] pcieport 0000:00:01.0: restoring config space at offset
> 0x24 (was 0xfff10001, writing 0xfff10001)
> [ 1087.613978] pcieport 0000:00:01.0: PME# disabled
> [ 1087.613984] pcieport 0000:00:01.0: waiting 100 ms for downstream
> link, after activation
> [ 1089.330956] pcieport 0000:01:00.0: not ready 1023ms after resume; giving up
> [ 1089.373036] pcieport 0000:01:00.0: Unable to change power state
> from D3cold to D0, device inaccessible
>
> After a short while the whole system froze.
>
> So the upstream port of GFX's PCIe switch cannot be powered on again.
>
> Kai-Heng
>
> >
> > >
> > > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c   |  3 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  7 ++++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    | 12 +++++-------
> > >   3 files changed, 14 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > index 60b1857f469e..407456ac0e84 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > @@ -987,6 +987,9 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)
> > >       if (amdgpu_sriov_vf(adev))
> > >               return false;
> > >
> > > +     if (amdgpu_device_supports_boco(adev_to_drm(adev)))
> > > +             return true;
> > > +
> > >   #if IS_ENABLED(CONFIG_SUSPEND)
> > >       return pm_suspend_target_state != PM_SUSPEND_TO_IDLE;
> > >   #else
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index f5658359ff5c..d56b7a2bafa6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -2181,7 +2181,12 @@ static int amdgpu_device_ip_early_init(struct amdgpu_device *adev)
> > >
> > >       if (!(adev->flags & AMD_IS_APU)) {
> > >               parent = pci_upstream_bridge(adev->pdev);
> > > -             adev->has_pr3 = parent ? pci_pr3_present(parent) : false;
> > > +             do {
> > > +                     if (pci_pr3_present(parent)) {
> > > +                             adev->has_pr3 = true;
> > > +                             break;
> > > +                     }
> > > +             } while ((parent = pci_upstream_bridge(parent)));
> > >       }
> > >
> > >       amdgpu_amdkfd_device_probe(adev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index ba5def374368..5d81fcac4b0a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -2415,10 +2415,11 @@ static int amdgpu_pmops_suspend(struct device *dev)
> > >       struct drm_device *drm_dev = dev_get_drvdata(dev);
> > >       struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > >
> > > -     if (amdgpu_acpi_is_s0ix_active(adev))
> > > -             adev->in_s0ix = true;
> > > -     else if (amdgpu_acpi_is_s3_active(adev))
> > > +     if (amdgpu_acpi_is_s3_active(adev) ||
> > > +         amdgpu_device_supports_boco(drm_dev))
> > >               adev->in_s3 = true;
> > > +     else if (amdgpu_acpi_is_s0ix_active(adev))
> > > +             adev->in_s0ix = true;
> > >       if (!adev->in_s0ix && !adev->in_s3)
> > >               return 0;
> > >       return amdgpu_device_suspend(drm_dev, true);
> > > @@ -2449,10 +2450,7 @@ static int amdgpu_pmops_resume(struct device *dev)
> > >               adev->no_hw_access = true;
> > >
> > >       r = amdgpu_device_resume(drm_dev, true);
> > > -     if (amdgpu_acpi_is_s0ix_active(adev))
> > > -             adev->in_s0ix = false;
> > > -     else
> > > -             adev->in_s3 = false;
> > > +     adev->in_s0ix = adev->in_s3 = false;
> > >       return r;
> > >   }
> > >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-03-30 13:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-29  9:59 [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Kai-Heng Feng
2023-03-29  9:59 ` [PATCH 2/2] drm/amdgpu: Remove ASPM workaround on VI and NV Kai-Heng Feng
2023-03-29 13:08   ` Gong, Richard
2023-03-29 13:18     ` Mario Limonciello
2023-03-29 13:21 ` [PATCH 1/2] drm/amdgpu: Reset GPU on S0ix when device supports BOCO Alex Deucher
2023-03-30  0:49   ` Kai-Heng Feng
2023-03-30  2:08     ` Alex Deucher
2023-03-29 13:23 ` Mario Limonciello
2023-03-30  3:36   ` Kai-Heng Feng
2023-03-30  4:20     ` Mario Limonciello
2023-03-30 13:22     ` Alex Deucher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).