All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2)
@ 2021-11-24 12:43 Prike Liang
  2021-11-24 13:30 ` Lazar, Lijo
  0 siblings, 1 reply; 3+ messages in thread
From: Prike Liang @ 2021-11-24 12:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alexander.Deucher, Prike Liang, ray.huang

Do ASIC reset at the moment Sx suspend aborted behind of amdgpu suspend
to keep AMDGPU in a clean reset state and that can avoid re-initialize
device improperly error. Currently,we just always do asic reset in the
amdgpu resume until sort out the PM abort case.

v2: Remove incomplete PM abort flag and add GPU hive case check for
GPU reset.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7d4115d..3fcd90d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
 	if (adev->in_s0ix)
 		amdgpu_gfx_state_change_set(adev, sGpuChangeState_D0Entry);
 
+	/*TODO: In order to not let all-always asic reset affect resume latency
+	 * need sort out the case which really need asic reset in the resume process.
+	 * As to the known issue on the system suspend abort behind the AMDGPU suspend,
+	 * may can sort this case by checking struct suspend_stats which need exported
+	 * firstly.
+	 */
+	if (adev->gmc.xgmi.num_physical_nodes <= 1)
+		amdgpu_asic_reset(adev);
 	/* post card */
 	if (amdgpu_device_need_post(adev)) {
 		r = amdgpu_device_asic_init(adev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2)
  2021-11-24 12:43 [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2) Prike Liang
@ 2021-11-24 13:30 ` Lazar, Lijo
  2021-11-25  4:58   ` Liang, Prike
  0 siblings, 1 reply; 3+ messages in thread
From: Lazar, Lijo @ 2021-11-24 13:30 UTC (permalink / raw)
  To: Prike Liang, amd-gfx; +Cc: Alexander.Deucher, ray.huang



On 11/24/2021 6:13 PM, Prike Liang wrote:
> Do ASIC reset at the moment Sx suspend aborted behind of amdgpu suspend
> to keep AMDGPU in a clean reset state and that can avoid re-initialize
> device improperly error. Currently,we just always do asic reset in the
> amdgpu resume until sort out the PM abort case.
> 
> v2: Remove incomplete PM abort flag and add GPU hive case check for
> GPU reset.
> 
> Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7d4115d..3fcd90d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
>   	if (adev->in_s0ix)
>   		amdgpu_gfx_state_change_set(adev, sGpuChangeState_D0Entry);
>   
> +	/*TODO: In order to not let all-always asic reset affect resume latency
> +	 * need sort out the case which really need asic reset in the resume process.
> +	 * As to the known issue on the system suspend abort behind the AMDGPU suspend,
> +	 * may can sort this case by checking struct suspend_stats which need exported
> +	 * firstly.
> +	 */
> +	if (adev->gmc.xgmi.num_physical_nodes <= 1)
> +		amdgpu_asic_reset(adev);

Newer dGPUs depend on PMFW to do reset and that is not loaded at this 
point. For some, there is a mini FW available which could technically 
handle a reset and some of the older ones depend on PSP. Strongly 
suggest to check all such cases before doing a reset here.

Or, the safest at this point could be to do the reset only for APUs.

Thanks,
Lijo

>   	/* post card */
>   	if (amdgpu_device_need_post(adev)) {
>   		r = amdgpu_device_asic_init(adev);
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2)
  2021-11-24 13:30 ` Lazar, Lijo
@ 2021-11-25  4:58   ` Liang, Prike
  0 siblings, 0 replies; 3+ messages in thread
From: Liang, Prike @ 2021-11-25  4:58 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx; +Cc: Deucher, Alexander, Huang, Ray

[Public]

> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Wednesday, November 24, 2021 9:30 PM
> To: Liang, Prike <Prike.Liang@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Huang, Ray
> <Ray.Huang@amd.com>
> Subject: Re: [PATCH v2] drm/amdgpu: reset asic after system-wide suspend
> aborted (v2)
>
>
>
> On 11/24/2021 6:13 PM, Prike Liang wrote:
> > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > suspend to keep AMDGPU in a clean reset state and that can avoid
> > re-initialize device improperly error. Currently,we just always do
> > asic reset in the amdgpu resume until sort out the PM abort case.
> >
> > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > GPU reset.
> >
> > Signed-off-by: Prike Liang <Prike.Liang@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7d4115d..3fcd90d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> >     if (adev->in_s0ix)
> >             amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D0Entry);
> >
> > +   /*TODO: In order to not let all-always asic reset affect resume
> latency
> > +    * need sort out the case which really need asic reset in the resume
> process.
> > +    * As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > +    * may can sort this case by checking struct suspend_stats which
> need exported
> > +    * firstly.
> > +    */
> > +   if (adev->gmc.xgmi.num_physical_nodes <= 1)
> > +           amdgpu_asic_reset(adev);
>
> Newer dGPUs depend on PMFW to do reset and that is not loaded at this
> point. For some, there is a mini FW available which could technically handle a
> reset and some of the older ones depend on PSP. Strongly suggest to check
> all such cases before doing a reset here.
>
> Or, the safest at this point could be to do the reset only for APUs.
>
> Thanks,
> Lijo
>
Thanks for the input, that may need a lot of effort to sort out reset method from many dGPUs.
So in this time let's only handle APUs firstly.

> >     /* post card */
> >     if (amdgpu_device_need_post(adev)) {
> >             r = amdgpu_device_asic_init(adev);
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-25  4:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24 12:43 [PATCH v2] drm/amdgpu: reset asic after system-wide suspend aborted (v2) Prike Liang
2021-11-24 13:30 ` Lazar, Lijo
2021-11-25  4:58   ` Liang, Prike

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.