All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: support new mode-1 reset interface
@ 2021-11-16  7:23 Tao Zhou
  2021-11-16  7:40 ` Zhang, Hawking
  2021-11-16  7:44 ` Lazar, Lijo
  0 siblings, 2 replies; 5+ messages in thread
From: Tao Zhou @ 2021-11-16  7:23 UTC (permalink / raw)
  To: amd-gfx, hawking.zhang, john.clements, stanley.yang, equan; +Cc: Tao Zhou

If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
reset message.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21 ++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 35145db6eedf..6f3d064a8232 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu)
 
 int smu_v13_0_mode1_reset(struct smu_context *smu)
 {
-	u32 smu_version;
+	u32 smu_version, fatal_err, param;
 	int ret = 0;
+	struct amdgpu_device *adev = smu->adev;
+	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
+
+	fatal_err = 0;
+	param = SMU_RESET_MODE_1;
+
 	/*
 	* PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
 	*/
 	smu_cmn_get_smc_version(smu, NULL, &smu_version);
 	if (smu_version < 0x00440700)
 		ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
-	else
-		ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
+	else {
+		/* fatal error triggered by ras, PMFW supports the flag
+		   from 68.44.0 */
+		if ((smu_version >= 0x00442c00) && ras &&
+		    atomic_read(&ras->in_recovery))
+			fatal_err = 1;
+
+		param |= (fatal_err << 16);
+		ret = smu_cmn_send_smc_msg_with_param(smu,
+					SMU_MSG_GfxDeviceDriverReset, param, NULL);
+	}
 
 	if (!ret)
 		msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: support new mode-1 reset interface
  2021-11-16  7:23 [PATCH] drm/amdgpu: support new mode-1 reset interface Tao Zhou
@ 2021-11-16  7:40 ` Zhang, Hawking
  2021-11-16  7:44 ` Lazar, Lijo
  1 sibling, 0 replies; 5+ messages in thread
From: Zhang, Hawking @ 2021-11-16  7:40 UTC (permalink / raw)
  To: Zhou1, Tao, amd-gfx, Clements, John, Yang, Stanley, Quan, Evan

[AMD Official Use Only]

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>

Regards,
Hawking
-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1@amd.com> 
Sent: Tuesday, November 16, 2021 15:24
To: amd-gfx@lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang@amd.com>; Clements, John <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
Cc: Zhou1, Tao <Tao.Zhou1@amd.com>
Subject: [PATCH] drm/amdgpu: support new mode-1 reset interface

If gpu reset is triggered by ras fatal error, tell it to smu in mode-1 reset message.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
---
 .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21 ++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
index 35145db6eedf..6f3d064a8232 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
@@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu)
 
 int smu_v13_0_mode1_reset(struct smu_context *smu)  {
-	u32 smu_version;
+	u32 smu_version, fatal_err, param;
 	int ret = 0;
+	struct amdgpu_device *adev = smu->adev;
+	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
+
+	fatal_err = 0;
+	param = SMU_RESET_MODE_1;
+
 	/*
 	* PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
 	*/
 	smu_cmn_get_smc_version(smu, NULL, &smu_version);
 	if (smu_version < 0x00440700)
 		ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
-	else
-		ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
+	else {
+		/* fatal error triggered by ras, PMFW supports the flag
+		   from 68.44.0 */
+		if ((smu_version >= 0x00442c00) && ras &&
+		    atomic_read(&ras->in_recovery))
+			fatal_err = 1;
+
+		param |= (fatal_err << 16);
+		ret = smu_cmn_send_smc_msg_with_param(smu,
+					SMU_MSG_GfxDeviceDriverReset, param, NULL);
+	}
 
 	if (!ret)
 		msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
--
2.17.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
  2021-11-16  7:23 [PATCH] drm/amdgpu: support new mode-1 reset interface Tao Zhou
  2021-11-16  7:40 ` Zhang, Hawking
@ 2021-11-16  7:44 ` Lazar, Lijo
  2021-11-16  8:47   ` Zhou1, Tao
  1 sibling, 1 reply; 5+ messages in thread
From: Lazar, Lijo @ 2021-11-16  7:44 UTC (permalink / raw)
  To: Tao Zhou, amd-gfx, hawking.zhang, john.clements, stanley.yang, equan



On 11/16/2021 12:53 PM, Tao Zhou wrote:
> If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
> reset message.
> 
> Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
> ---
>   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21 ++++++++++++++++---
>   1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> index 35145db6eedf..6f3d064a8232 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct smu_context *smu)
>   
>   int smu_v13_0_mode1_reset(struct smu_context *smu)
>   {
> -	u32 smu_version;
> +	u32 smu_version, fatal_err, param;
>   	int ret = 0;
> +	struct amdgpu_device *adev = smu->adev;
> +	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
> +
> +	fatal_err = 0;
> +	param = SMU_RESET_MODE_1;
> +
>   	/*
>   	* PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
>   	*/
>   	smu_cmn_get_smc_version(smu, NULL, &smu_version);
>   	if (smu_version < 0x00440700)
>   		ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset, NULL);
> -	else
> -		ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
> +	else {
> +		/* fatal error triggered by ras, PMFW supports the flag
> +		   from 68.44.0 */
> +		if ((smu_version >= 0x00442c00) && ras &&
> +		    atomic_read(&ras->in_recovery))
> +			fatal_err = 1;
> +

 From PMFW version, this looks specific to aldebaran. Since there is 
version check as well, the implementation needs to be moved to 
aldebaran_ppt.c

Thanks,
Lijo

> +		param |= (fatal_err << 16);
> +		ret = smu_cmn_send_smc_msg_with_param(smu,
> +					SMU_MSG_GfxDeviceDriverReset, param, NULL);
> +	}
>   
>   	if (!ret)
>   		msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] drm/amdgpu: support new mode-1 reset interface
  2021-11-16  7:44 ` Lazar, Lijo
@ 2021-11-16  8:47   ` Zhou1, Tao
  2021-11-16  8:56     ` Lazar, Lijo
  0 siblings, 1 reply; 5+ messages in thread
From: Zhou1, Tao @ 2021-11-16  8:47 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx, Zhang, Hawking, Clements, John, Yang,
	Stanley, Quan, Evan

[AMD Official Use Only]

Hi Lijo,

Your concern is reasonable, but in fact smu_v13_0_mode1_reset is used only by ALDEBARAN currently. I assume the PMFW of new smu v13 ASIC in the future will follow this design, otherwise we could move the implementation into xxx_ppt.c.

Regards,
Tao

> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Tuesday, November 16, 2021 3:44 PM
> To: Zhou1, Tao <Tao.Zhou1@amd.com>; amd-gfx@lists.freedesktop.org; Zhang,
> Hawking <Hawking.Zhang@amd.com>; Clements, John
> <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Quan,
> Evan <Evan.Quan@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
>
>
>
> On 11/16/2021 12:53 PM, Tao Zhou wrote:
> > If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
> > reset message.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
> > ---
> >   .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21
> ++++++++++++++++---
> >   1 file changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> > index 35145db6eedf..6f3d064a8232 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
> > @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct
> > smu_context *smu)
> >
> >   int smu_v13_0_mode1_reset(struct smu_context *smu)
> >   {
> > -   u32 smu_version;
> > +   u32 smu_version, fatal_err, param;
> >     int ret = 0;
> > +   struct amdgpu_device *adev = smu->adev;
> > +   struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
> > +
> > +   fatal_err = 0;
> > +   param = SMU_RESET_MODE_1;
> > +
> >     /*
> >     * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
> >     */
> >     smu_cmn_get_smc_version(smu, NULL, &smu_version);
> >     if (smu_version < 0x00440700)
> >             ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset,
> NULL);
> > -   else
> > -           ret = smu_cmn_send_smc_msg_with_param(smu,
> SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
> > +   else {
> > +           /* fatal error triggered by ras, PMFW supports the flag
> > +              from 68.44.0 */
> > +           if ((smu_version >= 0x00442c00) && ras &&
> > +               atomic_read(&ras->in_recovery))
> > +                   fatal_err = 1;
> > +
>
>  From PMFW version, this looks specific to aldebaran. Since there is version
> check as well, the implementation needs to be moved to aldebaran_ppt.c
>
> Thanks,
> Lijo
>
> > +           param |= (fatal_err << 16);
> > +           ret = smu_cmn_send_smc_msg_with_param(smu,
> > +                                   SMU_MSG_GfxDeviceDriverReset,
> param, NULL);
> > +   }
> >
> >     if (!ret)
> >             msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
  2021-11-16  8:47   ` Zhou1, Tao
@ 2021-11-16  8:56     ` Lazar, Lijo
  0 siblings, 0 replies; 5+ messages in thread
From: Lazar, Lijo @ 2021-11-16  8:56 UTC (permalink / raw)
  To: Zhou1, Tao, amd-gfx, Zhang, Hawking, Clements, John, Yang,
	Stanley, Quan, Evan



On 11/16/2021 2:17 PM, Zhou1, Tao wrote:
> [AMD Official Use Only]
> 
> Hi Lijo,
> 
> Your concern is reasonable, but in fact smu_v13_0_mode1_reset is used only by ALDEBARAN currently. I assume the PMFW of new smu v13 ASIC in the future will follow this design, otherwise we could move the implementation into xxx_ppt.c.
> 

Actually, this is meant to be a common logic for SMU13 based ASICs. The 
version check in a common file is not maintainable. I see there is a 
version check before also, even that is not proper :)

It is better to do it properly when support is added rather than 
thinking of refactoring with future ASICs.

Thanks,
Lijo

> Regards,
> Tao
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>> Sent: Tuesday, November 16, 2021 3:44 PM
>> To: Zhou1, Tao <Tao.Zhou1@amd.com>; amd-gfx@lists.freedesktop.org; Zhang,
>> Hawking <Hawking.Zhang@amd.com>; Clements, John
>> <John.Clements@amd.com>; Yang, Stanley <Stanley.Yang@amd.com>; Quan,
>> Evan <Evan.Quan@amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: support new mode-1 reset interface
>>
>>
>>
>> On 11/16/2021 12:53 PM, Tao Zhou wrote:
>>> If gpu reset is triggered by ras fatal error, tell it to smu in mode-1
>>> reset message.
>>>
>>> Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
>>> ---
>>>    .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c    | 21
>> ++++++++++++++++---
>>>    1 file changed, 18 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> index 35145db6eedf..6f3d064a8232 100644
>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
>>> @@ -1426,16 +1426,31 @@ int smu_v13_0_set_azalia_d3_pme(struct
>>> smu_context *smu)
>>>
>>>    int smu_v13_0_mode1_reset(struct smu_context *smu)
>>>    {
>>> -   u32 smu_version;
>>> +   u32 smu_version, fatal_err, param;
>>>      int ret = 0;
>>> +   struct amdgpu_device *adev = smu->adev;
>>> +   struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>> +
>>> +   fatal_err = 0;
>>> +   param = SMU_RESET_MODE_1;
>>> +
>>>      /*
>>>      * PM FW support SMU_MSG_GfxDeviceDriverReset from 68.07
>>>      */
>>>      smu_cmn_get_smc_version(smu, NULL, &smu_version);
>>>      if (smu_version < 0x00440700)
>>>              ret = smu_cmn_send_smc_msg(smu, SMU_MSG_Mode1Reset,
>> NULL);
>>> -   else
>>> -           ret = smu_cmn_send_smc_msg_with_param(smu,
>> SMU_MSG_GfxDeviceDriverReset, SMU_RESET_MODE_1, NULL);
>>> +   else {
>>> +           /* fatal error triggered by ras, PMFW supports the flag
>>> +              from 68.44.0 */
>>> +           if ((smu_version >= 0x00442c00) && ras &&
>>> +               atomic_read(&ras->in_recovery))
>>> +                   fatal_err = 1;
>>> +
>>
>>   From PMFW version, this looks specific to aldebaran. Since there is version
>> check as well, the implementation needs to be moved to aldebaran_ppt.c
>>
>> Thanks,
>> Lijo
>>
>>> +           param |= (fatal_err << 16);
>>> +           ret = smu_cmn_send_smc_msg_with_param(smu,
>>> +                                   SMU_MSG_GfxDeviceDriverReset,
>> param, NULL);
>>> +   }
>>>
>>>      if (!ret)
>>>              msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
>>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-16  8:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-16  7:23 [PATCH] drm/amdgpu: support new mode-1 reset interface Tao Zhou
2021-11-16  7:40 ` Zhang, Hawking
2021-11-16  7:44 ` Lazar, Lijo
2021-11-16  8:47   ` Zhou1, Tao
2021-11-16  8:56     ` Lazar, Lijo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.