All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amd/pm: Send message when resp status is 0xFC
@ 2022-02-25  4:21 Lijo Lazar
  2022-02-25  5:36 ` Quan, Evan
  0 siblings, 1 reply; 11+ messages in thread
From: Lijo Lazar @ 2022-02-25  4:21 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alexander.Deucher, Evan.Quan, KevinYang.Wang, Hawking.Zhang

When PMFW is really busy, it will respond with 0xFC. However, it doesn't
change the response register state when it becomes free. Driver should
retry and proceed to send message if the response status is 0xFC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 590a6ed12d54..92161b9d8c1a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct smu_context *smu,
 	reg = __smu_cmn_poll_stat(smu);
 	res = __smu_cmn_reg2errno(smu, reg);
 	if (reg == SMU_RESP_NONE ||
-	    reg == SMU_RESP_BUSY_OTHER ||
 	    res == -EREMOTEIO)
 		goto Out;
 	__smu_cmn_send_msg(smu, msg_index, param);
@@ -391,7 +390,6 @@ int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
 	reg = __smu_cmn_poll_stat(smu);
 	res = __smu_cmn_reg2errno(smu, reg);
 	if (reg == SMU_RESP_NONE ||
-	    reg == SMU_RESP_BUSY_OTHER ||
 	    res == -EREMOTEIO) {
 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
 		goto Out;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  4:21 [PATCH] drm/amd/pm: Send message when resp status is 0xFC Lijo Lazar
@ 2022-02-25  5:36 ` Quan, Evan
  2022-02-25  5:46   ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Quan, Evan @ 2022-02-25  5:36 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[AMD Official Use Only]

This may introduce some problems for two callers scenarios. That is the 2nd one will still proceed even if the 1st one was already blocked.
Maybe the logics here should be performed by the caller. That is the caller can perform something like issuing the same message again without prerequisites check on PMFW busy.
Or we can just update the smu_cmn_send_smc_msg APIs to give it another try on PMFW busy.

BR
Evan
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, February 25, 2022 12:22 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> When PMFW is really busy, it will respond with 0xFC. However, it doesn't
> change the response register state when it becomes free. Driver should
> retry and proceed to send message if the response status is 0xFC.
> 
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index 590a6ed12d54..92161b9d8c1a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
> smu_context *smu,
>  	reg = __smu_cmn_poll_stat(smu);
>  	res = __smu_cmn_reg2errno(smu, reg);
>  	if (reg == SMU_RESP_NONE ||
> -	    reg == SMU_RESP_BUSY_OTHER ||
>  	    res == -EREMOTEIO)
>  		goto Out;
>  	__smu_cmn_send_msg(smu, msg_index, param);
> @@ -391,7 +390,6 @@ int smu_cmn_send_smc_msg_with_param(struct
> smu_context *smu,
>  	reg = __smu_cmn_poll_stat(smu);
>  	res = __smu_cmn_reg2errno(smu, reg);
>  	if (reg == SMU_RESP_NONE ||
> -	    reg == SMU_RESP_BUSY_OTHER ||
>  	    res == -EREMOTEIO) {
>  		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>  		goto Out;
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  5:36 ` Quan, Evan
@ 2022-02-25  5:46   ` Lazar, Lijo
  2022-02-25  5:55     ` Quan, Evan
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-02-25  5:46 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx; +Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[AMD Official Use Only]

> That is the caller can perform something like issuing the same message again without prerequisites check on PMFW busy 

This patch expects this method. Caller may try to resend message again. As part of message sending, driver first checks response register. Current logic blocks sending any message if it sees 0xFC in response register, this patch is to address that. 

Thanks,
Lijo

-----Original Message-----
From: Quan, Evan <Evan.Quan@amd.com> 
Sent: Friday, February 25, 2022 11:07 AM
To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) <KevinYang.Wang@amd.com>
Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC

[AMD Official Use Only]

This may introduce some problems for two callers scenarios. That is the 2nd one will still proceed even if the 1st one was already blocked.
Maybe the logics here should be performed by the caller. That is the caller can perform something like issuing the same message again without prerequisites check on PMFW busy.
Or we can just update the smu_cmn_send_smc_msg APIs to give it another try on PMFW busy.

BR
Evan
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, February 25, 2022 12:22 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander 
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> When PMFW is really busy, it will respond with 0xFC. However, it 
> doesn't change the response register state when it becomes free. 
> Driver should retry and proceed to send message if the response status is 0xFC.
> 
> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index 590a6ed12d54..92161b9d8c1a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
> smu_context *smu,
>  	reg = __smu_cmn_poll_stat(smu);
>  	res = __smu_cmn_reg2errno(smu, reg);
>  	if (reg == SMU_RESP_NONE ||
> -	    reg == SMU_RESP_BUSY_OTHER ||
>  	    res == -EREMOTEIO)
>  		goto Out;
>  	__smu_cmn_send_msg(smu, msg_index, param); @@ -391,7 +390,6 @@ int 
> smu_cmn_send_smc_msg_with_param(struct
> smu_context *smu,
>  	reg = __smu_cmn_poll_stat(smu);
>  	res = __smu_cmn_reg2errno(smu, reg);
>  	if (reg == SMU_RESP_NONE ||
> -	    reg == SMU_RESP_BUSY_OTHER ||
>  	    res == -EREMOTEIO) {
>  		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>  		goto Out;
> --
> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  5:46   ` Lazar, Lijo
@ 2022-02-25  5:55     ` Quan, Evan
  2022-02-25  6:02       ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Quan, Evan @ 2022-02-25  5:55 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, February 25, 2022 1:47 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> [AMD Official Use Only]
> 
> > That is the caller can perform something like issuing the same message
> > again without prerequisites check on PMFW busy
> 
> This patch expects this method. Caller may try to resend message again. As
> part of message sending, driver first checks response register. Current logic
> blocks sending any message if it sees 0xFC in response register, this patch is
> to address that.
[Quan, Evan] Yes, I know. But the caller here could be another one. I mean there may be another caller stepped in.

BR
Evan
> 
> Thanks,
> Lijo
> 
> -----Original Message-----
> From: Quan, Evan <Evan.Quan@amd.com>
> Sent: Friday, February 25, 2022 11:07 AM
> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> [AMD Official Use Only]
> 
> This may introduce some problems for two callers scenarios. That is the 2nd
> one will still proceed even if the 1st one was already blocked.
> Maybe the logics here should be performed by the caller. That is the caller
> can perform something like issuing the same message again without
> prerequisites check on PMFW busy.
> Or we can just update the smu_cmn_send_smc_msg APIs to give it another
> try on PMFW busy.
> 
> BR
> Evan
> > -----Original Message-----
> > From: Lazar, Lijo <Lijo.Lazar@amd.com>
> > Sent: Friday, February 25, 2022 12:22 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> > <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> > Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> >
> > When PMFW is really busy, it will respond with 0xFC. However, it
> > doesn't change the response register state when it becomes free.
> > Driver should retry and proceed to send message if the response status is
> 0xFC.
> >
> > Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> > ---
> >  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > index 590a6ed12d54..92161b9d8c1a 100644
> > --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> > @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
> > smu_context *smu,
> >  	reg = __smu_cmn_poll_stat(smu);
> >  	res = __smu_cmn_reg2errno(smu, reg);
> >  	if (reg == SMU_RESP_NONE ||
> > -	    reg == SMU_RESP_BUSY_OTHER ||
> >  	    res == -EREMOTEIO)
> >  		goto Out;
> >  	__smu_cmn_send_msg(smu, msg_index, param); @@ -391,7 +390,6
> @@ int
> > smu_cmn_send_smc_msg_with_param(struct
> > smu_context *smu,
> >  	reg = __smu_cmn_poll_stat(smu);
> >  	res = __smu_cmn_reg2errno(smu, reg);
> >  	if (reg == SMU_RESP_NONE ||
> > -	    reg == SMU_RESP_BUSY_OTHER ||
> >  	    res == -EREMOTEIO) {
> >  		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> >  		goto Out;
> > --
> > 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  5:55     ` Quan, Evan
@ 2022-02-25  6:02       ` Lazar, Lijo
  2022-02-25  7:32         ` Quan, Evan
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-02-25  6:02 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx; +Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking



On 2/25/2022 11:25 AM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>> Sent: Friday, February 25, 2022 1:47 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>> <KevinYang.Wang@amd.com>
>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>
>> [AMD Official Use Only]
>>
>>> That is the caller can perform something like issuing the same message
>>> again without prerequisites check on PMFW busy
>>
>> This patch expects this method. Caller may try to resend message again. As
>> part of message sending, driver first checks response register. Current logic
>> blocks sending any message if it sees 0xFC in response register, this patch is
>> to address that.
> [Quan, Evan] Yes, I know. But the caller here could be another one. I mean there may be another caller stepped in.
> 

That shouldn't cause an issue to the second caller if it got message 
mutex. The second caller also should be able to send message if PMFW got 
free by that time. The first caller can retry when it gets back the 
message mutex. FW doesn't maintain any state for 0xFC response. Any 
other message may be sent after that. If driver keeps the state based on 
two callers, that is a logic problem in driver. I don't think we have 
any flow like that.

Basically, 0xFC is not valid pre-condition check for sending any 
message. As per PMFW team - it only means that PMFW was busy when a 
previous message was sent and PMFW won't change the response status when 
it becomes free.

Thanks,
Lijo

> BR
> Evan
>>
>> Thanks,
>> Lijo
>>
>> -----Original Message-----
>> From: Quan, Evan <Evan.Quan@amd.com>
>> Sent: Friday, February 25, 2022 11:07 AM
>> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>> <KevinYang.Wang@amd.com>
>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>
>> [AMD Official Use Only]
>>
>> This may introduce some problems for two callers scenarios. That is the 2nd
>> one will still proceed even if the 1st one was already blocked.
>> Maybe the logics here should be performed by the caller. That is the caller
>> can perform something like issuing the same message again without
>> prerequisites check on PMFW busy.
>> Or we can just update the smu_cmn_send_smc_msg APIs to give it another
>> try on PMFW busy.
>>
>> BR
>> Evan
>>> -----Original Message-----
>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>> Sent: Friday, February 25, 2022 12:22 PM
>>> To: amd-gfx@lists.freedesktop.org
>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>>
>>> When PMFW is really busy, it will respond with 0xFC. However, it
>>> doesn't change the response register state when it becomes free.
>>> Driver should retry and proceed to send message if the response status is
>> 0xFC.
>>>
>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>>>   1 file changed, 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>> index 590a6ed12d54..92161b9d8c1a 100644
>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>> @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
>>> smu_context *smu,
>>>   	reg = __smu_cmn_poll_stat(smu);
>>>   	res = __smu_cmn_reg2errno(smu, reg);
>>>   	if (reg == SMU_RESP_NONE ||
>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>   	    res == -EREMOTEIO)
>>>   		goto Out;
>>>   	__smu_cmn_send_msg(smu, msg_index, param); @@ -391,7 +390,6
>> @@ int
>>> smu_cmn_send_smc_msg_with_param(struct
>>> smu_context *smu,
>>>   	reg = __smu_cmn_poll_stat(smu);
>>>   	res = __smu_cmn_reg2errno(smu, reg);
>>>   	if (reg == SMU_RESP_NONE ||
>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>   	    res == -EREMOTEIO) {
>>>   		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>>>   		goto Out;
>>> --
>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  6:02       ` Lazar, Lijo
@ 2022-02-25  7:32         ` Quan, Evan
  2022-02-25  7:43           ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Quan, Evan @ 2022-02-25  7:32 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, February 25, 2022 2:03 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> 
> 
> On 2/25/2022 11:25 AM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >> Sent: Friday, February 25, 2022 1:47 PM
> >> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >> <KevinYang.Wang@amd.com>
> >> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>
> >> [AMD Official Use Only]
> >>
> >>> That is the caller can perform something like issuing the same
> >>> message again without prerequisites check on PMFW busy
> >>
> >> This patch expects this method. Caller may try to resend message
> >> again. As part of message sending, driver first checks response
> >> register. Current logic blocks sending any message if it sees 0xFC in
> >> response register, this patch is to address that.
> > [Quan, Evan] Yes, I know. But the caller here could be another one. I mean
> there may be another caller stepped in.
> >
> 
> That shouldn't cause an issue to the second caller if it got message mutex.
> The second caller also should be able to send message if PMFW got free by
> that time. The first caller can retry when it gets back the message mutex. FW
> doesn't maintain any state for 0xFC response. Any other message may be
> sent after that. If driver keeps the state based on two callers, that is a logic
> problem in driver. I don't think we have any flow like that.
[Quan, Evan] Yeah, but there may be some case that messages issued by the two callers have dependence.
That means the message issued by the 2nd caller should be only after the 1st one.
The one i can think of is "EnableAllSmuFeatures" message should be after "SetAllowedFeatures" message.
Although that should not cause any problem, I'm not sure whether there is other similar case.

What I suggest is something like below. We just do it again in smu_cmn_send_smc_msg_with_param() on PMFW busy.

int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
				    enum smu_message_type msg,
				    uint32_t param,
				    uint32_t *read_arg)
{
...
...
	mutex_lock(&smu->message_lock);
	reg = __smu_cmn_poll_stat(smu);
	res = __smu_cmn_reg2errno(smu, reg);
	if (reg == SMU_RESP_NONE ||
	    reg == SMU_RESP_BUSY_OTHER ||
	    res == -EREMOTEIO) {
		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
		goto Out;
	}
+retry:
	__smu_cmn_send_msg(smu, (uint16_t) index, param);
	reg = __smu_cmn_poll_stat(smu);
	res = __smu_cmn_reg2errno(smu, reg);
+	if (reg == SMU_RESP_BUSY_OTHER) {
+		mdelay(1);
+		goto retry;
+	}
...
...
}

BR
Evan
> 
> Basically, 0xFC is not valid pre-condition check for sending any message. As
> per PMFW team - it only means that PMFW was busy when a previous
> message was sent and PMFW won't change the response status when it
> becomes free.
> 
> Thanks,
> Lijo
> 
> > BR
> > Evan
> >>
> >> Thanks,
> >> Lijo
> >>
> >> -----Original Message-----
> >> From: Quan, Evan <Evan.Quan@amd.com>
> >> Sent: Friday, February 25, 2022 11:07 AM
> >> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >> <KevinYang.Wang@amd.com>
> >> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>
> >> [AMD Official Use Only]
> >>
> >> This may introduce some problems for two callers scenarios. That is
> >> the 2nd one will still proceed even if the 1st one was already blocked.
> >> Maybe the logics here should be performed by the caller. That is the
> >> caller can perform something like issuing the same message again
> >> without prerequisites check on PMFW busy.
> >> Or we can just update the smu_cmn_send_smc_msg APIs to give it
> >> another try on PMFW busy.
> >>
> >> BR
> >> Evan
> >>> -----Original Message-----
> >>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>> Sent: Friday, February 25, 2022 12:22 PM
> >>> To: amd-gfx@lists.freedesktop.org
> >>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> >>> Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> >>>
> >>> When PMFW is really busy, it will respond with 0xFC. However, it
> >>> doesn't change the response register state when it becomes free.
> >>> Driver should retry and proceed to send message if the response
> >>> status is
> >> 0xFC.
> >>>
> >>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
> >>>   1 file changed, 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> index 590a6ed12d54..92161b9d8c1a 100644
> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>> @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
> >>> smu_context *smu,
> >>>   	reg = __smu_cmn_poll_stat(smu);
> >>>   	res = __smu_cmn_reg2errno(smu, reg);
> >>>   	if (reg == SMU_RESP_NONE ||
> >>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>   	    res == -EREMOTEIO)
> >>>   		goto Out;
> >>>   	__smu_cmn_send_msg(smu, msg_index, param); @@ -391,7 +390,6
> >> @@ int
> >>> smu_cmn_send_smc_msg_with_param(struct
> >>> smu_context *smu,
> >>>   	reg = __smu_cmn_poll_stat(smu);
> >>>   	res = __smu_cmn_reg2errno(smu, reg);
> >>>   	if (reg == SMU_RESP_NONE ||
> >>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>   	    res == -EREMOTEIO) {
> >>>   		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> >>>   		goto Out;
> >>> --
> >>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  7:32         ` Quan, Evan
@ 2022-02-25  7:43           ` Lazar, Lijo
  2022-02-25 13:03             ` Quan, Evan
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-02-25  7:43 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx; +Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking



On 2/25/2022 1:02 PM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>> Sent: Friday, February 25, 2022 2:03 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>> <KevinYang.Wang@amd.com>
>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>
>>
>>
>> On 2/25/2022 11:25 AM, Quan, Evan wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>> Sent: Friday, February 25, 2022 1:47 PM
>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>> <KevinYang.Wang@amd.com>
>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>> 0xFC
>>>>
>>>> [AMD Official Use Only]
>>>>
>>>>> That is the caller can perform something like issuing the same
>>>>> message again without prerequisites check on PMFW busy
>>>>
>>>> This patch expects this method. Caller may try to resend message
>>>> again. As part of message sending, driver first checks response
>>>> register. Current logic blocks sending any message if it sees 0xFC in
>>>> response register, this patch is to address that.
>>> [Quan, Evan] Yes, I know. But the caller here could be another one. I mean
>> there may be another caller stepped in.
>>>
>>
>> That shouldn't cause an issue to the second caller if it got message mutex.
>> The second caller also should be able to send message if PMFW got free by
>> that time. The first caller can retry when it gets back the message mutex. FW
>> doesn't maintain any state for 0xFC response. Any other message may be
>> sent after that. If driver keeps the state based on two callers, that is a logic
>> problem in driver. I don't think we have any flow like that.
> [Quan, Evan] Yeah, but there may be some case that messages issued by the two callers have dependence.
> That means the message issued by the 2nd caller should be only after the 1st one.
> The one i can think of is "EnableAllSmuFeatures" message should be after "SetAllowedFeatures" message.
> Although that should not cause any problem, I'm not sure whether there is other similar case.
> 
> What I suggest is something like below. We just do it again in smu_cmn_send_smc_msg_with_param() on PMFW busy.
> 
> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
> 				    enum smu_message_type msg,
> 				    uint32_t param,
> 				    uint32_t *read_arg)
> {
> ...
> ...
> 	mutex_lock(&smu->message_lock);
> 	reg = __smu_cmn_poll_stat(smu);
> 	res = __smu_cmn_reg2errno(smu, reg);
> 	if (reg == SMU_RESP_NONE ||
> 	    reg == SMU_RESP_BUSY_OTHER ||
> 	    res == -EREMOTEIO) {
> 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> 		goto Out;
> 	}
> +retry:
> 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
> 	reg = __smu_cmn_poll_stat(smu);
> 	res = __smu_cmn_reg2errno(smu, reg);
> +	if (reg == SMU_RESP_BUSY_OTHER) {
> +		mdelay(1);
> +		goto retry;
> +	}

I suppose the retry option should be left to caller. Regardless of retry 
or not, the patch is still valid.

Example situation -

rocm-smi is trying to get metrics and another app is trying set 
performance profile. If metrics fetch fail and even retry of metrics 
fetch fail after some loops, rocm-smi is free to fetch the metrics again 
after say 5s. That also shouldn't prevent the second app to send 
performance profile message and that app also may retry the same later.

Thanks,
Lijo

> ...
> ...
> }
> 
> BR
> Evan
>>
>> Basically, 0xFC is not valid pre-condition check for sending any message. As
>> per PMFW team - it only means that PMFW was busy when a previous
>> message was sent and PMFW won't change the response status when it
>> becomes free.
>>
>> Thanks,
>> Lijo
>>
>>> BR
>>> Evan
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>> -----Original Message-----
>>>> From: Quan, Evan <Evan.Quan@amd.com>
>>>> Sent: Friday, February 25, 2022 11:07 AM
>>>> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>> <KevinYang.Wang@amd.com>
>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>> 0xFC
>>>>
>>>> [AMD Official Use Only]
>>>>
>>>> This may introduce some problems for two callers scenarios. That is
>>>> the 2nd one will still proceed even if the 1st one was already blocked.
>>>> Maybe the logics here should be performed by the caller. That is the
>>>> caller can perform something like issuing the same message again
>>>> without prerequisites check on PMFW busy.
>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
>>>> another try on PMFW busy.
>>>>
>>>> BR
>>>> Evan
>>>>> -----Original Message-----
>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>> Sent: Friday, February 25, 2022 12:22 PM
>>>>> To: amd-gfx@lists.freedesktop.org
>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>>>>
>>>>> When PMFW is really busy, it will respond with 0xFC. However, it
>>>>> doesn't change the response register state when it becomes free.
>>>>> Driver should retry and proceed to send message if the response
>>>>> status is
>>>> 0xFC.
>>>>>
>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>>>>>    1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>> index 590a6ed12d54..92161b9d8c1a 100644
>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>> @@ -297,7 +297,6 @@ int smu_cmn_send_msg_without_waiting(struct
>>>>> smu_context *smu,
>>>>>    	reg = __smu_cmn_poll_stat(smu);
>>>>>    	res = __smu_cmn_reg2errno(smu, reg);
>>>>>    	if (reg == SMU_RESP_NONE ||
>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>    	    res == -EREMOTEIO)
>>>>>    		goto Out;
>>>>>    	__smu_cmn_send_msg(smu, msg_index, param); @@ -391,7 +390,6
>>>> @@ int
>>>>> smu_cmn_send_smc_msg_with_param(struct
>>>>> smu_context *smu,
>>>>>    	reg = __smu_cmn_poll_stat(smu);
>>>>>    	res = __smu_cmn_reg2errno(smu, reg);
>>>>>    	if (reg == SMU_RESP_NONE ||
>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>    	    res == -EREMOTEIO) {
>>>>>    		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>>>>>    		goto Out;
>>>>> --
>>>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25  7:43           ` Lazar, Lijo
@ 2022-02-25 13:03             ` Quan, Evan
  2022-02-25 13:59               ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Quan, Evan @ 2022-02-25 13:03 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, February 25, 2022 3:43 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> 
> 
> On 2/25/2022 1:02 PM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >> Sent: Friday, February 25, 2022 2:03 PM
> >> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >> <KevinYang.Wang@amd.com>
> >> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>
> >>
> >>
> >> On 2/25/2022 11:25 AM, Quan, Evan wrote:
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>>> Sent: Friday, February 25, 2022 1:47 PM
> >>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-
> gfx@lists.freedesktop.org
> >>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>> <KevinYang.Wang@amd.com>
> >>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >>>> 0xFC
> >>>>
> >>>> [AMD Official Use Only]
> >>>>
> >>>>> That is the caller can perform something like issuing the same
> >>>>> message again without prerequisites check on PMFW busy
> >>>>
> >>>> This patch expects this method. Caller may try to resend message
> >>>> again. As part of message sending, driver first checks response
> >>>> register. Current logic blocks sending any message if it sees 0xFC
> >>>> in response register, this patch is to address that.
> >>> [Quan, Evan] Yes, I know. But the caller here could be another one.
> >>> I mean
> >> there may be another caller stepped in.
> >>>
> >>
> >> That shouldn't cause an issue to the second caller if it got message mutex.
> >> The second caller also should be able to send message if PMFW got
> >> free by that time. The first caller can retry when it gets back the
> >> message mutex. FW doesn't maintain any state for 0xFC response. Any
> >> other message may be sent after that. If driver keeps the state based
> >> on two callers, that is a logic problem in driver. I don't think we have any
> flow like that.
> > [Quan, Evan] Yeah, but there may be some case that messages issued by
> the two callers have dependence.
> > That means the message issued by the 2nd caller should be only after the
> 1st one.
> > The one i can think of is "EnableAllSmuFeatures" message should be after
> "SetAllowedFeatures" message.
> > Although that should not cause any problem, I'm not sure whether there is
> other similar case.
> >
> > What I suggest is something like below. We just do it again in
> smu_cmn_send_smc_msg_with_param() on PMFW busy.
> >
> > int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
> > 				    enum smu_message_type msg,
> > 				    uint32_t param,
> > 				    uint32_t *read_arg)
> > {
> > ...
> > ...
> > 	mutex_lock(&smu->message_lock);
> > 	reg = __smu_cmn_poll_stat(smu);
> > 	res = __smu_cmn_reg2errno(smu, reg);
> > 	if (reg == SMU_RESP_NONE ||
> > 	    reg == SMU_RESP_BUSY_OTHER ||
> > 	    res == -EREMOTEIO) {
> > 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> > 		goto Out;
> > 	}
> > +retry:
> > 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
> > 	reg = __smu_cmn_poll_stat(smu);
> > 	res = __smu_cmn_reg2errno(smu, reg);
> > +	if (reg == SMU_RESP_BUSY_OTHER) {
> > +		mdelay(1);
> > +		goto retry;
> > +	}
> 
> I suppose the retry option should be left to caller. Regardless of retry or not,
> the patch is still valid.
> 
> Example situation -
> 
> rocm-smi is trying to get metrics and another app is trying set performance
> profile. If metrics fetch fail and even retry of metrics fetch fail after some
> loops, rocm-smi is free to fetch the metrics again after say 5s. That also
> shouldn't prevent the second app to send performance profile message and
> that app also may retry the same later.
[Quan, Evan] I have no doubt the second app may still be able to send performance profile message.
However, the metrics data retrieved by rocm-smi will be different under such scenario.
That is after performance profile setting the clock frequencies may go up.
If the first app is sensitive to that(suppose it wants to compare the clock frequencies before and after performance profile setting), that will be a problem.

I reconsider this a bit. Can we add one more parameter for smu_cmn_send_smc_msg_with_param()?
That parameter can tell what the caller wants(retry or abort) under PMFW busy scenario.

BR
Evan
> 
> Thanks,
> Lijo
> 
> > ...
> > ...
> > }
> >
> > BR
> > Evan
> >>
> >> Basically, 0xFC is not valid pre-condition check for sending any
> >> message. As per PMFW team - it only means that PMFW was busy when a
> >> previous message was sent and PMFW won't change the response status
> >> when it becomes free.
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> BR
> >>> Evan
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>> -----Original Message-----
> >>>> From: Quan, Evan <Evan.Quan@amd.com>
> >>>> Sent: Friday, February 25, 2022 11:07 AM
> >>>> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
> >>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>> <KevinYang.Wang@amd.com>
> >>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
> >>>> 0xFC
> >>>>
> >>>> [AMD Official Use Only]
> >>>>
> >>>> This may introduce some problems for two callers scenarios. That is
> >>>> the 2nd one will still proceed even if the 1st one was already blocked.
> >>>> Maybe the logics here should be performed by the caller. That is
> >>>> the caller can perform something like issuing the same message
> >>>> again without prerequisites check on PMFW busy.
> >>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
> >>>> another try on PMFW busy.
> >>>>
> >>>> BR
> >>>> Evan
> >>>>> -----Original Message-----
> >>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>>>> Sent: Friday, February 25, 2022 12:22 PM
> >>>>> To: amd-gfx@lists.freedesktop.org
> >>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
> Alexander
> >>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> >>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
> 0xFC
> >>>>>
> >>>>> When PMFW is really busy, it will respond with 0xFC. However, it
> >>>>> doesn't change the response register state when it becomes free.
> >>>>> Driver should retry and proceed to send message if the response
> >>>>> status is
> >>>> 0xFC.
> >>>>>
> >>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>>>> ---
> >>>>>    drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
> >>>>>    1 file changed, 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> index 590a6ed12d54..92161b9d8c1a 100644
> >>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>> @@ -297,7 +297,6 @@ int
> smu_cmn_send_msg_without_waiting(struct
> >>>>> smu_context *smu,
> >>>>>    	reg = __smu_cmn_poll_stat(smu);
> >>>>>    	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>    	if (reg == SMU_RESP_NONE ||
> >>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>    	    res == -EREMOTEIO)
> >>>>>    		goto Out;
> >>>>>    	__smu_cmn_send_msg(smu, msg_index, param); @@ -
> 391,7 +390,6
> >>>> @@ int
> >>>>> smu_cmn_send_smc_msg_with_param(struct
> >>>>> smu_context *smu,
> >>>>>    	reg = __smu_cmn_poll_stat(smu);
> >>>>>    	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>    	if (reg == SMU_RESP_NONE ||
> >>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>    	    res == -EREMOTEIO) {
> >>>>>    		__smu_cmn_reg_print_error(smu, reg, index, param,
> msg);
> >>>>>    		goto Out;
> >>>>> --
> >>>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25 13:03             ` Quan, Evan
@ 2022-02-25 13:59               ` Lazar, Lijo
  2022-03-08  5:58                 ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-02-25 13:59 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx; +Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking



On 2/25/2022 6:33 PM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>> Sent: Friday, February 25, 2022 3:43 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>> <KevinYang.Wang@amd.com>
>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
>>
>>
>>
>> On 2/25/2022 1:02 PM, Quan, Evan wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>> Sent: Friday, February 25, 2022 2:03 PM
>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>> <KevinYang.Wang@amd.com>
>>>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
>>>> 0xFC
>>>>
>>>>
>>>>
>>>> On 2/25/2022 11:25 AM, Quan, Evan wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>> Sent: Friday, February 25, 2022 1:47 PM
>>>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-
>> gfx@lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>>>> <KevinYang.Wang@amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>> That is the caller can perform something like issuing the same
>>>>>>> message again without prerequisites check on PMFW busy
>>>>>>
>>>>>> This patch expects this method. Caller may try to resend message
>>>>>> again. As part of message sending, driver first checks response
>>>>>> register. Current logic blocks sending any message if it sees 0xFC
>>>>>> in response register, this patch is to address that.
>>>>> [Quan, Evan] Yes, I know. But the caller here could be another one.
>>>>> I mean
>>>> there may be another caller stepped in.
>>>>>
>>>>
>>>> That shouldn't cause an issue to the second caller if it got message mutex.
>>>> The second caller also should be able to send message if PMFW got
>>>> free by that time. The first caller can retry when it gets back the
>>>> message mutex. FW doesn't maintain any state for 0xFC response. Any
>>>> other message may be sent after that. If driver keeps the state based
>>>> on two callers, that is a logic problem in driver. I don't think we have any
>> flow like that.
>>> [Quan, Evan] Yeah, but there may be some case that messages issued by
>> the two callers have dependence.
>>> That means the message issued by the 2nd caller should be only after the
>> 1st one.
>>> The one i can think of is "EnableAllSmuFeatures" message should be after
>> "SetAllowedFeatures" message.
>>> Although that should not cause any problem, I'm not sure whether there is
>> other similar case.
>>>
>>> What I suggest is something like below. We just do it again in
>> smu_cmn_send_smc_msg_with_param() on PMFW busy.
>>>
>>> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
>>> 				    enum smu_message_type msg,
>>> 				    uint32_t param,
>>> 				    uint32_t *read_arg)
>>> {
>>> ...
>>> ...
>>> 	mutex_lock(&smu->message_lock);
>>> 	reg = __smu_cmn_poll_stat(smu);
>>> 	res = __smu_cmn_reg2errno(smu, reg);
>>> 	if (reg == SMU_RESP_NONE ||
>>> 	    reg == SMU_RESP_BUSY_OTHER ||
>>> 	    res == -EREMOTEIO) {
>>> 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>>> 		goto Out;
>>> 	}
>>> +retry:
>>> 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
>>> 	reg = __smu_cmn_poll_stat(smu);
>>> 	res = __smu_cmn_reg2errno(smu, reg);
>>> +	if (reg == SMU_RESP_BUSY_OTHER) {
>>> +		mdelay(1);
>>> +		goto retry;
>>> +	}
>>
>> I suppose the retry option should be left to caller. Regardless of retry or not,
>> the patch is still valid.
>>
>> Example situation -
>>
>> rocm-smi is trying to get metrics and another app is trying set performance
>> profile. If metrics fetch fail and even retry of metrics fetch fail after some
>> loops, rocm-smi is free to fetch the metrics again after say 5s. That also
>> shouldn't prevent the second app to send performance profile message and
>> that app also may retry the same later.
> [Quan, Evan] I have no doubt the second app may still be able to send performance profile message.
> However, the metrics data retrieved by rocm-smi will be different under such scenario > That is after performance profile setting the clock frequencies may 
go up.
> If the first app is sensitive to that(suppose it wants to compare the clock frequencies before and after performance profile setting), that will be a problem

Ah, my intention was to tell about a different use case, probably a bad 
example.

What I intended is -

rocm-smi is running periodically collecting metrics data and another app 
which doesn't bother about the background data collection trying to do 
something else. Thus a metrics fetch fail shouldn't prevent the other 
app from changing performance profile or whatever it intended to do. 
Also, rocm-smi may be able to fetch data during the next polling 
interval when the PMFW gets relatively free.
> 
> I reconsider this a bit. Can we add one more parameter for smu_cmn_send_smc_msg_with_param()?
> That parameter can tell what the caller wants(retry or abort) under PMFW busy scenario.

There is one complication to this. The same message could return 0xFC 
under different conditions, and it may be fine for situation A but not 
for situation B. For ex: during driver load or init after a reset, we 
don't expect PMFW to be busy.Driver may send message to get metrics data 
to check say 'smart shift is enabled or not'. A failure with 0xFC is not 
a good sign that time and we should abort there.

The same message sent during runtime (like rocm-smi data collection) may 
fail with 0xFC due to workload conditions.

Adding another parameter will complicate things as we need to check 
every condition of message usage. In the present case, a message failure 
with 0xFC is treated as failure during init and we don't proceed. For a 
runtime use, apps always have a choice whether to retry the same API or not.

Thanks,
Lijo

> 
> BR
> Evan
>>
>> Thanks,
>> Lijo
>>
>>> ...
>>> ...
>>> }
>>>
>>> BR
>>> Evan
>>>>
>>>> Basically, 0xFC is not valid pre-condition check for sending any
>>>> message. As per PMFW team - it only means that PMFW was busy when a
>>>> previous message was sent and PMFW won't change the response status
>>>> when it becomes free.
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> BR
>>>>> Evan
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Quan, Evan <Evan.Quan@amd.com>
>>>>>> Sent: Friday, February 25, 2022 11:07 AM
>>>>>> To: Lazar, Lijo <Lijo.Lazar@amd.com>; amd-gfx@lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>>>> <KevinYang.Wang@amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>> This may introduce some problems for two callers scenarios. That is
>>>>>> the 2nd one will still proceed even if the 1st one was already blocked.
>>>>>> Maybe the logics here should be performed by the caller. That is
>>>>>> the caller can perform something like issuing the same message
>>>>>> again without prerequisites check on PMFW busy.
>>>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
>>>>>> another try on PMFW busy.
>>>>>>
>>>>>> BR
>>>>>> Evan
>>>>>>> -----Original Message-----
>>>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>>> Sent: Friday, February 25, 2022 12:22 PM
>>>>>>> To: amd-gfx@lists.freedesktop.org
>>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
>> Alexander
>>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
>>>>>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
>>>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
>> 0xFC
>>>>>>>
>>>>>>> When PMFW is really busy, it will respond with 0xFC. However, it
>>>>>>> doesn't change the response register state when it becomes free.
>>>>>>> Driver should retry and proceed to send message if the response
>>>>>>> status is
>>>>>> 0xFC.
>>>>>>>
>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>>>>>>>     1 file changed, 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> index 590a6ed12d54..92161b9d8c1a 100644
>>>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> @@ -297,7 +297,6 @@ int
>> smu_cmn_send_msg_without_waiting(struct
>>>>>>> smu_context *smu,
>>>>>>>     	reg = __smu_cmn_poll_stat(smu);
>>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
>>>>>>>     	if (reg == SMU_RESP_NONE ||
>>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>>>     	    res == -EREMOTEIO)
>>>>>>>     		goto Out;
>>>>>>>     	__smu_cmn_send_msg(smu, msg_index, param); @@ -
>> 391,7 +390,6
>>>>>> @@ int
>>>>>>> smu_cmn_send_smc_msg_with_param(struct
>>>>>>> smu_context *smu,
>>>>>>>     	reg = __smu_cmn_poll_stat(smu);
>>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
>>>>>>>     	if (reg == SMU_RESP_NONE ||
>>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>>>     	    res == -EREMOTEIO) {
>>>>>>>     		__smu_cmn_reg_print_error(smu, reg, index, param,
>> msg);
>>>>>>>     		goto Out;
>>>>>>> --
>>>>>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-02-25 13:59               ` Lazar, Lijo
@ 2022-03-08  5:58                 ` Lazar, Lijo
  2022-03-08  6:56                   ` Quan, Evan
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-03-08  5:58 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx; +Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[Public]

Hi Evan,

Any comments on this?

Thanks,
Lijo

-----Original Message-----
From: Lazar, Lijo 
Sent: Friday, February 25, 2022 7:30 PM
To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) <KevinYang.Wang@amd.com>
Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC



On 2/25/2022 6:33 PM, Quan, Evan wrote:
> [AMD Official Use Only]
> 
> 
> 
>> -----Original Message-----
>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>> Sent: Friday, February 25, 2022 3:43 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander 
>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
>> <KevinYang.Wang@amd.com>
>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 
>> 0xFC
>>
>>
>>
>> On 2/25/2022 1:02 PM, Quan, Evan wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>> Sent: Friday, February 25, 2022 2:03 PM
>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander 
>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
>>>> <KevinYang.Wang@amd.com>
>>>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 
>>>> 0xFC
>>>>
>>>>
>>>>
>>>> On 2/25/2022 11:25 AM, Quan, Evan wrote:
>>>>> [AMD Official Use Only]
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>> Sent: Friday, February 25, 2022 1:47 PM
>>>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-
>> gfx@lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
>>>>>> <KevinYang.Wang@amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>>> That is the caller can perform something like issuing the same 
>>>>>>> message again without prerequisites check on PMFW busy
>>>>>>
>>>>>> This patch expects this method. Caller may try to resend message 
>>>>>> again. As part of message sending, driver first checks response 
>>>>>> register. Current logic blocks sending any message if it sees 
>>>>>> 0xFC in response register, this patch is to address that.
>>>>> [Quan, Evan] Yes, I know. But the caller here could be another one.
>>>>> I mean
>>>> there may be another caller stepped in.
>>>>>
>>>>
>>>> That shouldn't cause an issue to the second caller if it got message mutex.
>>>> The second caller also should be able to send message if PMFW got 
>>>> free by that time. The first caller can retry when it gets back the 
>>>> message mutex. FW doesn't maintain any state for 0xFC response. Any 
>>>> other message may be sent after that. If driver keeps the state 
>>>> based on two callers, that is a logic problem in driver. I don't 
>>>> think we have any
>> flow like that.
>>> [Quan, Evan] Yeah, but there may be some case that messages issued 
>>> by
>> the two callers have dependence.
>>> That means the message issued by the 2nd caller should be only after 
>>> the
>> 1st one.
>>> The one i can think of is "EnableAllSmuFeatures" message should be 
>>> after
>> "SetAllowedFeatures" message.
>>> Although that should not cause any problem, I'm not sure whether 
>>> there is
>> other similar case.
>>>
>>> What I suggest is something like below. We just do it again in
>> smu_cmn_send_smc_msg_with_param() on PMFW busy.
>>>
>>> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
>>> 				    enum smu_message_type msg,
>>> 				    uint32_t param,
>>> 				    uint32_t *read_arg)
>>> {
>>> ...
>>> ...
>>> 	mutex_lock(&smu->message_lock);
>>> 	reg = __smu_cmn_poll_stat(smu);
>>> 	res = __smu_cmn_reg2errno(smu, reg);
>>> 	if (reg == SMU_RESP_NONE ||
>>> 	    reg == SMU_RESP_BUSY_OTHER ||
>>> 	    res == -EREMOTEIO) {
>>> 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
>>> 		goto Out;
>>> 	}
>>> +retry:
>>> 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
>>> 	reg = __smu_cmn_poll_stat(smu);
>>> 	res = __smu_cmn_reg2errno(smu, reg);
>>> +	if (reg == SMU_RESP_BUSY_OTHER) {
>>> +		mdelay(1);
>>> +		goto retry;
>>> +	}
>>
>> I suppose the retry option should be left to caller. Regardless of 
>> retry or not, the patch is still valid.
>>
>> Example situation -
>>
>> rocm-smi is trying to get metrics and another app is trying set 
>> performance profile. If metrics fetch fail and even retry of metrics 
>> fetch fail after some loops, rocm-smi is free to fetch the metrics 
>> again after say 5s. That also shouldn't prevent the second app to 
>> send performance profile message and that app also may retry the same later.
> [Quan, Evan] I have no doubt the second app may still be able to send performance profile message.
> However, the metrics data retrieved by rocm-smi will be different 
> under such scenario > That is after performance profile setting the 
> clock frequencies may
go up.
> If the first app is sensitive to that(suppose it wants to compare the 
> clock frequencies before and after performance profile setting), that 
> will be a problem

Ah, my intention was to tell about a different use case, probably a bad example.

What I intended is -

rocm-smi is running periodically collecting metrics data and another app which doesn't bother about the background data collection trying to do something else. Thus a metrics fetch fail shouldn't prevent the other app from changing performance profile or whatever it intended to do. 
Also, rocm-smi may be able to fetch data during the next polling interval when the PMFW gets relatively free.
> 
> I reconsider this a bit. Can we add one more parameter for smu_cmn_send_smc_msg_with_param()?
> That parameter can tell what the caller wants(retry or abort) under PMFW busy scenario.

There is one complication to this. The same message could return 0xFC under different conditions, and it may be fine for situation A but not for situation B. For ex: during driver load or init after a reset, we don't expect PMFW to be busy.Driver may send message to get metrics data to check say 'smart shift is enabled or not'. A failure with 0xFC is not a good sign that time and we should abort there.

The same message sent during runtime (like rocm-smi data collection) may fail with 0xFC due to workload conditions.

Adding another parameter will complicate things as we need to check every condition of message usage. In the present case, a message failure with 0xFC is treated as failure during init and we don't proceed. For a runtime use, apps always have a choice whether to retry the same API or not.

Thanks,
Lijo

> 
> BR
> Evan
>>
>> Thanks,
>> Lijo
>>
>>> ...
>>> ...
>>> }
>>>
>>> BR
>>> Evan
>>>>
>>>> Basically, 0xFC is not valid pre-condition check for sending any 
>>>> message. As per PMFW team - it only means that PMFW was busy when a 
>>>> previous message was sent and PMFW won't change the response status 
>>>> when it becomes free.
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> BR
>>>>> Evan
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Quan, Evan <Evan.Quan@amd.com>
>>>>>> Sent: Friday, February 25, 2022 11:07 AM
>>>>>> To: Lazar, Lijo <Lijo.Lazar@amd.com>; 
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander 
>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
>>>>>> <KevinYang.Wang@amd.com>
>>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 
>>>>>> 0xFC
>>>>>>
>>>>>> [AMD Official Use Only]
>>>>>>
>>>>>> This may introduce some problems for two callers scenarios. That 
>>>>>> is the 2nd one will still proceed even if the 1st one was already blocked.
>>>>>> Maybe the logics here should be performed by the caller. That is 
>>>>>> the caller can perform something like issuing the same message 
>>>>>> again without prerequisites check on PMFW busy.
>>>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it 
>>>>>> another try on PMFW busy.
>>>>>>
>>>>>> BR
>>>>>> Evan
>>>>>>> -----Original Message-----
>>>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
>>>>>>> Sent: Friday, February 25, 2022 12:22 PM
>>>>>>> To: amd-gfx@lists.freedesktop.org
>>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
>> Alexander
>>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin) 
>>>>>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
>>>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
>> 0xFC
>>>>>>>
>>>>>>> When PMFW is really busy, it will respond with 0xFC. However, it 
>>>>>>> doesn't change the response register state when it becomes free.
>>>>>>> Driver should retry and proceed to send message if the response 
>>>>>>> status is
>>>>>> 0xFC.
>>>>>>>
>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
>>>>>>>     1 file changed, 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> index 590a6ed12d54..92161b9d8c1a 100644
>>>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
>>>>>>> @@ -297,7 +297,6 @@ int
>> smu_cmn_send_msg_without_waiting(struct
>>>>>>> smu_context *smu,
>>>>>>>     	reg = __smu_cmn_poll_stat(smu);
>>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
>>>>>>>     	if (reg == SMU_RESP_NONE ||
>>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>>>     	    res == -EREMOTEIO)
>>>>>>>     		goto Out;
>>>>>>>     	__smu_cmn_send_msg(smu, msg_index, param); @@ -
>> 391,7 +390,6
>>>>>> @@ int
>>>>>>> smu_cmn_send_smc_msg_with_param(struct
>>>>>>> smu_context *smu,
>>>>>>>     	reg = __smu_cmn_poll_stat(smu);
>>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
>>>>>>>     	if (reg == SMU_RESP_NONE ||
>>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
>>>>>>>     	    res == -EREMOTEIO) {
>>>>>>>     		__smu_cmn_reg_print_error(smu, reg, index, param,
>> msg);
>>>>>>>     		goto Out;
>>>>>>> --
>>>>>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
  2022-03-08  5:58                 ` Lazar, Lijo
@ 2022-03-08  6:56                   ` Quan, Evan
  0 siblings, 0 replies; 11+ messages in thread
From: Quan, Evan @ 2022-03-08  6:56 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Deucher, Alexander, Wang, Yang(Kevin), Zhang, Hawking

[Public]

Sorry for missing this. Busy with new asic bringup.
Let's adopt this for now and we can revise it later if we do see some corner case.
Reviewed-by: Evan Quan <evan.quan@amd.com>

Thanks,
Evan
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Tuesday, March 8, 2022 1:58 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> [Public]
> 
> Hi Evan,
> 
> Any comments on this?
> 
> Thanks,
> Lijo
> 
> -----Original Message-----
> From: Lazar, Lijo
> Sent: Friday, February 25, 2022 7:30 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> <KevinYang.Wang@amd.com>
> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is 0xFC
> 
> 
> 
> On 2/25/2022 6:33 PM, Quan, Evan wrote:
> > [AMD Official Use Only]
> >
> >
> >
> >> -----Original Message-----
> >> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >> Sent: Friday, February 25, 2022 3:43 PM
> >> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> >> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >> <KevinYang.Wang@amd.com>
> >> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>
> >>
> >>
> >> On 2/25/2022 1:02 PM, Quan, Evan wrote:
> >>> [AMD Official Use Only]
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>>> Sent: Friday, February 25, 2022 2:03 PM
> >>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-
> gfx@lists.freedesktop.org
> >>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher, Alexander
> >>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>> <KevinYang.Wang@amd.com>
> >>>> Subject: Re: [PATCH] drm/amd/pm: Send message when resp status is
> >>>> 0xFC
> >>>>
> >>>>
> >>>>
> >>>> On 2/25/2022 11:25 AM, Quan, Evan wrote:
> >>>>> [AMD Official Use Only]
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>>>>> Sent: Friday, February 25, 2022 1:47 PM
> >>>>>> To: Quan, Evan <Evan.Quan@amd.com>; amd-
> >> gfx@lists.freedesktop.org
> >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
> Alexander
> >>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>>>> <KevinYang.Wang@amd.com>
> >>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status
> is
> >>>>>> 0xFC
> >>>>>>
> >>>>>> [AMD Official Use Only]
> >>>>>>
> >>>>>>> That is the caller can perform something like issuing the same
> >>>>>>> message again without prerequisites check on PMFW busy
> >>>>>>
> >>>>>> This patch expects this method. Caller may try to resend message
> >>>>>> again. As part of message sending, driver first checks response
> >>>>>> register. Current logic blocks sending any message if it sees
> >>>>>> 0xFC in response register, this patch is to address that.
> >>>>> [Quan, Evan] Yes, I know. But the caller here could be another one.
> >>>>> I mean
> >>>> there may be another caller stepped in.
> >>>>>
> >>>>
> >>>> That shouldn't cause an issue to the second caller if it got message
> mutex.
> >>>> The second caller also should be able to send message if PMFW got
> >>>> free by that time. The first caller can retry when it gets back the
> >>>> message mutex. FW doesn't maintain any state for 0xFC response. Any
> >>>> other message may be sent after that. If driver keeps the state
> >>>> based on two callers, that is a logic problem in driver. I don't
> >>>> think we have any
> >> flow like that.
> >>> [Quan, Evan] Yeah, but there may be some case that messages issued
> >>> by
> >> the two callers have dependence.
> >>> That means the message issued by the 2nd caller should be only after
> >>> the
> >> 1st one.
> >>> The one i can think of is "EnableAllSmuFeatures" message should be
> >>> after
> >> "SetAllowedFeatures" message.
> >>> Although that should not cause any problem, I'm not sure whether
> >>> there is
> >> other similar case.
> >>>
> >>> What I suggest is something like below. We just do it again in
> >> smu_cmn_send_smc_msg_with_param() on PMFW busy.
> >>>
> >>> int smu_cmn_send_smc_msg_with_param(struct smu_context *smu,
> >>> 				    enum smu_message_type msg,
> >>> 				    uint32_t param,
> >>> 				    uint32_t *read_arg)
> >>> {
> >>> ...
> >>> ...
> >>> 	mutex_lock(&smu->message_lock);
> >>> 	reg = __smu_cmn_poll_stat(smu);
> >>> 	res = __smu_cmn_reg2errno(smu, reg);
> >>> 	if (reg == SMU_RESP_NONE ||
> >>> 	    reg == SMU_RESP_BUSY_OTHER ||
> >>> 	    res == -EREMOTEIO) {
> >>> 		__smu_cmn_reg_print_error(smu, reg, index, param, msg);
> >>> 		goto Out;
> >>> 	}
> >>> +retry:
> >>> 	__smu_cmn_send_msg(smu, (uint16_t) index, param);
> >>> 	reg = __smu_cmn_poll_stat(smu);
> >>> 	res = __smu_cmn_reg2errno(smu, reg);
> >>> +	if (reg == SMU_RESP_BUSY_OTHER) {
> >>> +		mdelay(1);
> >>> +		goto retry;
> >>> +	}
> >>
> >> I suppose the retry option should be left to caller. Regardless of
> >> retry or not, the patch is still valid.
> >>
> >> Example situation -
> >>
> >> rocm-smi is trying to get metrics and another app is trying set
> >> performance profile. If metrics fetch fail and even retry of metrics
> >> fetch fail after some loops, rocm-smi is free to fetch the metrics
> >> again after say 5s. That also shouldn't prevent the second app to
> >> send performance profile message and that app also may retry the same
> later.
> > [Quan, Evan] I have no doubt the second app may still be able to send
> performance profile message.
> > However, the metrics data retrieved by rocm-smi will be different
> > under such scenario > That is after performance profile setting the
> > clock frequencies may
> go up.
> > If the first app is sensitive to that(suppose it wants to compare the
> > clock frequencies before and after performance profile setting), that
> > will be a problem
> 
> Ah, my intention was to tell about a different use case, probably a bad
> example.
> 
> What I intended is -
> 
> rocm-smi is running periodically collecting metrics data and another app
> which doesn't bother about the background data collection trying to do
> something else. Thus a metrics fetch fail shouldn't prevent the other app
> from changing performance profile or whatever it intended to do.
> Also, rocm-smi may be able to fetch data during the next polling interval
> when the PMFW gets relatively free.
> >
> > I reconsider this a bit. Can we add one more parameter for
> smu_cmn_send_smc_msg_with_param()?
> > That parameter can tell what the caller wants(retry or abort) under PMFW
> busy scenario.
> 
> There is one complication to this. The same message could return 0xFC under
> different conditions, and it may be fine for situation A but not for situation B.
> For ex: during driver load or init after a reset, we don't expect PMFW to be
> busy.Driver may send message to get metrics data to check say 'smart shift is
> enabled or not'. A failure with 0xFC is not a good sign that time and we
> should abort there.
> 
> The same message sent during runtime (like rocm-smi data collection) may
> fail with 0xFC due to workload conditions.
> 
> Adding another parameter will complicate things as we need to check every
> condition of message usage. In the present case, a message failure with 0xFC
> is treated as failure during init and we don't proceed. For a runtime use, apps
> always have a choice whether to retry the same API or not.
> 
> Thanks,
> Lijo
> 
> >
> > BR
> > Evan
> >>
> >> Thanks,
> >> Lijo
> >>
> >>> ...
> >>> ...
> >>> }
> >>>
> >>> BR
> >>> Evan
> >>>>
> >>>> Basically, 0xFC is not valid pre-condition check for sending any
> >>>> message. As per PMFW team - it only means that PMFW was busy
> when a
> >>>> previous message was sent and PMFW won't change the response
> status
> >>>> when it becomes free.
> >>>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>> BR
> >>>>> Evan
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Lijo
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Quan, Evan <Evan.Quan@amd.com>
> >>>>>> Sent: Friday, February 25, 2022 11:07 AM
> >>>>>> To: Lazar, Lijo <Lijo.Lazar@amd.com>;
> >>>>>> amd-gfx@lists.freedesktop.org
> >>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
> Alexander
> >>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>>>> <KevinYang.Wang@amd.com>
> >>>>>> Subject: RE: [PATCH] drm/amd/pm: Send message when resp status
> is
> >>>>>> 0xFC
> >>>>>>
> >>>>>> [AMD Official Use Only]
> >>>>>>
> >>>>>> This may introduce some problems for two callers scenarios. That
> >>>>>> is the 2nd one will still proceed even if the 1st one was already
> blocked.
> >>>>>> Maybe the logics here should be performed by the caller. That is
> >>>>>> the caller can perform something like issuing the same message
> >>>>>> again without prerequisites check on PMFW busy.
> >>>>>> Or we can just update the smu_cmn_send_smc_msg APIs to give it
> >>>>>> another try on PMFW busy.
> >>>>>>
> >>>>>> BR
> >>>>>> Evan
> >>>>>>> -----Original Message-----
> >>>>>>> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> >>>>>>> Sent: Friday, February 25, 2022 12:22 PM
> >>>>>>> To: amd-gfx@lists.freedesktop.org
> >>>>>>> Cc: Zhang, Hawking <Hawking.Zhang@amd.com>; Deucher,
> >> Alexander
> >>>>>>> <Alexander.Deucher@amd.com>; Wang, Yang(Kevin)
> >>>>>>> <KevinYang.Wang@amd.com>; Quan, Evan <Evan.Quan@amd.com>
> >>>>>>> Subject: [PATCH] drm/amd/pm: Send message when resp status is
> >> 0xFC
> >>>>>>>
> >>>>>>> When PMFW is really busy, it will respond with 0xFC. However, it
> >>>>>>> doesn't change the response register state when it becomes free.
> >>>>>>> Driver should retry and proceed to send message if the response
> >>>>>>> status is
> >>>>>> 0xFC.
> >>>>>>>
> >>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
> >>>>>>> ---
> >>>>>>>     drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 --
> >>>>>>>     1 file changed, 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>>>> b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>>>> index 590a6ed12d54..92161b9d8c1a 100644
> >>>>>>> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>>>> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> >>>>>>> @@ -297,7 +297,6 @@ int
> >> smu_cmn_send_msg_without_waiting(struct
> >>>>>>> smu_context *smu,
> >>>>>>>     	reg = __smu_cmn_poll_stat(smu);
> >>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>>>     	if (reg == SMU_RESP_NONE ||
> >>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>>>     	    res == -EREMOTEIO)
> >>>>>>>     		goto Out;
> >>>>>>>     	__smu_cmn_send_msg(smu, msg_index, param); @@ -
> >> 391,7 +390,6
> >>>>>> @@ int
> >>>>>>> smu_cmn_send_smc_msg_with_param(struct
> >>>>>>> smu_context *smu,
> >>>>>>>     	reg = __smu_cmn_poll_stat(smu);
> >>>>>>>     	res = __smu_cmn_reg2errno(smu, reg);
> >>>>>>>     	if (reg == SMU_RESP_NONE ||
> >>>>>>> -	    reg == SMU_RESP_BUSY_OTHER ||
> >>>>>>>     	    res == -EREMOTEIO) {
> >>>>>>>     		__smu_cmn_reg_print_error(smu, reg, index, param,
> >> msg);
> >>>>>>>     		goto Out;
> >>>>>>> --
> >>>>>>> 2.25.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-03-08  6:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25  4:21 [PATCH] drm/amd/pm: Send message when resp status is 0xFC Lijo Lazar
2022-02-25  5:36 ` Quan, Evan
2022-02-25  5:46   ` Lazar, Lijo
2022-02-25  5:55     ` Quan, Evan
2022-02-25  6:02       ` Lazar, Lijo
2022-02-25  7:32         ` Quan, Evan
2022-02-25  7:43           ` Lazar, Lijo
2022-02-25 13:03             ` Quan, Evan
2022-02-25 13:59               ` Lazar, Lijo
2022-03-08  5:58                 ` Lazar, Lijo
2022-03-08  6:56                   ` Quan, Evan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.