All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
@ 2022-04-12  4:08 Darren Powell
  2022-04-12  4:19 ` Lazar, Lijo
  2022-04-12 21:09 ` Luben Tuikov
  0 siblings, 2 replies; 10+ messages in thread
From: Darren Powell @ 2022-04-12  4:08 UTC (permalink / raw)
  To: amd-gfx
  Cc: luben.tuikov, evan.quan, Darren Powell, andrey.grodzovsky, pmenzel

Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
cases exist where messages are silently dropped with no error returned
to the caller. These cases occur in unusual situations where either:
 1. the message target is a virtual GPU, or
 2. a PCI recovery is underway and the HW is not yet in sync with the SW

For more details see
 commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
 commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")

(v2)
  Reworked with suggestions from Luben & Paul

Signed-off-by: Darren Powell <darren.powell@amd.com>
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..8008ae5508e6 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
  * completion of the command, and return back a value from the SMU in
  * @read_arg pointer.
  *
- * Return 0 on success, -errno on error, if we weren't able to send
- * the message or if the message completed with some kind of
- * error. See __smu_cmn_reg2errno() for details of the -errno.
+ * Return 0 on success, -errno when a problem is encountered sending
+ * message or receiving reply. If there is a PCI bus recovery or
+ * the destination is a virtual GPU, the message is simply dropped and
+ * success is also returned.
+ * See __smu_cmn_reg2errno() for details of the -errno.
  *
  * If we weren't able to send the message to the SMU, we also print
  * the error to the standard log.

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
  2022-04-12  4:08 [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg Darren Powell
@ 2022-04-12  4:19 ` Lazar, Lijo
  2022-04-13  2:48   ` Powell, Darren
  2022-04-12 21:09 ` Luben Tuikov
  1 sibling, 1 reply; 10+ messages in thread
From: Lazar, Lijo @ 2022-04-12  4:19 UTC (permalink / raw)
  To: Darren Powell, amd-gfx
  Cc: andrey.grodzovsky, luben.tuikov, evan.quan, pmenzel



On 4/12/2022 9:38 AM, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>   1. the message target is a virtual GPU, or

This is not fully correct - only messages which are not valid for 
virtual GPU are dropped, not all.

Thanks,
Lijo

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
> For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
> 
> (v2)
>    Reworked with suggestions from Luben & Paul
> 
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>    * completion of the command, and return back a value from the SMU in
>    * @read_arg pointer.
>    *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>    *
>    * If we weren't able to send the message to the SMU, we also print
>    * the error to the standard log.
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
  2022-04-12  4:08 [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg Darren Powell
  2022-04-12  4:19 ` Lazar, Lijo
@ 2022-04-12 21:09 ` Luben Tuikov
  2022-04-13  2:50   ` Powell, Darren
  1 sibling, 1 reply; 10+ messages in thread
From: Luben Tuikov @ 2022-04-12 21:09 UTC (permalink / raw)
  To: Darren Powell, amd-gfx; +Cc: pmenzel, evan.quan, andrey.grodzovsky

I suppose I didn't quite register this on a first read:

On 2022-04-12 00:08, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two

I'd just say

  Clarify the documentation to also mention that we drop
  messages and return success in the following two cases:
     1. ...

That "Contrary to the ..." sounds a bit harsh.

Regards,
Luben

> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>  1. the message target is a virtual GPU, or
>  2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
> For more details see
>  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
> 
> (v2)
>   Reworked with suggestions from Luben & Paul
> 
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
>   * the error to the standard log.
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
-- 
Luben

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
  2022-04-12  4:19 ` Lazar, Lijo
@ 2022-04-13  2:48   ` Powell, Darren
  0 siblings, 0 replies; 10+ messages in thread
From: Powell, Darren @ 2022-04-13  2:48 UTC (permalink / raw)
  To: Lazar, Lijo, amd-gfx
  Cc: Grodzovsky, Andrey, Tuikov, Luben, Quan, Evan, pmenzel

[-- Attachment #1: Type: text/plain, Size: 2749 bytes --]

[AMD Official Use Only]

I needed to dig further down to find the message map, I had been looking back in mailing list looking for clarification but hadn't found anything.
Will reword
Thanks
Darren
________________________________
From: Lazar, Lijo <Lijo.Lazar@amd.com>
Sent: Tuesday, April 12, 2022 12:19 AM
To: Powell, Darren <Darren.Powell@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Tuikov, Luben <Luben.Tuikov@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; pmenzel@molgen.mpg.de <pmenzel@molgen.mpg.de>
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg



On 4/12/2022 9:38 AM, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>   1. the message target is a virtual GPU, or

This is not fully correct - only messages which are not valid for
virtual GPU are dropped, not all.

Thanks,
Lijo

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
> For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
>
> (v2)
>    Reworked with suggestions from Luben & Paul
>
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>    * completion of the command, and return back a value from the SMU in
>    * @read_arg pointer.
>    *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>    *
>    * If we weren't able to send the message to the SMU, we also print
>    * the error to the standard log.
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
>

[-- Attachment #2: Type: text/html, Size: 4465 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
  2022-04-12 21:09 ` Luben Tuikov
@ 2022-04-13  2:50   ` Powell, Darren
  0 siblings, 0 replies; 10+ messages in thread
From: Powell, Darren @ 2022-04-13  2:50 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx; +Cc: pmenzel, Quan, Evan, Grodzovsky, Andrey

[-- Attachment #1: Type: text/plain, Size: 2798 bytes --]

[AMD Official Use Only]

Yes, it looks like I was a little snippy writing that intro, will lighten the grammar.
Thanks
Darren
________________________________
From: Tuikov, Luben <Luben.Tuikov@amd.com>
Sent: Tuesday, April 12, 2022 5:09 PM
To: Powell, Darren <Darren.Powell@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Quan, Evan <Evan.Quan@amd.com>; pmenzel@molgen.mpg.de <pmenzel@molgen.mpg.de>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

I suppose I didn't quite register this on a first read:

On 2022-04-12 00:08, Darren Powell wrote:
> Contrary to the smu_cmn_send_smc_msg_with_param documentation, two

I'd just say

  Clarify the documentation to also mention that we drop
  messages and return success in the following two cases:
     1. ...

That "Contrary to the ..." sounds a bit harsh.

Regards,
Luben

> cases exist where messages are silently dropped with no error returned
> to the caller. These cases occur in unusual situations where either:
>  1. the message target is a virtual GPU, or
>  2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
> For more details see
>  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
>
> (v2)
>   Reworked with suggestions from Luben & Paul
>
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..8008ae5508e6 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,9 +356,11 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> - * the message or if the message completed with some kind of
> - * error. See __smu_cmn_reg2errno() for details of the -errno.
> + * Return 0 on success, -errno when a problem is encountered sending
> + * message or receiving reply. If there is a PCI bus recovery or
> + * the destination is a virtual GPU, the message is simply dropped and
> + * success is also returned.
> + * See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
>   * the error to the standard log.
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
--
Luben

[-- Attachment #2: Type: text/html, Size: 4400 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
  2022-04-08  6:29 ` Paul Menzel
@ 2022-04-08 20:55   ` Powell, Darren
  0 siblings, 0 replies; 10+ messages in thread
From: Powell, Darren @ 2022-04-08 20:55 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Grodzovsky, Andrey, Tuikov, Luben, Quan, Evan, amd-gfx

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]

[AMD Official Use Only]

will respin and incorporate your suggestions
Thanks,
Darren
________________________________
From: Paul Menzel <pmenzel@molgen.mpg.de>
Sent: Friday, April 8, 2022 2:29 AM
To: Powell, Darren <Darren.Powell@amd.com>
Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; Tuikov, Luben <Luben.Tuikov@amd.com>; Quan, Evan <Evan.Quan@amd.com>; wenhui.sheng@amd.com <wenhui.sheng@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

Dear Darren,


Thank you for your patch.

Am 08.04.22 um 04:26 schrieb Darren Powell:
>   Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>   cases exist where messages are silently dropped with no error returned
>   to the caller. These cases occur in unusual situations where either:
>    1. the caller is a virtual GPU, or
>    2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
>   For more details see
>    commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>    commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")

Please remove the indentation. If you re-rolled the patch, you could
also spell *documentation* lowercase in the commit message summary.

> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>    * completion of the command, and return back a value from the SMU in
>    * @read_arg pointer.
>    *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send
>    * the message or if the message completed with some kind of
>    * error. See __smu_cmn_reg2errno() for details of the -errno.
>    *
>    * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.
>    *
>    * Command completion status is printed only if the -errno is
>    * -EREMOTEIO, indicating that the SMU returned back an
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

The diff looks good – despite Mozilla Thunderbird quoting it strangely.


Kind regards,

Paul

[-- Attachment #2: Type: text/html, Size: 4484 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
  2022-04-08 13:33 ` Luben Tuikov
@ 2022-04-08 20:54   ` Powell, Darren
  0 siblings, 0 replies; 10+ messages in thread
From: Powell, Darren @ 2022-04-08 20:54 UTC (permalink / raw)
  To: Tuikov, Luben, amd-gfx; +Cc: Grodzovsky, Andrey, Quan, Evan

[-- Attachment #1: Type: text/plain, Size: 3068 bytes --]

[AMD Official Use Only]

inline

________________________________
From: Tuikov, Luben <Luben.Tuikov@amd.com>
Sent: Friday, April 8, 2022 9:33 AM
To: Powell, Darren <Darren.Powell@amd.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
Cc: Quan, Evan <Evan.Quan@amd.com>; Wenhui.Sheng@amd.com <Wenhui.Sheng@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Subject: Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg

I'd add who and how is the message dropped, and also mention that we're unable
to recognize a dropped message.

On 2022-04-07 22:26, Darren Powell wrote:
>  Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>  cases exist where messages are silently dropped with no error returned
>  to the caller. These cases occur in unusual situations where either:
>   1. the caller is a virtual GPU, or

The caller? Isn't this code executed on a CPU sending to the SMU (which lives on a GPU)?
[DP] Great point, will fix

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
>
>  For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
>
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send

Something like this:

  Return 0 on success, -errno on error. If the message was dropped
  due to PCI bus recovery or sending to a virtual GPU, we're unable
  to detect this and success is also returned.

>   * the message or if the message completed with some kind of
>   * error. See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.

This is a moot point with the clarification I suggested above and I'd remove that.
[DP] sounds more succinct, will address in v2

>   *
>   * Command completion status is printed only if the -errno is
>   * -EREMOTEIO, indicating that the SMU returned back an
>
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
--
Luben

[-- Attachment #2: Type: text/html, Size: 4664 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
  2022-04-08  2:26 [PATCH 1/1] amdgpu/pm: Clarify Documentation " Darren Powell
  2022-04-08  6:29 ` Paul Menzel
@ 2022-04-08 13:33 ` Luben Tuikov
  2022-04-08 20:54   ` Powell, Darren
  1 sibling, 1 reply; 10+ messages in thread
From: Luben Tuikov @ 2022-04-08 13:33 UTC (permalink / raw)
  To: Darren Powell, amd-gfx; +Cc: andrey.grodzovsky, evan.quan, Wenhui.Sheng

I'd add who and how is the message dropped, and also mention that we're unable
to recognize a dropped message.

On 2022-04-07 22:26, Darren Powell wrote:
>  Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>  cases exist where messages are silently dropped with no error returned
>  to the caller. These cases occur in unusual situations where either:
>   1. the caller is a virtual GPU, or

The caller? Isn't this code executed on a CPU sending to the SMU (which lives on a GPU)?

>   2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
>  For more details see
>   commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>   commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")
> 
> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>   * completion of the command, and return back a value from the SMU in
>   * @read_arg pointer.
>   *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send

Something like this:

  Return 0 on success, -errno on error. If the message was dropped
  due to PCI bus recovery or sending to a virtual GPU, we're unable
  to detect this and success is also returned.

>   * the message or if the message completed with some kind of
>   * error. See __smu_cmn_reg2errno() for details of the -errno.
>   *
>   * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.

This is a moot point with the clarification I suggested above and I'd remove that.

>   *
>   * Command completion status is printed only if the -errno is
>   * -EREMOTEIO, indicating that the SMU returned back an
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

Regards,
-- 
Luben

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
  2022-04-08  2:26 [PATCH 1/1] amdgpu/pm: Clarify Documentation " Darren Powell
@ 2022-04-08  6:29 ` Paul Menzel
  2022-04-08 20:55   ` Powell, Darren
  2022-04-08 13:33 ` Luben Tuikov
  1 sibling, 1 reply; 10+ messages in thread
From: Paul Menzel @ 2022-04-08  6:29 UTC (permalink / raw)
  To: Darren Powell
  Cc: andrey.grodzovsky, luben.tuikov, evan.quan, amd-gfx, wenhui.sheng

Dear Darren,


Thank you for your patch.

Am 08.04.22 um 04:26 schrieb Darren Powell:
>   Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
>   cases exist where messages are silently dropped with no error returned
>   to the caller. These cases occur in unusual situations where either:
>    1. the caller is a virtual GPU, or
>    2. a PCI recovery is underway and the HW is not yet in sync with the SW
> 
>   For more details see
>    commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
>    commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")

Please remove the indentation. If you re-rolled the patch, you could 
also spell *documentation* lowercase in the commit message summary.

> Signed-off-by: Darren Powell <darren.powell@amd.com>
> ---
>   drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> index b8d0c70ff668..b1bd1990c88b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
> @@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
>    * completion of the command, and return back a value from the SMU in
>    * @read_arg pointer.
>    *
> - * Return 0 on success, -errno on error, if we weren't able to send
> + * Return 0 on success, or if the message is dropped.
> + * On error, -errno is returned if we weren't able to send
>    * the message or if the message completed with some kind of
>    * error. See __smu_cmn_reg2errno() for details of the -errno.
>    *
>    * If we weren't able to send the message to the SMU, we also print
> - * the error to the standard log.
> + * the error to the standard log. Dropped messages can be caused
> + * due to PCI slot recovery or attempting to send from a virtual GPU,
> + * and do not print an error.
>    *
>    * Command completion status is printed only if the -errno is
>    * -EREMOTEIO, indicating that the SMU returned back an
> 
> base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7

The diff looks good – despite Mozilla Thunderbird quoting it strangely.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] amdgpu/pm: Clarify Documentation of error handling in send_smc_mesg
@ 2022-04-08  2:26 Darren Powell
  2022-04-08  6:29 ` Paul Menzel
  2022-04-08 13:33 ` Luben Tuikov
  0 siblings, 2 replies; 10+ messages in thread
From: Darren Powell @ 2022-04-08  2:26 UTC (permalink / raw)
  To: amd-gfx
  Cc: luben.tuikov, evan.quan, Darren Powell, Wenhui.Sheng, andrey.grodzovsky

 Contrary to the smu_cmn_send_smc_msg_with_param documentation, two
 cases exist where messages are silently dropped with no error returned
 to the caller. These cases occur in unusual situations where either:
  1. the caller is a virtual GPU, or
  2. a PCI recovery is underway and the HW is not yet in sync with the SW

 For more details see
  commit 4ea5081c82c4 ("drm/amd/powerplay: enable SMC message filter")
  commit bf36b52e781d ("drm/amdgpu: Avoid accessing HW when suspending SW state")

Signed-off-by: Darren Powell <darren.powell@amd.com>
---
 drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index b8d0c70ff668..b1bd1990c88b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -356,12 +356,15 @@ int smu_cmn_wait_for_response(struct smu_context *smu)
  * completion of the command, and return back a value from the SMU in
  * @read_arg pointer.
  *
- * Return 0 on success, -errno on error, if we weren't able to send
+ * Return 0 on success, or if the message is dropped.
+ * On error, -errno is returned if we weren't able to send
  * the message or if the message completed with some kind of
  * error. See __smu_cmn_reg2errno() for details of the -errno.
  *
  * If we weren't able to send the message to the SMU, we also print
- * the error to the standard log.
+ * the error to the standard log. Dropped messages can be caused
+ * due to PCI slot recovery or attempting to send from a virtual GPU,
+ * and do not print an error.
  *
  * Command completion status is printed only if the -errno is
  * -EREMOTEIO, indicating that the SMU returned back an

base-commit: 4585c45a6a66cb17cc97f4370457503746e540b7
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-13  2:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12  4:08 [PATCH 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg Darren Powell
2022-04-12  4:19 ` Lazar, Lijo
2022-04-13  2:48   ` Powell, Darren
2022-04-12 21:09 ` Luben Tuikov
2022-04-13  2:50   ` Powell, Darren
  -- strict thread matches above, loose matches on Subject: below --
2022-04-08  2:26 [PATCH 1/1] amdgpu/pm: Clarify Documentation " Darren Powell
2022-04-08  6:29 ` Paul Menzel
2022-04-08 20:55   ` Powell, Darren
2022-04-08 13:33 ` Luben Tuikov
2022-04-08 20:54   ` Powell, Darren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.