All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
@ 2021-12-10 11:41 chen gong
  2021-12-10 12:25 ` Lazar, Lijo
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: chen gong @ 2021-12-10 11:41 UTC (permalink / raw)
  To: amd-gfx; +Cc: Alexander.Deucher, James.Zhu, leo.liu, evan.quan, chen gong

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<vcn_v1_0> failed -110
amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

Signed-off-by: chen gong <curry.gong@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
 {
 	int r;
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	bool cancel_success;
+
+	cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
+	if (cancel_success) {
+		if (adev->pm.dpm_enabled)
+			amdgpu_dpm_enable_uvd(adev, false);
+	}
 
 	r = vcn_v1_0_hw_fini(adev);
 	if (r)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 11:41 [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled chen gong
@ 2021-12-10 12:25 ` Lazar, Lijo
  2021-12-10 16:19   ` Quan, Evan
  2021-12-10 16:06 ` Quan, Evan
  2021-12-10 21:07 ` James Zhu
  2 siblings, 1 reply; 14+ messages in thread
From: Lazar, Lijo @ 2021-12-10 12:25 UTC (permalink / raw)
  To: chen gong, amd-gfx; +Cc: Alexander.Deucher, James.Zhu, leo.liu, evan.quan



On 12/10/2021 5:11 PM, chen gong wrote:
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
> test. When resume, the following error will be reported:
> 
> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> vcn_dec test failed (-110)
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
> <vcn_v1_0> failed -110
> amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> 
> [why]
> When playing the video: The power state flag of the vcn block is set to
> POWER_STATE_ON.
> 
> When doing suspend: There is no change to the power state flag of the
> vcn block, it is still POWER_STATE_ON.
> 
> When doing resume: Need to open the power gate of the vcn block and set
> the power state flag of the VCN block to POWER_STATE_ON.
> But at this time, the power state flag of the vcn block is already
> POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
> avoid duplicate powergate/ungate setting" patch will return the
> amdgpu_dpm_set_powergating_by_smu function directly.
> As a result, the gate of the power was not opened, causing the
> subsequent ring test to fail.
> 
> [how]
> In the suspend function of the vcn block, explicitly change the power
> state flag of the vcn block to POWER_STATE_OFF.
> 
> Signed-off-by: chen gong <curry.gong@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index d54d720..d73676b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>   {
>   	int r;
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	bool cancel_success;
> +
> +	cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
> +	if (cancel_success) {
> +		if (adev->pm.dpm_enabled)
> +			amdgpu_dpm_enable_uvd(adev, false);
> +	}
>   

Probably this is a common issue. Can you try moving this to 
amdgpu_vcn_suspend?

if (adev->pm.dpm_enabled)
    amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);

Call this after cancel_delayed_work_sync. Shouldn't have any effect if 
idle work already put it in PG state. Evan, what do you think?

Thanks,
Lijo

>   	r = vcn_v1_0_hw_fini(adev);
>   	if (r)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 11:41 [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled chen gong
  2021-12-10 12:25 ` Lazar, Lijo
@ 2021-12-10 16:06 ` Quan, Evan
  2021-12-10 19:50   ` Alex Deucher
  2021-12-10 21:07 ` James Zhu
  2 siblings, 1 reply; 14+ messages in thread
From: Quan, Evan @ 2021-12-10 16:06 UTC (permalink / raw)
  To: Gong, Curry, amd-gfx, Deucher, Alexander
  Cc: Zhu, James, Liu, Leo, Gong, Curry

[AMD Official Use Only]

Hi Curry,

Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan <evan.quan@amd.com>

@Deucher, Alexander this should be able address the issue reported by https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm this?

BR
Evan
> -----Original Message-----
> From: chen gong <curry.gong@amd.com>
> Sent: Friday, December 10, 2021 7:42 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; Quan,
> Evan <Evan.Quan@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Gong, Curry <Curry.Gong@amd.com>
> Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
> test. When resume, the following error will be reported:
> 
> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> ring
> vcn_dec test failed (-110)
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> block
> <vcn_v1_0> failed -110
> amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> 
> [why]
> When playing the video: The power state flag of the vcn block is set to
> POWER_STATE_ON.
> 
> When doing suspend: There is no change to the power state flag of the
> vcn block, it is still POWER_STATE_ON.
> 
> When doing resume: Need to open the power gate of the vcn block and set
> the power state flag of the VCN block to POWER_STATE_ON.
> But at this time, the power state flag of the vcn block is already
> POWER_STATE_ON. The power status flag check in the "8f2cdef
> drm/amd/pm:
> avoid duplicate powergate/ungate setting" patch will return the
> amdgpu_dpm_set_powergating_by_smu function directly.
> As a result, the gate of the power was not opened, causing the
> subsequent ring test to fail.
Better to update this as some descriptions  below:
adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we rely on it to do the VCN gate(power down).
However, if the VCN ring is still using on suspend kicked, there will be no chance for adev->vcn.idle_work to do the VCN gate.
That will make driver wrongly treat VCN as ungated(power on) and prevent further VCN ungate(power on) operation(which is actually needed) on resume.
> 
> [how]
> In the suspend function of the vcn block, explicitly change the power
> state flag of the vcn block to POWER_STATE_OFF.
Maybe some descriptions as below is better:
Manually do the VCN gate(power down) in the suspend routine if adev->vcn.idle_work does not(have no chance) do that. 
> 
> Signed-off-by: chen gong <curry.gong@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index d54d720..d73676b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>  {
>  	int r;
>  	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	bool cancel_success;
This seems not a good naming since the cancel always succeed. The difference is whether the idle_work get actually executed.
 Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"?
> +
> +	cancel_success = cancel_delayed_work_sync(&adev-
> >vcn.idle_work);
> +	if (cancel_success) {
> +		if (adev->pm.dpm_enabled)
> +			amdgpu_dpm_enable_uvd(adev, false);
> +	}
> 
>  	r = vcn_v1_0_hw_fini(adev);
>  	if (r)
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 12:25 ` Lazar, Lijo
@ 2021-12-10 16:19   ` Quan, Evan
  2021-12-10 21:17     ` James Zhu
  0 siblings, 1 reply; 14+ messages in thread
From: Quan, Evan @ 2021-12-10 16:19 UTC (permalink / raw)
  To: Lazar, Lijo, Gong, Curry, amd-gfx
  Cc: Deucher, Alexander, Zhu, James, Liu, Leo

[AMD Official Use Only]



> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar@amd.com>
> Sent: Friday, December 10, 2021 8:25 PM
> To: Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Zhu, James
> <James.Zhu@amd.com>; Liu, Leo <Leo.Liu@amd.com>; Quan, Evan
> <Evan.Quan@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> 
> 
> On 12/10/2021 5:11 PM, chen gong wrote:
> > Play a video on the raven (or PCO, raven2) platform, and then do the
> > S3 test. When resume, the following error will be reported:
> >
> > amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > ring vcn_dec test failed (-110)
> > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
> IP
> > block <vcn_v1_0> failed -110 amdgpu 0000:02:00.0: amdgpu:
> > amdgpu_device_ip_resume failed (-110).
> > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> >
> > [why]
> > When playing the video: The power state flag of the vcn block is set
> > to POWER_STATE_ON.
> >
> > When doing suspend: There is no change to the power state flag of the
> > vcn block, it is still POWER_STATE_ON.
> >
> > When doing resume: Need to open the power gate of the vcn block and
> > set the power state flag of the VCN block to POWER_STATE_ON.
> > But at this time, the power state flag of the vcn block is already
> > POWER_STATE_ON. The power status flag check in the "8f2cdef
> drm/amd/pm:
> > avoid duplicate powergate/ungate setting" patch will return the
> > amdgpu_dpm_set_powergating_by_smu function directly.
> > As a result, the gate of the power was not opened, causing the
> > subsequent ring test to fail.
> >
> > [how]
> > In the suspend function of the vcn block, explicitly change the power
> > state flag of the vcn block to POWER_STATE_OFF.
> >
> > Signed-off-by: chen gong <curry.gong@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > index d54d720..d73676b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> >   {
> >   	int r;
> >   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +	bool cancel_success;
> > +
> > +	cancel_success = cancel_delayed_work_sync(&adev-
> >vcn.idle_work);
> > +	if (cancel_success) {
> > +		if (adev->pm.dpm_enabled)
> > +			amdgpu_dpm_enable_uvd(adev, false);
> > +	}
> >
> 
> Probably this is a common issue. Can you try moving this to
> amdgpu_vcn_suspend?
[Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the vcn code is not consistent.
For vcn v2 and later, they already do the manual gate operation in their suspend routine(like vcn_v2_0_stop).
So, it is actually only vcn_v1_0.c suffers this issue.
> 
> if (adev->pm.dpm_enabled)
>     amdgpu_device_ip_set_powergating_state(adev,
> AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);
> 
> Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle
> work already put it in PG state. Evan, what do you think?
[Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info or dev_warn.
Hopefully that can help to capture some potential issues like this. So, better to keep the check for the return value of cancel_delayed_work_sync here.

BR
Evan
> 
> Thanks,
> Lijo
> 
> >   	r = vcn_v1_0_hw_fini(adev);
> >   	if (r)
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 16:06 ` Quan, Evan
@ 2021-12-10 19:50   ` Alex Deucher
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Deucher @ 2021-12-10 19:50 UTC (permalink / raw)
  To: Quan, Evan; +Cc: Deucher, Alexander, Zhu, James, Liu, Leo, amd-gfx, Gong, Curry

On Fri, Dec 10, 2021 at 11:06 AM Quan, Evan <Evan.Quan@amd.com> wrote:
>
> [AMD Official Use Only]
>
> Hi Curry,
>
> Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan <evan.quan@amd.com>
>
> @Deucher, Alexander this should be able address the issue reported by https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm this?

Yes, the patch works according to the reporter.

Alex

>
> BR
> Evan
> > -----Original Message-----
> > From: chen gong <curry.gong@amd.com>
> > Sent: Friday, December 10, 2021 7:42 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; Quan,
> > Evan <Evan.Quan@amd.com>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Gong, Curry <Curry.Gong@amd.com>
> > Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> > Play a video on the raven (or PCO, raven2) platform, and then do the S3
> > test. When resume, the following error will be reported:
> >
> > amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > ring
> > vcn_dec test failed (-110)
> > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> > block
> > <vcn_v1_0> failed -110
> > amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> >
> > [why]
> > When playing the video: The power state flag of the vcn block is set to
> > POWER_STATE_ON.
> >
> > When doing suspend: There is no change to the power state flag of the
> > vcn block, it is still POWER_STATE_ON.
> >
> > When doing resume: Need to open the power gate of the vcn block and set
> > the power state flag of the VCN block to POWER_STATE_ON.
> > But at this time, the power state flag of the vcn block is already
> > POWER_STATE_ON. The power status flag check in the "8f2cdef
> > drm/amd/pm:
> > avoid duplicate powergate/ungate setting" patch will return the
> > amdgpu_dpm_set_powergating_by_smu function directly.
> > As a result, the gate of the power was not opened, causing the
> > subsequent ring test to fail.
> Better to update this as some descriptions  below:
> adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we rely on it to do the VCN gate(power down).
> However, if the VCN ring is still using on suspend kicked, there will be no chance for adev->vcn.idle_work to do the VCN gate.
> That will make driver wrongly treat VCN as ungated(power on) and prevent further VCN ungate(power on) operation(which is actually needed) on resume.
> >
> > [how]
> > In the suspend function of the vcn block, explicitly change the power
> > state flag of the vcn block to POWER_STATE_OFF.
> Maybe some descriptions as below is better:
> Manually do the VCN gate(power down) in the suspend routine if adev->vcn.idle_work does not(have no chance) do that.
> >
> > Signed-off-by: chen gong <curry.gong@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > index d54d720..d73676b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> >  {
> >       int r;
> >       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +     bool cancel_success;
> This seems not a good naming since the cancel always succeed. The difference is whether the idle_work get actually executed.
>  Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"?
> > +
> > +     cancel_success = cancel_delayed_work_sync(&adev-
> > >vcn.idle_work);
> > +     if (cancel_success) {
> > +             if (adev->pm.dpm_enabled)
> > +                     amdgpu_dpm_enable_uvd(adev, false);
> > +     }
> >
> >       r = vcn_v1_0_hw_fini(adev);
> >       if (r)
> > --
> > 2.7.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 11:41 [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled chen gong
  2021-12-10 12:25 ` Lazar, Lijo
  2021-12-10 16:06 ` Quan, Evan
@ 2021-12-10 21:07 ` James Zhu
  2021-12-13  8:55   ` Gong, Curry
  2 siblings, 1 reply; 14+ messages in thread
From: James Zhu @ 2021-12-10 21:07 UTC (permalink / raw)
  To: chen gong, amd-gfx; +Cc: Alexander.Deucher, James.Zhu, leo.liu, evan.quan

[-- Attachment #1: Type: text/plain, Size: 2263 bytes --]

On 2021-12-10 6:41 a.m., chen gong wrote:
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
> test. When resume, the following error will be reported:
>
> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
> vcn_dec test failed (-110)
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
> <vcn_v1_0> failed -110
> amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>
> [why]
> When playing the video: The power state flag of the vcn block is set to
> POWER_STATE_ON.
>
> When doing suspend: There is no change to the power state flag of the
> vcn block, it is still POWER_STATE_ON.
>
> When doing resume: Need to open the power gate of the vcn block and set
> the power state flag of the VCN block to POWER_STATE_ON.
> But at this time, the power state flag of the vcn block is already
> POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
> avoid duplicate powergate/ungate setting" patch will return the
> amdgpu_dpm_set_powergating_by_smu function directly.
> As a result, the gate of the power was not opened, causing the
> subsequent ring test to fail.
>
> [how]
> In the suspend function of the vcn block, explicitly change the power
> state flag of the vcn block to POWER_STATE_OFF.
>
> Signed-off-by: chen gong<curry.gong@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>   1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index d54d720..d73676b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>   {
>   	int r;
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	bool cancel_success;
> +
> +	cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add 
amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.

> +	if (cancel_success) {
> +		if (adev->pm.dpm_enabled)
> +			amdgpu_dpm_enable_uvd(adev, false);
> +	}
>   
>   	r = vcn_v1_0_hw_fini(adev);
>   	if (r)

[-- Attachment #2: Type: text/html, Size: 2905 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 16:19   ` Quan, Evan
@ 2021-12-10 21:17     ` James Zhu
  0 siblings, 0 replies; 14+ messages in thread
From: James Zhu @ 2021-12-10 21:17 UTC (permalink / raw)
  To: Quan, Evan, Lazar, Lijo, Gong, Curry, amd-gfx
  Cc: Deucher, Alexander, Zhu, James, Liu, Leo

[-- Attachment #1: Type: text/plain, Size: 3851 bytes --]


On 2021-12-10 11:19 a.m., Quan, Evan wrote:
> [AMD Official Use Only]
>
>
>
>> -----Original Message-----
>> From: Lazar, Lijo<Lijo.Lazar@amd.com>
>> Sent: Friday, December 10, 2021 8:25 PM
>> To: Gong, Curry<Curry.Gong@amd.com>;amd-gfx@lists.freedesktop.org
>> Cc: Deucher, Alexander<Alexander.Deucher@amd.com>; Zhu, James
>> <James.Zhu@amd.com>; Liu, Leo<Leo.Liu@amd.com>; Quan, Evan
>> <Evan.Quan@amd.com>
>> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
>> powergating is explicitly enabled
>>
>>
>>
>> On 12/10/2021 5:11 PM, chen gong wrote:
>>> Play a video on the raven (or PCO, raven2) platform, and then do the
>>> S3 test. When resume, the following error will be reported:
>>>
>>> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
>>> ring vcn_dec test failed (-110)
>>> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
>> IP
>>> block <vcn_v1_0> failed -110 amdgpu 0000:02:00.0: amdgpu:
>>> amdgpu_device_ip_resume failed (-110).
>>> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>>>
>>> [why]
>>> When playing the video: The power state flag of the vcn block is set
>>> to POWER_STATE_ON.
>>>
>>> When doing suspend: There is no change to the power state flag of the
>>> vcn block, it is still POWER_STATE_ON.
>>>
>>> When doing resume: Need to open the power gate of the vcn block and
>>> set the power state flag of the VCN block to POWER_STATE_ON.
>>> But at this time, the power state flag of the vcn block is already
>>> POWER_STATE_ON. The power status flag check in the "8f2cdef
>> drm/amd/pm:
>>> avoid duplicate powergate/ungate setting" patch will return the
>>> amdgpu_dpm_set_powergating_by_smu function directly.
>>> As a result, the gate of the power was not opened, causing the
>>> subsequent ring test to fail.
>>>
>>> [how]
>>> In the suspend function of the vcn block, explicitly change the power
>>> state flag of the vcn block to POWER_STATE_OFF.
>>>
>>> Signed-off-by: chen gong<curry.gong@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> index d54d720..d73676b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>>>    {
>>>    	int r;
>>>    	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>> +	bool cancel_success;
>>> +
>>> +	cancel_success = cancel_delayed_work_sync(&adev-
>>> vcn.idle_work);
>>> +	if (cancel_success) {
>>> +		if (adev->pm.dpm_enabled)
>>> +			amdgpu_dpm_enable_uvd(adev, false);
>>> +	}
>>>
>> Probably this is a common issue. Can you try moving this to
>> amdgpu_vcn_suspend?
> [Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the vcn code is not consistent.
> For vcn v2 and later, they already do the manual gate operation in their suspend routine(like vcn_v2_0_stop).
> So, it is actually only vcn_v1_0.c suffers this issue.
[JZ] Then what is concern for vcn1 not doing the same thing as 
vcn_v2_0_stop?
>> if (adev->pm.dpm_enabled)
>>      amdgpu_device_ip_set_powergating_state(adev,
>> AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);
>>
>> Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle
>> work already put it in PG state. Evan, what do you think?
> [Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info or dev_warn.
> Hopefully that can help to capture some potential issues like this. So, better to keep the check for the return value of cancel_delayed_work_sync here.
>
> BR
> Evan
>> Thanks,
>> Lijo
>>
>>>    	r = vcn_v1_0_hw_fini(adev);
>>>    	if (r)
>>>

[-- Attachment #2: Type: text/html, Size: 5769 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-10 21:07 ` James Zhu
@ 2021-12-13  8:55   ` Gong, Curry
  2021-12-13 10:09     ` Quan, Evan
  2021-12-13 13:39     ` James Zhu
  0 siblings, 2 replies; 14+ messages in thread
From: Gong, Curry @ 2021-12-13  8:55 UTC (permalink / raw)
  To: Zhu, James, amd-gfx; +Cc: Deucher, Alexander, Quan, Evan, Liu, Leo

[-- Attachment #1: Type: text/plain, Size: 6416 bytes --]

[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
        else
                r = vcn_v1_0_stop_spg_mode(adev);

+       if (adev->pm.dpm_enabled)
+               amdgpu_dpm_enable_uvd(adev, false);
+
        return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fork+0x22/0x30


BR
Curry Gong
From: Zhu, James <James.Zhu@amd.com>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3

test. When resume, the following error will be reported:



amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring

vcn_dec test failed (-110)

[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block

<vcn_v1_0> failed -110

amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).

PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110



[why]

When playing the video: The power state flag of the vcn block is set to

POWER_STATE_ON.



When doing suspend: There is no change to the power state flag of the

vcn block, it is still POWER_STATE_ON.



When doing resume: Need to open the power gate of the vcn block and set

the power state flag of the VCN block to POWER_STATE_ON.

But at this time, the power state flag of the vcn block is already

POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:

avoid duplicate powergate/ungate setting" patch will return the

amdgpu_dpm_set_powergating_by_smu function directly.

As a result, the gate of the power was not opened, causing the

subsequent ring test to fail.



[how]

In the suspend function of the vcn block, explicitly change the power

state flag of the vcn block to POWER_STATE_OFF.



Signed-off-by: chen gong <curry.gong@amd.com><mailto:curry.gong@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++

 1 file changed, 7 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

index d54d720..d73676b 100644

--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)

 {

  int r;

  struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+ bool cancel_success;

+

+ cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.



+ if (cancel_success) {

+        if (adev->pm.dpm_enabled)

+                amdgpu_dpm_enable_uvd(adev, false);

+ }



  r = vcn_v1_0_hw_fini(adev);

  if (r)

[-- Attachment #2: Type: text/html, Size: 20235 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-13  8:55   ` Gong, Curry
@ 2021-12-13 10:09     ` Quan, Evan
  2021-12-13 13:39     ` James Zhu
  1 sibling, 0 replies; 14+ messages in thread
From: Quan, Evan @ 2021-12-13 10:09 UTC (permalink / raw)
  To: Gong, Curry, Zhu, James, amd-gfx; +Cc: Deucher, Alexander, Liu, Leo

[-- Attachment #1: Type: text/plain, Size: 7417 bytes --]

[AMD Official Use Only]

Checked the log paste below with Curry. The way to add this fix in vcn_v1_0_stop is not workable.
As it will induce a circle calling(below) and lead to dead lock.
VCN ring begin use -> amdgpu_dpm_enable_uvd -> acquire the smu_lock -> smu10_powergate_vcn -> amdgpu_device_ip_set_powergating_state -> vcn_v1_0_stop -> amdgpu_dpm_enable_uvd -> try to acquire the smu_lock again -> dead lock

BR
Evan
From: Gong, Curry <Curry.Gong@amd.com>
Sent: Monday, December 13, 2021 4:56 PM
To: Zhu, James <James.Zhu@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Liu, Leo <Leo.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled


[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
        else
                r = vcn_v1_0_stop_spg_mode(adev);

+       if (adev->pm.dpm_enabled)
+               amdgpu_dpm_enable_uvd(adev, false);
+
        return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fork+0x22/0x30


BR
Curry Gong
From: Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry <Curry.Gong@amd.com<mailto:Curry.Gong@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Liu, Leo <Leo.Liu@amd.com<mailto:Leo.Liu@amd.com>>; Zhu, James <James.Zhu@amd.com<mailto:James.Zhu@amd.com>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>; Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3

test. When resume, the following error will be reported:



amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring

vcn_dec test failed (-110)

[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block

<vcn_v1_0> failed -110

amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).

PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110



[why]

When playing the video: The power state flag of the vcn block is set to

POWER_STATE_ON.



When doing suspend: There is no change to the power state flag of the

vcn block, it is still POWER_STATE_ON.



When doing resume: Need to open the power gate of the vcn block and set

the power state flag of the VCN block to POWER_STATE_ON.

But at this time, the power state flag of the vcn block is already

POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:

avoid duplicate powergate/ungate setting" patch will return the

amdgpu_dpm_set_powergating_by_smu function directly.

As a result, the gate of the power was not opened, causing the

subsequent ring test to fail.



[how]

In the suspend function of the vcn block, explicitly change the power

state flag of the vcn block to POWER_STATE_OFF.



Signed-off-by: chen gong <curry.gong@amd.com><mailto:curry.gong@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++

 1 file changed, 7 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

index d54d720..d73676b 100644

--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)

 {

  int r;

  struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+ bool cancel_success;

+

+ cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.



+ if (cancel_success) {

+        if (adev->pm.dpm_enabled)

+                amdgpu_dpm_enable_uvd(adev, false);

+ }



  r = vcn_v1_0_hw_fini(adev);

  if (r)

[-- Attachment #2: Type: text/html, Size: 22340 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-13  8:55   ` Gong, Curry
  2021-12-13 10:09     ` Quan, Evan
@ 2021-12-13 13:39     ` James Zhu
  2021-12-14  5:59       ` Quan, Evan
  1 sibling, 1 reply; 14+ messages in thread
From: James Zhu @ 2021-12-13 13:39 UTC (permalink / raw)
  To: Gong, Curry, Zhu, James, amd-gfx; +Cc: Deucher, Alexander, Quan, Evan, Liu, Leo

[-- Attachment #1: Type: text/plain, Size: 7596 bytes --]

Hi Curry, Evan,

It seems vcn1.0 power gate sequence are special.

if the original solution is thread safe, then the following solution 
will be thread safe also.

static int vcn_v1_0_hw_fini(void *handle)
{
     struct amdgpu_device *adev = (struct amdgpu_device *)handle;

     cancel_delayed_work_sync(&adev->vcn.idle_work);

     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
+        if (adev->pm.dpm_enabled)
+            amdgpu_dpm_enable_uvd(adev, false);
+        else
+            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
     }

Best Regards!

James

On 2021-12-13 3:55 a.m., Gong, Curry wrote:
>
> [AMD Official Use Only]
>
>
> Hi James:
>
> With the following patch, an error will be reported when the driver is 
> loaded
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
>
> else
>
> r = vcn_v1_0_stop_spg_mode(adev);
>
> c
>
> return r;
>
> }
>
> $ dmesg
>
> [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 
> seconds.
>
> [ 363.181150]       Tainted: G           OE 5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
>
> [ 363.181266] task:kworker/3:2     state:D stack:    0 pid: 223 
> ppid:     2 flags:0x00004000
>
> [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
>
> [ 363.181612] Call Trace:
>
> [ 363.181618]  __schedule+0x44c/0x8a0
>
> [ 363.181627]  schedule+0x4f/0xc0
>
> [ 363.181631]  schedule_preempt_disabled+0xe/0x10
>
> [ 363.181636]  __mutex_lock.isra.0+0x183/0x4d0
>
> [ 363.181643]  __mutex_lock_slowpath+0x13/0x20
>
> [ 363.181648]  mutex_lock+0x32/0x40
>
> [ 363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
>
> [ 363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [ 363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
>
> [ 363.182776] amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
>
> [ 363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
>
> [ 363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
>
> [ 363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
>
> [ 363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [ 363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
>
> [ 363.184667]  process_one_work+0x220/0x3c0
>
> [ 363.184674]  worker_thread+0x4d/0x3f0
>
> [ 363.184677]  ? process_one_work+0x3c0/0x3c0
>
> [ 363.184680]  kthread+0x12b/0x150
>
> [ 363.184685]  ? set_kthread_struct+0x40/0x40
>
> [ 363.184690]  ret_from_fork+0x22/0x30
>
> [ 363.184699] INFO: task kworker/2:2:233 blocked for more than 120 
> seconds.
>
> [ 363.184739]       Tainted: G           OE 5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [ 363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
>
> [ 363.184825] task:kworker/2:2     state:D stack:    0 pid: 233 
> ppid:     2 flags:0x00004000
>
> [ 363.184831] Workqueue: events 
> amdgpu_device_delayed_init_work_handler [amdgpu]
>
> [ 363.185085] Call Trace:
>
> [ 363.185087]  __schedule+0x44c/0x8a0
>
> [ 363.185092]  schedule+0x4f/0xc0
>
> [ 363.185095]  schedule_timeout+0x202/0x290
>
> [ 363.185099]  ? sched_clock_cpu+0x11/0xb0
>
> [ 363.185105]  wait_for_completion+0x94/0x100
>
> [ 363.185110]  __flush_work+0x12a/0x1e0
>
> [ 363.185113]  ? worker_detach_from_pool+0xc0/0xc0
>
> [ 363.185119]  __cancel_work_timer+0x10e/0x190
>
> [ 363.185123]  cancel_delayed_work_sync+0x13/0x20
>
> [ 363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
>
> [ 363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
>
> [ 363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
>
> [ 363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
>
> [ 363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
>
> [ 363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
>
> [ 363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
>
> [ 363.186978] amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
>
> [ 363.187206]  process_one_work+0x220/0x3c0
>
> [ 363.187210]  worker_thread+0x4d/0x3f0
>
> [ 363.187214]  ? process_one_work+0x3c0/0x3c0
>
> [ 363.187217]  kthread+0x12b/0x150
>
> [ 363.187221]  ? set_kthread_struct+0x40/0x40
>
> [ 363.187226]  ret_from_fork+0x22/0x30
>
> BR
>
> Curry Gong
>
> *From:* Zhu, James <James.Zhu@amd.com>
> *Sent:* Saturday, December 11, 2021 5:07 AM
> *To:* Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
> *Cc:* Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; 
> Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander 
> <Alexander.Deucher@amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is 
> suspended, powergating is explicitly enabled
>
> On 2021-12-10 6:41 a.m., chen gong wrote:
>
>     Play a video on the raven (or PCO, raven2) platform, and then do the S3
>
>     test. When resume, the following error will be reported:
>
>     amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
>
>     vcn_dec test failed (-110)
>
>     [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
>
>     <vcn_v1_0> failed -110
>
>     amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
>
>     PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>
>     [why]
>
>     When playing the video: The power state flag of the vcn block is set to
>
>     POWER_STATE_ON.
>
>     When doing suspend: There is no change to the power state flag of the
>
>     vcn block, it is still POWER_STATE_ON.
>
>     When doing resume: Need to open the power gate of the vcn block and set
>
>     the power state flag of the VCN block to POWER_STATE_ON.
>
>     But at this time, the power state flag of the vcn block is already
>
>     POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
>
>     avoid duplicate powergate/ungate setting" patch will return the
>
>     amdgpu_dpm_set_powergating_by_smu function directly.
>
>     As a result, the gate of the power was not opened, causing the
>
>     subsequent ring test to fail.
>
>     [how]
>
>     In the suspend function of the vcn block, explicitly change the power
>
>     state flag of the vcn block to POWER_STATE_OFF.
>
>     Signed-off-by: chen gong<curry.gong@amd.com>  <mailto:curry.gong@amd.com>
>
>     ---
>
>       drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>
>       1 file changed, 7 insertions(+)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
>     index d54d720..d73676b 100644
>
>     --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
>     +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
>     @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>
>       {
>
>        int r;
>
>        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>     + bool cancel_success;
>
>     +
>
>     + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
>
> [JZ] Can you refer to vcn_v3_0_stop , and add 
> amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?
>
> See if it also can help.
>
>     + if (cancel_success) {
>
>     +        if (adev->pm.dpm_enabled)
>
>     +                amdgpu_dpm_enable_uvd(adev, false);
>
>     + }
>
>       
>
>        r = vcn_v1_0_hw_fini(adev);
>
>        if (r)
>

[-- Attachment #2: Type: text/html, Size: 24129 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-13 13:39     ` James Zhu
@ 2021-12-14  5:59       ` Quan, Evan
  2021-12-16 15:38         ` Alex Deucher
  0 siblings, 1 reply; 14+ messages in thread
From: Quan, Evan @ 2021-12-14  5:59 UTC (permalink / raw)
  To: Zhu, James, Gong, Curry, amd-gfx; +Cc: Deucher, Alexander, Liu, Leo

[-- Attachment #1: Type: text/plain, Size: 8139 bytes --]

[AMD Official Use Only]



From: Zhu, James <James.Zhu@amd.com>
Sent: Monday, December 13, 2021 9:39 PM
To: Gong, Curry <Curry.Gong@amd.com>; Zhu, James <James.Zhu@amd.com>; amd-gfx@lists.freedesktop.org
Cc: Liu, Leo <Leo.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled


Hi Curry, Evan,

It seems vcn1.0 power gate sequence are special.

if the original solution is thread safe, then the following solution will be thread safe also.
static int vcn_v1_0_hw_fini(void *handle)
{
    struct amdgpu_device *adev = (struct amdgpu_device *)handle;

    cancel_delayed_work_sync(&adev->vcn.idle_work);

    if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
+        if (adev->pm.dpm_enabled)
+            amdgpu_dpm_enable_uvd(adev, false);
+        else
+            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
[Quan, Evan] Considering adev->pm.dpm_enabled is always true, "vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); " will become dead code.
Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the suspend/resume discussed here). So, it seems quite risky compared with Curry's original patch.
Before we can come up with a better idea, I would suggest to land Curry's original patch as a quick fix.

BR
Evan

    }

Best Regards!

James
On 2021-12-13 3:55 a.m., Gong, Curry wrote:

[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
        else
                r = vcn_v1_0_stop_spg_mode(adev);

c
        return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fork+0x22/0x30


BR
Curry Gong
From: Zhu, James <James.Zhu@amd.com><mailto:James.Zhu@amd.com>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry <Curry.Gong@amd.com><mailto:Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Liu, Leo <Leo.Liu@amd.com><mailto:Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com><mailto:James.Zhu@amd.com>; Quan, Evan <Evan.Quan@amd.com><mailto:Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com><mailto:Alexander.Deucher@amd.com>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3

test. When resume, the following error will be reported:



amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring

vcn_dec test failed (-110)

[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block

<vcn_v1_0> failed -110

amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).

PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110



[why]

When playing the video: The power state flag of the vcn block is set to

POWER_STATE_ON.



When doing suspend: There is no change to the power state flag of the

vcn block, it is still POWER_STATE_ON.



When doing resume: Need to open the power gate of the vcn block and set

the power state flag of the VCN block to POWER_STATE_ON.

But at this time, the power state flag of the vcn block is already

POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:

avoid duplicate powergate/ungate setting" patch will return the

amdgpu_dpm_set_powergating_by_smu function directly.

As a result, the gate of the power was not opened, causing the

subsequent ring test to fail.



[how]

In the suspend function of the vcn block, explicitly change the power

state flag of the vcn block to POWER_STATE_OFF.



Signed-off-by: chen gong <curry.gong@amd.com><mailto:curry.gong@amd.com>

---

 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++

 1 file changed, 7 insertions(+)



diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

index d54d720..d73676b 100644

--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c

@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)

 {

  int r;

  struct amdgpu_device *adev = (struct amdgpu_device *)handle;

+ bool cancel_success;

+

+ cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);

[JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?

See if it also can help.



+ if (cancel_success) {

+        if (adev->pm.dpm_enabled)

+                amdgpu_dpm_enable_uvd(adev, false);

+ }



  r = vcn_v1_0_hw_fini(adev);

  if (r)

[-- Attachment #2: Type: text/html, Size: 23366 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-14  5:59       ` Quan, Evan
@ 2021-12-16 15:38         ` Alex Deucher
  2021-12-17  1:43           ` Quan, Evan
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Deucher @ 2021-12-16 15:38 UTC (permalink / raw)
  To: Quan, Evan; +Cc: Deucher, Alexander, Zhu, James, Liu, Leo, amd-gfx, Gong, Curry

FWIW, it looks like all versions of VCN need the same fix.  There have
been reports of suspend failing when VCN is in use on other newer APUs
as well.

Alex

On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan <Evan.Quan@amd.com> wrote:
>
> [AMD Official Use Only]
>
>
>
>
>
>
>
> From: Zhu, James <James.Zhu@amd.com>
> Sent: Monday, December 13, 2021 9:39 PM
> To: Gong, Curry <Curry.Gong@amd.com>; Zhu, James <James.Zhu@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Liu, Leo <Leo.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
>
>
>
> Hi Curry, Evan,
>
> It seems vcn1.0 power gate sequence are special.
>
> if the original solution is thread safe, then the following solution will be thread safe also.
>
> static int vcn_v1_0_hw_fini(void *handle)
> {
>     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
>     cancel_delayed_work_sync(&adev->vcn.idle_work);
>
>     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> +        if (adev->pm.dpm_enabled)
> +            amdgpu_dpm_enable_uvd(adev, false);
> +        else
> +            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>
> [Quan, Evan] Considering adev->pm.dpm_enabled is always true, “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will become dead code.
>
> Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the suspend/resume discussed here). So, it seems quite risky compared with Curry’s original patch.
>
> Before we can come up with a better idea, I would suggest to land Curry’s original patch as a quick fix.
>
>
>
> BR
>
> Evan
>
>
>     }
>
> Best Regards!
>
> James
>
> On 2021-12-13 3:55 a.m., Gong, Curry wrote:
>
> [AMD Official Use Only]
>
>
>
> Hi James:
>
>
>
> With the following patch, an error will be reported when the driver is loaded
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
>
>         else
>
>                 r = vcn_v1_0_stop_spg_mode(adev);
>
>
>
> c
>
>         return r;
>
> }
>
>
>
>
>
> $ dmesg
>
> [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
>
> [  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
>
> [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> [  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2 flags:0x00004000
>
> [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
>
> [  363.181612] Call Trace:
>
> [  363.181618]  __schedule+0x44c/0x8a0
>
> [  363.181627]  schedule+0x4f/0xc0
>
> [  363.181631]  schedule_preempt_disabled+0xe/0x10
>
> [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
>
> [  363.181643]  __mutex_lock_slowpath+0x13/0x20
>
> [  363.181648]  mutex_lock+0x32/0x40
>
> [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
>
> [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
>
> [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
>
> [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
>
> [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
>
> [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
>
> [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
>
> [  363.184667]  process_one_work+0x220/0x3c0
>
> [  363.184674]  worker_thread+0x4d/0x3f0
>
> [  363.184677]  ? process_one_work+0x3c0/0x3c0
>
> [  363.184680]  kthread+0x12b/0x150
>
> [  363.184685]  ? set_kthread_struct+0x40/0x40
>
> [  363.184690]  ret_from_fork+0x22/0x30
>
> [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
>
> [  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-Ubuntu
>
> [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>
> [  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2 flags:0x00004000
>
> [  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu]
>
> [  363.185085] Call Trace:
>
> [  363.185087]  __schedule+0x44c/0x8a0
>
> [  363.185092]  schedule+0x4f/0xc0
>
> [  363.185095]  schedule_timeout+0x202/0x290
>
> [  363.185099]  ? sched_clock_cpu+0x11/0xb0
>
> [  363.185105]  wait_for_completion+0x94/0x100
>
> [  363.185110]  __flush_work+0x12a/0x1e0
>
> [  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
>
> [  363.185119]  __cancel_work_timer+0x10e/0x190
>
> [  363.185123]  cancel_delayed_work_sync+0x13/0x20
>
> [  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
>
> [  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
>
> [  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
>
> [  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
>
> [  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
>
> [  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
>
> [  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
>
> [  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
>
> [  363.187206]  process_one_work+0x220/0x3c0
>
> [  363.187210]  worker_thread+0x4d/0x3f0
>
> [  363.187214]  ? process_one_work+0x3c0/0x3c0
>
> [  363.187217]  kthread+0x12b/0x150
>
> [  363.187221]  ? set_kthread_struct+0x40/0x40
>
> [  363.187226]  ret_from_fork+0x22/0x30
>
>
>
>
>
> BR
>
> Curry Gong
>
> From: Zhu, James <James.Zhu@amd.com>
> Sent: Saturday, December 11, 2021 5:07 AM
> To: Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
>
>
>
> On 2021-12-10 6:41 a.m., chen gong wrote:
>
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
>
> test. When resume, the following error will be reported:
>
>
>
> amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
>
> vcn_dec test failed (-110)
>
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
>
> <vcn_v1_0> failed -110
>
> amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
>
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>
>
>
> [why]
>
> When playing the video: The power state flag of the vcn block is set to
>
> POWER_STATE_ON.
>
>
>
> When doing suspend: There is no change to the power state flag of the
>
> vcn block, it is still POWER_STATE_ON.
>
>
>
> When doing resume: Need to open the power gate of the vcn block and set
>
> the power state flag of the VCN block to POWER_STATE_ON.
>
> But at this time, the power state flag of the vcn block is already
>
> POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
>
> avoid duplicate powergate/ungate setting" patch will return the
>
> amdgpu_dpm_set_powergating_by_smu function directly.
>
> As a result, the gate of the power was not opened, causing the
>
> subsequent ring test to fail.
>
>
>
> [how]
>
> In the suspend function of the vcn block, explicitly change the power
>
> state flag of the vcn block to POWER_STATE_OFF.
>
>
>
> Signed-off-by: chen gong <curry.gong@amd.com>
>
> ---
>
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
>
>  1 file changed, 7 insertions(+)
>
>
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> index d54d720..d73676b 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>
>  {
>
>   int r;
>
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> + bool cancel_success;
>
> +
>
> + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
>
> [JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?
>
> See if it also can help.
>
>
>
> + if (cancel_success) {
>
> +        if (adev->pm.dpm_enabled)
>
> +                amdgpu_dpm_enable_uvd(adev, false);
>
> + }
>
>
>
>   r = vcn_v1_0_hw_fini(adev);
>
>   if (r)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-16 15:38         ` Alex Deucher
@ 2021-12-17  1:43           ` Quan, Evan
  2021-12-17  2:02             ` Alex Deucher
  0 siblings, 1 reply; 14+ messages in thread
From: Quan, Evan @ 2021-12-17  1:43 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Deucher, Alexander, Zhu, James, Liu, Leo, amd-gfx, Gong, Curry

[AMD Official Use Only]

Hi Alex,

Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) in their ->suspend routine which should prevent them from the issue here.
	if (adev->pm.dpm_enabled)
		amdgpu_dpm_enable_uvd(adev, false);
So, maybe it's a different story with those newer APUs. Can you forward the bug reports to me?

Thanks,
Evan
> -----Original Message-----
> From: Alex Deucher <alexdeucher@gmail.com>
> Sent: Thursday, December 16, 2021 11:38 PM
> To: Quan, Evan <Evan.Quan@amd.com>
> Cc: Zhu, James <James.Zhu@amd.com>; Gong, Curry
> <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org; Deucher,
> Alexander <Alexander.Deucher@amd.com>; Liu, Leo <Leo.Liu@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> FWIW, it looks like all versions of VCN need the same fix.  There have been
> reports of suspend failing when VCN is in use on other newer APUs as well.
> 
> Alex
> 
> On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan <Evan.Quan@amd.com> wrote:
> >
> > [AMD Official Use Only]
> >
> >
> >
> >
> >
> >
> >
> > From: Zhu, James <James.Zhu@amd.com>
> > Sent: Monday, December 13, 2021 9:39 PM
> > To: Gong, Curry <Curry.Gong@amd.com>; Zhu, James
> <James.Zhu@amd.com>;
> > amd-gfx@lists.freedesktop.org
> > Cc: Liu, Leo <Leo.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>;
> > Deucher, Alexander <Alexander.Deucher@amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> >
> >
> > Hi Curry, Evan,
> >
> > It seems vcn1.0 power gate sequence are special.
> >
> > if the original solution is thread safe, then the following solution will be
> thread safe also.
> >
> > static int vcn_v1_0_hw_fini(void *handle) {
> >     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> >     cancel_delayed_work_sync(&adev->vcn.idle_work);
> >
> >     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> >         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> >          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> > +        if (adev->pm.dpm_enabled)
> > +            amdgpu_dpm_enable_uvd(adev, false);
> > +        else
> > +            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> >
> > [Quan, Evan] Considering adev->pm.dpm_enabled is always true,
> “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will
> become dead code.
> >
> > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the
> suspend/resume discussed here). So, it seems quite risky compared with
> Curry’s original patch.
> >
> > Before we can come up with a better idea, I would suggest to land Curry’s
> original patch as a quick fix.
> >
> >
> >
> > BR
> >
> > Evan
> >
> >
> >     }
> >
> > Best Regards!
> >
> > James
> >
> > On 2021-12-13 3:55 a.m., Gong, Curry wrote:
> >
> > [AMD Official Use Only]
> >
> >
> >
> > Hi James:
> >
> >
> >
> > With the following patch, an error will be reported when the driver is
> > loaded
> >
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >
> > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device
> > *adev)
> >
> >         else
> >
> >                 r = vcn_v1_0_stop_spg_mode(adev);
> >
> >
> >
> > c
> >
> >         return r;
> >
> > }
> >
> >
> >
> >
> >
> > $ dmesg
> >
> > [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120
> seconds.
> >
> > [  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> Ubuntu
> >
> > [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> >
> > [  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2
> flags:0x00004000
> >
> > [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
> >
> > [  363.181612] Call Trace:
> >
> > [  363.181618]  __schedule+0x44c/0x8a0
> >
> > [  363.181627]  schedule+0x4f/0xc0
> >
> > [  363.181631]  schedule_preempt_disabled+0xe/0x10
> >
> > [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
> >
> > [  363.181643]  __mutex_lock_slowpath+0x13/0x20
> >
> > [  363.181648]  mutex_lock+0x32/0x40
> >
> > [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180
> [amdgpu]
> >
> > [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> >
> > [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
> >
> > [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0
> > [amdgpu]
> >
> > [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
> >
> > [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
> >
> > [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180
> [amdgpu]
> >
> > [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> >
> > [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
> >
> > [  363.184667]  process_one_work+0x220/0x3c0
> >
> > [  363.184674]  worker_thread+0x4d/0x3f0
> >
> > [  363.184677]  ? process_one_work+0x3c0/0x3c0
> >
> > [  363.184680]  kthread+0x12b/0x150
> >
> > [  363.184685]  ? set_kthread_struct+0x40/0x40
> >
> > [  363.184690]  ret_from_fork+0x22/0x30
> >
> > [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120
> seconds.
> >
> > [  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> Ubuntu
> >
> > [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> >
> > [  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2
> flags:0x00004000
> >
> > [  363.184831] Workqueue: events
> > amdgpu_device_delayed_init_work_handler [amdgpu]
> >
> > [  363.185085] Call Trace:
> >
> > [  363.185087]  __schedule+0x44c/0x8a0
> >
> > [  363.185092]  schedule+0x4f/0xc0
> >
> > [  363.185095]  schedule_timeout+0x202/0x290
> >
> > [  363.185099]  ? sched_clock_cpu+0x11/0xb0
> >
> > [  363.185105]  wait_for_completion+0x94/0x100
> >
> > [  363.185110]  __flush_work+0x12a/0x1e0
> >
> > [  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
> >
> > [  363.185119]  __cancel_work_timer+0x10e/0x190
> >
> > [  363.185123]  cancel_delayed_work_sync+0x13/0x20
> >
> > [  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
> >
> > [  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
> >
> > [  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
> >
> > [  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
> >
> > [  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
> >
> > [  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
> >
> > [  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
> >
> > [  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30
> > [amdgpu]
> >
> > [  363.187206]  process_one_work+0x220/0x3c0
> >
> > [  363.187210]  worker_thread+0x4d/0x3f0
> >
> > [  363.187214]  ? process_one_work+0x3c0/0x3c0
> >
> > [  363.187217]  kthread+0x12b/0x150
> >
> > [  363.187221]  ? set_kthread_struct+0x40/0x40
> >
> > [  363.187226]  ret_from_fork+0x22/0x30
> >
> >
> >
> >
> >
> > BR
> >
> > Curry Gong
> >
> > From: Zhu, James <James.Zhu@amd.com>
> > Sent: Saturday, December 11, 2021 5:07 AM
> > To: Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
> > Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>;
> Quan,
> > Evan <Evan.Quan@amd.com>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> >
> >
> > On 2021-12-10 6:41 a.m., chen gong wrote:
> >
> > Play a video on the raven (or PCO, raven2) platform, and then do the
> > S3
> >
> > test. When resume, the following error will be reported:
> >
> >
> >
> > amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > ring
> >
> > vcn_dec test failed (-110)
> >
> > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
> IP
> > block
> >
> > <vcn_v1_0> failed -110
> >
> > amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> >
> > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> >
> >
> >
> > [why]
> >
> > When playing the video: The power state flag of the vcn block is set
> > to
> >
> > POWER_STATE_ON.
> >
> >
> >
> > When doing suspend: There is no change to the power state flag of the
> >
> > vcn block, it is still POWER_STATE_ON.
> >
> >
> >
> > When doing resume: Need to open the power gate of the vcn block and
> > set
> >
> > the power state flag of the VCN block to POWER_STATE_ON.
> >
> > But at this time, the power state flag of the vcn block is already
> >
> > POWER_STATE_ON. The power status flag check in the "8f2cdef
> drm/amd/pm:
> >
> > avoid duplicate powergate/ungate setting" patch will return the
> >
> > amdgpu_dpm_set_powergating_by_smu function directly.
> >
> > As a result, the gate of the power was not opened, causing the
> >
> > subsequent ring test to fail.
> >
> >
> >
> > [how]
> >
> > In the suspend function of the vcn block, explicitly change the power
> >
> > state flag of the vcn block to POWER_STATE_OFF.
> >
> >
> >
> > Signed-off-by: chen gong <curry.gong@amd.com>
> >
> > ---
> >
> >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
> >
> >  1 file changed, 7 insertions(+)
> >
> >
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >
> > index d54d720..d73676b 100644
> >
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >
> > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> >
> >  {
> >
> >   int r;
> >
> >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > + bool cancel_success;
> >
> > +
> >
> > + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
> >
> > [JZ] Can you refer to vcn_v3_0_stop , and add
> amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?
> >
> > See if it also can help.
> >
> >
> >
> > + if (cancel_success) {
> >
> > +        if (adev->pm.dpm_enabled)
> >
> > +                amdgpu_dpm_enable_uvd(adev, false);
> >
> > + }
> >
> >
> >
> >   r = vcn_v1_0_hw_fini(adev);
> >
> >   if (r)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
  2021-12-17  1:43           ` Quan, Evan
@ 2021-12-17  2:02             ` Alex Deucher
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Deucher @ 2021-12-17  2:02 UTC (permalink / raw)
  To: Quan, Evan; +Cc: Deucher, Alexander, Zhu, James, Liu, Leo, amd-gfx, Gong, Curry

On Thu, Dec 16, 2021 at 8:43 PM Quan, Evan <Evan.Quan@amd.com> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) in their ->suspend routine which should prevent them from the issue here.
>         if (adev->pm.dpm_enabled)
>                 amdgpu_dpm_enable_uvd(adev, false);
> So, maybe it's a different story with those newer APUs. Can you forward the bug reports to me?

https://gitlab.freedesktop.org/drm/amd/-/issues/1712#note_1192187

Alex

>
> Thanks,
> Evan
> > -----Original Message-----
> > From: Alex Deucher <alexdeucher@gmail.com>
> > Sent: Thursday, December 16, 2021 11:38 PM
> > To: Quan, Evan <Evan.Quan@amd.com>
> > Cc: Zhu, James <James.Zhu@amd.com>; Gong, Curry
> > <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org; Deucher,
> > Alexander <Alexander.Deucher@amd.com>; Liu, Leo <Leo.Liu@amd.com>
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> > FWIW, it looks like all versions of VCN need the same fix.  There have been
> > reports of suspend failing when VCN is in use on other newer APUs as well.
> >
> > Alex
> >
> > On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan <Evan.Quan@amd.com> wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > From: Zhu, James <James.Zhu@amd.com>
> > > Sent: Monday, December 13, 2021 9:39 PM
> > > To: Gong, Curry <Curry.Gong@amd.com>; Zhu, James
> > <James.Zhu@amd.com>;
> > > amd-gfx@lists.freedesktop.org
> > > Cc: Liu, Leo <Leo.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>;
> > > Deucher, Alexander <Alexander.Deucher@amd.com>
> > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > > powergating is explicitly enabled
> > >
> > >
> > >
> > > Hi Curry, Evan,
> > >
> > > It seems vcn1.0 power gate sequence are special.
> > >
> > > if the original solution is thread safe, then the following solution will be
> > thread safe also.
> > >
> > > static int vcn_v1_0_hw_fini(void *handle) {
> > >     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > >     cancel_delayed_work_sync(&adev->vcn.idle_work);
> > >
> > >     if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> > >         (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> > >          RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> > > +        if (adev->pm.dpm_enabled)
> > > +            amdgpu_dpm_enable_uvd(adev, false);
> > > +        else
> > > +            vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> > >
> > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true,
> > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will
> > become dead code.
> > >
> > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the
> > suspend/resume discussed here). So, it seems quite risky compared with
> > Curry’s original patch.
> > >
> > > Before we can come up with a better idea, I would suggest to land Curry’s
> > original patch as a quick fix.
> > >
> > >
> > >
> > > BR
> > >
> > > Evan
> > >
> > >
> > >     }
> > >
> > > Best Regards!
> > >
> > > James
> > >
> > > On 2021-12-13 3:55 a.m., Gong, Curry wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > > Hi James:
> > >
> > >
> > >
> > > With the following patch, an error will be reported when the driver is
> > > loaded
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device
> > > *adev)
> > >
> > >         else
> > >
> > >                 r = vcn_v1_0_stop_spg_mode(adev);
> > >
> > >
> > >
> > > c
> > >
> > >         return r;
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > > $ dmesg
> > >
> > > [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120
> > seconds.
> > >
> > > [  363.181150]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> > Ubuntu
> > >
> > > [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > >
> > > [  363.181266] task:kworker/3:2     state:D stack:    0 pid:  223 ppid:     2
> > flags:0x00004000
> > >
> > > [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
> > >
> > > [  363.181612] Call Trace:
> > >
> > > [  363.181618]  __schedule+0x44c/0x8a0
> > >
> > > [  363.181627]  schedule+0x4f/0xc0
> > >
> > > [  363.181631]  schedule_preempt_disabled+0xe/0x10
> > >
> > > [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
> > >
> > > [  363.181643]  __mutex_lock_slowpath+0x13/0x20
> > >
> > > [  363.181648]  mutex_lock+0x32/0x40
> > >
> > > [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180
> > [amdgpu]
> > >
> > > [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> > >
> > > [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
> > >
> > > [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0
> > > [amdgpu]
> > >
> > > [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
> > >
> > > [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
> > >
> > > [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180
> > [amdgpu]
> > >
> > > [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> > >
> > > [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
> > >
> > > [  363.184667]  process_one_work+0x220/0x3c0
> > >
> > > [  363.184674]  worker_thread+0x4d/0x3f0
> > >
> > > [  363.184677]  ? process_one_work+0x3c0/0x3c0
> > >
> > > [  363.184680]  kthread+0x12b/0x150
> > >
> > > [  363.184685]  ? set_kthread_struct+0x40/0x40
> > >
> > > [  363.184690]  ret_from_fork+0x22/0x30
> > >
> > > [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120
> > seconds.
> > >
> > > [  363.184739]       Tainted: G           OE     5.11.0-41-generic #45~20.04.1-
> > Ubuntu
> > >
> > > [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > >
> > > [  363.184825] task:kworker/2:2     state:D stack:    0 pid:  233 ppid:     2
> > flags:0x00004000
> > >
> > > [  363.184831] Workqueue: events
> > > amdgpu_device_delayed_init_work_handler [amdgpu]
> > >
> > > [  363.185085] Call Trace:
> > >
> > > [  363.185087]  __schedule+0x44c/0x8a0
> > >
> > > [  363.185092]  schedule+0x4f/0xc0
> > >
> > > [  363.185095]  schedule_timeout+0x202/0x290
> > >
> > > [  363.185099]  ? sched_clock_cpu+0x11/0xb0
> > >
> > > [  363.185105]  wait_for_completion+0x94/0x100
> > >
> > > [  363.185110]  __flush_work+0x12a/0x1e0
> > >
> > > [  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
> > >
> > > [  363.185119]  __cancel_work_timer+0x10e/0x190
> > >
> > > [  363.185123]  cancel_delayed_work_sync+0x13/0x20
> > >
> > > [  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
> > >
> > > [  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
> > >
> > > [  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
> > >
> > > [  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
> > >
> > > [  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
> > >
> > > [  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
> > >
> > > [  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
> > >
> > > [  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30
> > > [amdgpu]
> > >
> > > [  363.187206]  process_one_work+0x220/0x3c0
> > >
> > > [  363.187210]  worker_thread+0x4d/0x3f0
> > >
> > > [  363.187214]  ? process_one_work+0x3c0/0x3c0
> > >
> > > [  363.187217]  kthread+0x12b/0x150
> > >
> > > [  363.187221]  ? set_kthread_struct+0x40/0x40
> > >
> > > [  363.187226]  ret_from_fork+0x22/0x30
> > >
> > >
> > >
> > >
> > >
> > > BR
> > >
> > > Curry Gong
> > >
> > > From: Zhu, James <James.Zhu@amd.com>
> > > Sent: Saturday, December 11, 2021 5:07 AM
> > > To: Gong, Curry <Curry.Gong@amd.com>; amd-gfx@lists.freedesktop.org
> > > Cc: Liu, Leo <Leo.Liu@amd.com>; Zhu, James <James.Zhu@amd.com>;
> > Quan,
> > > Evan <Evan.Quan@amd.com>; Deucher, Alexander
> > > <Alexander.Deucher@amd.com>
> > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > > powergating is explicitly enabled
> > >
> > >
> > >
> > > On 2021-12-10 6:41 a.m., chen gong wrote:
> > >
> > > Play a video on the raven (or PCO, raven2) platform, and then do the
> > > S3
> > >
> > > test. When resume, the following error will be reported:
> > >
> > >
> > >
> > > amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > > ring
> > >
> > > vcn_dec test failed (-110)
> > >
> > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
> > IP
> > > block
> > >
> > > <vcn_v1_0> failed -110
> > >
> > > amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> > >
> > > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> > >
> > >
> > >
> > > [why]
> > >
> > > When playing the video: The power state flag of the vcn block is set
> > > to
> > >
> > > POWER_STATE_ON.
> > >
> > >
> > >
> > > When doing suspend: There is no change to the power state flag of the
> > >
> > > vcn block, it is still POWER_STATE_ON.
> > >
> > >
> > >
> > > When doing resume: Need to open the power gate of the vcn block and
> > > set
> > >
> > > the power state flag of the VCN block to POWER_STATE_ON.
> > >
> > > But at this time, the power state flag of the vcn block is already
> > >
> > > POWER_STATE_ON. The power status flag check in the "8f2cdef
> > drm/amd/pm:
> > >
> > > avoid duplicate powergate/ungate setting" patch will return the
> > >
> > > amdgpu_dpm_set_powergating_by_smu function directly.
> > >
> > > As a result, the gate of the power was not opened, causing the
> > >
> > > subsequent ring test to fail.
> > >
> > >
> > >
> > > [how]
> > >
> > > In the suspend function of the vcn block, explicitly change the power
> > >
> > > state flag of the vcn block to POWER_STATE_OFF.
> > >
> > >
> > >
> > > Signed-off-by: chen gong <curry.gong@amd.com>
> > >
> > > ---
> > >
> > >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++++++
> > >
> > >  1 file changed, 7 insertions(+)
> > >
> > >
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > index d54d720..d73676b 100644
> > >
> > > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> > >
> > >  {
> > >
> > >   int r;
> > >
> > >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > > + bool cancel_success;
> > >
> > > +
> > >
> > > + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
> > >
> > > [JZ] Can you refer to vcn_v3_0_stop , and add
> > amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?
> > >
> > > See if it also can help.
> > >
> > >
> > >
> > > + if (cancel_success) {
> > >
> > > +        if (adev->pm.dpm_enabled)
> > >
> > > +                amdgpu_dpm_enable_uvd(adev, false);
> > >
> > > + }
> > >
> > >
> > >
> > >   r = vcn_v1_0_hw_fini(adev);
> > >
> > >   if (r)

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-17  2:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-10 11:41 [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled chen gong
2021-12-10 12:25 ` Lazar, Lijo
2021-12-10 16:19   ` Quan, Evan
2021-12-10 21:17     ` James Zhu
2021-12-10 16:06 ` Quan, Evan
2021-12-10 19:50   ` Alex Deucher
2021-12-10 21:07 ` James Zhu
2021-12-13  8:55   ` Gong, Curry
2021-12-13 10:09     ` Quan, Evan
2021-12-13 13:39     ` James Zhu
2021-12-14  5:59       ` Quan, Evan
2021-12-16 15:38         ` Alex Deucher
2021-12-17  1:43           ` Quan, Evan
2021-12-17  2:02             ` Alex Deucher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.