All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
@ 2021-05-18 15:11 James Zhu
  2021-05-18 15:23 ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: James Zhu @ 2021-05-18 15:11 UTC (permalink / raw)
  To: amd-gfx; +Cc: jamesz

Add cancel_delayed_work_sync before set power gating state
to avoid race condition issue when power gating.

Signed-off-by: James Zhu <James.Zhu@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index 0c1beef..6c5c083 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
 static int vcn_v1_0_hw_fini(void *handle)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	struct amdgpu_ring *ring;
+	int i;
+
+	ring = &adev->vcn.inst->ring_dec;
+	ring->sched.ready = false;
+
+	for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
+		ring = &adev->vcn.inst->ring_enc[i];
+		ring->sched.ready = false;
+	}
+
+	ring = &adev->jpeg.inst->ring_dec;
+	ring->sched.ready = false;
+
+	cancel_delayed_work_sync(&adev->vcn.idle_work);
 
 	if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
-		RREG32_SOC15(VCN, 0, mmUVD_STATUS))
+		(adev->vcn.cur_state != AMD_PG_STATE_GATE &&
+		 RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
 		vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
+	}
 
 	return 0;
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 15:11 [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate James Zhu
@ 2021-05-18 15:23 ` Christian König
  2021-05-18 15:45   ` James Zhu
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-05-18 15:23 UTC (permalink / raw)
  To: James Zhu, amd-gfx; +Cc: jamesz

Am 18.05.21 um 17:11 schrieb James Zhu:
> Add cancel_delayed_work_sync before set power gating state
> to avoid race condition issue when power gating.
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index 0c1beef..6c5c083 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>   static int vcn_v1_0_hw_fini(void *handle)
>   {
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +	struct amdgpu_ring *ring;
> +	int i;
> +
> +	ring = &adev->vcn.inst->ring_dec;
> +	ring->sched.ready = false;
> +
> +	for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
> +		ring = &adev->vcn.inst->ring_enc[i];
> +		ring->sched.ready = false;
> +	}
> +
> +	ring = &adev->jpeg.inst->ring_dec;
> +	ring->sched.ready = false;

Thinking more about that this is a really big NAK. The scheduler threads 
must to stay ready during a reset.

This is controlled by the upper layer and shouldn't be messed with in 
the hardware specific backend at all.

I've removed all of those a couple of years ago.

Regards,
Christian.

> +
> +	cancel_delayed_work_sync(&adev->vcn.idle_work);
>   
>   	if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> -		RREG32_SOC15(VCN, 0, mmUVD_STATUS))
> +		(adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> +		 RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>   		vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> +	}
>   
>   	return 0;
>   }

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 15:23 ` Christian König
@ 2021-05-18 15:45   ` James Zhu
  2021-05-18 15:54     ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: James Zhu @ 2021-05-18 15:45 UTC (permalink / raw)
  To: Christian König, James Zhu, amd-gfx


On 2021-05-18 11:23 a.m., Christian König wrote:
> Am 18.05.21 um 17:11 schrieb James Zhu:
>> Add cancel_delayed_work_sync before set power gating state
>> to avoid race condition issue when power gating.
>>
>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> index 0c1beef..6c5c083 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>   static int vcn_v1_0_hw_fini(void *handle)
>>   {
>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>> +    struct amdgpu_ring *ring;
>> +    int i;
>> +
>> +    ring = &adev->vcn.inst->ring_dec;
>> +    ring->sched.ready = false;
>> +
>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>> +        ring = &adev->vcn.inst->ring_enc[i];
>> +        ring->sched.ready = false;
>> +    }
>> +
>> +    ring = &adev->jpeg.inst->ring_dec;
>> +    ring->sched.ready = false;
>
> Thinking more about that this is a really big NAK. The scheduler 
> threads must to stay ready during a reset.
>
> This is controlled by the upper layer and shouldn't be messed with in 
> the hardware specific backend at all.

> [JZ] I ported this from current vcn3 hw_fini. Just want to make sure 
> that no more new jobs will be scheduled after suspend process starts.
It may a redundancy, since scheduler maybe already suspend. I can remove 
those if you are sure no side effect,

> I've removed all of those a couple of years ago.
>
> Regards,
> Christian.
>
>> +
>> +    cancel_delayed_work_sync(&adev->vcn.idle_work);
>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>> +    }
>>         return 0;
>>   }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 15:45   ` James Zhu
@ 2021-05-18 15:54     ` Christian König
  2021-05-18 15:59       ` James Zhu
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-05-18 15:54 UTC (permalink / raw)
  To: James Zhu, James Zhu, amd-gfx



Am 18.05.21 um 17:45 schrieb James Zhu:
>
> On 2021-05-18 11:23 a.m., Christian König wrote:
>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>> Add cancel_delayed_work_sync before set power gating state
>>> to avoid race condition issue when power gating.
>>>
>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> index 0c1beef..6c5c083 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>   {
>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>> +    struct amdgpu_ring *ring;
>>> +    int i;
>>> +
>>> +    ring = &adev->vcn.inst->ring_dec;
>>> +    ring->sched.ready = false;
>>> +
>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>> +        ring->sched.ready = false;
>>> +    }
>>> +
>>> +    ring = &adev->jpeg.inst->ring_dec;
>>> +    ring->sched.ready = false;
>>
>> Thinking more about that this is a really big NAK. The scheduler 
>> threads must to stay ready during a reset.
>>
>> This is controlled by the upper layer and shouldn't be messed with in 
>> the hardware specific backend at all.
>
>> [JZ] I ported this from current vcn3 hw_fini. Just want to make sure 
>> that no more new jobs will be scheduled after suspend process starts.
> It may a redundancy, since scheduler maybe already suspend. I can 
> remove those if you are sure no side effect,

Well, we *must* remove those. This flag controls if the hardware engine 
can be used for command submission and is only be set to true/false 
during initial driver load.

If you change it to false during hw_fini the engine won't work correctly 
any more after GPU reset or resume.

If you have any idea how to document that fact then please speak up, 
cause we had this problem a couple of times now.

Just send out a patch fixing various other occasions of that.

Regards,
Christian.

>
>> I've removed all of those a couple of years ago.
>>
>> Regards,
>> Christian.
>>
>>> +
>>> +    cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>> +    }
>>>         return 0;
>>>   }
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 15:54     ` Christian König
@ 2021-05-18 15:59       ` James Zhu
  2021-05-18 16:36         ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: James Zhu @ 2021-05-18 15:59 UTC (permalink / raw)
  To: Christian König, James Zhu, amd-gfx


On 2021-05-18 11:54 a.m., Christian König wrote:
>
>
> Am 18.05.21 um 17:45 schrieb James Zhu:
>>
>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>> Add cancel_delayed_work_sync before set power gating state
>>>> to avoid race condition issue when power gating.
>>>>
>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>> index 0c1beef..6c5c083 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>>   {
>>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>> +    struct amdgpu_ring *ring;
>>>> +    int i;
>>>> +
>>>> +    ring = &adev->vcn.inst->ring_dec;
>>>> +    ring->sched.ready = false;
>>>> +
>>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>>> +        ring->sched.ready = false;
>>>> +    }
>>>> +
>>>> +    ring = &adev->jpeg.inst->ring_dec;
>>>> +    ring->sched.ready = false;
>>>
>>> Thinking more about that this is a really big NAK. The scheduler 
>>> threads must to stay ready during a reset.
>>>
>>> This is controlled by the upper layer and shouldn't be messed with 
>>> in the hardware specific backend at all.
>>
>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make sure 
>>> that no more new jobs will be scheduled after suspend process starts.
>> It may a redundancy, since scheduler maybe already suspend. I can 
>> remove those if you are sure no side effect,
>
> Well, we *must* remove those. This flag controls if the hardware 
> engine can be used for command submission and is only be set to 
> true/false during initial driver load.
>
> If you change it to false during hw_fini the engine won't work 
> correctly any more after GPU reset or resume.
[JZ] If I recalled correctly tat hw_init will be called every time after 
GPU reset or suspend/resume,
>
> If you have any idea how to document that fact then please speak up, 
> cause we had this problem a couple of times now.
>
> Just send out a patch fixing various other occasions of that.
>
> Regards,
> Christian.
>
>>
>>> I've removed all of those a couple of years ago.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +
>>>> +    cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>> +    }
>>>>         return 0;
>>>>   }
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 15:59       ` James Zhu
@ 2021-05-18 16:36         ` Christian König
  2021-05-18 17:04           ` James Zhu
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-05-18 16:36 UTC (permalink / raw)
  To: James Zhu, James Zhu, amd-gfx

Am 18.05.21 um 17:59 schrieb James Zhu:
>
> On 2021-05-18 11:54 a.m., Christian König wrote:
>>
>>
>> Am 18.05.21 um 17:45 schrieb James Zhu:
>>>
>>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>>> Add cancel_delayed_work_sync before set power gating state
>>>>> to avoid race condition issue when power gating.
>>>>>
>>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>> index 0c1beef..6c5c083 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>>>   {
>>>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>>> +    struct amdgpu_ring *ring;
>>>>> +    int i;
>>>>> +
>>>>> +    ring = &adev->vcn.inst->ring_dec;
>>>>> +    ring->sched.ready = false;
>>>>> +
>>>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>>>> +        ring->sched.ready = false;
>>>>> +    }
>>>>> +
>>>>> +    ring = &adev->jpeg.inst->ring_dec;
>>>>> +    ring->sched.ready = false;
>>>>
>>>> Thinking more about that this is a really big NAK. The scheduler 
>>>> threads must to stay ready during a reset.
>>>>
>>>> This is controlled by the upper layer and shouldn't be messed with 
>>>> in the hardware specific backend at all.
>>>
>>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make 
>>>> sure that no more new jobs will be scheduled after suspend process 
>>>> starts.
>>> It may a redundancy, since scheduler maybe already suspend. I can 
>>> remove those if you are sure no side effect,
>>
>> Well, we *must* remove those. This flag controls if the hardware 
>> engine can be used for command submission and is only be set to 
>> true/false during initial driver load.
>>
>> If you change it to false during hw_fini the engine won't work 
>> correctly any more after GPU reset or resume.
> [JZ] If I recalled correctly tat hw_init will be called every time 
> after GPU reset or suspend/resume,

Yes that's correct.

But before that and during GPU reset the ready flag is then false for a 
short period of time which would result in userspace applications 
crashing when they try to submit something.

The flag essentially says that userspace can submit jobs to the 
scheduler. Processing of those jobs is of course only started after the 
hardware is re-initialized, but pushing jobs down the pipe is still 
perfectly valid in that situation.

Christian.

>>
>> If you have any idea how to document that fact then please speak up, 
>> cause we had this problem a couple of times now.
>>
>> Just send out a patch fixing various other occasions of that.
>>
>> Regards,
>> Christian.
>>
>>>
>>>> I've removed all of those a couple of years ago.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> +
>>>>> + cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>> +    }
>>>>>         return 0;
>>>>>   }
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 16:36         ` Christian König
@ 2021-05-18 17:04           ` James Zhu
  2021-05-18 18:06             ` Christian König
  0 siblings, 1 reply; 9+ messages in thread
From: James Zhu @ 2021-05-18 17:04 UTC (permalink / raw)
  To: Christian König, James Zhu, amd-gfx


On 2021-05-18 12:36 p.m., Christian König wrote:
> Am 18.05.21 um 17:59 schrieb James Zhu:
>>
>> On 2021-05-18 11:54 a.m., Christian König wrote:
>>>
>>>
>>> Am 18.05.21 um 17:45 schrieb James Zhu:
>>>>
>>>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>>>> Add cancel_delayed_work_sync before set power gating state
>>>>>> to avoid race condition issue when power gating.
>>>>>>
>>>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>> index 0c1beef..6c5c083 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>>>>   {
>>>>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>>>> +    struct amdgpu_ring *ring;
>>>>>> +    int i;
>>>>>> +
>>>>>> +    ring = &adev->vcn.inst->ring_dec;
>>>>>> +    ring->sched.ready = false;
>>>>>> +
>>>>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>>>>> +        ring->sched.ready = false;
>>>>>> +    }
>>>>>> +
>>>>>> +    ring = &adev->jpeg.inst->ring_dec;
>>>>>> +    ring->sched.ready = false;
>>>>>
>>>>> Thinking more about that this is a really big NAK. The scheduler 
>>>>> threads must to stay ready during a reset.
>>>>>
>>>>> This is controlled by the upper layer and shouldn't be messed with 
>>>>> in the hardware specific backend at all.
>>>>
>>>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make 
>>>>> sure that no more new jobs will be scheduled after suspend process 
>>>>> starts.
>>>> It may a redundancy, since scheduler maybe already suspend. I can 
>>>> remove those if you are sure no side effect,
>>>
>>> Well, we *must* remove those. This flag controls if the hardware 
>>> engine can be used for command submission and is only be set to 
>>> true/false during initial driver load.
>>>
>>> If you change it to false during hw_fini the engine won't work 
>>> correctly any more after GPU reset or resume.
>> [JZ] If I recalled correctly tat hw_init will be called every time 
>> after GPU reset or suspend/resume,
>
> Yes that's correct.
>
> But before that and during GPU reset the ready flag is then false for 
> a short period of time which would result in userspace applications 
> crashing when they try to submit something.
[JZ]  Application should handle situation when submission failed without 
crash.Maybe driver should return -EAGAIN to ask application to submit 
job later when gpu is under reset/suspend-resume.
>
> The flag essentially says that userspace can submit jobs to the 
> scheduler. Processing of those jobs is of course only started after 
> the hardware is re-initialized, but pushing jobs down the pipe is 
> still perfectly valid in that situation.
[JZ] I am wondering if it is requested to stop scheduling new jobs 
before save bo.
>
> Christian.
>
>>>
>>> If you have any idea how to document that fact then please speak up, 
>>> cause we had this problem a couple of times now.
>>>
>>> Just send out a patch fixing various other occasions of that.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>> I've removed all of those a couple of years ago.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>> + cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>>> +    }
>>>>>>         return 0;
>>>>>>   }
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 17:04           ` James Zhu
@ 2021-05-18 18:06             ` Christian König
  2021-05-18 18:18               ` James Zhu
  0 siblings, 1 reply; 9+ messages in thread
From: Christian König @ 2021-05-18 18:06 UTC (permalink / raw)
  To: James Zhu, James Zhu, amd-gfx

Am 18.05.21 um 19:04 schrieb James Zhu:
>
> On 2021-05-18 12:36 p.m., Christian König wrote:
>> Am 18.05.21 um 17:59 schrieb James Zhu:
>>>
>>> On 2021-05-18 11:54 a.m., Christian König wrote:
>>>>
>>>>
>>>> Am 18.05.21 um 17:45 schrieb James Zhu:
>>>>>
>>>>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>>>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>>>>> Add cancel_delayed_work_sync before set power gating state
>>>>>>> to avoid race condition issue when power gating.
>>>>>>>
>>>>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> index 0c1beef..6c5c083 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>>>>>   {
>>>>>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>>>>> +    struct amdgpu_ring *ring;
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    ring = &adev->vcn.inst->ring_dec;
>>>>>>> +    ring->sched.ready = false;
>>>>>>> +
>>>>>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>>>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>>>>>> +        ring->sched.ready = false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    ring = &adev->jpeg.inst->ring_dec;
>>>>>>> +    ring->sched.ready = false;
>>>>>>
>>>>>> Thinking more about that this is a really big NAK. The scheduler 
>>>>>> threads must to stay ready during a reset.
>>>>>>
>>>>>> This is controlled by the upper layer and shouldn't be messed 
>>>>>> with in the hardware specific backend at all.
>>>>>
>>>>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make 
>>>>>> sure that no more new jobs will be scheduled after suspend 
>>>>>> process starts.
>>>>> It may a redundancy, since scheduler maybe already suspend. I can 
>>>>> remove those if you are sure no side effect,
>>>>
>>>> Well, we *must* remove those. This flag controls if the hardware 
>>>> engine can be used for command submission and is only be set to 
>>>> true/false during initial driver load.
>>>>
>>>> If you change it to false during hw_fini the engine won't work 
>>>> correctly any more after GPU reset or resume.
>>> [JZ] If I recalled correctly tat hw_init will be called every time 
>>> after GPU reset or suspend/resume,
>>
>> Yes that's correct.
>>
>> But before that and during GPU reset the ready flag is then false for 
>> a short period of time which would result in userspace applications 
>> crashing when they try to submit something.
> [JZ]  Application should handle situation when submission failed 
> without crash.Maybe driver should return -EAGAIN to ask application to 
> submit job later when gpu is under reset/suspend-resume.

No, by design driver should always be able to accept jobs except for the 
case when the hardware is unrecoverable broken.

This is how we have implemented userspace already.

>> The flag essentially says that userspace can submit jobs to the 
>> scheduler. Processing of those jobs is of course only started after 
>> the hardware is re-initialized, but pushing jobs down the pipe is 
>> still perfectly valid in that situation.
> [JZ] I am wondering if it is requested to stop scheduling new jobs 
> before save bo.

Yes, that is guaranteed. The hardware backend doesn't need to worry 
about this in hw_fini() or otherwise we have a bug.

Christian.

>>
>> Christian.
>>
>>>>
>>>> If you have any idea how to document that fact then please speak 
>>>> up, cause we had this problem a couple of times now.
>>>>
>>>> Just send out a patch fixing various other occasions of that.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>
>>>>>> I've removed all of those a couple of years ago.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> +
>>>>>>> + cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>>>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>>>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>>>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>>>> +    }
>>>>>>>         return 0;
>>>>>>>   }
>>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
  2021-05-18 18:06             ` Christian König
@ 2021-05-18 18:18               ` James Zhu
  0 siblings, 0 replies; 9+ messages in thread
From: James Zhu @ 2021-05-18 18:18 UTC (permalink / raw)
  To: Christian König, James Zhu, amd-gfx


On 2021-05-18 2:06 p.m., Christian König wrote:
> Am 18.05.21 um 19:04 schrieb James Zhu:
>>
>> On 2021-05-18 12:36 p.m., Christian König wrote:
>>> Am 18.05.21 um 17:59 schrieb James Zhu:
>>>>
>>>> On 2021-05-18 11:54 a.m., Christian König wrote:
>>>>>
>>>>>
>>>>> Am 18.05.21 um 17:45 schrieb James Zhu:
>>>>>>
>>>>>> On 2021-05-18 11:23 a.m., Christian König wrote:
>>>>>>> Am 18.05.21 um 17:11 schrieb James Zhu:
>>>>>>>> Add cancel_delayed_work_sync before set power gating state
>>>>>>>> to avoid race condition issue when power gating.
>>>>>>>>
>>>>>>>> Signed-off-by: James Zhu <James.Zhu@amd.com>
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 19 ++++++++++++++++++-
>>>>>>>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>>> index 0c1beef..6c5c083 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>>>>>>>> @@ -230,10 +230,27 @@ static int vcn_v1_0_hw_init(void *handle)
>>>>>>>>   static int vcn_v1_0_hw_fini(void *handle)
>>>>>>>>   {
>>>>>>>>       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>>>>>>> +    struct amdgpu_ring *ring;
>>>>>>>> +    int i;
>>>>>>>> +
>>>>>>>> +    ring = &adev->vcn.inst->ring_dec;
>>>>>>>> +    ring->sched.ready = false;
>>>>>>>> +
>>>>>>>> +    for (i = 0; i < adev->vcn.num_enc_rings; ++i) {
>>>>>>>> +        ring = &adev->vcn.inst->ring_enc[i];
>>>>>>>> +        ring->sched.ready = false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    ring = &adev->jpeg.inst->ring_dec;
>>>>>>>> +    ring->sched.ready = false;
>>>>>>>
>>>>>>> Thinking more about that this is a really big NAK. The scheduler 
>>>>>>> threads must to stay ready during a reset.
>>>>>>>
>>>>>>> This is controlled by the upper layer and shouldn't be messed 
>>>>>>> with in the hardware specific backend at all.
>>>>>>
>>>>>>> [JZ] I ported this from current vcn3 hw_fini. Just want to make 
>>>>>>> sure that no more new jobs will be scheduled after suspend 
>>>>>>> process starts.
>>>>>> It may a redundancy, since scheduler maybe already suspend. I can 
>>>>>> remove those if you are sure no side effect,
>>>>>
>>>>> Well, we *must* remove those. This flag controls if the hardware 
>>>>> engine can be used for command submission and is only be set to 
>>>>> true/false during initial driver load.
>>>>>
>>>>> If you change it to false during hw_fini the engine won't work 
>>>>> correctly any more after GPU reset or resume.
>>>> [JZ] If I recalled correctly tat hw_init will be called every time 
>>>> after GPU reset or suspend/resume,
>>>
>>> Yes that's correct.
>>>
>>> But before that and during GPU reset the ready flag is then false 
>>> for a short period of time which would result in userspace 
>>> applications crashing when they try to submit something.
>> [JZ]  Application should handle situation when submission failed 
>> without crash.Maybe driver should return -EAGAIN to ask application 
>> to submit job later when gpu is under reset/suspend-resume.
>
> No, by design driver should always be able to accept jobs except for 
> the case when the hardware is unrecoverable broken.
>
> This is how we have implemented userspace already.
[JZ] I will submit new patches wihout ring->sched.ready in hw_fini after 
test. thanks!
>
>>> The flag essentially says that userspace can submit jobs to the 
>>> scheduler. Processing of those jobs is of course only started after 
>>> the hardware is re-initialized, but pushing jobs down the pipe is 
>>> still perfectly valid in that situation.
>> [JZ] I am wondering if it is requested to stop scheduling new jobs 
>> before save bo.
>
> Yes, that is guaranteed. The hardware backend doesn't need to worry 
> about this in hw_fini() or otherwise we have a bug.
>
> Christian.
>
>>>
>>> Christian.
>>>
>>>>>
>>>>> If you have any idea how to document that fact then please speak 
>>>>> up, cause we had this problem a couple of times now.
>>>>>
>>>>> Just send out a patch fixing various other occasions of that.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>>> I've removed all of those a couple of years ago.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> +
>>>>>>>> + cancel_delayed_work_sync(&adev->vcn.idle_work);
>>>>>>>>         if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
>>>>>>>> -        RREG32_SOC15(VCN, 0, mmUVD_STATUS))
>>>>>>>> +        (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>>>>>>>> +         RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
>>>>>>>>           vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>>>>>>>> +    }
>>>>>>>>         return 0;
>>>>>>>>   }
>>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-05-18 18:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-18 15:11 [PATCH] drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate James Zhu
2021-05-18 15:23 ` Christian König
2021-05-18 15:45   ` James Zhu
2021-05-18 15:54     ` Christian König
2021-05-18 15:59       ` James Zhu
2021-05-18 16:36         ` Christian König
2021-05-18 17:04           ` James Zhu
2021-05-18 18:06             ` Christian König
2021-05-18 18:18               ` James Zhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.