[PATCH] drm/scheduler: don't update last scheduled fence in TDR

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/scheduler: don't update last scheduled fence in TDR
@ 2018-04-25  8:39 Pixel Ding
       [not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Pixel Ding @ 2018-04-25  8:39 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Pixel Ding

The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1

Since we update last sched prior to joining mirror list, the jobs
in mirror list already pass the last sched fence. TDR just run
the jobs in mirror list, so we should not update the last sched
fences in TDR.

Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
 		fence = sched->ops->run_job(s_job);
 		atomic_inc(&sched->hw_rq_count);
 
-		dma_fence_put(s_job->entity->last_scheduled);
-		s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
-
 		if (fence) {
 			s_fence->parent = dma_fence_get(fence);
 			r = dma_fence_add_callback(fence, &s_fence->cb,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
       [not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-26  3:47   ` Ding, Pixel
       [not found]     ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
  2018-04-26  3:48   ` Liu, Monk
  1 sibling, 1 reply; 6+ messages in thread
From: Ding, Pixel @ 2018-04-26  3:47 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk

Hi Monk,

Please review it. Thanks.

— 
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:

    The current sequence in scheduler thread is:
    1. update last sched fence
    2. job begin (adding to mirror list)
    3. job finish (remove from mirror list)
    4. back to 1
    
    Since we update last sched prior to joining mirror list, the jobs
    in mirror list already pass the last sched fence. TDR just run
    the jobs in mirror list, so we should not update the last sched
    fences in TDR.
    
    Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
    ---
     drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
     1 file changed, 3 deletions(-)
    
    diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    index 088ff2b..1f1dd70 100644
    --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
    +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
     		fence = sched->ops->run_job(s_job);
     		atomic_inc(&sched->hw_rq_count);
     
    -		dma_fence_put(s_job->entity->last_scheduled);
    -		s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
    -
     		if (fence) {
     			s_fence->parent = dma_fence_get(fence);
     			r = dma_fence_add_callback(fence, &s_fence->cb,
    -- 
    2.7.4
    
    

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
       [not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
  2018-04-26  3:47   ` Ding, Pixel
@ 2018-04-26  3:48   ` Liu, Monk
  1 sibling, 0 replies; 6+ messages in thread
From: Liu, Monk @ 2018-04-26  3:48 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Ding, Pixel

Reviewed-by : Monk Liu <monk.liu@amd.com>

-----Original Message-----
From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Pixel Ding
Sent: 2018年4月25日 16:40
To: amd-gfx@lists.freedesktop.org
Cc: Ding, Pixel <Pixel.Ding@amd.com>
Subject: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1

Since we update last sched prior to joining mirror list, the jobs in mirror list already pass the last sched fence. TDR just run the jobs in mirror list, so we should not update the last sched fences in TDR.

Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
 		fence = sched->ops->run_job(s_job);
 		atomic_inc(&sched->hw_rq_count);
 
-		dma_fence_put(s_job->entity->last_scheduled);
-		s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
-
 		if (fence) {
 			s_fence->parent = dma_fence_get(fence);
 			r = dma_fence_add_callback(fence, &s_fence->cb,
--
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
       [not found]     ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-26  4:23       ` zhoucm1
       [not found]         ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: zhoucm1 @ 2018-04-26  4:23 UTC (permalink / raw)
  To: Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk

NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same 
entity, then the entity->last_scheduled is jobB->finished.

2. when you do recovery, re-submit jobA first, if don't update 
last_scheduled, then entity->last_scheduled still is jobB->finished.

3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.


So the change isn't necessary at all.


Regards,

David Zhou


On 2018年04月26日 11:47, Ding, Pixel wrote:
> Hi Monk,
>
> Please review it. Thanks.
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
>
>      The current sequence in scheduler thread is:
>      1. update last sched fence
>      2. job begin (adding to mirror list)
>      3. job finish (remove from mirror list)
>      4. back to 1
>      
>      Since we update last sched prior to joining mirror list, the jobs
>      in mirror list already pass the last sched fence. TDR just run
>      the jobs in mirror list, so we should not update the last sched
>      fences in TDR.
>      
>      Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
>      ---
>       drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>       1 file changed, 3 deletions(-)
>      
>      diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>      index 088ff2b..1f1dd70 100644
>      --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>      +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>      @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
>       		fence = sched->ops->run_job(s_job);
>       		atomic_inc(&sched->hw_rq_count);
>       
>      -		dma_fence_put(s_job->entity->last_scheduled);
>      -		s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
>      -
>       		if (fence) {
>       			s_fence->parent = dma_fence_get(fence);
>       			r = dma_fence_add_callback(fence, &s_fence->cb,
>      --
>      2.7.4
>      
>      
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
       [not found]         ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-03 12:09           ` Christian König
  2018-05-07  1:00           ` Ding, Pixel
  1 sibling, 0 replies; 6+ messages in thread
From: Christian König @ 2018-05-03 12:09 UTC (permalink / raw)
  To: zhoucm1, Ding, Pixel, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Liu, Monk

That change is indeed fixing a problem. When drm_sched_job_recovery() is 
called s_job->entity should already be NULL.

And even when the process is killed we should still either re-submit the 
jobs or set the error.

BTW: There is another bug in that function: guilty_context needs to be 
kept alive over the course of a loop.

Going to submit a patch for that.

Regards,
Christian.

Am 26.04.2018 um 06:23 schrieb zhoucm1:
> NAK on it.
>
> First of all, without this patch, Does it cause any issue?
>
> second,
>
> entity->last_scheduled present the last submiting job.
>
> Your this change will break this meaning and don't work, e.g.
>
> 1. mirror list has jobA and jobB, assuming they are belonged to same 
> entity, then the entity->last_scheduled is jobB->finished.
>
> 2. when you do recovery, re-submit jobA first, if don't update 
> last_scheduled, then entity->last_scheduled still is jobB->finished.
>
> 3. killed this process, will call to drm_sched_entity_cleanup, will 
> register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB 
> isn't submitted at all.
>
>
> So the change isn't necessary at all.
>
>
> Regards,
>
> David Zhou
>
>
> On 2018年04月26日 11:47, Ding, Pixel wrote:
>> Hi Monk,
>>
>> Please review it. Thanks.
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>> On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
>>
>>      The current sequence in scheduler thread is:
>>      1. update last sched fence
>>      2. job begin (adding to mirror list)
>>      3. job finish (remove from mirror list)
>>      4. back to 1
>>           Since we update last sched prior to joining mirror list, 
>> the jobs
>>      in mirror list already pass the last sched fence. TDR just run
>>      the jobs in mirror list, so we should not update the last sched
>>      fences in TDR.
>>           Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
>>      ---
>>       drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>>       1 file changed, 3 deletions(-)
>>           diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      index 088ff2b..1f1dd70 100644
>>      --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>      @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
>> drm_gpu_scheduler *sched)
>>               fence = sched->ops->run_job(s_job);
>>               atomic_inc(&sched->hw_rq_count);
>>            - dma_fence_put(s_job->entity->last_scheduled);
>>      -        s_job->entity->last_scheduled = 
>> dma_fence_get(&s_fence->finished);
>>      -
>>               if (fence) {
>>                   s_fence->parent = dma_fence_get(fence);
>>                   r = dma_fence_add_callback(fence, &s_fence->cb,
>>      --
>>      2.7.4
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR
       [not found]         ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
  2018-05-03 12:09           ` Christian König
@ 2018-05-07  1:00           ` Ding, Pixel
  1 sibling, 0 replies; 6+ messages in thread
From: Ding, Pixel @ 2018-05-07  1:00 UTC (permalink / raw)
  To: Zhou, David(ChunMing),
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk

    3. killed this process, will call to drm_sched_entity_cleanup, will 
    register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
    submitted at all.

I think that point is your misunderstanding. Killing process won’t stop jobB to be submitted.

— 
Sincerely Yours,
Pixel


On 2018/4/26, 12:23 PM, "amd-gfx on behalf of zhoucm1" <amd-gfx-bounces@lists.freedesktop.org on behalf of zhoucm1@amd.com> wrote:

    NAK on it.
    
    First of all, without this patch, Does it cause any issue?
    
    second,
    
    entity->last_scheduled present the last submiting job.
    
    Your this change will break this meaning and don't work, e.g.
    
    1. mirror list has jobA and jobB, assuming they are belonged to same 
    entity, then the entity->last_scheduled is jobB->finished.
    
    2. when you do recovery, re-submit jobA first, if don't update 
    last_scheduled, then entity->last_scheduled still is jobB->finished.
    
    3. killed this process, will call to drm_sched_entity_cleanup, will 
    register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
    submitted at all.
    
    
    So the change isn't necessary at all.
    
    
    Regards,
    
    David Zhou
    
    
    On 2018年04月26日 11:47, Ding, Pixel wrote:
    > Hi Monk,
    >
    > Please review it. Thanks.
    >
    > —
    > Sincerely Yours,
    > Pixel
    >
    >
    > On 2018/4/25, 4:39 PM, "Pixel Ding" <Pixel.Ding@amd.com> wrote:
    >
    >      The current sequence in scheduler thread is:
    >      1. update last sched fence
    >      2. job begin (adding to mirror list)
    >      3. job finish (remove from mirror list)
    >      4. back to 1
    >      
    >      Since we update last sched prior to joining mirror list, the jobs
    >      in mirror list already pass the last sched fence. TDR just run
    >      the jobs in mirror list, so we should not update the last sched
    >      fences in TDR.
    >      
    >      Signed-off-by: Pixel Ding <Pixel.Ding@amd.com>
    >      ---
    >       drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
    >       1 file changed, 3 deletions(-)
    >      
    >      diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    >      index 088ff2b..1f1dd70 100644
    >      --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
    >      +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
    >      @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler *sched)
    >       		fence = sched->ops->run_job(s_job);
    >       		atomic_inc(&sched->hw_rq_count);
    >       
    >      -		dma_fence_put(s_job->entity->last_scheduled);
    >      -		s_job->entity->last_scheduled = dma_fence_get(&s_fence->finished);
    >      -
    >       		if (fence) {
    >       			s_fence->parent = dma_fence_get(fence);
    >       			r = dma_fence_add_callback(fence, &s_fence->cb,
    >      --
    >      2.7.4
    >      
    >      
    >
    > _______________________________________________
    > amd-gfx mailing list
    > amd-gfx@lists.freedesktop.org
    > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
    
    _______________________________________________
    amd-gfx mailing list
    amd-gfx@lists.freedesktop.org
    https://lists.freedesktop.org/mailman/listinfo/amd-gfx
    

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-05-07  1:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25  8:39 [PATCH] drm/scheduler: don't update last scheduled fence in TDR Pixel Ding
     [not found] ` <1524645589-16162-1-git-send-email-Pixel.Ding-5C7GfCeVMHo@public.gmane.org>
2018-04-26  3:47   ` Ding, Pixel
     [not found]     ` <6A08CC55-19DC-46E1-9741-24CE89C49763-5C7GfCeVMHo@public.gmane.org>
2018-04-26  4:23       ` zhoucm1
     [not found]         ` <2c43a4bf-dc5a-69e2-1d9f-02a4019d9d27-5C7GfCeVMHo@public.gmane.org>
2018-05-03 12:09           ` Christian König
2018-05-07  1:00           ` Ding, Pixel
2018-04-26  3:48   ` Liu, Monk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.