Yes, I think so as well. Andrey can you push this? Christian. Am 15.09.21 um 00:59 schrieb Grodzovsky, Andrey: > AFAIK this one is independent. > > Christian, can you confirm ? > > Andrey > ------------------------------------------------------------------------ > *From:* amd-gfx on behalf of > Alex Deucher > *Sent:* 14 September 2021 15:33 > *To:* Christian König > *Cc:* Liu, Monk ; amd-gfx list > ; Maling list - DRI developers > > *Subject:* Re: [PATCH 1/2] drm/sched: fix the bug of time out > calculation(v4) > Was this fix independent of the other discussions?  Should this be > applied to drm-misc? > > Alex > > On Wed, Sep 1, 2021 at 4:42 PM Alex Deucher wrote: > > > > On Wed, Sep 1, 2021 at 2:50 AM Christian König > > wrote: > > > > > > Am 01.09.21 um 02:46 schrieb Monk Liu: > > > > issue: > > > > in cleanup_job the cancle_delayed_work will cancel a TO timer > > > > even the its corresponding job is still running. > > > > > > > > fix: > > > > do not cancel the timer in cleanup_job, instead do the cancelling > > > > only when the heading job is signaled, and if there is a "next" job > > > > we start_timeout again. > > > > > > > > v2: > > > > further cleanup the logic, and do the TDR timer cancelling if > the signaled job > > > > is the last one in its scheduler. > > > > > > > > v3: > > > > change the issue description > > > > remove the cancel_delayed_work in the begining of the cleanup_job > > > > recover the implement of drm_sched_job_begin. > > > > > > > > v4: > > > > remove the kthread_should_park() checking in cleanup_job routine, > > > > we should cleanup the signaled job asap > > > > > > > > TODO: > > > > 1)introduce pause/resume scheduler in job_timeout to serial the > handling > > > > of scheduler and job_timeout. > > > > 2)drop the bad job's del and insert in scheduler due to above > serialization > > > > (no race issue anymore with the serialization) > > > > > > > > tested-by: jingwen > > > > Signed-off-by: Monk Liu > > > > > > Reviewed-by: Christian König > > > > > > > Are you planning to push this to drm-misc? > > > > Alex > > > > > > > > --- > > > >   drivers/gpu/drm/scheduler/sched_main.c | 26 > +++++++++----------------- > > > >   1 file changed, 9 insertions(+), 17 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > > > > index a2a9536..3e0bbc7 100644 > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > @@ -676,15 +676,6 @@ drm_sched_get_cleanup_job(struct > drm_gpu_scheduler *sched) > > > >   { > > > >       struct drm_sched_job *job, *next; > > > > > > > > -     /* > > > > -      * Don't destroy jobs while the timeout worker is running  > OR thread > > > > -      * is being parked and hence assumed to not touch pending_list > > > > -      */ > > > > -     if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > > > > - !cancel_delayed_work(&sched->work_tdr)) || > > > > -         kthread_should_park()) > > > > -             return NULL; > > > > - > > > > spin_lock(&sched->job_list_lock); > > > > > > > >       job = list_first_entry_or_null(&sched->pending_list, > > > > @@ -693,17 +684,21 @@ drm_sched_get_cleanup_job(struct > drm_gpu_scheduler *sched) > > > >       if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > > > >               /* remove job from pending_list */ > > > > list_del_init(&job->list); > > > > + > > > > +             /* cancel this job's TO timer */ > > > > + cancel_delayed_work(&sched->work_tdr); > > > >               /* make the scheduled timestamp more accurate */ > > > >               next = list_first_entry_or_null(&sched->pending_list, > > > > typeof(*next), list); > > > > -             if (next) > > > > + > > > > +             if (next) { > > > > next->s_fence->scheduled.timestamp = > > > > job->s_fence->finished.timestamp; > > > > - > > > > +                     /* start TO timer for next job */ > > > > + drm_sched_start_timeout(sched); > > > > +             } > > > >       } else { > > > >               job = NULL; > > > > -             /* queue timeout for next job */ > > > > - drm_sched_start_timeout(sched); > > > >       } > > > > > > > > spin_unlock(&sched->job_list_lock); > > > > @@ -791,11 +786,8 @@ static int drm_sched_main(void *param) > > > > (entity = drm_sched_select_entity(sched))) || > > > > kthread_should_stop()); > > > > > > > > -             if (cleanup_job) { > > > > +             if (cleanup_job) > > > > sched->ops->free_job(cleanup_job); > > > > -                     /* queue timeout for next job */ > > > > - drm_sched_start_timeout(sched); > > > > -             } > > > > > > > >               if (!entity) > > > >                       continue; > > >