* [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." @ 2021-08-17 4:28 Jingwen Chen 2021-08-17 6:35 ` Liu, Monk 2021-08-17 13:37 ` Andrey Grodzovsky 0 siblings, 2 replies; 6+ messages in thread From: Jingwen Chen @ 2021-08-17 4:28 UTC (permalink / raw) To: amd-gfx; +Cc: monk.liu, christian.koenig, Andrey.Grodzovsky, Jingwen Chen [Why] for bailing job, this commit will delete it from pending list thus the bailing job will never have a chance to be resubmitted even in advance tdr mode. [How] after embeded hw_fence into amdgpu_job is done, the race condition that this commit tries to work around is completely solved.So revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> --- drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- 1 file changed, 27 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..31d1176d939f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct work_struct *work) enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); - - /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ - spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* - * Remove the bad job so it cannot be freed by concurrent - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread - * is parked at which point it's safe. - */ - list_del_init(&job->list); - spin_unlock(&sched->job_list_lock); - status = job->sched->ops->timedout_job(job); /* @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct work_struct *work) job->sched->ops->free_job(job); sched->free_guilty = false; } - } else { - spin_unlock(&sched->job_list_lock); } if (status != DRM_GPU_SCHED_STAT_ENODEV) { @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); - /* - * Reinsert back the bad job here - now it's safe as - * drm_sched_get_cleanup_job cannot race against us and release the - * bad job at this point - we parked (waited for) any in progress - * (earlier) cleanups and drm_sched_get_cleanup_job will not be called - * now until the scheduler thread is unparked. - */ - if (bad && bad->sched == sched) - /* - * Add at the head of the queue to reflect it was the earliest - * job extracted. - */ - list_add(&bad->list, &sched->pending_list); - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* RE: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." 2021-08-17 4:28 [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." Jingwen Chen @ 2021-08-17 6:35 ` Liu, Monk 2021-08-17 13:37 ` Andrey Grodzovsky 1 sibling, 0 replies; 6+ messages in thread From: Liu, Monk @ 2021-08-17 6:35 UTC (permalink / raw) To: Chen, JingWen, amd-gfx Cc: Koenig, Christian, Grodzovsky, Andrey, Chen, JingWen [AMD Official Use Only] Reviewed-by: Monk Liu <monk.liu@amd.com> Thanks ------------------------------------------ Monk Liu | Cloud-GPU Core team ------------------------------------------ -----Original Message----- From: Jingwen Chen <Jingwen.Chen2@amd.com> Sent: Tuesday, August 17, 2021 12:28 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Monk <Monk.Liu@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Chen, JingWen <JingWen.Chen2@amd.com> Subject: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." [Why] for bailing job, this commit will delete it from pending list thus the bailing job will never have a chance to be resubmitted even in advance tdr mode. [How] after embeded hw_fence into amdgpu_job is done, the race condition that this commit tries to work around is completely solved.So revert this commit. This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> --- drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- 1 file changed, 27 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..31d1176d939f 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct work_struct *work) enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); - - /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ - spin_lock(&sched->job_list_lock); job = list_first_entry_or_null(&sched->pending_list, struct drm_sched_job, list); if (job) { - /* - * Remove the bad job so it cannot be freed by concurrent - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread - * is parked at which point it's safe. - */ - list_del_init(&job->list); - spin_unlock(&sched->job_list_lock); - status = job->sched->ops->timedout_job(job); /* @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct work_struct *work) job->sched->ops->free_job(job); sched->free_guilty = false; } - } else { - spin_unlock(&sched->job_list_lock); } if (status != DRM_GPU_SCHED_STAT_ENODEV) { @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) kthread_park(sched->thread); - /* - * Reinsert back the bad job here - now it's safe as - * drm_sched_get_cleanup_job cannot race against us and release the - * bad job at this point - we parked (waited for) any in progress - * (earlier) cleanups and drm_sched_get_cleanup_job will not be called - * now until the scheduler thread is unparked. - */ - if (bad && bad->sched == sched) - /* - * Add at the head of the queue to reflect it was the earliest - * job extracted. - */ - list_add(&bad->list, &sched->pending_list); - /* * Iterate the job list from later to earlier one and either deactive * their HW callbacks or remove them from pending list if they already -- 2.25.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." 2021-08-17 4:28 [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." Jingwen Chen 2021-08-17 6:35 ` Liu, Monk @ 2021-08-17 13:37 ` Andrey Grodzovsky 2021-08-17 13:43 ` Christian König 1 sibling, 1 reply; 6+ messages in thread From: Andrey Grodzovsky @ 2021-08-17 13:37 UTC (permalink / raw) To: Jingwen Chen, amd-gfx; +Cc: monk.liu, christian.koenig On 2021-08-17 12:28 a.m., Jingwen Chen wrote: > [Why] > for bailing job, this commit will delete it from pending list thus the > bailing job will never have a chance to be resubmitted even in advance > tdr mode. > > [How] > after embeded hw_fence into amdgpu_job is done, the race condition that > this commit tries to work around is completely solved.So revert this > commit. > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. Can you elaborate please how this solves the race ? As far as I see and with this patch reverted, in drm_sched_job_timedout you get a pointer to next job to process in timed out handler, immediately next this job is actually finished and it's fence signaled, this in turn triggers drm_sched_get_cleanup_job which fetches this job and returns to drm_sched_main which in turn call free_job callabck->...->amdgpu_fence_free which frees the job from the HW dma_fence release callback. After that you proceed with a freed job in timed out handler. If you could take the HW fence reference from drm_sched_job_timedout before starting processing then yes, I think it would work. Andrey > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- > 1 file changed, 27 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index a2a953693b45..31d1176d939f 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct work_struct *work) > enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; > > sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work); > - > - /* Protects against concurrent deletion in drm_sched_get_cleanup_job */ > - spin_lock(&sched->job_list_lock); > job = list_first_entry_or_null(&sched->pending_list, > struct drm_sched_job, list); > > if (job) { > - /* > - * Remove the bad job so it cannot be freed by concurrent > - * drm_sched_cleanup_jobs. It will be reinserted back after sched->thread > - * is parked at which point it's safe. > - */ > - list_del_init(&job->list); > - spin_unlock(&sched->job_list_lock); > - > status = job->sched->ops->timedout_job(job); > > /* > @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct work_struct *work) > job->sched->ops->free_job(job); > sched->free_guilty = false; > } > - } else { > - spin_unlock(&sched->job_list_lock); > } > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job *bad) > > kthread_park(sched->thread); > > - /* > - * Reinsert back the bad job here - now it's safe as > - * drm_sched_get_cleanup_job cannot race against us and release the > - * bad job at this point - we parked (waited for) any in progress > - * (earlier) cleanups and drm_sched_get_cleanup_job will not be called > - * now until the scheduler thread is unparked. > - */ > - if (bad && bad->sched == sched) > - /* > - * Add at the head of the queue to reflect it was the earliest > - * job extracted. > - */ > - list_add(&bad->list, &sched->pending_list); > - > /* > * Iterate the job list from later to earlier one and either deactive > * their HW callbacks or remove them from pending list if they already ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." 2021-08-17 13:37 ` Andrey Grodzovsky @ 2021-08-17 13:43 ` Christian König 2021-08-18 8:08 ` Jingwen Chen 0 siblings, 1 reply; 6+ messages in thread From: Christian König @ 2021-08-17 13:43 UTC (permalink / raw) To: Andrey Grodzovsky, Jingwen Chen, amd-gfx; +Cc: monk.liu Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > On 2021-08-17 12:28 a.m., Jingwen Chen wrote: >> [Why] >> for bailing job, this commit will delete it from pending list thus the >> bailing job will never have a chance to be resubmitted even in advance >> tdr mode. >> >> [How] >> after embeded hw_fence into amdgpu_job is done, the race condition that >> this commit tries to work around is completely solved.So revert this >> commit. >> This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. > > > Can you elaborate please how this solves the race ? > As far as I see and with this patch reverted, in drm_sched_job_timedout > you get a pointer to next job to process in timed out handler, > immediately > next this job is actually finished and it's fence signaled, this in > turn triggers > drm_sched_get_cleanup_job which fetches this job and returns to > drm_sched_main > which in turn call free_job callabck->...->amdgpu_fence_free which > frees the job > from the HW dma_fence release callback. After that you proceed with a > freed job > in timed out handler. > > If you could take the HW fence reference from drm_sched_job_timedout > before > starting processing then yes, I think it would work. Yes, precisely that's what I had in mind as well and seems to be missing from this patch. Regards, Christian. > > Andrey > > >> >> Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- >> 1 file changed, 27 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index a2a953693b45..31d1176d939f 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; >> sched = container_of(work, struct drm_gpu_scheduler, >> work_tdr.work); >> - >> - /* Protects against concurrent deletion in >> drm_sched_get_cleanup_job */ >> - spin_lock(&sched->job_list_lock); >> job = list_first_entry_or_null(&sched->pending_list, >> struct drm_sched_job, list); >> if (job) { >> - /* >> - * Remove the bad job so it cannot be freed by concurrent >> - * drm_sched_cleanup_jobs. It will be reinserted back after >> sched->thread >> - * is parked at which point it's safe. >> - */ >> - list_del_init(&job->list); >> - spin_unlock(&sched->job_list_lock); >> - >> status = job->sched->ops->timedout_job(job); >> /* >> @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct >> work_struct *work) >> job->sched->ops->free_job(job); >> sched->free_guilty = false; >> } >> - } else { >> - spin_unlock(&sched->job_list_lock); >> } >> if (status != DRM_GPU_SCHED_STAT_ENODEV) { >> @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler >> *sched, struct drm_sched_job *bad) >> kthread_park(sched->thread); >> - /* >> - * Reinsert back the bad job here - now it's safe as >> - * drm_sched_get_cleanup_job cannot race against us and release the >> - * bad job at this point - we parked (waited for) any in progress >> - * (earlier) cleanups and drm_sched_get_cleanup_job will not be >> called >> - * now until the scheduler thread is unparked. >> - */ >> - if (bad && bad->sched == sched) >> - /* >> - * Add at the head of the queue to reflect it was the earliest >> - * job extracted. >> - */ >> - list_add(&bad->list, &sched->pending_list); >> - >> /* >> * Iterate the job list from later to earlier one and either >> deactive >> * their HW callbacks or remove them from pending list if they >> already ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." 2021-08-17 13:43 ` Christian König @ 2021-08-18 8:08 ` Jingwen Chen 2021-08-18 10:39 ` Jingwen Chen 0 siblings, 1 reply; 6+ messages in thread From: Jingwen Chen @ 2021-08-18 8:08 UTC (permalink / raw) To: Christian König, Andrey Grodzovsky, amd-gfx; +Cc: monk.liu On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote: > > > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > > On 2021-08-17 12:28 a.m., Jingwen Chen wrote: > > > [Why] > > > for bailing job, this commit will delete it from pending list thus the > > > bailing job will never have a chance to be resubmitted even in advance > > > tdr mode. > > > > > > [How] > > > after embeded hw_fence into amdgpu_job is done, the race condition that > > > this commit tries to work around is completely solved.So revert this > > > commit. > > > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. > > > > > > Can you elaborate please how this solves the race ? > > As far as I see and with this patch reverted, in drm_sched_job_timedout > > you get a pointer to next job to process in timed out handler, > > immediately > > next this job is actually finished and it's fence signaled, this in turn > > triggers > > drm_sched_get_cleanup_job which fetches this job and returns to Hi Andrey, if drm_sched_job_timedout is triggered first, drm_sched_get_cleanup_job will return NULL when the timeout worker is running according to this code: if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && !cancel_delayed_work(&sched->work_tdr)) || kthread_should_park()) return NULL; But yes a dma_fence_get(job->s_fence->parent) is needed before processing timedout_job. When the bad job is signaled just before processing, the amdgpu_fence_free can be triggered by the dma_fence_put() of HW fence. And if I'm understanding this race condition correctly, the spin_lock is still needed here to avoid the drm_sched_get_cleanup_job get the spin_lock first and then enter the tdr work. > > drm_sched_main > > which in turn call free_job callabck->...->amdgpu_fence_free which frees > > the job > > from the HW dma_fence release callback. After that you proceed with a > > freed job > > in timed out handler. > > > > If you could take the HW fence reference from drm_sched_job_timedout > > before > > starting processing then yes, I think it would work. > > Yes, precisely that's what I had in mind as well and seems to be missing > from this patch. > > Regards, > Christian. > > > > > Andrey > > > > > > > > > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> > > > --- > > > drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- > > > 1 file changed, 27 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index a2a953693b45..31d1176d939f 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct > > > work_struct *work) > > > enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; > > > sched = container_of(work, struct drm_gpu_scheduler, > > > work_tdr.work); > > > - > > > - /* Protects against concurrent deletion in > > > drm_sched_get_cleanup_job */ > > > - spin_lock(&sched->job_list_lock); > > > job = list_first_entry_or_null(&sched->pending_list, > > > struct drm_sched_job, list); > > > if (job) { > > > - /* > > > - * Remove the bad job so it cannot be freed by concurrent > > > - * drm_sched_cleanup_jobs. It will be reinserted back after > > > sched->thread > > > - * is parked at which point it's safe. > > > - */ > > > - list_del_init(&job->list); > > > - spin_unlock(&sched->job_list_lock); > > > - > > > status = job->sched->ops->timedout_job(job); > > > /* > > > @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct > > > work_struct *work) > > > job->sched->ops->free_job(job); > > > sched->free_guilty = false; > > > } > > > - } else { > > > - spin_unlock(&sched->job_list_lock); > > > } > > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > > > @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler > > > *sched, struct drm_sched_job *bad) > > > kthread_park(sched->thread); > > > - /* > > > - * Reinsert back the bad job here - now it's safe as > > > - * drm_sched_get_cleanup_job cannot race against us and release the > > > - * bad job at this point - we parked (waited for) any in progress > > > - * (earlier) cleanups and drm_sched_get_cleanup_job will not be > > > called > > > - * now until the scheduler thread is unparked. > > > - */ > > > - if (bad && bad->sched == sched) > > > - /* > > > - * Add at the head of the queue to reflect it was the earliest > > > - * job extracted. > > > - */ > > > - list_add(&bad->list, &sched->pending_list); > > > - > > > /* > > > * Iterate the job list from later to earlier one and either > > > deactive > > > * their HW callbacks or remove them from pending list if they > > > already > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." 2021-08-18 8:08 ` Jingwen Chen @ 2021-08-18 10:39 ` Jingwen Chen 0 siblings, 0 replies; 6+ messages in thread From: Jingwen Chen @ 2021-08-18 10:39 UTC (permalink / raw) To: Christian König, Andrey Grodzovsky, amd-gfx; +Cc: monk.liu Sorry, just get what you mean, will submit a v2 patch. On Wed Aug 18, 2021 at 04:08:37PM +0800, Jingwen Chen wrote: > On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote: > > > > > > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky: > > > On 2021-08-17 12:28 a.m., Jingwen Chen wrote: > > > > [Why] > > > > for bailing job, this commit will delete it from pending list thus the > > > > bailing job will never have a chance to be resubmitted even in advance > > > > tdr mode. > > > > > > > > [How] > > > > after embeded hw_fence into amdgpu_job is done, the race condition that > > > > this commit tries to work around is completely solved.So revert this > > > > commit. > > > > This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90. > > > > > > > > > Can you elaborate please how this solves the race ? > > > As far as I see and with this patch reverted, in drm_sched_job_timedout > > > you get a pointer to next job to process in timed out handler, > > > immediately > > > next this job is actually finished and it's fence signaled, this in turn > > > triggers > > > drm_sched_get_cleanup_job which fetches this job and returns to > Hi Andrey, > > if drm_sched_job_timedout is triggered first, drm_sched_get_cleanup_job will return > NULL when the timeout worker is running according to this code: > if ((sched->timeout != MAX_SCHEDULE_TIMEOUT && > !cancel_delayed_work(&sched->work_tdr)) || > kthread_should_park()) > return NULL; > > But yes a dma_fence_get(job->s_fence->parent) is needed before > processing timedout_job. When the bad job is signaled just before > processing, the amdgpu_fence_free can be triggered by the dma_fence_put() of HW fence. > And if I'm understanding this race condition correctly, the spin_lock is > still needed here to avoid the drm_sched_get_cleanup_job get the > spin_lock first and then enter the tdr work. > > > drm_sched_main > > > which in turn call free_job callabck->...->amdgpu_fence_free which frees > > > the job > > > from the HW dma_fence release callback. After that you proceed with a > > > freed job > > > in timed out handler. > > > > > > If you could take the HW fence reference from drm_sched_job_timedout > > > before > > > starting processing then yes, I think it would work. > > > > Yes, precisely that's what I had in mind as well and seems to be missing > > from this patch. > > > > Regards, > > Christian. > > > > > > > > Andrey > > > > > > > > > > > > > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> > > > > --- > > > > drivers/gpu/drm/scheduler/sched_main.c | 27 -------------------------- > > > > 1 file changed, 27 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > > index a2a953693b45..31d1176d939f 100644 > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > > @@ -317,21 +317,10 @@ static void drm_sched_job_timedout(struct > > > > work_struct *work) > > > > enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL; > > > > sched = container_of(work, struct drm_gpu_scheduler, > > > > work_tdr.work); > > > > - > > > > - /* Protects against concurrent deletion in > > > > drm_sched_get_cleanup_job */ > > > > - spin_lock(&sched->job_list_lock); > > > > job = list_first_entry_or_null(&sched->pending_list, > > > > struct drm_sched_job, list); > > > > if (job) { > > > > - /* > > > > - * Remove the bad job so it cannot be freed by concurrent > > > > - * drm_sched_cleanup_jobs. It will be reinserted back after > > > > sched->thread > > > > - * is parked at which point it's safe. > > > > - */ > > > > - list_del_init(&job->list); > > > > - spin_unlock(&sched->job_list_lock); > > > > - > > > > status = job->sched->ops->timedout_job(job); > > > > /* > > > > @@ -342,8 +331,6 @@ static void drm_sched_job_timedout(struct > > > > work_struct *work) > > > > job->sched->ops->free_job(job); > > > > sched->free_guilty = false; > > > > } > > > > - } else { > > > > - spin_unlock(&sched->job_list_lock); > > > > } > > > > if (status != DRM_GPU_SCHED_STAT_ENODEV) { > > > > @@ -392,20 +379,6 @@ void drm_sched_stop(struct drm_gpu_scheduler > > > > *sched, struct drm_sched_job *bad) > > > > kthread_park(sched->thread); > > > > - /* > > > > - * Reinsert back the bad job here - now it's safe as > > > > - * drm_sched_get_cleanup_job cannot race against us and release the > > > > - * bad job at this point - we parked (waited for) any in progress > > > > - * (earlier) cleanups and drm_sched_get_cleanup_job will not be > > > > called > > > > - * now until the scheduler thread is unparked. > > > > - */ > > > > - if (bad && bad->sched == sched) > > > > - /* > > > > - * Add at the head of the queue to reflect it was the earliest > > > > - * job extracted. > > > > - */ > > > > - list_add(&bad->list, &sched->pending_list); > > > > - > > > > /* > > > > * Iterate the job list from later to earlier one and either > > > > deactive > > > > * their HW callbacks or remove them from pending list if they > > > > already > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-08-18 10:39 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-17 4:28 [PATCH] Revert "drm/scheduler: Avoid accessing freed bad job." Jingwen Chen 2021-08-17 6:35 ` Liu, Monk 2021-08-17 13:37 ` Andrey Grodzovsky 2021-08-17 13:43 ` Christian König 2021-08-18 8:08 ` Jingwen Chen 2021-08-18 10:39 ` Jingwen Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.