From: "Christian König" <ckoenig.leichtzumerken@gmail.com> To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>, dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2] drm/scheduler: Fix hang when sched_entity released Date: Thu, 18 Feb 2021 09:07:01 +0100 [thread overview] Message-ID: <bc2c5ce4-a641-8a5e-bd7b-11174c883e99@gmail.com> (raw) In-Reply-To: <1613599181-9492-1-git-send-email-andrey.grodzovsky@amd.com> Am 17.02.21 um 22:59 schrieb Andrey Grodzovsky: > Problem: If scheduler is already stopped by the time sched_entity > is released and entity's job_queue not empty I encountred > a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle > never becomes false. > > Fix: In drm_sched_fini detach all sched_entities from the > scheduler's run queues. This will satisfy drm_sched_entity_is_idle. > Also wakeup all those processes stuck in sched_entity flushing > as the scheduler main thread which wakes them up is stopped by now. > > v2: > Reverse order of drm_sched_rq_remove_entity and marking > s_entity as stopped to prevent reinserion back to rq due > to race. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 908b0b5..c6b7947 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -897,9 +897,40 @@ EXPORT_SYMBOL(drm_sched_init); > */ > void drm_sched_fini(struct drm_gpu_scheduler *sched) > { > + int i; > + struct drm_sched_entity *s_entity; BTW: Please order that so that i is declared last. > if (sched->thread) > kthread_stop(sched->thread); > > + /* Detach all sched_entites from this scheduler once it's stopped */ > + for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > + struct drm_sched_rq *rq = &sched->sched_rq[i]; > + > + if (!rq) > + continue; > + > + /* Loop this way because rq->lock is taken in drm_sched_rq_remove_entity */ > + spin_lock(&rq->lock); > + while ((s_entity = list_first_entry_or_null(&rq->entities, > + struct drm_sched_entity, > + list))) { > + spin_unlock(&rq->lock); > + > + /* Prevent reinsertion and remove */ > + spin_lock(&s_entity->rq_lock); > + s_entity->stopped = true; > + drm_sched_rq_remove_entity(rq, s_entity); > + spin_unlock(&s_entity->rq_lock); Well this spin_unlock/lock dance here doesn't look correct at all now. Christian. > + > + spin_lock(&rq->lock); > + } > + spin_unlock(&rq->lock); > + > + } > + > + /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */ > + wake_up_all(&sched->job_scheduled); > + > /* Confirm no work left behind accessing device structures */ > cancel_delayed_work_sync(&sched->work_tdr); > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com> To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>, dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2] drm/scheduler: Fix hang when sched_entity released Date: Thu, 18 Feb 2021 09:07:01 +0100 [thread overview] Message-ID: <bc2c5ce4-a641-8a5e-bd7b-11174c883e99@gmail.com> (raw) In-Reply-To: <1613599181-9492-1-git-send-email-andrey.grodzovsky@amd.com> Am 17.02.21 um 22:59 schrieb Andrey Grodzovsky: > Problem: If scheduler is already stopped by the time sched_entity > is released and entity's job_queue not empty I encountred > a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle > never becomes false. > > Fix: In drm_sched_fini detach all sched_entities from the > scheduler's run queues. This will satisfy drm_sched_entity_is_idle. > Also wakeup all those processes stuck in sched_entity flushing > as the scheduler main thread which wakes them up is stopped by now. > > v2: > Reverse order of drm_sched_rq_remove_entity and marking > s_entity as stopped to prevent reinserion back to rq due > to race. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 31 +++++++++++++++++++++++++++++++ > 1 file changed, 31 insertions(+) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 908b0b5..c6b7947 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -897,9 +897,40 @@ EXPORT_SYMBOL(drm_sched_init); > */ > void drm_sched_fini(struct drm_gpu_scheduler *sched) > { > + int i; > + struct drm_sched_entity *s_entity; BTW: Please order that so that i is declared last. > if (sched->thread) > kthread_stop(sched->thread); > > + /* Detach all sched_entites from this scheduler once it's stopped */ > + for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > + struct drm_sched_rq *rq = &sched->sched_rq[i]; > + > + if (!rq) > + continue; > + > + /* Loop this way because rq->lock is taken in drm_sched_rq_remove_entity */ > + spin_lock(&rq->lock); > + while ((s_entity = list_first_entry_or_null(&rq->entities, > + struct drm_sched_entity, > + list))) { > + spin_unlock(&rq->lock); > + > + /* Prevent reinsertion and remove */ > + spin_lock(&s_entity->rq_lock); > + s_entity->stopped = true; > + drm_sched_rq_remove_entity(rq, s_entity); > + spin_unlock(&s_entity->rq_lock); Well this spin_unlock/lock dance here doesn't look correct at all now. Christian. > + > + spin_lock(&rq->lock); > + } > + spin_unlock(&rq->lock); > + > + } > + > + /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */ > + wake_up_all(&sched->job_scheduled); > + > /* Confirm no work left behind accessing device structures */ > cancel_delayed_work_sync(&sched->work_tdr); > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2021-02-18 8:07 UTC|newest] Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-02-17 21:59 [PATCH v2] drm/scheduler: Fix hang when sched_entity released Andrey Grodzovsky 2021-02-17 21:59 ` Andrey Grodzovsky 2021-02-18 8:07 ` Christian König [this message] 2021-02-18 8:07 ` Christian König 2021-02-18 15:05 ` Andrey Grodzovsky 2021-02-18 15:05 ` Andrey Grodzovsky 2021-02-18 15:15 ` Christian König 2021-02-18 15:15 ` Christian König 2021-02-18 16:41 ` Andrey Grodzovsky 2021-02-18 16:41 ` Andrey Grodzovsky 2021-02-19 19:17 ` Andrey Grodzovsky 2021-02-19 19:17 ` Andrey Grodzovsky 2021-02-20 8:38 ` Christian König 2021-02-20 8:38 ` Christian König 2021-02-20 12:12 ` Andrey Grodzovsky 2021-02-20 12:12 ` Andrey Grodzovsky 2021-02-22 13:35 ` Andrey Grodzovsky 2021-02-22 13:35 ` Andrey Grodzovsky 2021-02-24 15:13 ` Andrey Grodzovsky 2021-02-24 15:13 ` Andrey Grodzovsky 2021-02-25 7:53 ` Christian König 2021-02-25 7:53 ` Christian König 2021-02-25 16:03 ` Andrey Grodzovsky 2021-02-25 16:03 ` Andrey Grodzovsky 2021-02-25 18:42 ` Christian König 2021-02-25 18:42 ` Christian König 2021-02-25 21:27 ` Andrey Grodzovsky 2021-02-25 21:27 ` Andrey Grodzovsky 2021-02-26 8:01 ` Christian König 2021-02-26 8:01 ` Christian König
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bc2c5ce4-a641-8a5e-bd7b-11174c883e99@gmail.com \ --to=ckoenig.leichtzumerken@gmail.com \ --cc=amd-gfx@lists.freedesktop.org \ --cc=andrey.grodzovsky@amd.com \ --cc=dri-devel@lists.freedesktop.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.