* [PATCH] drm/sced: Add FIFO policy for scheduler rq @ 2022-08-22 20:09 Andrey Grodzovsky 2022-08-23 12:15 ` Christian König ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-22 20:09 UTC (permalink / raw) To: dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, luben.tuikov, amd-gfx Poblem: Given many entities competing for same rq on same scheduler an uncceptabliy long wait time for some jobs waiting stuck in rq before being picked up are observed (seen using GPUVis). The issue is due to Round Robin policy used by scheduler to pick up the next entity for execution. Under stress of many entities and long job queus within entity some jobs could be stack for very long time in it's entity's queue before being popped from the queue and executed while for other entites with samller job queues a job might execute ealier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entites in RQ, chose next enitity on rq in such order that if job on one entity arrived ealrier then job on another entity the first job will start executing ealier regardless of the length of the entity's job queue. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 2 + drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- include/drm/gpu_scheduler.h | 8 +++ 3 files changed, 71 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 6b25b2f4f5a3..3bb7f69306ef 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); + sched_job->submit_ts = ktime_get(); + /* first job wakes up scheduler */ if (first) { diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 68317d3a7a27..c123aa120d06 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -59,6 +59,19 @@ #define CREATE_TRACE_POINTS #include "gpu_scheduler_trace.h" + + +int drm_sched_policy = -1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); + + #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, } /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * - * Try to find a ready entity, returns NULL if none found. + * Try to find a ready entity, in round robin manner. + * + * Returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, based on FIFO order of jobs arrivals. + * + * Returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *tmp, *entity = NULL; + ktime_t oldest_ts = KTIME_MAX; + struct drm_sched_job *sched_job; + + spin_lock(&rq->lock); + + list_for_each_entry(tmp, &rq->entities, list) { + + if (drm_sched_entity_is_ready(tmp)) { + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); + + if (ktime_before(sched_job->submit_ts, oldest_ts)) { + oldest_ts = sched_job->submit_ts; + entity = tmp; + } + } + } + + if (entity) { + rq->current_entity = entity; + reinit_completion(&entity->entity_idle); + } + + spin_unlock(&rq->lock); + return entity; +} + /** * drm_sched_job_done - complete a job * @s_job: pointer to the job which is done @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) /* Kernel run queue has higher priority than normal run queue*/ for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); + entity = drm_sched_policy != 1 ? + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); + if (entity) break; } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index addb135eeea6..95865881bfcf 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -314,6 +314,14 @@ struct drm_sched_job { /** @last_dependency: tracks @dependencies as they signal */ unsigned long last_dependency; + + /** + * @submit_ts: + * + * Marks job submit time + */ + ktime_t submit_ts; + }; static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky @ 2022-08-23 12:15 ` Christian König 2022-08-23 15:15 ` Andrey Grodzovsky 2022-08-23 16:58 ` Luben Tuikov 2022-08-24 8:29 ` Michel Dänzer 2 siblings, 1 reply; 14+ messages in thread From: Christian König @ 2022-08-23 12:15 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel; +Cc: Li Yunxiang, luben.tuikov, amd-gfx Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky: > Poblem: Given many entities competing for same rq on > same scheduler an uncceptabliy long wait time for some > jobs waiting stuck in rq before being picked up are > observed (seen using GPUVis). > The issue is due to Round Robin policy used by scheduler > to pick up the next entity for execution. Under stress > of many entities and long job queus within entity some > jobs could be stack for very long time in it's entity's > queue before being popped from the queue and executed > while for other entites with samller job queues a job > might execute ealier even though that job arrived later > then the job in the long queue. > > Fix: > Add FIFO selection policy to entites in RQ, chose next enitity > on rq in such order that if job on one entity arrived > ealrier then job on another entity the first job will start > executing ealier regardless of the length of the entity's job > queue. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 2 + > drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- > include/drm/gpu_scheduler.h | 8 +++ > 3 files changed, 71 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 6b25b2f4f5a3..3bb7f69306ef 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); > + sched_job->submit_ts = ktime_get(); > + > > /* first job wakes up scheduler */ > if (first) { > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 68317d3a7a27..c123aa120d06 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -59,6 +59,19 @@ > #define CREATE_TRACE_POINTS > #include "gpu_scheduler_trace.h" > > + > + > +int drm_sched_policy = -1; > + > +/** > + * DOC: sched_policy (int) > + * Used to override default entites scheduling policy in a run queue. > + */ > +MODULE_PARM_DESC(sched_policy, > + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); Well we don't really have an autodetect at the moment, so I would drop that. > +module_param_named(sched_policy, drm_sched_policy, int, 0444); > + > + > #define to_drm_sched_job(sched_job) \ > container_of((sched_job), struct drm_sched_job, queue_node) > > @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > } > > /** > - * drm_sched_rq_select_entity - Select an entity which could provide a job to run > + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run > * > * @rq: scheduler run queue to check. > * > - * Try to find a ready entity, returns NULL if none found. > + * Try to find a ready entity, in round robin manner. > + * > + * Returns NULL if none found. > */ > static struct drm_sched_entity * > -drm_sched_rq_select_entity(struct drm_sched_rq *rq) > +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) > { > struct drm_sched_entity *entity; > > @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) > return NULL; > } > > +/** > + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run > + * > + * @rq: scheduler run queue to check. > + * > + * Try to find a ready entity, based on FIFO order of jobs arrivals. > + * > + * Returns NULL if none found. > + */ > +static struct drm_sched_entity * > +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) > +{ > + struct drm_sched_entity *tmp, *entity = NULL; > + ktime_t oldest_ts = KTIME_MAX; > + struct drm_sched_job *sched_job; > + > + spin_lock(&rq->lock); > + > + list_for_each_entry(tmp, &rq->entities, list) { > + > + if (drm_sched_entity_is_ready(tmp)) { > + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); > + > + if (ktime_before(sched_job->submit_ts, oldest_ts)) { > + oldest_ts = sched_job->submit_ts; > + entity = tmp; > + } > + } > + } > + > + if (entity) { > + rq->current_entity = entity; > + reinit_completion(&entity->entity_idle); > + } That should probably be a separate function or at least outside of this here. Apart from that totally straight forward implementation. Any idea how much extra overhead that is? Regards, Christian. > + > + spin_unlock(&rq->lock); > + return entity; > +} > + > /** > * drm_sched_job_done - complete a job > * @s_job: pointer to the job which is done > @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) > > /* Kernel run queue has higher priority than normal run queue*/ > for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); > + entity = drm_sched_policy != 1 ? > + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : > + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); > + > if (entity) > break; > } > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index addb135eeea6..95865881bfcf 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -314,6 +314,14 @@ struct drm_sched_job { > > /** @last_dependency: tracks @dependencies as they signal */ > unsigned long last_dependency; > + > + /** > + * @submit_ts: > + * > + * Marks job submit time > + */ > + ktime_t submit_ts; > + > }; > > static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 12:15 ` Christian König @ 2022-08-23 15:15 ` Andrey Grodzovsky 0 siblings, 0 replies; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-23 15:15 UTC (permalink / raw) To: Christian König, dri-devel; +Cc: Li Yunxiang, luben.tuikov, amd-gfx On 2022-08-23 08:15, Christian König wrote: > > > Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky: >> Poblem: Given many entities competing for same rq on >> same scheduler an uncceptabliy long wait time for some >> jobs waiting stuck in rq before being picked up are >> observed (seen using GPUVis). >> The issue is due to Round Robin policy used by scheduler >> to pick up the next entity for execution. Under stress >> of many entities and long job queus within entity some >> jobs could be stack for very long time in it's entity's >> queue before being popped from the queue and executed >> while for other entites with samller job queues a job >> might execute ealier even though that job arrived later >> then the job in the long queue. >> >> Fix: >> Add FIFO selection policy to entites in RQ, chose next enitity >> on rq in such order that if job on one entity arrived >> ealrier then job on another entity the first job will start >> executing ealier regardless of the length of the entity's job >> queue. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >> include/drm/gpu_scheduler.h | 8 +++ >> 3 files changed, 71 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >> b/drivers/gpu/drm/scheduler/sched_entity.c >> index 6b25b2f4f5a3..3bb7f69306ef 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct >> drm_sched_job *sched_job) >> atomic_inc(entity->rq->sched->score); >> WRITE_ONCE(entity->last_user, current->group_leader); >> first = spsc_queue_push(&entity->job_queue, >> &sched_job->queue_node); >> + sched_job->submit_ts = ktime_get(); >> + >> /* first job wakes up scheduler */ >> if (first) { >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c >> b/drivers/gpu/drm/scheduler/sched_main.c >> index 68317d3a7a27..c123aa120d06 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -59,6 +59,19 @@ >> #define CREATE_TRACE_POINTS >> #include "gpu_scheduler_trace.h" >> + >> + >> +int drm_sched_policy = -1; >> + >> +/** >> + * DOC: sched_policy (int) >> + * Used to override default entites scheduling policy in a run queue. >> + */ >> +MODULE_PARM_DESC(sched_policy, >> + "specify schedule policy for entites on a runqueue (-1 = >> auto(default) value, 0 = Round Robin,1 = use FIFO"); > > Well we don't really have an autodetect at the moment, so I would drop > that. > >> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >> + >> + >> #define to_drm_sched_job(sched_job) \ >> container_of((sched_job), struct drm_sched_job, queue_node) >> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct >> drm_sched_rq *rq, >> } >> /** >> - * drm_sched_rq_select_entity - Select an entity which could provide >> a job to run >> + * drm_sched_rq_select_entity_rr - Select an entity which could >> provide a job to run >> * >> * @rq: scheduler run queue to check. >> * >> - * Try to find a ready entity, returns NULL if none found. >> + * Try to find a ready entity, in round robin manner. >> + * >> + * Returns NULL if none found. >> */ >> static struct drm_sched_entity * >> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >> { >> struct drm_sched_entity *entity; >> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq >> *rq) >> return NULL; >> } >> +/** >> + * drm_sched_rq_select_entity_fifo - Select an entity which could >> provide a job to run >> + * >> + * @rq: scheduler run queue to check. >> + * >> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >> + * >> + * Returns NULL if none found. >> + */ >> +static struct drm_sched_entity * >> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >> +{ >> + struct drm_sched_entity *tmp, *entity = NULL; >> + ktime_t oldest_ts = KTIME_MAX; >> + struct drm_sched_job *sched_job; >> + >> + spin_lock(&rq->lock); >> + >> + list_for_each_entry(tmp, &rq->entities, list) { >> + >> + if (drm_sched_entity_is_ready(tmp)) { >> + sched_job = >> to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >> + >> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >> + oldest_ts = sched_job->submit_ts; >> + entity = tmp; >> + } >> + } >> + } >> + >> + if (entity) { >> + rq->current_entity = entity; >> + reinit_completion(&entity->entity_idle); >> + } > > That should probably be a separate function or at least outside of > this here. > > Apart from that totally straight forward implementation. Any idea how > much extra overhead that is? > > Regards, > Christian. Well, memory wise you have the extra long for each job struct for the time stamp, and then for each next job extraction you have to iterate the entire rq to find the next entity with oldest job so always linear in number of entitles. Today the worst case is also O(# entities) in case none of them are ready but usually it's not the case. BTW, we could also add some adaptive logic where if you identify that for particular entity jobs are spending too much time (need to define a threshold) in the SW queue waiting you dynamically switch to FIFO policy and when the delays go down (or number of entities drops) you go back to RR. Andrey > >> + >> + spin_unlock(&rq->lock); >> + return entity; >> +} >> + >> /** >> * drm_sched_job_done - complete a job >> * @s_job: pointer to the job which is done >> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler >> *sched) >> /* Kernel run queue has higher priority than normal run queue*/ >> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= >> DRM_SCHED_PRIORITY_MIN; i--) { >> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >> + entity = drm_sched_policy != 1 ? >> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >> + >> if (entity) >> break; >> } >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index addb135eeea6..95865881bfcf 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -314,6 +314,14 @@ struct drm_sched_job { >> /** @last_dependency: tracks @dependencies as they signal */ >> unsigned long last_dependency; >> + >> + /** >> + * @submit_ts: >> + * >> + * Marks job submit time >> + */ >> + ktime_t submit_ts; >> + >> }; >> static inline bool drm_sched_invalidate_job(struct drm_sched_job >> *s_job, > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky 2022-08-23 12:15 ` Christian König @ 2022-08-23 16:58 ` Luben Tuikov 2022-08-23 18:13 ` Andrey Grodzovsky 2022-08-24 8:29 ` Michel Dänzer 2 siblings, 1 reply; 14+ messages in thread From: Luben Tuikov @ 2022-08-23 16:58 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx Inlined: On 2022-08-22 16:09, Andrey Grodzovsky wrote: > Poblem: Given many entities competing for same rq on ^Problem > same scheduler an uncceptabliy long wait time for some ^unacceptably > jobs waiting stuck in rq before being picked up are > observed (seen using GPUVis). > The issue is due to Round Robin policy used by scheduler > to pick up the next entity for execution. Under stress > of many entities and long job queus within entity some ^queues > jobs could be stack for very long time in it's entity's > queue before being popped from the queue and executed > while for other entites with samller job queues a job ^entities; smaller > might execute ealier even though that job arrived later ^earlier > then the job in the long queue. > > Fix: > Add FIFO selection policy to entites in RQ, chose next enitity > on rq in such order that if job on one entity arrived > ealrier then job on another entity the first job will start > executing ealier regardless of the length of the entity's job > queue. > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> > Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 2 + > drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- > include/drm/gpu_scheduler.h | 8 +++ > 3 files changed, 71 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 6b25b2f4f5a3..3bb7f69306ef 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) > atomic_inc(entity->rq->sched->score); > WRITE_ONCE(entity->last_user, current->group_leader); > first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); > + sched_job->submit_ts = ktime_get(); > + > > /* first job wakes up scheduler */ > if (first) { > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 68317d3a7a27..c123aa120d06 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -59,6 +59,19 @@ > #define CREATE_TRACE_POINTS > #include "gpu_scheduler_trace.h" > > + > + > +int drm_sched_policy = -1; > + > +/** > + * DOC: sched_policy (int) > + * Used to override default entites scheduling policy in a run queue. > + */ > +MODULE_PARM_DESC(sched_policy, > + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); > +module_param_named(sched_policy, drm_sched_policy, int, 0444); As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, say the RR. I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. > + > + > #define to_drm_sched_job(sched_job) \ > container_of((sched_job), struct drm_sched_job, queue_node) > > @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > } > > /** > - * drm_sched_rq_select_entity - Select an entity which could provide a job to run > + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run > * > * @rq: scheduler run queue to check. > * > - * Try to find a ready entity, returns NULL if none found. > + * Try to find a ready entity, in round robin manner. > + * > + * Returns NULL if none found. > */ > static struct drm_sched_entity * > -drm_sched_rq_select_entity(struct drm_sched_rq *rq) > +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) > { > struct drm_sched_entity *entity; > > @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) > return NULL; > } > > +/** > + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run > + * > + * @rq: scheduler run queue to check. > + * > + * Try to find a ready entity, based on FIFO order of jobs arrivals. > + * > + * Returns NULL if none found. > + */ > +static struct drm_sched_entity * > +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) > +{ > + struct drm_sched_entity *tmp, *entity = NULL; > + ktime_t oldest_ts = KTIME_MAX; > + struct drm_sched_job *sched_job; > + > + spin_lock(&rq->lock); > + > + list_for_each_entry(tmp, &rq->entities, list) { > + > + if (drm_sched_entity_is_ready(tmp)) { > + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); > + > + if (ktime_before(sched_job->submit_ts, oldest_ts)) { > + oldest_ts = sched_job->submit_ts; > + entity = tmp; > + } > + } > + } Here I think we need an O(1) lookup of the next job to pick out to run. I see a number of optimizations, for instance keeping the current/oldest timestamp in the rq struct itself, or better yet keeping the next job to pick out to run at a head of list (a la timer wheel implementation). For suck an optimization to work, you'd prep things up on job insertion, rather than on job removal/pick to run. I'm also surprised that there is no job transition from one queue to another, as it is picked to run next--for the above optimizations to implemented, you'd want a state transition from (state) queue to queue. Regards, Luben > + > + if (entity) { > + rq->current_entity = entity; > + reinit_completion(&entity->entity_idle); > + } > + > + spin_unlock(&rq->lock); > + return entity; > +} > + > /** > * drm_sched_job_done - complete a job > * @s_job: pointer to the job which is done > @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) > > /* Kernel run queue has higher priority than normal run queue*/ > for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { > - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); > + entity = drm_sched_policy != 1 ? > + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : > + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); > + > if (entity) > break; > } > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index addb135eeea6..95865881bfcf 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -314,6 +314,14 @@ struct drm_sched_job { > > /** @last_dependency: tracks @dependencies as they signal */ > unsigned long last_dependency; > + > + /** > + * @submit_ts: > + * > + * Marks job submit time > + */ > + ktime_t submit_ts; > + > }; > > static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 16:58 ` Luben Tuikov @ 2022-08-23 18:13 ` Andrey Grodzovsky 2022-08-23 18:30 ` Luben Tuikov 0 siblings, 1 reply; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-23 18:13 UTC (permalink / raw) To: Luben Tuikov, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-23 12:58, Luben Tuikov wrote: > Inlined: > > On 2022-08-22 16:09, Andrey Grodzovsky wrote: >> Poblem: Given many entities competing for same rq on > ^Problem > >> same scheduler an uncceptabliy long wait time for some > ^unacceptably > >> jobs waiting stuck in rq before being picked up are >> observed (seen using GPUVis). >> The issue is due to Round Robin policy used by scheduler >> to pick up the next entity for execution. Under stress >> of many entities and long job queus within entity some > ^queues > >> jobs could be stack for very long time in it's entity's >> queue before being popped from the queue and executed >> while for other entites with samller job queues a job > ^entities; smaller > >> might execute ealier even though that job arrived later > ^earlier > >> then the job in the long queue. >> >> Fix: >> Add FIFO selection policy to entites in RQ, chose next enitity >> on rq in such order that if job on one entity arrived >> ealrier then job on another entity the first job will start >> executing ealier regardless of the length of the entity's job >> queue. >> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >> --- >> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >> include/drm/gpu_scheduler.h | 8 +++ >> 3 files changed, 71 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >> index 6b25b2f4f5a3..3bb7f69306ef 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >> atomic_inc(entity->rq->sched->score); >> WRITE_ONCE(entity->last_user, current->group_leader); >> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >> + sched_job->submit_ts = ktime_get(); >> + >> >> /* first job wakes up scheduler */ >> if (first) { >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 68317d3a7a27..c123aa120d06 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -59,6 +59,19 @@ >> #define CREATE_TRACE_POINTS >> #include "gpu_scheduler_trace.h" >> >> + >> + >> +int drm_sched_policy = -1; >> + >> +/** >> + * DOC: sched_policy (int) >> + * Used to override default entites scheduling policy in a run queue. >> + */ >> +MODULE_PARM_DESC(sched_policy, >> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >> +module_param_named(sched_policy, drm_sched_policy, int, 0444); > As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, > say the RR. > > I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. Christian is not against it, just against adding 'auto' here - like the default. > >> + >> + >> #define to_drm_sched_job(sched_job) \ >> container_of((sched_job), struct drm_sched_job, queue_node) >> >> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >> } >> >> /** >> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >> * >> * @rq: scheduler run queue to check. >> * >> - * Try to find a ready entity, returns NULL if none found. >> + * Try to find a ready entity, in round robin manner. >> + * >> + * Returns NULL if none found. >> */ >> static struct drm_sched_entity * >> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >> { >> struct drm_sched_entity *entity; >> >> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >> return NULL; >> } >> >> +/** >> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >> + * >> + * @rq: scheduler run queue to check. >> + * >> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >> + * >> + * Returns NULL if none found. >> + */ >> +static struct drm_sched_entity * >> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >> +{ >> + struct drm_sched_entity *tmp, *entity = NULL; >> + ktime_t oldest_ts = KTIME_MAX; >> + struct drm_sched_job *sched_job; >> + >> + spin_lock(&rq->lock); >> + >> + list_for_each_entry(tmp, &rq->entities, list) { >> + >> + if (drm_sched_entity_is_ready(tmp)) { >> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >> + >> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >> + oldest_ts = sched_job->submit_ts; >> + entity = tmp; >> + } >> + } >> + } > Here I think we need an O(1) lookup of the next job to pick out to run. > I see a number of optimizations, for instance keeping the current/oldest > timestamp in the rq struct itself, This was my original design with rb tree based min heap per rq based on time stamp of the oldest job waiting in each entity's job queue (head of entity's SPCP job queue). But how in this case you record the timestamps of all the jobs waiting in entity's the SPCP queue behind the head job ? If you record only the oldest job and more jobs come you have no place to store their timestamps and so on next job select after current HEAD how you will know who came before or after between 2 job queues of of 2 entities ? > or better yet keeping the next job > to pick out to run at a head of list (a la timer wheel implementation). > For suck an optimization to work, you'd prep things up on job insertion, rather > than on job removal/pick to run. I looked at timer wheel and I don't see how this applies here - it assumes you know before job submission your deadline time for job's execution to start - which we don't so I don't see how we can use it. This seems more suitable for real time scheduler implementation where you have a hard requirement to execute a job by some specific time T. I also mentioned a list, obviously there is the brute force solution of just ordering all jobs in one giant list and get naturally a FIFO ordering this way with O(1) insertions and extractions but I assume we don't want one giant jobs queue for all the entities to avoid more dependeies between them (like locking the entire list when one specific entity is killed and needs to remove it's jobs from SW queue). > > I'm also surprised that there is no job transition from one queue to another, > as it is picked to run next--for the above optimizations to implemented, you'd > want a state transition from (state) queue to queue. Not sure what you have in mind here ? How this helps ? Andrey > > Regards, > Luben > In my origianl design >> + >> + if (entity) { >> + rq->current_entity = entity; >> + reinit_completion(&entity->entity_idle); >> + } >> + >> + spin_unlock(&rq->lock); >> + return entity; >> +} >> + >> /** >> * drm_sched_job_done - complete a job >> * @s_job: pointer to the job which is done >> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >> >> /* Kernel run queue has higher priority than normal run queue*/ >> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >> + entity = drm_sched_policy != 1 ? >> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >> + >> if (entity) >> break; >> } >> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >> index addb135eeea6..95865881bfcf 100644 >> --- a/include/drm/gpu_scheduler.h >> +++ b/include/drm/gpu_scheduler.h >> @@ -314,6 +314,14 @@ struct drm_sched_job { >> >> /** @last_dependency: tracks @dependencies as they signal */ >> unsigned long last_dependency; >> + >> + /** >> + * @submit_ts: >> + * >> + * Marks job submit time >> + */ >> + ktime_t submit_ts; >> + >> }; >> >> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 18:13 ` Andrey Grodzovsky @ 2022-08-23 18:30 ` Luben Tuikov 2022-08-23 18:57 ` Andrey Grodzovsky 0 siblings, 1 reply; 14+ messages in thread From: Luben Tuikov @ 2022-08-23 18:30 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-23 14:13, Andrey Grodzovsky wrote: > > On 2022-08-23 12:58, Luben Tuikov wrote: >> Inlined: >> >> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>> Poblem: Given many entities competing for same rq on >> ^Problem >> >>> same scheduler an uncceptabliy long wait time for some >> ^unacceptably >> >>> jobs waiting stuck in rq before being picked up are >>> observed (seen using GPUVis). >>> The issue is due to Round Robin policy used by scheduler >>> to pick up the next entity for execution. Under stress >>> of many entities and long job queus within entity some >> ^queues >> >>> jobs could be stack for very long time in it's entity's >>> queue before being popped from the queue and executed >>> while for other entites with samller job queues a job >> ^entities; smaller >> >>> might execute ealier even though that job arrived later >> ^earlier >> >>> then the job in the long queue. >>> >>> Fix: >>> Add FIFO selection policy to entites in RQ, chose next enitity >>> on rq in such order that if job on one entity arrived >>> ealrier then job on another entity the first job will start >>> executing ealier regardless of the length of the entity's job >>> queue. >>> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>> --- >>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>> include/drm/gpu_scheduler.h | 8 +++ >>> 3 files changed, 71 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>> atomic_inc(entity->rq->sched->score); >>> WRITE_ONCE(entity->last_user, current->group_leader); >>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>> + sched_job->submit_ts = ktime_get(); >>> + >>> >>> /* first job wakes up scheduler */ >>> if (first) { >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index 68317d3a7a27..c123aa120d06 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -59,6 +59,19 @@ >>> #define CREATE_TRACE_POINTS >>> #include "gpu_scheduler_trace.h" >>> >>> + >>> + >>> +int drm_sched_policy = -1; >>> + >>> +/** >>> + * DOC: sched_policy (int) >>> + * Used to override default entites scheduling policy in a run queue. >>> + */ >>> +MODULE_PARM_DESC(sched_policy, >>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >> say the RR. >> >> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. > > > Christian is not against it, just against adding 'auto' here - like the > default. Exactly what I said. Also, I still think an O(1) scheduling (picking next to run) should be what we strive for in such a FIFO patch implementation. A FIFO mechanism is by it's nature an O(1) mechanism for picking the next element. Regards, Luben > > >> >>> + >>> + >>> #define to_drm_sched_job(sched_job) \ >>> container_of((sched_job), struct drm_sched_job, queue_node) >>> >>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>> } >>> >>> /** >>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>> * >>> * @rq: scheduler run queue to check. >>> * >>> - * Try to find a ready entity, returns NULL if none found. >>> + * Try to find a ready entity, in round robin manner. >>> + * >>> + * Returns NULL if none found. >>> */ >>> static struct drm_sched_entity * >>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>> { >>> struct drm_sched_entity *entity; >>> >>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>> return NULL; >>> } >>> >>> +/** >>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>> + * >>> + * @rq: scheduler run queue to check. >>> + * >>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>> + * >>> + * Returns NULL if none found. >>> + */ >>> +static struct drm_sched_entity * >>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>> +{ >>> + struct drm_sched_entity *tmp, *entity = NULL; >>> + ktime_t oldest_ts = KTIME_MAX; >>> + struct drm_sched_job *sched_job; >>> + >>> + spin_lock(&rq->lock); >>> + >>> + list_for_each_entry(tmp, &rq->entities, list) { >>> + >>> + if (drm_sched_entity_is_ready(tmp)) { >>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>> + >>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>> + oldest_ts = sched_job->submit_ts; >>> + entity = tmp; >>> + } >>> + } >>> + } >> Here I think we need an O(1) lookup of the next job to pick out to run. >> I see a number of optimizations, for instance keeping the current/oldest >> timestamp in the rq struct itself, > > > This was my original design with rb tree based min heap per rq based on > time stamp of > the oldest job waiting in each entity's job queue (head of entity's SPCP > job queue). But how in this > case you record the timestamps of all the jobs waiting in entity's the > SPCP queue behind the head job ? > If you record only the oldest job and more jobs come you have no place > to store their timestamps and so > on next job select after current HEAD how you will know who came before > or after between 2 job queues of > of 2 entities ? > > >> or better yet keeping the next job >> to pick out to run at a head of list (a la timer wheel implementation). >> For suck an optimization to work, you'd prep things up on job insertion, rather >> than on job removal/pick to run. > > > I looked at timer wheel and I don't see how this applies here - it > assumes you know before > job submission your deadline time for job's execution to start - which > we don't so I don't see > how we can use it. This seems more suitable for real time scheduler > implementation where you > have a hard requirement to execute a job by some specific time T. > > I also mentioned a list, obviously there is the brute force solution of > just ordering all jobs in one giant list and get > naturally a FIFO ordering this way with O(1) insertions and extractions > but I assume we don't want one giant jobs queue > for all the entities to avoid more dependeies between them (like locking > the entire list when one specific entity is killed and > needs to remove it's jobs from SW queue). > >> >> I'm also surprised that there is no job transition from one queue to another, >> as it is picked to run next--for the above optimizations to implemented, you'd >> want a state transition from (state) queue to queue. > > > Not sure what you have in mind here ? How this helps ? > > Andrey > > >> >> Regards, >> Luben >> > > In my origianl design > >>> + >>> + if (entity) { >>> + rq->current_entity = entity; >>> + reinit_completion(&entity->entity_idle); >>> + } >>> + >>> + spin_unlock(&rq->lock); >>> + return entity; >>> +} >>> + >>> /** >>> * drm_sched_job_done - complete a job >>> * @s_job: pointer to the job which is done >>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>> >>> /* Kernel run queue has higher priority than normal run queue*/ >>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>> + entity = drm_sched_policy != 1 ? >>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>> + >>> if (entity) >>> break; >>> } >>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>> index addb135eeea6..95865881bfcf 100644 >>> --- a/include/drm/gpu_scheduler.h >>> +++ b/include/drm/gpu_scheduler.h >>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>> >>> /** @last_dependency: tracks @dependencies as they signal */ >>> unsigned long last_dependency; >>> + >>> + /** >>> + * @submit_ts: >>> + * >>> + * Marks job submit time >>> + */ >>> + ktime_t submit_ts; >>> + >>> }; >>> >>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, Regards, -- Luben ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 18:30 ` Luben Tuikov @ 2022-08-23 18:57 ` Andrey Grodzovsky 2022-08-23 21:37 ` Luben Tuikov 0 siblings, 1 reply; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-23 18:57 UTC (permalink / raw) To: Luben Tuikov, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-23 14:30, Luben Tuikov wrote: > > On 2022-08-23 14:13, Andrey Grodzovsky wrote: >> On 2022-08-23 12:58, Luben Tuikov wrote: >>> Inlined: >>> >>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>> Poblem: Given many entities competing for same rq on >>> ^Problem >>> >>>> same scheduler an uncceptabliy long wait time for some >>> ^unacceptably >>> >>>> jobs waiting stuck in rq before being picked up are >>>> observed (seen using GPUVis). >>>> The issue is due to Round Robin policy used by scheduler >>>> to pick up the next entity for execution. Under stress >>>> of many entities and long job queus within entity some >>> ^queues >>> >>>> jobs could be stack for very long time in it's entity's >>>> queue before being popped from the queue and executed >>>> while for other entites with samller job queues a job >>> ^entities; smaller >>> >>>> might execute ealier even though that job arrived later >>> ^earlier >>> >>>> then the job in the long queue. >>>> >>>> Fix: >>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>> on rq in such order that if job on one entity arrived >>>> ealrier then job on another entity the first job will start >>>> executing ealier regardless of the length of the entity's job >>>> queue. >>>> >>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>> --- >>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>> include/drm/gpu_scheduler.h | 8 +++ >>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>> atomic_inc(entity->rq->sched->score); >>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>> + sched_job->submit_ts = ktime_get(); >>>> + >>>> >>>> /* first job wakes up scheduler */ >>>> if (first) { >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 68317d3a7a27..c123aa120d06 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -59,6 +59,19 @@ >>>> #define CREATE_TRACE_POINTS >>>> #include "gpu_scheduler_trace.h" >>>> >>>> + >>>> + >>>> +int drm_sched_policy = -1; >>>> + >>>> +/** >>>> + * DOC: sched_policy (int) >>>> + * Used to override default entites scheduling policy in a run queue. >>>> + */ >>>> +MODULE_PARM_DESC(sched_policy, >>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>> say the RR. >>> >>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >> >> Christian is not against it, just against adding 'auto' here - like the >> default. > Exactly what I said. > > Also, I still think an O(1) scheduling (picking next to run) should be > what we strive for in such a FIFO patch implementation. > A FIFO mechanism is by it's nature an O(1) mechanism for picking the next > element. > > Regards, > Luben The only solution i see for this now is keeping a global per rq jobs list parallel to SPCP queue per entity - we use this list when we switch to FIFO scheduling, we can even start building it ONLY when we switch to FIFO building it gradually as more jobs come. Do you have other solution in mind ? Andrey > >> >>>> + >>>> + >>>> #define to_drm_sched_job(sched_job) \ >>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>> >>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>> } >>>> >>>> /** >>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>> * >>>> * @rq: scheduler run queue to check. >>>> * >>>> - * Try to find a ready entity, returns NULL if none found. >>>> + * Try to find a ready entity, in round robin manner. >>>> + * >>>> + * Returns NULL if none found. >>>> */ >>>> static struct drm_sched_entity * >>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>> { >>>> struct drm_sched_entity *entity; >>>> >>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>> return NULL; >>>> } >>>> >>>> +/** >>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>> + * >>>> + * @rq: scheduler run queue to check. >>>> + * >>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>> + * >>>> + * Returns NULL if none found. >>>> + */ >>>> +static struct drm_sched_entity * >>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>> +{ >>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>> + ktime_t oldest_ts = KTIME_MAX; >>>> + struct drm_sched_job *sched_job; >>>> + >>>> + spin_lock(&rq->lock); >>>> + >>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>> + >>>> + if (drm_sched_entity_is_ready(tmp)) { >>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>> + >>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>> + oldest_ts = sched_job->submit_ts; >>>> + entity = tmp; >>>> + } >>>> + } >>>> + } >>> Here I think we need an O(1) lookup of the next job to pick out to run. >>> I see a number of optimizations, for instance keeping the current/oldest >>> timestamp in the rq struct itself, >> >> This was my original design with rb tree based min heap per rq based on >> time stamp of >> the oldest job waiting in each entity's job queue (head of entity's SPCP >> job queue). But how in this >> case you record the timestamps of all the jobs waiting in entity's the >> SPCP queue behind the head job ? >> If you record only the oldest job and more jobs come you have no place >> to store their timestamps and so >> on next job select after current HEAD how you will know who came before >> or after between 2 job queues of >> of 2 entities ? >> >> >>> or better yet keeping the next job >>> to pick out to run at a head of list (a la timer wheel implementation). >>> For suck an optimization to work, you'd prep things up on job insertion, rather >>> than on job removal/pick to run. >> >> I looked at timer wheel and I don't see how this applies here - it >> assumes you know before >> job submission your deadline time for job's execution to start - which >> we don't so I don't see >> how we can use it. This seems more suitable for real time scheduler >> implementation where you >> have a hard requirement to execute a job by some specific time T. >> >> I also mentioned a list, obviously there is the brute force solution of >> just ordering all jobs in one giant list and get >> naturally a FIFO ordering this way with O(1) insertions and extractions >> but I assume we don't want one giant jobs queue >> for all the entities to avoid more dependeies between them (like locking >> the entire list when one specific entity is killed and >> needs to remove it's jobs from SW queue). >> >>> I'm also surprised that there is no job transition from one queue to another, >>> as it is picked to run next--for the above optimizations to implemented, you'd >>> want a state transition from (state) queue to queue. >> >> Not sure what you have in mind here ? How this helps ? >> >> Andrey >> >> >>> Regards, >>> Luben >>> >> In my origianl design >> >>>> + >>>> + if (entity) { >>>> + rq->current_entity = entity; >>>> + reinit_completion(&entity->entity_idle); >>>> + } >>>> + >>>> + spin_unlock(&rq->lock); >>>> + return entity; >>>> +} >>>> + >>>> /** >>>> * drm_sched_job_done - complete a job >>>> * @s_job: pointer to the job which is done >>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>> >>>> /* Kernel run queue has higher priority than normal run queue*/ >>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>> + entity = drm_sched_policy != 1 ? >>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>> + >>>> if (entity) >>>> break; >>>> } >>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>> index addb135eeea6..95865881bfcf 100644 >>>> --- a/include/drm/gpu_scheduler.h >>>> +++ b/include/drm/gpu_scheduler.h >>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>> >>>> /** @last_dependency: tracks @dependencies as they signal */ >>>> unsigned long last_dependency; >>>> + >>>> + /** >>>> + * @submit_ts: >>>> + * >>>> + * Marks job submit time >>>> + */ >>>> + ktime_t submit_ts; >>>> + >>>> }; >>>> >>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, > Regards, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 18:57 ` Andrey Grodzovsky @ 2022-08-23 21:37 ` Luben Tuikov 2022-08-24 16:21 ` Andrey Grodzovsky 0 siblings, 1 reply; 14+ messages in thread From: Luben Tuikov @ 2022-08-23 21:37 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-23 14:57, Andrey Grodzovsky wrote: > On 2022-08-23 14:30, Luben Tuikov wrote: > >> >> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>> Inlined: >>>> >>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>> Poblem: Given many entities competing for same rq on >>>> ^Problem >>>> >>>>> same scheduler an uncceptabliy long wait time for some >>>> ^unacceptably >>>> >>>>> jobs waiting stuck in rq before being picked up are >>>>> observed (seen using GPUVis). >>>>> The issue is due to Round Robin policy used by scheduler >>>>> to pick up the next entity for execution. Under stress >>>>> of many entities and long job queus within entity some >>>> ^queues >>>> >>>>> jobs could be stack for very long time in it's entity's >>>>> queue before being popped from the queue and executed >>>>> while for other entites with samller job queues a job >>>> ^entities; smaller >>>> >>>>> might execute ealier even though that job arrived later >>>> ^earlier >>>> >>>>> then the job in the long queue. >>>>> >>>>> Fix: >>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>> on rq in such order that if job on one entity arrived >>>>> ealrier then job on another entity the first job will start >>>>> executing ealier regardless of the length of the entity's job >>>>> queue. >>>>> >>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>> atomic_inc(entity->rq->sched->score); >>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>> + sched_job->submit_ts = ktime_get(); >>>>> + >>>>> >>>>> /* first job wakes up scheduler */ >>>>> if (first) { >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -59,6 +59,19 @@ >>>>> #define CREATE_TRACE_POINTS >>>>> #include "gpu_scheduler_trace.h" >>>>> >>>>> + >>>>> + >>>>> +int drm_sched_policy = -1; >>>>> + >>>>> +/** >>>>> + * DOC: sched_policy (int) >>>>> + * Used to override default entites scheduling policy in a run queue. >>>>> + */ >>>>> +MODULE_PARM_DESC(sched_policy, >>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>> say the RR. >>>> >>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>> >>> Christian is not against it, just against adding 'auto' here - like the >>> default. >> Exactly what I said. >> >> Also, I still think an O(1) scheduling (picking next to run) should be >> what we strive for in such a FIFO patch implementation. >> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >> element. >> >> Regards, >> Luben > > > The only solution i see for this now is keeping a global per rq jobs > list parallel to SPCP queue per entity - we use this list when we switch > to FIFO scheduling, we can even start building it ONLY when we switch > to FIFO building it gradually as more jobs come. Do you have other solution > in mind ? The idea is to "sort" on insertion, not on picking the next one to run. cont'd below: > > Andrey > >> >>> >>>>> + >>>>> + >>>>> #define to_drm_sched_job(sched_job) \ >>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>> >>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>> } >>>>> >>>>> /** >>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>> * >>>>> * @rq: scheduler run queue to check. >>>>> * >>>>> - * Try to find a ready entity, returns NULL if none found. >>>>> + * Try to find a ready entity, in round robin manner. >>>>> + * >>>>> + * Returns NULL if none found. >>>>> */ >>>>> static struct drm_sched_entity * >>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>> { >>>>> struct drm_sched_entity *entity; >>>>> >>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>> return NULL; >>>>> } >>>>> >>>>> +/** >>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>> + * >>>>> + * @rq: scheduler run queue to check. >>>>> + * >>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>> + * >>>>> + * Returns NULL if none found. >>>>> + */ >>>>> +static struct drm_sched_entity * >>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>> +{ >>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>> + struct drm_sched_job *sched_job; >>>>> + >>>>> + spin_lock(&rq->lock); >>>>> + >>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>> + >>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>> + >>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>> + oldest_ts = sched_job->submit_ts; >>>>> + entity = tmp; >>>>> + } >>>>> + } >>>>> + } >>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>> I see a number of optimizations, for instance keeping the current/oldest >>>> timestamp in the rq struct itself, >>> >>> This was my original design with rb tree based min heap per rq based on >>> time stamp of >>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>> job queue). But how in this >>> case you record the timestamps of all the jobs waiting in entity's the >>> SPCP queue behind the head job ? >>> If you record only the oldest job and more jobs come you have no place >>> to store their timestamps and so >>> on next job select after current HEAD how you will know who came before >>> or after between 2 job queues of >>> of 2 entities ? >>> >>> >>>> or better yet keeping the next job >>>> to pick out to run at a head of list (a la timer wheel implementation). >>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>> than on job removal/pick to run. >>> >>> I looked at timer wheel and I don't see how this applies here - it >>> assumes you know before >>> job submission your deadline time for job's execution to start - which >>> we don't so I don't see >>> how we can use it. This seems more suitable for real time scheduler >>> implementation where you >>> have a hard requirement to execute a job by some specific time T. In a timer wheel you instantly know the "soonest" job to run--it's naturally your "next" job, regardless of in what order the timers were added and what their timeout time is. >>> >>> I also mentioned a list, obviously there is the brute force solution of >>> just ordering all jobs in one giant list and get >>> naturally a FIFO ordering this way with O(1) insertions and extractions >>> but I assume we don't want one giant jobs queue >>> for all the entities to avoid more dependeies between them (like locking >>> the entire list when one specific entity is killed and >>> needs to remove it's jobs from SW queue). You can also have a list of list pointers. It'd be trivial to remove a whole list from the main list, by simply removing an element--akin to locking out a rq, or should you need to edit the rq's entity list. >>> >>>> I'm also surprised that there is no job transition from one queue to another, >>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>> want a state transition from (state) queue to queue. >>> >>> Not sure what you have in mind here ? How this helps ? I think I've explained this a few times now--each list represents a state and a job/entity travels through lists as it travels through states, which states more or less represent the state of execution in the hardware--it could be as simple as incoming --> pending --> done. It allows a finer grain when resetting the hardware (should the hardware allow it). Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised to find out none such distinction existed--that's all. Don't fixate on this. Regards, Luben >>> >>> Andrey >>> >>> >>>> Regards, >>>> Luben >>>> >>> In my origianl design >>> >>>>> + >>>>> + if (entity) { >>>>> + rq->current_entity = entity; >>>>> + reinit_completion(&entity->entity_idle); >>>>> + } >>>>> + >>>>> + spin_unlock(&rq->lock); >>>>> + return entity; >>>>> +} >>>>> + >>>>> /** >>>>> * drm_sched_job_done - complete a job >>>>> * @s_job: pointer to the job which is done >>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>> >>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>> + entity = drm_sched_policy != 1 ? >>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>> + >>>>> if (entity) >>>>> break; >>>>> } >>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>> index addb135eeea6..95865881bfcf 100644 >>>>> --- a/include/drm/gpu_scheduler.h >>>>> +++ b/include/drm/gpu_scheduler.h >>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>> >>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>> unsigned long last_dependency; >>>>> + >>>>> + /** >>>>> + * @submit_ts: >>>>> + * >>>>> + * Marks job submit time >>>>> + */ >>>>> + ktime_t submit_ts; >>>>> + >>>>> }; >>>>> >>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >> Regards, Regards, -- Luben ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-23 21:37 ` Luben Tuikov @ 2022-08-24 16:21 ` Andrey Grodzovsky 2022-08-25 2:29 ` Luben Tuikov 0 siblings, 1 reply; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-24 16:21 UTC (permalink / raw) To: Luben Tuikov, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx [-- Attachment #1: Type: text/plain, Size: 13060 bytes --] On 2022-08-23 17:37, Luben Tuikov wrote: > > On 2022-08-23 14:57, Andrey Grodzovsky wrote: >> On 2022-08-23 14:30, Luben Tuikov wrote: >> >>> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>>> Inlined: >>>>> >>>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>>> Poblem: Given many entities competing for same rq on >>>>> ^Problem >>>>> >>>>>> same scheduler an uncceptabliy long wait time for some >>>>> ^unacceptably >>>>> >>>>>> jobs waiting stuck in rq before being picked up are >>>>>> observed (seen using GPUVis). >>>>>> The issue is due to Round Robin policy used by scheduler >>>>>> to pick up the next entity for execution. Under stress >>>>>> of many entities and long job queus within entity some >>>>> ^queues >>>>> >>>>>> jobs could be stack for very long time in it's entity's >>>>>> queue before being popped from the queue and executed >>>>>> while for other entites with samller job queues a job >>>>> ^entities; smaller >>>>> >>>>>> might execute ealier even though that job arrived later >>>>> ^earlier >>>>> >>>>>> then the job in the long queue. >>>>>> >>>>>> Fix: >>>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>>> on rq in such order that if job on one entity arrived >>>>>> ealrier then job on another entity the first job will start >>>>>> executing ealier regardless of the length of the entity's job >>>>>> queue. >>>>>> >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>>> atomic_inc(entity->rq->sched->score); >>>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>>> + sched_job->submit_ts = ktime_get(); >>>>>> + >>>>>> >>>>>> /* first job wakes up scheduler */ >>>>>> if (first) { >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> @@ -59,6 +59,19 @@ >>>>>> #define CREATE_TRACE_POINTS >>>>>> #include "gpu_scheduler_trace.h" >>>>>> >>>>>> + >>>>>> + >>>>>> +int drm_sched_policy = -1; >>>>>> + >>>>>> +/** >>>>>> + * DOC: sched_policy (int) >>>>>> + * Used to override default entites scheduling policy in a run queue. >>>>>> + */ >>>>>> +MODULE_PARM_DESC(sched_policy, >>>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>>> say the RR. >>>>> >>>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>>> Christian is not against it, just against adding 'auto' here - like the >>>> default. >>> Exactly what I said. >>> >>> Also, I still think an O(1) scheduling (picking next to run) should be >>> what we strive for in such a FIFO patch implementation. >>> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >>> element. >>> >>> Regards, >>> Luben >> >> The only solution i see for this now is keeping a global per rq jobs >> list parallel to SPCP queue per entity - we use this list when we switch >> to FIFO scheduling, we can even start building it ONLY when we switch >> to FIFO building it gradually as more jobs come. Do you have other solution >> in mind ? > The idea is to "sort" on insertion, not on picking the next one to run. > > cont'd below: > >> Andrey >> >>>>>> + >>>>>> + >>>>>> #define to_drm_sched_job(sched_job) \ >>>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>>> >>>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>> } >>>>>> >>>>>> /** >>>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>>> * >>>>>> * @rq: scheduler run queue to check. >>>>>> * >>>>>> - * Try to find a ready entity, returns NULL if none found. >>>>>> + * Try to find a ready entity, in round robin manner. >>>>>> + * >>>>>> + * Returns NULL if none found. >>>>>> */ >>>>>> static struct drm_sched_entity * >>>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>>> { >>>>>> struct drm_sched_entity *entity; >>>>>> >>>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>> return NULL; >>>>>> } >>>>>> >>>>>> +/** >>>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>>> + * >>>>>> + * @rq: scheduler run queue to check. >>>>>> + * >>>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>>> + * >>>>>> + * Returns NULL if none found. >>>>>> + */ >>>>>> +static struct drm_sched_entity * >>>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>>> +{ >>>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>>> + struct drm_sched_job *sched_job; >>>>>> + >>>>>> + spin_lock(&rq->lock); >>>>>> + >>>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>>> + >>>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>>> + >>>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>>> + oldest_ts = sched_job->submit_ts; >>>>>> + entity = tmp; >>>>>> + } >>>>>> + } >>>>>> + } >>>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>>> I see a number of optimizations, for instance keeping the current/oldest >>>>> timestamp in the rq struct itself, >>>> This was my original design with rb tree based min heap per rq based on >>>> time stamp of >>>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>>> job queue). But how in this >>>> case you record the timestamps of all the jobs waiting in entity's the >>>> SPCP queue behind the head job ? >>>> If you record only the oldest job and more jobs come you have no place >>>> to store their timestamps and so >>>> on next job select after current HEAD how you will know who came before >>>> or after between 2 job queues of >>>> of 2 entities ? >>>> >>>> >>>>> or better yet keeping the next job >>>>> to pick out to run at a head of list (a la timer wheel implementation). >>>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>>> than on job removal/pick to run. >>>> I looked at timer wheel and I don't see how this applies here - it >>>> assumes you know before >>>> job submission your deadline time for job's execution to start - which >>>> we don't so I don't see >>>> how we can use it. This seems more suitable for real time scheduler >>>> implementation where you >>>> have a hard requirement to execute a job by some specific time T. > In a timer wheel you instantly know the "soonest" job to run--it's naturally > your "next" job, regardless of in what order the timers were added and what > their timeout time is. > >>>> I also mentioned a list, obviously there is the brute force solution of >>>> just ordering all jobs in one giant list and get >>>> naturally a FIFO ordering this way with O(1) insertions and extractions >>>> but I assume we don't want one giant jobs queue >>>> for all the entities to avoid more dependeies between them (like locking >>>> the entire list when one specific entity is killed and >>>> needs to remove it's jobs from SW queue). > You can also have a list of list pointers. It'd be trivial to remove a whole > list from the main list, by simply removing an element--akin to locking out a rq, > or should you need to edit the rq's entity list. So you do mean some kind of FIFO list. I really would want to avoid maintaining an extra data structure, we already have jobs stored in entity SPSC queue, and now we will have to add to a job struct a pointer to another, rq wide FIFO list, seems to me like a recipe for problems. Thinking more about it, if I do let each job have it's original submission timestamp stored I can go back to my original design of storing sched entities themself in min heap structure based on timestamp of the next job to run in the entity (head of job list), this way we get O(1) extraction of next job to run and it will still be FIFO. The cost will be O(log(# entites in rq)) for updating the min heap on each job extraction based on next head timestamp. Still better then linear. On the downside we need to maintain an rb tree structure to store the entitles in parallel to holding them in linear list for round robin as it's tpday. I am attaching my original patch to give sense of it with TODO section were I added this new code. There are actually some corner cases with empty SPCP queue becoming full and sampling not the head of the queue but the next one is what we need but i think it's solvable. In general it seems to me that before doing more complicated design we actually need to measure and see if there is really a substantial performance hit compared to current RR or even compared to possible O(1) extraction solution. No point to complicate design if we don't get significant performance improvement from it. > >>>>> I'm also surprised that there is no job transition from one queue to another, >>>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>>> want a state transition from (state) queue to queue. >>>> Not sure what you have in mind here ? How this helps ? > I think I've explained this a few times now--each list represents a state and a job/entity > travels through lists as it travels through states, which states more or less represent > the state of execution in the hardware--it could be as simple as incoming --> pending --> done. > > It allows a finer grain when resetting the hardware (should the hardware allow it). > > Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised > to find out none such distinction existed--that's all. Don't fixate on this. > > Regards, > Luben Right, so this one is a separate topic regarding the pending job list refactoring which we need to discuss and fix separately - i have TODO to come back to your patches from a year ago or so which were addressing this. Probably will try to get to this after finishing this work. Andrey > >>>> Andrey >>>> >>>> >>>>> Regards, >>>>> Luben >>>>> >>>> In my origianl design >>>> >>>>>> + >>>>>> + if (entity) { >>>>>> + rq->current_entity = entity; >>>>>> + reinit_completion(&entity->entity_idle); >>>>>> + } >>>>>> + >>>>>> + spin_unlock(&rq->lock); >>>>>> + return entity; >>>>>> +} >>>>>> + >>>>>> /** >>>>>> * drm_sched_job_done - complete a job >>>>>> * @s_job: pointer to the job which is done >>>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>>> >>>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>>> + entity = drm_sched_policy != 1 ? >>>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>>> + >>>>>> if (entity) >>>>>> break; >>>>>> } >>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>>> index addb135eeea6..95865881bfcf 100644 >>>>>> --- a/include/drm/gpu_scheduler.h >>>>>> +++ b/include/drm/gpu_scheduler.h >>>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>>> >>>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>>> unsigned long last_dependency; >>>>>> + >>>>>> + /** >>>>>> + * @submit_ts: >>>>>> + * >>>>>> + * Marks job submit time >>>>>> + */ >>>>>> + ktime_t submit_ts; >>>>>> + >>>>>> }; >>>>>> >>>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >>> Regards, > Regards, [-- Attachment #2: 0001-drm-sced-Add-FIFO-sched-policy-to-rq-v7.patch --] [-- Type: text/x-patch, Size: 9622 bytes --] From c7fad79b1790e5a4f43e95c390ebf4e638662fe4 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Date: Fri, 29 Jul 2022 12:26:50 -0400 Subject: drm/sced: Add FIFO sched policy to rq v7 Also add enblement flag. Add ordering based on TS Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> --- drivers/gpu/drm/scheduler/sched_entity.c | 6 ++ drivers/gpu/drm/scheduler/sched_main.c | 109 ++++++++++++++++++++++- include/drm/gpu_scheduler.h | 38 ++++++++ 3 files changed, 150 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 191c56064f19..6eb495dec09f 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -33,6 +33,8 @@ #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) +extern int drm_sched_policy; + /** * drm_sched_entity_init - Init a context entity used by scheduler when * submit to HW ring. @@ -73,6 +75,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity, entity->priority = priority; entity->sched_list = num_sched_list > 1 ? sched_list : NULL; entity->last_scheduled = NULL; + RB_CLEAR_NODE(&entity->rb_tree_node); if(num_sched_list) entity->rq = &sched_list[0]->sched_rq[entity->priority]; @@ -443,6 +446,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) smp_wmb(); spsc_queue_pop(&entity->job_queue); + return sched_job; } @@ -507,6 +511,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) atomic_inc(entity->rq->sched->score); WRITE_ONCE(entity->last_user, current->group_leader); first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); + sched_job->submit_ts = ktime_get(); /* first job wakes up scheduler */ if (first) { @@ -518,6 +523,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) DRM_ERROR("Trying to push to a killed entity\n"); return; } + drm_sched_rq_add_entity(entity->rq, entity); spin_unlock(&entity->rq_lock); drm_sched_wakeup(entity->rq->sched); diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index c5437ee03e3f..530f7f3e2aea 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -62,6 +62,42 @@ #define to_drm_sched_job(sched_job) \ container_of((sched_job), struct drm_sched_job, queue_node) +int drm_sched_policy = 1; + +/** + * DOC: sched_policy (int) + * Used to override default entites scheduling policy in a run queue. + */ +MODULE_PARM_DESC(sched_policy, + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); +module_param_named(sched_policy, drm_sched_policy, int, 0444); + +static inline void drm_sched_rq_update_fifo_locked(struct drm_sched_rq *rq, + struct drm_sched_entity *entity, + bool remove_only) +{ + if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { + rb_erase_cached(&entity->rb_tree_node, &rq->rb_tree_root); + RB_CLEAR_NODE(&entity->rb_tree_node); + } + + if (remove_only) + return; + + /* + * TODO - In case there is next job pending to run use it's TS to update + * the entity location in min heap. Otherwise just thow it to the 'back of + * the line' + */ + if (drm_sched_entity_is_ready(entity)) + entity->oldest_job_waiting = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); + else + entity->oldest_job_waiting = ktime_get(); + + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, + drm_sched_entity_compare_earlier); +} + /** * drm_sched_rq_init - initialize a given run queue struct * @@ -75,6 +111,7 @@ static void drm_sched_rq_init(struct drm_gpu_scheduler *sched, { spin_lock_init(&rq->lock); INIT_LIST_HEAD(&rq->entities); + rq->rb_tree_root = RB_ROOT_CACHED; rq->current_entity = NULL; rq->sched = sched; } @@ -92,9 +129,15 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, { if (!list_empty(&entity->list)) return; + spin_lock(&rq->lock); + atomic_inc(rq->sched->score); list_add_tail(&entity->list, &rq->entities); + + if (drm_sched_policy == 1) + drm_sched_rq_update_fifo_locked(entity->rq, entity, false); + spin_unlock(&rq->lock); } @@ -111,23 +154,32 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, { if (list_empty(&entity->list)) return; + spin_lock(&rq->lock); + + atomic_dec(rq->sched->score); list_del_init(&entity->list); + if (rq->current_entity == entity) rq->current_entity = NULL; + + if (drm_sched_policy == 1) + drm_sched_rq_update_fifo_locked(entity->rq, entity, true); + spin_unlock(&rq->lock); } + /** - * drm_sched_rq_select_entity - Select an entity which could provide a job to run + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run * * @rq: scheduler run queue to check. * * Try to find a ready entity, returns NULL if none found. */ static struct drm_sched_entity * -drm_sched_rq_select_entity(struct drm_sched_rq *rq) +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) { struct drm_sched_entity *entity; @@ -163,6 +215,54 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) return NULL; } + +/** + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run + * + * @rq: scheduler run queue to check. + * + * Try to find a ready entity, returns NULL if none found. + */ +static struct drm_sched_entity * +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) +{ + struct drm_sched_entity *first, *entity = NULL; + struct rb_node *rb; + spin_lock(&rq->lock); + + rb = rb_first_cached(&rq->rb_tree_root); + if (!rb) + goto out; + + first = rb_entry((rb), struct drm_sched_entity, rb_tree_node); + entity = first; + + while(true){ + + /* Update entity's TS for the FIFO and update the FIFO accordingly */ + drm_sched_rq_update_fifo_locked(entity->rq, entity, false); + + if (drm_sched_entity_is_ready(entity)) { + rq->current_entity = entity; + reinit_completion(&entity->entity_idle); + break; + } + + rb = rb_first_cached(&rq->rb_tree_root); + entity = rb_entry((rb), struct drm_sched_entity, rb_tree_node); + + /* We completed full cycle */ + if (!drm_sched_entity_is_ready(entity) && entity == first) { + entity = NULL; + break; + } + } + + out: + spin_unlock(&rq->lock); + return entity; +} + /** * drm_sched_job_done - complete a job * @s_job: pointer to the job which is done @@ -592,6 +692,7 @@ int drm_sched_job_init(struct drm_sched_job *job, struct drm_sched_entity *entity, void *owner) { + drm_sched_entity_select_rq(entity); if (!entity->rq) return -ENOENT; @@ -801,7 +902,9 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) /* Kernel run queue has higher priority than normal run queue*/ for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); + entity = drm_sched_policy != 1 ? + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); if (entity) break; } diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 944f83ef9f2e..1c841a24b5c0 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -196,6 +196,21 @@ struct drm_sched_entity { * drm_sched_entity_fini(). */ struct completion entity_idle; + + /** + * @oldest_job_waiting: + * + * Marks earliest job waiting in SW queue + */ + ktime_t oldest_job_waiting; + + /** + * @rb_tree_node: + * + * To insert this entity into time based priority queue + */ + struct rb_node rb_tree_node; + }; /** @@ -205,6 +220,7 @@ struct drm_sched_entity { * @sched: the scheduler to which this rq belongs to. * @entities: list of the entities to be scheduled. * @current_entity: the entity which is to be scheduled. + * @rb_tree_root: root of time based priory queue of entites for FIFO scheduling * * Run queue is a set of entities scheduling command submissions for * one specific ring. It implements the scheduling policy that selects @@ -215,6 +231,7 @@ struct drm_sched_rq { struct drm_gpu_scheduler *sched; struct list_head entities; struct drm_sched_entity *current_entity; + struct rb_root_cached rb_tree_root; }; /** @@ -258,6 +275,14 @@ struct drm_sched_fence { * @owner: job owner for debugging */ void *owner; + + /** + * @submit_ts: + * + * Marks job submit time + */ + ktime_t submit_ts; + }; struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f); @@ -501,6 +526,19 @@ void drm_sched_rq_add_entity(struct drm_sched_rq *rq, void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, struct drm_sched_entity *entity); +void drm_sched_rq_update_fifo(struct drm_sched_rq *rq, + struct drm_sched_entity *entity, + bool remove_only); + +static __always_inline bool drm_sched_entity_compare_earlier(struct rb_node *a, + const struct rb_node *b) +{ + struct drm_sched_entity *ent_a = rb_entry((a), struct drm_sched_entity, rb_tree_node); + struct drm_sched_entity *ent_b = rb_entry((b), struct drm_sched_entity, rb_tree_node); + + return ktime_before(ent_a->oldest_job_waiting, ent_b->oldest_job_waiting); +} + int drm_sched_entity_init(struct drm_sched_entity *entity, enum drm_sched_priority priority, struct drm_gpu_scheduler **sched_list, -- 2.25.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-24 16:21 ` Andrey Grodzovsky @ 2022-08-25 2:29 ` Luben Tuikov 2022-08-25 13:49 ` Andrey Grodzovsky 2022-08-25 13:49 ` Andrey Grodzovsky 0 siblings, 2 replies; 14+ messages in thread From: Luben Tuikov @ 2022-08-25 2:29 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx Inlined: On 2022-08-24 12:21, Andrey Grodzovsky wrote: > > On 2022-08-23 17:37, Luben Tuikov wrote: >> >> On 2022-08-23 14:57, Andrey Grodzovsky wrote: >>> On 2022-08-23 14:30, Luben Tuikov wrote: >>> >>>> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>>>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>>>> Inlined: >>>>>> >>>>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>>>> Poblem: Given many entities competing for same rq on >>>>>> ^Problem >>>>>> >>>>>>> same scheduler an uncceptabliy long wait time for some >>>>>> ^unacceptably >>>>>> >>>>>>> jobs waiting stuck in rq before being picked up are >>>>>>> observed (seen using GPUVis). >>>>>>> The issue is due to Round Robin policy used by scheduler >>>>>>> to pick up the next entity for execution. Under stress >>>>>>> of many entities and long job queus within entity some >>>>>> ^queues >>>>>> >>>>>>> jobs could be stack for very long time in it's entity's >>>>>>> queue before being popped from the queue and executed >>>>>>> while for other entites with samller job queues a job >>>>>> ^entities; smaller >>>>>> >>>>>>> might execute ealier even though that job arrived later >>>>>> ^earlier >>>>>> >>>>>>> then the job in the long queue. >>>>>>> >>>>>>> Fix: >>>>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>>>> on rq in such order that if job on one entity arrived >>>>>>> ealrier then job on another entity the first job will start >>>>>>> executing ealier regardless of the length of the entity's job >>>>>>> queue. >>>>>>> >>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>>>>> --- >>>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>>>> atomic_inc(entity->rq->sched->score); >>>>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>>>> + sched_job->submit_ts = ktime_get(); >>>>>>> + >>>>>>> >>>>>>> /* first job wakes up scheduler */ >>>>>>> if (first) { >>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>> @@ -59,6 +59,19 @@ >>>>>>> #define CREATE_TRACE_POINTS >>>>>>> #include "gpu_scheduler_trace.h" >>>>>>> >>>>>>> + >>>>>>> + >>>>>>> +int drm_sched_policy = -1; >>>>>>> + >>>>>>> +/** >>>>>>> + * DOC: sched_policy (int) >>>>>>> + * Used to override default entites scheduling policy in a run queue. >>>>>>> + */ >>>>>>> +MODULE_PARM_DESC(sched_policy, >>>>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>>>> say the RR. >>>>>> >>>>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>>>> Christian is not against it, just against adding 'auto' here - like the >>>>> default. >>>> Exactly what I said. >>>> >>>> Also, I still think an O(1) scheduling (picking next to run) should be >>>> what we strive for in such a FIFO patch implementation. >>>> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >>>> element. >>>> >>>> Regards, >>>> Luben >>> >>> The only solution i see for this now is keeping a global per rq jobs >>> list parallel to SPCP queue per entity - we use this list when we switch >>> to FIFO scheduling, we can even start building it ONLY when we switch >>> to FIFO building it gradually as more jobs come. Do you have other solution >>> in mind ? >> The idea is to "sort" on insertion, not on picking the next one to run. >> >> cont'd below: >> >>> Andrey >>> >>>>>>> + >>>>>>> + >>>>>>> #define to_drm_sched_job(sched_job) \ >>>>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>>>> >>>>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>>> } >>>>>>> >>>>>>> /** >>>>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>>>> * >>>>>>> * @rq: scheduler run queue to check. >>>>>>> * >>>>>>> - * Try to find a ready entity, returns NULL if none found. >>>>>>> + * Try to find a ready entity, in round robin manner. >>>>>>> + * >>>>>>> + * Returns NULL if none found. >>>>>>> */ >>>>>>> static struct drm_sched_entity * >>>>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>>>> { >>>>>>> struct drm_sched_entity *entity; >>>>>>> >>>>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>> return NULL; >>>>>>> } >>>>>>> >>>>>>> +/** >>>>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>>>> + * >>>>>>> + * @rq: scheduler run queue to check. >>>>>>> + * >>>>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>>>> + * >>>>>>> + * Returns NULL if none found. >>>>>>> + */ >>>>>>> +static struct drm_sched_entity * >>>>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>>>> +{ >>>>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>>>> + struct drm_sched_job *sched_job; >>>>>>> + >>>>>>> + spin_lock(&rq->lock); >>>>>>> + >>>>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>>>> + >>>>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>>>> + >>>>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>>>> + oldest_ts = sched_job->submit_ts; >>>>>>> + entity = tmp; >>>>>>> + } >>>>>>> + } >>>>>>> + } >>>>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>>>> I see a number of optimizations, for instance keeping the current/oldest >>>>>> timestamp in the rq struct itself, >>>>> This was my original design with rb tree based min heap per rq based on >>>>> time stamp of >>>>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>>>> job queue). But how in this >>>>> case you record the timestamps of all the jobs waiting in entity's the >>>>> SPCP queue behind the head job ? >>>>> If you record only the oldest job and more jobs come you have no place >>>>> to store their timestamps and so >>>>> on next job select after current HEAD how you will know who came before >>>>> or after between 2 job queues of >>>>> of 2 entities ? >>>>> >>>>> >>>>>> or better yet keeping the next job >>>>>> to pick out to run at a head of list (a la timer wheel implementation). >>>>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>>>> than on job removal/pick to run. >>>>> I looked at timer wheel and I don't see how this applies here - it >>>>> assumes you know before >>>>> job submission your deadline time for job's execution to start - which >>>>> we don't so I don't see >>>>> how we can use it. This seems more suitable for real time scheduler >>>>> implementation where you >>>>> have a hard requirement to execute a job by some specific time T. >> In a timer wheel you instantly know the "soonest" job to run--it's naturally >> your "next" job, regardless of in what order the timers were added and what >> their timeout time is. >> >>>>> I also mentioned a list, obviously there is the brute force solution of >>>>> just ordering all jobs in one giant list and get >>>>> naturally a FIFO ordering this way with O(1) insertions and extractions >>>>> but I assume we don't want one giant jobs queue >>>>> for all the entities to avoid more dependeies between them (like locking >>>>> the entire list when one specific entity is killed and >>>>> needs to remove it's jobs from SW queue). >> You can also have a list of list pointers. It'd be trivial to remove a whole >> list from the main list, by simply removing an element--akin to locking out a rq, >> or should you need to edit the rq's entity list. > > > So you do mean some kind of FIFO list. I really would want to avoid > maintaining an > extra data structure, we already have jobs stored in entity SPSC queue, > and now we will have to add > to a job struct a pointer to another, rq wide FIFO list, seems to me > like a recipe for problems. Maybe. > > Thinking more about it, if I do let each job have it's original > submission timestamp stored I can go back to my original design > of storing sched entities themself in min heap structure based on > timestamp of the next job to run in the entity > (head of job list), this way we get O(1) extraction of next job to run > and it will still be FIFO. The cost will be O(log(# entites in rq)) > for updating the min heap on each job extraction based on next head > timestamp. Still better then linear. On the downside we need to maintain > an rb tree structure O(1) would be ideal, and the "sorting" can be had when the jobs are submitted, which could take O(log N), as in sorting. But O(log N) on a balanced tree, such as you're using an R-B tree for a min heap, is a great metric and way faster than O(N). > to store the entitles in parallel to holding them in linear list for > round robin as it's tpday. You could choose to store one way or another, exclusively, depending on what scheduling policy had been chosen. > I am attaching my original patch to give sense of > it with TODO section were I added this new code. There are actually some > corner cases with empty SPCP queue becoming full and sampling not the > head of the > queue but the next one is what we need but i think it's solvable. Your WIP patch looks good. > > In general it seems to me that before doing more complicated design we > actually need to measure and see if there is really a substantial > performance hit compared to current RR > or even compared to possible O(1) extraction solution. No point to > complicate design if we don't get significant performance improvement > from it. WLOG, assuming all jobs are ready to be executed, then, for graphics, I'd imagine we'd want to pick the next job to run in O(1) time complexity. However O(log N) is still good and preferable. O(N) is slow. FWIW, the RR we have right now is O(1) given the job readiness assumption above. > > >> >>>>>> I'm also surprised that there is no job transition from one queue to another, >>>>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>>>> want a state transition from (state) queue to queue. >>>>> Not sure what you have in mind here ? How this helps ? >> I think I've explained this a few times now--each list represents a state and a job/entity >> travels through lists as it travels through states, which states more or less represent >> the state of execution in the hardware--it could be as simple as incoming --> pending --> done. >> >> It allows a finer grain when resetting the hardware (should the hardware allow it). >> >> Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised >> to find out none such distinction existed--that's all. Don't fixate on this. >> >> Regards, >> Luben > > > Right, so this one is a separate topic regarding the pending job list > refactoring which we need to discuss and fix separately - i have TODO to > come back > to your patches from a year ago or so which were addressing this. > Probably will try to get to this after finishing this work. Excellent! Regards, Luben > > Andrey > > >> >>>>> Andrey >>>>> >>>>> >>>>>> Regards, >>>>>> Luben >>>>>> >>>>> In my origianl design >>>>> >>>>>>> + >>>>>>> + if (entity) { >>>>>>> + rq->current_entity = entity; >>>>>>> + reinit_completion(&entity->entity_idle); >>>>>>> + } >>>>>>> + >>>>>>> + spin_unlock(&rq->lock); >>>>>>> + return entity; >>>>>>> +} >>>>>>> + >>>>>>> /** >>>>>>> * drm_sched_job_done - complete a job >>>>>>> * @s_job: pointer to the job which is done >>>>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>>>> >>>>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>>>> + entity = drm_sched_policy != 1 ? >>>>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>>>> + >>>>>>> if (entity) >>>>>>> break; >>>>>>> } >>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>>>> index addb135eeea6..95865881bfcf 100644 >>>>>>> --- a/include/drm/gpu_scheduler.h >>>>>>> +++ b/include/drm/gpu_scheduler.h >>>>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>>>> >>>>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>>>> unsigned long last_dependency; >>>>>>> + >>>>>>> + /** >>>>>>> + * @submit_ts: >>>>>>> + * >>>>>>> + * Marks job submit time >>>>>>> + */ >>>>>>> + ktime_t submit_ts; >>>>>>> + >>>>>>> }; >>>>>>> >>>>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >>>> Regards, >> Regards, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-25 2:29 ` Luben Tuikov @ 2022-08-25 13:49 ` Andrey Grodzovsky 2022-08-25 13:49 ` Andrey Grodzovsky 1 sibling, 0 replies; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-25 13:49 UTC (permalink / raw) To: Luben Tuikov, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-24 22:29, Luben Tuikov wrote: > Inlined: > > On 2022-08-24 12:21, Andrey Grodzovsky wrote: >> On 2022-08-23 17:37, Luben Tuikov wrote: >>> On 2022-08-23 14:57, Andrey Grodzovsky wrote: >>>> On 2022-08-23 14:30, Luben Tuikov wrote: >>>> >>>>> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>>>>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>>>>> Inlined: >>>>>>> >>>>>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>>>>> Poblem: Given many entities competing for same rq on >>>>>>> ^Problem >>>>>>> >>>>>>>> same scheduler an uncceptabliy long wait time for some >>>>>>> ^unacceptably >>>>>>> >>>>>>>> jobs waiting stuck in rq before being picked up are >>>>>>>> observed (seen using GPUVis). >>>>>>>> The issue is due to Round Robin policy used by scheduler >>>>>>>> to pick up the next entity for execution. Under stress >>>>>>>> of many entities and long job queus within entity some >>>>>>> ^queues >>>>>>> >>>>>>>> jobs could be stack for very long time in it's entity's >>>>>>>> queue before being popped from the queue and executed >>>>>>>> while for other entites with samller job queues a job >>>>>>> ^entities; smaller >>>>>>> >>>>>>>> might execute ealier even though that job arrived later >>>>>>> ^earlier >>>>>>> >>>>>>>> then the job in the long queue. >>>>>>>> >>>>>>>> Fix: >>>>>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>>>>> on rq in such order that if job on one entity arrived >>>>>>>> ealrier then job on another entity the first job will start >>>>>>>> executing ealier regardless of the length of the entity's job >>>>>>>> queue. >>>>>>>> >>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>>>>> atomic_inc(entity->rq->sched->score); >>>>>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>>>>> + sched_job->submit_ts = ktime_get(); >>>>>>>> + >>>>>>>> >>>>>>>> /* first job wakes up scheduler */ >>>>>>>> if (first) { >>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> @@ -59,6 +59,19 @@ >>>>>>>> #define CREATE_TRACE_POINTS >>>>>>>> #include "gpu_scheduler_trace.h" >>>>>>>> >>>>>>>> + >>>>>>>> + >>>>>>>> +int drm_sched_policy = -1; >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * DOC: sched_policy (int) >>>>>>>> + * Used to override default entites scheduling policy in a run queue. >>>>>>>> + */ >>>>>>>> +MODULE_PARM_DESC(sched_policy, >>>>>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>>>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>>>>> say the RR. >>>>>>> >>>>>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>>>>> Christian is not against it, just against adding 'auto' here - like the >>>>>> default. >>>>> Exactly what I said. >>>>> >>>>> Also, I still think an O(1) scheduling (picking next to run) should be >>>>> what we strive for in such a FIFO patch implementation. >>>>> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >>>>> element. >>>>> >>>>> Regards, >>>>> Luben >>>> The only solution i see for this now is keeping a global per rq jobs >>>> list parallel to SPCP queue per entity - we use this list when we switch >>>> to FIFO scheduling, we can even start building it ONLY when we switch >>>> to FIFO building it gradually as more jobs come. Do you have other solution >>>> in mind ? >>> The idea is to "sort" on insertion, not on picking the next one to run. >>> >>> cont'd below: >>> >>>> Andrey >>>> >>>>>>>> + >>>>>>>> + >>>>>>>> #define to_drm_sched_job(sched_job) \ >>>>>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>>>>> >>>>>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>>>> } >>>>>>>> >>>>>>>> /** >>>>>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>>>>> * >>>>>>>> * @rq: scheduler run queue to check. >>>>>>>> * >>>>>>>> - * Try to find a ready entity, returns NULL if none found. >>>>>>>> + * Try to find a ready entity, in round robin manner. >>>>>>>> + * >>>>>>>> + * Returns NULL if none found. >>>>>>>> */ >>>>>>>> static struct drm_sched_entity * >>>>>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>>>>> { >>>>>>>> struct drm_sched_entity *entity; >>>>>>>> >>>>>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>>> return NULL; >>>>>>>> } >>>>>>>> >>>>>>>> +/** >>>>>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>>>>> + * >>>>>>>> + * @rq: scheduler run queue to check. >>>>>>>> + * >>>>>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>>>>> + * >>>>>>>> + * Returns NULL if none found. >>>>>>>> + */ >>>>>>>> +static struct drm_sched_entity * >>>>>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>>>>> +{ >>>>>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>>>>> + struct drm_sched_job *sched_job; >>>>>>>> + >>>>>>>> + spin_lock(&rq->lock); >>>>>>>> + >>>>>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>>>>> + >>>>>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>>>>> + >>>>>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>>>>> + oldest_ts = sched_job->submit_ts; >>>>>>>> + entity = tmp; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>>>>> I see a number of optimizations, for instance keeping the current/oldest >>>>>>> timestamp in the rq struct itself, >>>>>> This was my original design with rb tree based min heap per rq based on >>>>>> time stamp of >>>>>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>>>>> job queue). But how in this >>>>>> case you record the timestamps of all the jobs waiting in entity's the >>>>>> SPCP queue behind the head job ? >>>>>> If you record only the oldest job and more jobs come you have no place >>>>>> to store their timestamps and so >>>>>> on next job select after current HEAD how you will know who came before >>>>>> or after between 2 job queues of >>>>>> of 2 entities ? >>>>>> >>>>>> >>>>>>> or better yet keeping the next job >>>>>>> to pick out to run at a head of list (a la timer wheel implementation). >>>>>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>>>>> than on job removal/pick to run. >>>>>> I looked at timer wheel and I don't see how this applies here - it >>>>>> assumes you know before >>>>>> job submission your deadline time for job's execution to start - which >>>>>> we don't so I don't see >>>>>> how we can use it. This seems more suitable for real time scheduler >>>>>> implementation where you >>>>>> have a hard requirement to execute a job by some specific time T. >>> In a timer wheel you instantly know the "soonest" job to run--it's naturally >>> your "next" job, regardless of in what order the timers were added and what >>> their timeout time is. >>> >>>>>> I also mentioned a list, obviously there is the brute force solution of >>>>>> just ordering all jobs in one giant list and get >>>>>> naturally a FIFO ordering this way with O(1) insertions and extractions >>>>>> but I assume we don't want one giant jobs queue >>>>>> for all the entities to avoid more dependeies between them (like locking >>>>>> the entire list when one specific entity is killed and >>>>>> needs to remove it's jobs from SW queue). >>> You can also have a list of list pointers. It'd be trivial to remove a whole >>> list from the main list, by simply removing an element--akin to locking out a rq, >>> or should you need to edit the rq's entity list. >> >> So you do mean some kind of FIFO list. I really would want to avoid >> maintaining an >> extra data structure, we already have jobs stored in entity SPSC queue, >> and now we will have to add >> to a job struct a pointer to another, rq wide FIFO list, seems to me >> like a recipe for problems. > Maybe. > >> Thinking more about it, if I do let each job have it's original >> submission timestamp stored I can go back to my original design >> of storing sched entities themself in min heap structure based on >> timestamp of the next job to run in the entity >> (head of job list), this way we get O(1) extraction of next job to run >> and it will still be FIFO. The cost will be O(log(# entites in rq)) >> for updating the min heap on each job extraction based on next head >> timestamp. Still better then linear. On the downside we need to maintain >> an rb tree structure > O(1) would be ideal, and the "sorting" can be had when the jobs are submitted, > which could take O(log N), as in sorting. > > But O(log N) on a balanced tree, such as you're using an R-B tree for a min heap, > is a great metric and way faster than O(N). > >> to store the entitles in parallel to holding them in linear list for >> round robin as it's tpday. > You could choose to store one way or another, exclusively, depending on what > scheduling policy had been chosen. > >> I am attaching my original patch to give sense of >> it with TODO section were I added this new code. There are actually some >> corner cases with empty SPCP queue becoming full and sampling not the >> head of the >> queue but the next one is what we need but i think it's solvable. > Your WIP patch looks good. > >> In general it seems to me that before doing more complicated design we >> actually need to measure and see if there is really a substantial >> performance hit compared to current RR >> or even compared to possible O(1) extraction solution. No point to >> complicate design if we don't get significant performance improvement >> from it. > WLOG, assuming all jobs are ready to be executed, then, for graphics, I'd imagine > we'd want to pick the next job to run in O(1) time complexity. However O(log N) is > still good and preferable. O(N) is slow. FWIW, the RR we have right now is O(1) given > the job readiness assumption above. You convinced it, going to try to rework my original patch. I asked Yunxiang to measure performance delta anyway, just in case this rework get stuck and then we will just use the primitive linear search for the client needs at least. Andrey > >> >>>>>>> I'm also surprised that there is no job transition from one queue to another, >>>>>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>>>>> want a state transition from (state) queue to queue. >>>>>> Not sure what you have in mind here ? How this helps ? >>> I think I've explained this a few times now--each list represents a state and a job/entity >>> travels through lists as it travels through states, which states more or less represent >>> the state of execution in the hardware--it could be as simple as incoming --> pending --> done. >>> >>> It allows a finer grain when resetting the hardware (should the hardware allow it). >>> >>> Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised >>> to find out none such distinction existed--that's all. Don't fixate on this. >>> >>> Regards, >>> Luben >> >> Right, so this one is a separate topic regarding the pending job list >> refactoring which we need to discuss and fix separately - i have TODO to >> come back >> to your patches from a year ago or so which were addressing this. >> Probably will try to get to this after finishing this work. > Excellent! > > Regards, > Luben > >> Andrey >> >> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> Regards, >>>>>>> Luben >>>>>>> >>>>>> In my origianl design >>>>>> >>>>>>>> + >>>>>>>> + if (entity) { >>>>>>>> + rq->current_entity = entity; >>>>>>>> + reinit_completion(&entity->entity_idle); >>>>>>>> + } >>>>>>>> + >>>>>>>> + spin_unlock(&rq->lock); >>>>>>>> + return entity; >>>>>>>> +} >>>>>>>> + >>>>>>>> /** >>>>>>>> * drm_sched_job_done - complete a job >>>>>>>> * @s_job: pointer to the job which is done >>>>>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>>>>> >>>>>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>>>>> + entity = drm_sched_policy != 1 ? >>>>>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>>>>> + >>>>>>>> if (entity) >>>>>>>> break; >>>>>>>> } >>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>>>>> index addb135eeea6..95865881bfcf 100644 >>>>>>>> --- a/include/drm/gpu_scheduler.h >>>>>>>> +++ b/include/drm/gpu_scheduler.h >>>>>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>>>>> >>>>>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>>>>> unsigned long last_dependency; >>>>>>>> + >>>>>>>> + /** >>>>>>>> + * @submit_ts: >>>>>>>> + * >>>>>>>> + * Marks job submit time >>>>>>>> + */ >>>>>>>> + ktime_t submit_ts; >>>>>>>> + >>>>>>>> }; >>>>>>>> >>>>>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >>>>> Regards, >>> Regards, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-25 2:29 ` Luben Tuikov 2022-08-25 13:49 ` Andrey Grodzovsky @ 2022-08-25 13:49 ` Andrey Grodzovsky 1 sibling, 0 replies; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-25 13:49 UTC (permalink / raw) To: Luben Tuikov, dri-devel; +Cc: ckoenig.leichtzumerken, Li Yunxiang, amd-gfx On 2022-08-24 22:29, Luben Tuikov wrote: > Inlined: > > On 2022-08-24 12:21, Andrey Grodzovsky wrote: >> On 2022-08-23 17:37, Luben Tuikov wrote: >>> On 2022-08-23 14:57, Andrey Grodzovsky wrote: >>>> On 2022-08-23 14:30, Luben Tuikov wrote: >>>> >>>>> On 2022-08-23 14:13, Andrey Grodzovsky wrote: >>>>>> On 2022-08-23 12:58, Luben Tuikov wrote: >>>>>>> Inlined: >>>>>>> >>>>>>> On 2022-08-22 16:09, Andrey Grodzovsky wrote: >>>>>>>> Poblem: Given many entities competing for same rq on >>>>>>> ^Problem >>>>>>> >>>>>>>> same scheduler an uncceptabliy long wait time for some >>>>>>> ^unacceptably >>>>>>> >>>>>>>> jobs waiting stuck in rq before being picked up are >>>>>>>> observed (seen using GPUVis). >>>>>>>> The issue is due to Round Robin policy used by scheduler >>>>>>>> to pick up the next entity for execution. Under stress >>>>>>>> of many entities and long job queus within entity some >>>>>>> ^queues >>>>>>> >>>>>>>> jobs could be stack for very long time in it's entity's >>>>>>>> queue before being popped from the queue and executed >>>>>>>> while for other entites with samller job queues a job >>>>>>> ^entities; smaller >>>>>>> >>>>>>>> might execute ealier even though that job arrived later >>>>>>> ^earlier >>>>>>> >>>>>>>> then the job in the long queue. >>>>>>>> >>>>>>>> Fix: >>>>>>>> Add FIFO selection policy to entites in RQ, chose next enitity >>>>>>>> on rq in such order that if job on one entity arrived >>>>>>>> ealrier then job on another entity the first job will start >>>>>>>> executing ealier regardless of the length of the entity's job >>>>>>>> queue. >>>>>>>> >>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> >>>>>>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 + >>>>>>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++-- >>>>>>>> include/drm/gpu_scheduler.h | 8 +++ >>>>>>>> 3 files changed, 71 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> index 6b25b2f4f5a3..3bb7f69306ef 100644 >>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>>>>>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job) >>>>>>>> atomic_inc(entity->rq->sched->score); >>>>>>>> WRITE_ONCE(entity->last_user, current->group_leader); >>>>>>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node); >>>>>>>> + sched_job->submit_ts = ktime_get(); >>>>>>>> + >>>>>>>> >>>>>>>> /* first job wakes up scheduler */ >>>>>>>> if (first) { >>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> index 68317d3a7a27..c123aa120d06 100644 >>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>>>> @@ -59,6 +59,19 @@ >>>>>>>> #define CREATE_TRACE_POINTS >>>>>>>> #include "gpu_scheduler_trace.h" >>>>>>>> >>>>>>>> + >>>>>>>> + >>>>>>>> +int drm_sched_policy = -1; >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * DOC: sched_policy (int) >>>>>>>> + * Used to override default entites scheduling policy in a run queue. >>>>>>>> + */ >>>>>>>> +MODULE_PARM_DESC(sched_policy, >>>>>>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO"); >>>>>>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444); >>>>>>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default, >>>>>>> say the RR. >>>>>>> >>>>>>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above. >>>>>> Christian is not against it, just against adding 'auto' here - like the >>>>>> default. >>>>> Exactly what I said. >>>>> >>>>> Also, I still think an O(1) scheduling (picking next to run) should be >>>>> what we strive for in such a FIFO patch implementation. >>>>> A FIFO mechanism is by it's nature an O(1) mechanism for picking the next >>>>> element. >>>>> >>>>> Regards, >>>>> Luben >>>> The only solution i see for this now is keeping a global per rq jobs >>>> list parallel to SPCP queue per entity - we use this list when we switch >>>> to FIFO scheduling, we can even start building it ONLY when we switch >>>> to FIFO building it gradually as more jobs come. Do you have other solution >>>> in mind ? >>> The idea is to "sort" on insertion, not on picking the next one to run. >>> >>> cont'd below: >>> >>>> Andrey >>>> >>>>>>>> + >>>>>>>> + >>>>>>>> #define to_drm_sched_job(sched_job) \ >>>>>>>> container_of((sched_job), struct drm_sched_job, queue_node) >>>>>>>> >>>>>>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>>>> } >>>>>>>> >>>>>>>> /** >>>>>>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run >>>>>>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run >>>>>>>> * >>>>>>>> * @rq: scheduler run queue to check. >>>>>>>> * >>>>>>>> - * Try to find a ready entity, returns NULL if none found. >>>>>>>> + * Try to find a ready entity, in round robin manner. >>>>>>>> + * >>>>>>>> + * Returns NULL if none found. >>>>>>>> */ >>>>>>>> static struct drm_sched_entity * >>>>>>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq) >>>>>>>> { >>>>>>>> struct drm_sched_entity *entity; >>>>>>>> >>>>>>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq) >>>>>>>> return NULL; >>>>>>>> } >>>>>>>> >>>>>>>> +/** >>>>>>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run >>>>>>>> + * >>>>>>>> + * @rq: scheduler run queue to check. >>>>>>>> + * >>>>>>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals. >>>>>>>> + * >>>>>>>> + * Returns NULL if none found. >>>>>>>> + */ >>>>>>>> +static struct drm_sched_entity * >>>>>>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq) >>>>>>>> +{ >>>>>>>> + struct drm_sched_entity *tmp, *entity = NULL; >>>>>>>> + ktime_t oldest_ts = KTIME_MAX; >>>>>>>> + struct drm_sched_job *sched_job; >>>>>>>> + >>>>>>>> + spin_lock(&rq->lock); >>>>>>>> + >>>>>>>> + list_for_each_entry(tmp, &rq->entities, list) { >>>>>>>> + >>>>>>>> + if (drm_sched_entity_is_ready(tmp)) { >>>>>>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue)); >>>>>>>> + >>>>>>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) { >>>>>>>> + oldest_ts = sched_job->submit_ts; >>>>>>>> + entity = tmp; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + } >>>>>>> Here I think we need an O(1) lookup of the next job to pick out to run. >>>>>>> I see a number of optimizations, for instance keeping the current/oldest >>>>>>> timestamp in the rq struct itself, >>>>>> This was my original design with rb tree based min heap per rq based on >>>>>> time stamp of >>>>>> the oldest job waiting in each entity's job queue (head of entity's SPCP >>>>>> job queue). But how in this >>>>>> case you record the timestamps of all the jobs waiting in entity's the >>>>>> SPCP queue behind the head job ? >>>>>> If you record only the oldest job and more jobs come you have no place >>>>>> to store their timestamps and so >>>>>> on next job select after current HEAD how you will know who came before >>>>>> or after between 2 job queues of >>>>>> of 2 entities ? >>>>>> >>>>>> >>>>>>> or better yet keeping the next job >>>>>>> to pick out to run at a head of list (a la timer wheel implementation). >>>>>>> For suck an optimization to work, you'd prep things up on job insertion, rather >>>>>>> than on job removal/pick to run. >>>>>> I looked at timer wheel and I don't see how this applies here - it >>>>>> assumes you know before >>>>>> job submission your deadline time for job's execution to start - which >>>>>> we don't so I don't see >>>>>> how we can use it. This seems more suitable for real time scheduler >>>>>> implementation where you >>>>>> have a hard requirement to execute a job by some specific time T. >>> In a timer wheel you instantly know the "soonest" job to run--it's naturally >>> your "next" job, regardless of in what order the timers were added and what >>> their timeout time is. >>> >>>>>> I also mentioned a list, obviously there is the brute force solution of >>>>>> just ordering all jobs in one giant list and get >>>>>> naturally a FIFO ordering this way with O(1) insertions and extractions >>>>>> but I assume we don't want one giant jobs queue >>>>>> for all the entities to avoid more dependeies between them (like locking >>>>>> the entire list when one specific entity is killed and >>>>>> needs to remove it's jobs from SW queue). >>> You can also have a list of list pointers. It'd be trivial to remove a whole >>> list from the main list, by simply removing an element--akin to locking out a rq, >>> or should you need to edit the rq's entity list. >> >> So you do mean some kind of FIFO list. I really would want to avoid >> maintaining an >> extra data structure, we already have jobs stored in entity SPSC queue, >> and now we will have to add >> to a job struct a pointer to another, rq wide FIFO list, seems to me >> like a recipe for problems. > Maybe. > >> Thinking more about it, if I do let each job have it's original >> submission timestamp stored I can go back to my original design >> of storing sched entities themself in min heap structure based on >> timestamp of the next job to run in the entity >> (head of job list), this way we get O(1) extraction of next job to run >> and it will still be FIFO. The cost will be O(log(# entites in rq)) >> for updating the min heap on each job extraction based on next head >> timestamp. Still better then linear. On the downside we need to maintain >> an rb tree structure > O(1) would be ideal, and the "sorting" can be had when the jobs are submitted, > which could take O(log N), as in sorting. > > But O(log N) on a balanced tree, such as you're using an R-B tree for a min heap, > is a great metric and way faster than O(N). > >> to store the entitles in parallel to holding them in linear list for >> round robin as it's tpday. > You could choose to store one way or another, exclusively, depending on what > scheduling policy had been chosen. > >> I am attaching my original patch to give sense of >> it with TODO section were I added this new code. There are actually some >> corner cases with empty SPCP queue becoming full and sampling not the >> head of the >> queue but the next one is what we need but i think it's solvable. > Your WIP patch looks good. > >> In general it seems to me that before doing more complicated design we >> actually need to measure and see if there is really a substantial >> performance hit compared to current RR >> or even compared to possible O(1) extraction solution. No point to >> complicate design if we don't get significant performance improvement >> from it. > WLOG, assuming all jobs are ready to be executed, then, for graphics, I'd imagine > we'd want to pick the next job to run in O(1) time complexity. However O(log N) is > still good and preferable. O(N) is slow. FWIW, the RR we have right now is O(1) given > the job readiness assumption above. You convinced me, going to try to rework my original patch. I asked Yunxiang to measure performance delta anyway, just in case this rework get stuck and then we will just use the primitive linear search for the client needs at least. Andrey > >> >>>>>>> I'm also surprised that there is no job transition from one queue to another, >>>>>>> as it is picked to run next--for the above optimizations to implemented, you'd >>>>>>> want a state transition from (state) queue to queue. >>>>>> Not sure what you have in mind here ? How this helps ? >>> I think I've explained this a few times now--each list represents a state and a job/entity >>> travels through lists as it travels through states, which states more or less represent >>> the state of execution in the hardware--it could be as simple as incoming --> pending --> done. >>> >>> It allows a finer grain when resetting the hardware (should the hardware allow it). >>> >>> Note that this isn't directly related to the O(1) mechanism I brought up here. As I said, I was surprised >>> to find out none such distinction existed--that's all. Don't fixate on this. >>> >>> Regards, >>> Luben >> >> Right, so this one is a separate topic regarding the pending job list >> refactoring which we need to discuss and fix separately - i have TODO to >> come back >> to your patches from a year ago or so which were addressing this. >> Probably will try to get to this after finishing this work. > Excellent! > > Regards, > Luben > >> Andrey >> >> >>>>>> Andrey >>>>>> >>>>>> >>>>>>> Regards, >>>>>>> Luben >>>>>>> >>>>>> In my origianl design >>>>>> >>>>>>>> + >>>>>>>> + if (entity) { >>>>>>>> + rq->current_entity = entity; >>>>>>>> + reinit_completion(&entity->entity_idle); >>>>>>>> + } >>>>>>>> + >>>>>>>> + spin_unlock(&rq->lock); >>>>>>>> + return entity; >>>>>>>> +} >>>>>>>> + >>>>>>>> /** >>>>>>>> * drm_sched_job_done - complete a job >>>>>>>> * @s_job: pointer to the job which is done >>>>>>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched) >>>>>>>> >>>>>>>> /* Kernel run queue has higher priority than normal run queue*/ >>>>>>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { >>>>>>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]); >>>>>>>> + entity = drm_sched_policy != 1 ? >>>>>>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) : >>>>>>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]); >>>>>>>> + >>>>>>>> if (entity) >>>>>>>> break; >>>>>>>> } >>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h >>>>>>>> index addb135eeea6..95865881bfcf 100644 >>>>>>>> --- a/include/drm/gpu_scheduler.h >>>>>>>> +++ b/include/drm/gpu_scheduler.h >>>>>>>> @@ -314,6 +314,14 @@ struct drm_sched_job { >>>>>>>> >>>>>>>> /** @last_dependency: tracks @dependencies as they signal */ >>>>>>>> unsigned long last_dependency; >>>>>>>> + >>>>>>>> + /** >>>>>>>> + * @submit_ts: >>>>>>>> + * >>>>>>>> + * Marks job submit time >>>>>>>> + */ >>>>>>>> + ktime_t submit_ts; >>>>>>>> + >>>>>>>> }; >>>>>>>> >>>>>>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job, >>>>> Regards, >>> Regards, ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky 2022-08-23 12:15 ` Christian König 2022-08-23 16:58 ` Luben Tuikov @ 2022-08-24 8:29 ` Michel Dänzer 2022-08-24 15:06 ` Andrey Grodzovsky 2 siblings, 1 reply; 14+ messages in thread From: Michel Dänzer @ 2022-08-24 8:29 UTC (permalink / raw) To: Andrey Grodzovsky, dri-devel Cc: ckoenig.leichtzumerken, Li Yunxiang, luben.tuikov, amd-gfx On 2022-08-22 22:09, Andrey Grodzovsky wrote: > Poblem: Given many entities competing for same rq on > same scheduler an uncceptabliy long wait time for some > jobs waiting stuck in rq before being picked up are > observed (seen using GPUVis). > The issue is due to Round Robin policy used by scheduler > to pick up the next entity for execution. Under stress > of many entities and long job queus within entity some > jobs could be stack for very long time in it's entity's > queue before being popped from the queue and executed > while for other entites with samller job queues a job > might execute ealier even though that job arrived later > then the job in the long queue. > > Fix: > Add FIFO selection policy to entites in RQ, chose next enitity > on rq in such order that if job on one entity arrived > ealrier then job on another entity the first job will start > executing ealier regardless of the length of the entity's job > queue. Instead of ordering based on when jobs are added, might it be possible to order them based on when they become ready to run? Otherwise it seems possible to e.g. submit a large number of inter-dependent jobs at once, and they would all run before any jobs from another queue get a chance. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq 2022-08-24 8:29 ` Michel Dänzer @ 2022-08-24 15:06 ` Andrey Grodzovsky 0 siblings, 0 replies; 14+ messages in thread From: Andrey Grodzovsky @ 2022-08-24 15:06 UTC (permalink / raw) To: Michel Dänzer, dri-devel Cc: ckoenig.leichtzumerken, Li Yunxiang, luben.tuikov, amd-gfx On 2022-08-24 04:29, Michel Dänzer wrote: > On 2022-08-22 22:09, Andrey Grodzovsky wrote: >> Poblem: Given many entities competing for same rq on >> same scheduler an uncceptabliy long wait time for some >> jobs waiting stuck in rq before being picked up are >> observed (seen using GPUVis). >> The issue is due to Round Robin policy used by scheduler >> to pick up the next entity for execution. Under stress >> of many entities and long job queus within entity some >> jobs could be stack for very long time in it's entity's >> queue before being popped from the queue and executed >> while for other entites with samller job queues a job >> might execute ealier even though that job arrived later >> then the job in the long queue. >> >> Fix: >> Add FIFO selection policy to entites in RQ, chose next enitity >> on rq in such order that if job on one entity arrived >> ealrier then job on another entity the first job will start >> executing ealier regardless of the length of the entity's job >> queue. > Instead of ordering based on when jobs are added, might it be possible to order them based on when they become ready to run? > > Otherwise it seems possible to e.g. submit a large number of inter-dependent jobs at once, and they would all run before any jobs from another queue get a chance. While any of them is not ready (i.e. still having unfulfilled dependency) this job will not be chosen to run (see drm_sched_entity_is_ready). In this scenario if an earlier job from entity E1 is not ready to run it will be skipped and a later job from entity E2 (which is ready) will be chosen to run so E1 job is not blocking E2 job. The moment E1 job does become ready it seems to me logical to let it run ASAP as it's by now it spent the most time of anyone waiting for execution, and I don't think it matters that part of this time was because it waited for dependency job to complete it's run. Andrey > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2022-08-25 13:50 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky 2022-08-23 12:15 ` Christian König 2022-08-23 15:15 ` Andrey Grodzovsky 2022-08-23 16:58 ` Luben Tuikov 2022-08-23 18:13 ` Andrey Grodzovsky 2022-08-23 18:30 ` Luben Tuikov 2022-08-23 18:57 ` Andrey Grodzovsky 2022-08-23 21:37 ` Luben Tuikov 2022-08-24 16:21 ` Andrey Grodzovsky 2022-08-25 2:29 ` Luben Tuikov 2022-08-25 13:49 ` Andrey Grodzovsky 2022-08-25 13:49 ` Andrey Grodzovsky 2022-08-24 8:29 ` Michel Dänzer 2022-08-24 15:06 ` Andrey Grodzovsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).