From: Luben Tuikov <luben.tuikov@amd.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
dri-devel@lists.freedesktop.org
Cc: ckoenig.leichtzumerken@gmail.com,
Li Yunxiang <Yunxiang.Li@amd.com>,
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
Date: Tue, 23 Aug 2022 14:30:26 -0400 [thread overview]
Message-ID: <1eadb06c-7d2c-0317-a34a-c219c68b93c3@amd.com> (raw)
In-Reply-To: <bf0791b0-c917-6b6e-75bf-0c16869aa045@amd.com>
On 2022-08-23 14:13, Andrey Grodzovsky wrote:
>
> On 2022-08-23 12:58, Luben Tuikov wrote:
>> Inlined:
>>
>> On 2022-08-22 16:09, Andrey Grodzovsky wrote:
>>> Poblem: Given many entities competing for same rq on
>> ^Problem
>>
>>> same scheduler an uncceptabliy long wait time for some
>> ^unacceptably
>>
>>> jobs waiting stuck in rq before being picked up are
>>> observed (seen using GPUVis).
>>> The issue is due to Round Robin policy used by scheduler
>>> to pick up the next entity for execution. Under stress
>>> of many entities and long job queus within entity some
>> ^queues
>>
>>> jobs could be stack for very long time in it's entity's
>>> queue before being popped from the queue and executed
>>> while for other entites with samller job queues a job
>> ^entities; smaller
>>
>>> might execute ealier even though that job arrived later
>> ^earlier
>>
>>> then the job in the long queue.
>>>
>>> Fix:
>>> Add FIFO selection policy to entites in RQ, chose next enitity
>>> on rq in such order that if job on one entity arrived
>>> ealrier then job on another entity the first job will start
>>> executing ealier regardless of the length of the entity's job
>>> queue.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_entity.c | 2 +
>>> drivers/gpu/drm/scheduler/sched_main.c | 65 ++++++++++++++++++++++--
>>> include/drm/gpu_scheduler.h | 8 +++
>>> 3 files changed, 71 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index 6b25b2f4f5a3..3bb7f69306ef 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>>> atomic_inc(entity->rq->sched->score);
>>> WRITE_ONCE(entity->last_user, current->group_leader);
>>> first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
>>> + sched_job->submit_ts = ktime_get();
>>> +
>>>
>>> /* first job wakes up scheduler */
>>> if (first) {
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 68317d3a7a27..c123aa120d06 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -59,6 +59,19 @@
>>> #define CREATE_TRACE_POINTS
>>> #include "gpu_scheduler_trace.h"
>>>
>>> +
>>> +
>>> +int drm_sched_policy = -1;
>>> +
>>> +/**
>>> + * DOC: sched_policy (int)
>>> + * Used to override default entites scheduling policy in a run queue.
>>> + */
>>> +MODULE_PARM_DESC(sched_policy,
>>> + "specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1 = use FIFO");
>>> +module_param_named(sched_policy, drm_sched_policy, int, 0444);
>> As per Christian's comments, you can drop the "auto" and perhaps leave one as the default,
>> say the RR.
>>
>> I do think it is beneficial to have a module parameter control the scheduling policy, as shown above.
>
>
> Christian is not against it, just against adding 'auto' here - like the
> default.
Exactly what I said.
Also, I still think an O(1) scheduling (picking next to run) should be
what we strive for in such a FIFO patch implementation.
A FIFO mechanism is by it's nature an O(1) mechanism for picking the next
element.
Regards,
Luben
>
>
>>
>>> +
>>> +
>>> #define to_drm_sched_job(sched_job) \
>>> container_of((sched_job), struct drm_sched_job, queue_node)
>>>
>>> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>>> }
>>>
>>> /**
>>> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run
>>> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>>> *
>>> * @rq: scheduler run queue to check.
>>> *
>>> - * Try to find a ready entity, returns NULL if none found.
>>> + * Try to find a ready entity, in round robin manner.
>>> + *
>>> + * Returns NULL if none found.
>>> */
>>> static struct drm_sched_entity *
>>> -drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>>> {
>>> struct drm_sched_entity *entity;
>>>
>>> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>>> return NULL;
>>> }
>>>
>>> +/**
>>> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run
>>> + *
>>> + * @rq: scheduler run queue to check.
>>> + *
>>> + * Try to find a ready entity, based on FIFO order of jobs arrivals.
>>> + *
>>> + * Returns NULL if none found.
>>> + */
>>> +static struct drm_sched_entity *
>>> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
>>> +{
>>> + struct drm_sched_entity *tmp, *entity = NULL;
>>> + ktime_t oldest_ts = KTIME_MAX;
>>> + struct drm_sched_job *sched_job;
>>> +
>>> + spin_lock(&rq->lock);
>>> +
>>> + list_for_each_entry(tmp, &rq->entities, list) {
>>> +
>>> + if (drm_sched_entity_is_ready(tmp)) {
>>> + sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue));
>>> +
>>> + if (ktime_before(sched_job->submit_ts, oldest_ts)) {
>>> + oldest_ts = sched_job->submit_ts;
>>> + entity = tmp;
>>> + }
>>> + }
>>> + }
>> Here I think we need an O(1) lookup of the next job to pick out to run.
>> I see a number of optimizations, for instance keeping the current/oldest
>> timestamp in the rq struct itself,
>
>
> This was my original design with rb tree based min heap per rq based on
> time stamp of
> the oldest job waiting in each entity's job queue (head of entity's SPCP
> job queue). But how in this
> case you record the timestamps of all the jobs waiting in entity's the
> SPCP queue behind the head job ?
> If you record only the oldest job and more jobs come you have no place
> to store their timestamps and so
> on next job select after current HEAD how you will know who came before
> or after between 2 job queues of
> of 2 entities ?
>
>
>> or better yet keeping the next job
>> to pick out to run at a head of list (a la timer wheel implementation).
>> For suck an optimization to work, you'd prep things up on job insertion, rather
>> than on job removal/pick to run.
>
>
> I looked at timer wheel and I don't see how this applies here - it
> assumes you know before
> job submission your deadline time for job's execution to start - which
> we don't so I don't see
> how we can use it. This seems more suitable for real time scheduler
> implementation where you
> have a hard requirement to execute a job by some specific time T.
>
> I also mentioned a list, obviously there is the brute force solution of
> just ordering all jobs in one giant list and get
> naturally a FIFO ordering this way with O(1) insertions and extractions
> but I assume we don't want one giant jobs queue
> for all the entities to avoid more dependeies between them (like locking
> the entire list when one specific entity is killed and
> needs to remove it's jobs from SW queue).
>
>>
>> I'm also surprised that there is no job transition from one queue to another,
>> as it is picked to run next--for the above optimizations to implemented, you'd
>> want a state transition from (state) queue to queue.
>
>
> Not sure what you have in mind here ? How this helps ?
>
> Andrey
>
>
>>
>> Regards,
>> Luben
>>
>
> In my origianl design
>
>>> +
>>> + if (entity) {
>>> + rq->current_entity = entity;
>>> + reinit_completion(&entity->entity_idle);
>>> + }
>>> +
>>> + spin_unlock(&rq->lock);
>>> + return entity;
>>> +}
>>> +
>>> /**
>>> * drm_sched_job_done - complete a job
>>> * @s_job: pointer to the job which is done
>>> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>>>
>>> /* Kernel run queue has higher priority than normal run queue*/
>>> for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
>>> - entity = drm_sched_rq_select_entity(&sched->sched_rq[i]);
>>> + entity = drm_sched_policy != 1 ?
>>> + drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) :
>>> + drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]);
>>> +
>>> if (entity)
>>> break;
>>> }
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index addb135eeea6..95865881bfcf 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -314,6 +314,14 @@ struct drm_sched_job {
>>>
>>> /** @last_dependency: tracks @dependencies as they signal */
>>> unsigned long last_dependency;
>>> +
>>> + /**
>>> + * @submit_ts:
>>> + *
>>> + * Marks job submit time
>>> + */
>>> + ktime_t submit_ts;
>>> +
>>> };
>>>
>>> static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
Regards,
--
Luben
next prev parent reply other threads:[~2022-08-23 18:30 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky
2022-08-23 12:15 ` Christian König
2022-08-23 15:15 ` Andrey Grodzovsky
2022-08-23 16:58 ` Luben Tuikov
2022-08-23 18:13 ` Andrey Grodzovsky
2022-08-23 18:30 ` Luben Tuikov [this message]
2022-08-23 18:57 ` Andrey Grodzovsky
2022-08-23 21:37 ` Luben Tuikov
2022-08-24 16:21 ` Andrey Grodzovsky
2022-08-25 2:29 ` Luben Tuikov
2022-08-25 13:49 ` Andrey Grodzovsky
2022-08-25 13:49 ` Andrey Grodzovsky
2022-08-24 8:29 ` Michel Dänzer
2022-08-24 15:06 ` Andrey Grodzovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1eadb06c-7d2c-0317-a34a-c219c68b93c3@amd.com \
--to=luben.tuikov@amd.com \
--cc=Yunxiang.Li@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=andrey.grodzovsky@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).