dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>,
	dri-devel@lists.freedesktop.org
Cc: Li Yunxiang <Yunxiang.Li@amd.com>,
	luben.tuikov@amd.com, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/sced: Add FIFO policy for scheduler rq
Date: Tue, 23 Aug 2022 14:15:13 +0200	[thread overview]
Message-ID: <ae6aa412-326c-46e3-4cde-8870ded748b8@gmail.com> (raw)
In-Reply-To: <20220822200917.440681-1-andrey.grodzovsky@amd.com>



Am 22.08.22 um 22:09 schrieb Andrey Grodzovsky:
> Poblem: Given many entities competing for same rq on
> same scheduler an uncceptabliy long wait time for some
> jobs waiting stuck in rq before being picked up are
> observed (seen using  GPUVis).
> The issue is due to Round Robin policy used by scheduler
> to pick up the next entity for execution. Under stress
> of many entities and long job queus within entity some
> jobs could be stack for very long time in it's entity's
> queue before being popped from the queue and executed
> while for other entites with samller job queues a job
> might execute ealier even though that job arrived later
> then the job in the long queue.
>
> Fix:
> Add FIFO selection policy to entites in RQ, chose next enitity
> on rq in such order that if job on one entity arrived
> ealrier then job on another entity the first job will start
> executing ealier regardless of the length of the entity's job
> queue.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Tested-by: Li Yunxiang (Teddy) <Yunxiang.Li@amd.com>
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c |  2 +
>   drivers/gpu/drm/scheduler/sched_main.c   | 65 ++++++++++++++++++++++--
>   include/drm/gpu_scheduler.h              |  8 +++
>   3 files changed, 71 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 6b25b2f4f5a3..3bb7f69306ef 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -507,6 +507,8 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
>   	atomic_inc(entity->rq->sched->score);
>   	WRITE_ONCE(entity->last_user, current->group_leader);
>   	first = spsc_queue_push(&entity->job_queue, &sched_job->queue_node);
> +	sched_job->submit_ts = ktime_get();
> +
>   
>   	/* first job wakes up scheduler */
>   	if (first) {
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 68317d3a7a27..c123aa120d06 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -59,6 +59,19 @@
>   #define CREATE_TRACE_POINTS
>   #include "gpu_scheduler_trace.h"
>   
> +
> +
> +int drm_sched_policy = -1;
> +
> +/**
> + * DOC: sched_policy (int)
> + * Used to override default entites scheduling policy in a run queue.
> + */
> +MODULE_PARM_DESC(sched_policy,
> +		"specify schedule policy for entites on a runqueue (-1 = auto(default) value, 0 = Round Robin,1  = use FIFO");

Well we don't really have an autodetect at the moment, so I would drop that.

> +module_param_named(sched_policy, drm_sched_policy, int, 0444);
> +
> +
>   #define to_drm_sched_job(sched_job)		\
>   		container_of((sched_job), struct drm_sched_job, queue_node)
>   
> @@ -120,14 +133,16 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
>   }
>   
>   /**
> - * drm_sched_rq_select_entity - Select an entity which could provide a job to run
> + * drm_sched_rq_select_entity_rr - Select an entity which could provide a job to run
>    *
>    * @rq: scheduler run queue to check.
>    *
> - * Try to find a ready entity, returns NULL if none found.
> + * Try to find a ready entity, in round robin manner.
> + *
> + * Returns NULL if none found.
>    */
>   static struct drm_sched_entity *
> -drm_sched_rq_select_entity(struct drm_sched_rq *rq)
> +drm_sched_rq_select_entity_rr(struct drm_sched_rq *rq)
>   {
>   	struct drm_sched_entity *entity;
>   
> @@ -163,6 +178,45 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
>   	return NULL;
>   }
>   
> +/**
> + * drm_sched_rq_select_entity_fifo - Select an entity which could provide a job to run
> + *
> + * @rq: scheduler run queue to check.
> + *
> + * Try to find a ready entity, based on FIFO order of jobs arrivals.
> + *
> + * Returns NULL if none found.
> + */
> +static struct drm_sched_entity *
> +drm_sched_rq_select_entity_fifo(struct drm_sched_rq *rq)
> +{
> +	struct drm_sched_entity *tmp, *entity = NULL;
> +	ktime_t oldest_ts = KTIME_MAX;
> +	struct drm_sched_job *sched_job;
> +
> +	spin_lock(&rq->lock);
> +
> +	list_for_each_entry(tmp, &rq->entities, list) {
> +
> +		if (drm_sched_entity_is_ready(tmp)) {
> +			sched_job = to_drm_sched_job(spsc_queue_peek(&tmp->job_queue));
> +
> +			if (ktime_before(sched_job->submit_ts, oldest_ts)) {
> +				oldest_ts = sched_job->submit_ts;
> +				entity = tmp;
> +			}
> +		}
> +	}
> +
> +	if (entity) {
> +		rq->current_entity = entity;
> +		reinit_completion(&entity->entity_idle);
> +	}

That should probably be a separate function or at least outside of this 
here.

Apart from that totally straight forward implementation. Any idea how 
much extra overhead that is?

Regards,
Christian.

> +
> +	spin_unlock(&rq->lock);
> +	return entity;
> +}
> +
>   /**
>    * drm_sched_job_done - complete a job
>    * @s_job: pointer to the job which is done
> @@ -804,7 +858,10 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
>   
>   	/* Kernel run queue has higher priority than normal run queue*/
>   	for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) {
> -		entity = drm_sched_rq_select_entity(&sched->sched_rq[i]);
> +		entity = drm_sched_policy != 1 ?
> +				drm_sched_rq_select_entity_rr(&sched->sched_rq[i]) :
> +				drm_sched_rq_select_entity_fifo(&sched->sched_rq[i]);
> +
>   		if (entity)
>   			break;
>   	}
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index addb135eeea6..95865881bfcf 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -314,6 +314,14 @@ struct drm_sched_job {
>   
>   	/** @last_dependency: tracks @dependencies as they signal */
>   	unsigned long			last_dependency;
> +
> +       /**
> +	* @submit_ts:
> +	*
> +	* Marks job submit time
> +	*/
> +       ktime_t				submit_ts;
> +
>   };
>   
>   static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,


  reply	other threads:[~2022-08-23 12:15 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22 20:09 [PATCH] drm/sced: Add FIFO policy for scheduler rq Andrey Grodzovsky
2022-08-23 12:15 ` Christian König [this message]
2022-08-23 15:15   ` Andrey Grodzovsky
2022-08-23 16:58 ` Luben Tuikov
2022-08-23 18:13   ` Andrey Grodzovsky
2022-08-23 18:30     ` Luben Tuikov
2022-08-23 18:57       ` Andrey Grodzovsky
2022-08-23 21:37         ` Luben Tuikov
2022-08-24 16:21           ` Andrey Grodzovsky
2022-08-25  2:29             ` Luben Tuikov
2022-08-25 13:49               ` Andrey Grodzovsky
2022-08-25 13:49               ` Andrey Grodzovsky
2022-08-24  8:29 ` Michel Dänzer
2022-08-24 15:06   ` Andrey Grodzovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae6aa412-326c-46e3-4cde-8870ded748b8@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Yunxiang.Li@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=luben.tuikov@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).