From mboxrd@z Thu Jan 1 00:00:00 1970 From: zhoucm1 Subject: Re: [PATCH 3/4] drm/scheduler: add new function to get least loaded sched v2 Date: Thu, 2 Aug 2018 14:42:58 +0800 Message-ID: <823db5d9-ec03-7469-0746-a5d9b521d933@amd.com> References: <20180801082002.20696-1-nayan26deshmukh@gmail.com> <20180801082002.20696-3-nayan26deshmukh@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0886574305==" Return-path: In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: amd-gfx-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "amd-gfx" To: Nayan Deshmukh , David1.Zhou-5C7GfCeVMHo@public.gmane.org Cc: Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org, Maling list - DRI developers , amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, =?UTF-8?Q?Christian_K=c3=b6nig?= List-Id: dri-devel@lists.freedesktop.org --===============0886574305== Content-Type: multipart/alternative; boundary="------------3338382B526B0F53E4633636" Content-Language: en-US --------------3338382B526B0F53E4633636 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit On 2018年08月02日 14:01, Nayan Deshmukh wrote: > Hi David, > > On Thu, Aug 2, 2018 at 8:22 AM Zhou, David(ChunMing) > > wrote: > > Another big question: > > I agree the general idea is good to balance scheduler load for > same ring family. > > But, when same entity job run on different scheduler, that means > the later job could be completed ahead of front, Right? > > Really good question. To avoid this senario we do not move an entity > which already has a job in the hardware queue. We only move entities > whose last_scheduled fence has been signalled which means that the > last submitted job of this entity has finished executing. Good handling I missed when reviewing them. Cheers, David Zhou > > Moving an entity which already has a job in the hardware queue will > hinder the dependency optimization that we are using and hence will > not anyway lead to a better performance. I have talked about the issue > in more detail here [1]. Please let me know if you have any more > doubts regarding this. > > Cheers, > Nayan > > [1] > http://ndesh26.github.io/gsoc/2018/06/14/GSoC-Update-A-Curious-Case-of-Dependency-Handling/ > > That will break fence design, later fence must be signaled after > front fence in same fence context. > > Anything I missed? > > Regards, > > David Zhou > > *From:* dri-devel > *On Behalf Of > *Nayan Deshmukh > *Sent:* Thursday, August 02, 2018 12:07 AM > *To:* Grodzovsky, Andrey > > *Cc:* amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > ; Maling list - DRI > developers >; Koenig, Christian > > > *Subject:* Re: [PATCH 3/4] drm/scheduler: add new function to get > least loaded sched v2 > > Yes, that is correct. > > Nayan > > On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky > > wrote: > > Clarification question -  if the run queues belong to different > schedulers they effectively point to different rings, > > it means we allow to move (reschedule) a drm_sched_entity from > one ring > to another - i assume that the idea int the first place, that > > you have a set of HW rings and you can utilize any of them for > your jobs > (like compute rings). Correct ? > > Andrey > > > On 08/01/2018 04:20 AM, Nayan Deshmukh wrote: > > The function selects the run queue from the rq_list with the > > least load. The load is decided by the number of jobs in a > > scheduler. > > > > v2: avoid using atomic read twice consecutively, instead store > >      it locally > > > > Signed-off-by: Nayan Deshmukh > > > --- > >   drivers/gpu/drm/scheduler/gpu_scheduler.c | 25 > +++++++++++++++++++++++++ > >   1 file changed, 25 insertions(+) > > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c > b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > index 375f6f7f6a93..fb4e542660b0 100644 > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > @@ -255,6 +255,31 @@ static bool > drm_sched_entity_is_ready(struct drm_sched_entity *entity) > >       return true; > >   } > > > > +/** > > + * drm_sched_entity_get_free_sched - Get the rq from > rq_list with least load > > + * > > + * @entity: scheduler entity > > + * > > + * Return the pointer to the rq with least load. > > + */ > > +static struct drm_sched_rq * > > +drm_sched_entity_get_free_sched(struct drm_sched_entity > *entity) > > +{ > > +     struct drm_sched_rq *rq = NULL; > > +     unsigned int min_jobs = UINT_MAX, num_jobs; > > +     int i; > > + > > +     for (i = 0; i < entity->num_rq_list; ++i) { > > +             num_jobs = > atomic_read(&entity->rq_list[i]->sched->num_jobs); > > +             if (num_jobs < min_jobs) { > > +                     min_jobs = num_jobs; > > +                     rq = entity->rq_list[i]; > > +             } > > +     } > > + > > +     return rq; > > +} > > + > >   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, > >                                   struct dma_fence_cb *cb) > >   { > --------------3338382B526B0F53E4633636 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: 8bit



On 2018年08月02日 14:01, Nayan Deshmukh wrote:
Hi David,

On Thu, Aug 2, 2018 at 8:22 AM Zhou, David(ChunMing) <David1.Zhou-5C7GfCeVMHo@public.gmane.org> wrote:

Another big question:

I agree the general idea is good to balance scheduler load for same ring family.

But, when same entity job run on different scheduler, that means the later job could be completed ahead of front, Right?

Really good question. To avoid this senario we do not move an entity which already has a job in the hardware queue. We only move entities whose last_scheduled fence has been signalled which means that the last submitted job of this entity has finished executing.
Good handling I missed when reviewing them.

Cheers,
David Zhou

Moving an entity which already has a job in the hardware queue will hinder the dependency optimization that we are using and hence will not anyway lead to a better performance. I have talked about the issue in more detail here [1]. Please let me know if you have any more doubts regarding this.

Cheers,

That will break fence design, later fence must be signaled after front fence in same fence context.

 

Anything I missed?

 

Regards,

David Zhou

 

From: dri-devel <dri-devel-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org> On Behalf Of Nayan Deshmukh
Sent: Thursday, August 02, 2018 12:07 AM
To: Grodzovsky, Andrey <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org>
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org; Maling list - DRI developers <dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>; Koenig, Christian <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
Subject: Re: [PATCH 3/4] drm/scheduler: add new function to get least loaded sched v2

 

Yes, that is correct. 

 

Nayan

 

On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky <Andrey.Grodzovsky-5C7GfCeVMHo@public.gmane.org> wrote:

Clarification question -  if the run queues belong to different
schedulers they effectively point to different rings,

it means we allow to move (reschedule) a drm_sched_entity from one ring
to another - i assume that the idea int the first place, that

you have a set of HW rings and you can utilize any of them for your jobs
(like compute rings). Correct ?

Andrey


On 08/01/2018 04:20 AM, Nayan Deshmukh wrote:
> The function selects the run queue from the rq_list with the
> least load. The load is decided by the number of jobs in a
> scheduler.
>
> v2: avoid using atomic read twice consecutively, instead store
>      it locally
>
> Signed-off-by: Nayan Deshmukh <nayan26deshmukh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 25 +++++++++++++++++++++++++
>   1 file changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 375f6f7f6a93..fb4e542660b0 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -255,6 +255,31 @@ static bool drm_sched_entity_is_ready(struct drm_sched_entity *entity)
>       return true;
>   }
>   
> +/**
> + * drm_sched_entity_get_free_sched - Get the rq from rq_list with least load
> + *
> + * @entity: scheduler entity
> + *
> + * Return the pointer to the rq with least load.
> + */
> +static struct drm_sched_rq *
> +drm_sched_entity_get_free_sched(struct drm_sched_entity *entity)
> +{
> +     struct drm_sched_rq *rq = NULL;
> +     unsigned int min_jobs = UINT_MAX, num_jobs;
> +     int i;
> +
> +     for (i = 0; i < entity->num_rq_list; ++i) {
> +             num_jobs = atomic_read(&entity->rq_list[i]->sched->num_jobs);
> +             if (num_jobs < min_jobs) {
> +                     min_jobs = num_jobs;
> +                     rq = entity->rq_list[i];
> +             }
> +     }
> +
> +     return rq;
> +}
> +
>   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>                                   struct dma_fence_cb *cb)
>   {


--------------3338382B526B0F53E4633636-- --===============0886574305== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KYW1kLWdmeCBt YWlsaW5nIGxpc3QKYW1kLWdmeEBsaXN0cy5mcmVlZGVza3RvcC5vcmcKaHR0cHM6Ly9saXN0cy5m cmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9hbWQtZ2Z4Cg== --===============0886574305==--