From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nayan Deshmukh Subject: Re: [PATCH 3/4] drm/scheduler: add new function to get least loaded sched v2 Date: Thu, 2 Aug 2018 11:31:56 +0530 Message-ID: References: <20180801082002.20696-1-nayan26deshmukh@gmail.com> <20180801082002.20696-3-nayan26deshmukh@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0878455635==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: David1.Zhou@amd.com Cc: Maling list - DRI developers , amd-gfx@lists.freedesktop.org, =?UTF-8?Q?Christian_K=C3=B6nig?= List-Id: dri-devel@lists.freedesktop.org --===============0878455635== Content-Type: multipart/alternative; boundary="0000000000003d891d05726d8f44" --0000000000003d891d05726d8f44 Content-Type: text/plain; charset="UTF-8" Hi David, On Thu, Aug 2, 2018 at 8:22 AM Zhou, David(ChunMing) wrote: > Another big question: > > I agree the general idea is good to balance scheduler load for same ring > family. > > But, when same entity job run on different scheduler, that means the later > job could be completed ahead of front, Right? > Really good question. To avoid this senario we do not move an entity which already has a job in the hardware queue. We only move entities whose last_scheduled fence has been signalled which means that the last submitted job of this entity has finished executing. Moving an entity which already has a job in the hardware queue will hinder the dependency optimization that we are using and hence will not anyway lead to a better performance. I have talked about the issue in more detail here [1]. Please let me know if you have any more doubts regarding this. Cheers, Nayan [1] http://ndesh26.github.io/gsoc/2018/06/14/GSoC-Update-A-Curious-Case-of-Dependency-Handling/ That will break fence design, later fence must be signaled after front > fence in same fence context. > > > > Anything I missed? > > > > Regards, > > David Zhou > > > > *From:* dri-devel *On Behalf Of > *Nayan Deshmukh > *Sent:* Thursday, August 02, 2018 12:07 AM > *To:* Grodzovsky, Andrey > *Cc:* amd-gfx@lists.freedesktop.org; Maling list - DRI developers < > dri-devel@lists.freedesktop.org>; Koenig, Christian < > Christian.Koenig@amd.com> > *Subject:* Re: [PATCH 3/4] drm/scheduler: add new function to get least > loaded sched v2 > > > > Yes, that is correct. > > > > Nayan > > > > On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky > wrote: > > Clarification question - if the run queues belong to different > schedulers they effectively point to different rings, > > it means we allow to move (reschedule) a drm_sched_entity from one ring > to another - i assume that the idea int the first place, that > > you have a set of HW rings and you can utilize any of them for your jobs > (like compute rings). Correct ? > > Andrey > > > On 08/01/2018 04:20 AM, Nayan Deshmukh wrote: > > The function selects the run queue from the rq_list with the > > least load. The load is decided by the number of jobs in a > > scheduler. > > > > v2: avoid using atomic read twice consecutively, instead store > > it locally > > > > Signed-off-by: Nayan Deshmukh > > --- > > drivers/gpu/drm/scheduler/gpu_scheduler.c | 25 > +++++++++++++++++++++++++ > > 1 file changed, 25 insertions(+) > > > > diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c > b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > index 375f6f7f6a93..fb4e542660b0 100644 > > --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c > > +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c > > @@ -255,6 +255,31 @@ static bool drm_sched_entity_is_ready(struct > drm_sched_entity *entity) > > return true; > > } > > > > +/** > > + * drm_sched_entity_get_free_sched - Get the rq from rq_list with least > load > > + * > > + * @entity: scheduler entity > > + * > > + * Return the pointer to the rq with least load. > > + */ > > +static struct drm_sched_rq * > > +drm_sched_entity_get_free_sched(struct drm_sched_entity *entity) > > +{ > > + struct drm_sched_rq *rq = NULL; > > + unsigned int min_jobs = UINT_MAX, num_jobs; > > + int i; > > + > > + for (i = 0; i < entity->num_rq_list; ++i) { > > + num_jobs = > atomic_read(&entity->rq_list[i]->sched->num_jobs); > > + if (num_jobs < min_jobs) { > > + min_jobs = num_jobs; > > + rq = entity->rq_list[i]; > > + } > > + } > > + > > + return rq; > > +} > > + > > static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f, > > struct dma_fence_cb *cb) > > { > > --0000000000003d891d05726d8f44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi David,

On Thu, Aug 2, 2018 at 8:22 AM Zhou, David(ChunMing) <David1.Zhou@amd.com> wrote:

Another big question:

I agree the general idea is good to balance schedule= r load for same ring family.

But, when same entity job run on different scheduler= , that means the later job could be completed ahead of front, Right?=

Really good question. To avoid thi= s senario we do not move an entity which already has a job in the hardware = queue. We only move entities whose last_scheduled fence has been signalled = which means that the last submitted job of this entity has finished executi= ng.

Moving an entity which already has a job in the hard= ware queue will hinder the dependency optimization that we are using and he= nce will not anyway lead to a better performance. I have talked about the i= ssue in more detail here [1]. Please let me know if you have any more doubt= s regarding this.

Cheers,

That will break fence design, later fence must be si= gnaled after front fence in same fence context.

=C2=A0

Anything I missed?

=C2=A0

Regards,

David Zhou

=C2=A0

From: dri-devel <dri-devel-bounces@lists= .freedesktop.org> On Behalf Of Nayan Deshmukh
Sent: Thursday, August 02, 2018 12:07 AM
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Cc: amd-gfx@lists.freedesktop.org; Maling list - DRI developers <dri-devel@= lists.freedesktop.org>; Koenig, Christian <Christian.Koenig@amd.com> Subject: Re: [PATCH 3/4] drm/scheduler: add new function to get leas= t loaded sched v2

=C2=A0

Yes, that is correct.=C2=A0

=C2=A0

Nayan

=C2=A0

On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky <<= a href=3D"mailto:Andrey.Grodzovsky@amd.com" target=3D"_blank">Andrey.Grodzo= vsky@amd.com> wrote:

Clarification question = -=C2=A0 if the run queues belong to different
schedulers they effectively point to different rings,

it means we allow to move (reschedule) a drm_sched_entity from one ring to another - i assume that the idea int the first place, that

you have a set of HW rings and you can utilize any of them for your jobs (like compute rings). Correct ?

Andrey


On 08/01/2018 04:20 AM, Nayan Deshmukh wrote:
> The function selects the run queue from the rq_list with the
> least load. The load is decided by the number of jobs in a
> scheduler.
>
> v2: avoid using atomic read twice consecutively, instead store
>=C2=A0 =C2=A0 =C2=A0 it locally
>
> Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
> ---
>=C2=A0 =C2=A0drivers/gpu/drm/scheduler/gpu_scheduler.c | 25 +++++++++++= ++++++++++++++
>=C2=A0 =C2=A01 file changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/d= rm/scheduler/gpu_scheduler.c
> index 375f6f7f6a93..fb4e542660b0 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -255,6 +255,31 @@ static bool drm_sched_entity_is_ready(struct drm_= sched_entity *entity)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0return true;
>=C2=A0 =C2=A0}
>=C2=A0 =C2=A0
> +/**
> + * drm_sched_entity_get_free_sched - Get the rq from rq_list with lea= st load
> + *
> + * @entity: scheduler entity
> + *
> + * Return the pointer to the rq with least load.
> + */
> +static struct drm_sched_rq *
> +drm_sched_entity_get_free_sched(struct drm_sched_entity *entity)
> +{
> +=C2=A0 =C2=A0 =C2=A0struct drm_sched_rq *rq =3D NULL;
> +=C2=A0 =C2=A0 =C2=A0unsigned int min_jobs =3D UINT_MAX, num_jobs;
> +=C2=A0 =C2=A0 =C2=A0int i;
> +
> +=C2=A0 =C2=A0 =C2=A0for (i =3D 0; i < entity->num_rq_list; ++i)= {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0num_jobs =3D atomic_r= ead(&entity->rq_list[i]->sched->num_jobs);
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (num_jobs < min= _jobs) {
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0min_jobs =3D num_jobs;
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0rq =3D entity->rq_list[i];
> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> +=C2=A0 =C2=A0 =C2=A0}
> +
> +=C2=A0 =C2=A0 =C2=A0return rq;
> +}
> +
>=C2=A0 =C2=A0static void drm_sched_entity_kill_jobs_cb(struct dma_fence= *f,
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct dma_fence_cb = *cb)
>=C2=A0 =C2=A0{

--0000000000003d891d05726d8f44-- --===============0878455635== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0878455635==--