All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org,
	Rob Clark <robdclark@chromium.org>,
	Luben Tuikov <luben.tuikov@amd.com>,
	David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	open list <linux-kernel@vger.kernel.org>,
	"open list:DMA BUFFER SHARING FRAMEWORK" 
	<linux-media@vger.kernel.org>,
	"moderated list:DMA BUFFER SHARING FRAMEWORK" 
	<linaro-mm-sig@lists.linaro.org>
Subject: Re: [RFC] drm/scheduler: Unwrap job dependencies
Date: Thu, 23 Mar 2023 14:30:24 -0700	[thread overview]
Message-ID: <CAF6AEGvdVca_mnZVo9He9oKVfYp84e_kOPWaxX+K5aV4Es9kcQ@mail.gmail.com> (raw)
In-Reply-To: <e2fa296b-9b71-a41b-d37d-33f0fac2cd4e@amd.com>

On Thu, Mar 23, 2023 at 7:03 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 23.03.23 um 14:54 schrieb Rob Clark:
> > On Thu, Mar 23, 2023 at 12:35 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 22.03.23 um 23:44 schrieb Rob Clark:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Container fences have burner contexts, which makes the trick to store at
> >>> most one fence per context somewhat useless if we don't unwrap array or
> >>> chain fences.
> >> Mhm, we intentionally kept them not unwrapped since this way they only
> >> occupy one fence slot.
> >>
> >> But it might be better to unwrap them if you add many of those dependencies.
> >>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>> ---
> >>> tbh, I'm not sure why we weren't doing this already, unless there is
> >>> something I'm overlooking
> >>>
> >>>    drivers/gpu/drm/scheduler/sched_main.c | 43 +++++++++++++++++---------
> >>>    1 file changed, 28 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index c2ee44d6224b..f59e5335afbb 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -41,20 +41,21 @@
> >>>     * 4. Entities themselves maintain a queue of jobs that will be scheduled on
> >>>     *    the hardware.
> >>>     *
> >>>     * The jobs in a entity are always scheduled in the order that they were pushed.
> >>>     */
> >>>
> >>>    #include <linux/kthread.h>
> >>>    #include <linux/wait.h>
> >>>    #include <linux/sched.h>
> >>>    #include <linux/completion.h>
> >>> +#include <linux/dma-fence-unwrap.h>
> >>>    #include <linux/dma-resv.h>
> >>>    #include <uapi/linux/sched/types.h>
> >>>
> >>>    #include <drm/drm_print.h>
> >>>    #include <drm/drm_gem.h>
> >>>    #include <drm/gpu_scheduler.h>
> >>>    #include <drm/spsc_queue.h>
> >>>
> >>>    #define CREATE_TRACE_POINTS
> >>>    #include "gpu_scheduler_trace.h"
> >>> @@ -665,41 +666,27 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> >>>        sched = entity->rq->sched;
> >>>
> >>>        job->sched = sched;
> >>>        job->s_priority = entity->rq - sched->sched_rq;
> >>>        job->id = atomic64_inc_return(&sched->job_id_count);
> >>>
> >>>        drm_sched_fence_init(job->s_fence, job->entity);
> >>>    }
> >>>    EXPORT_SYMBOL(drm_sched_job_arm);
> >>>
> >>> -/**
> >>> - * drm_sched_job_add_dependency - adds the fence as a job dependency
> >>> - * @job: scheduler job to add the dependencies to
> >>> - * @fence: the dma_fence to add to the list of dependencies.
> >>> - *
> >>> - * Note that @fence is consumed in both the success and error cases.
> >>> - *
> >>> - * Returns:
> >>> - * 0 on success, or an error on failing to expand the array.
> >>> - */
> >>> -int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>> -                              struct dma_fence *fence)
> >>> +static int _add_dependency(struct drm_sched_job *job, struct dma_fence *fence)
> >> Please keep the drm_sched_job_ prefix here even for static functions.
> >> The symbol _add_dependency just sucks in a backtrace, especially when
> >> it's tail optimized.
> >>
> >>>    {
> >>>        struct dma_fence *entry;
> >>>        unsigned long index;
> >>>        u32 id = 0;
> >>>        int ret;
> >>>
> >>> -     if (!fence)
> >>> -             return 0;
> >>> -
> >>>        /* Deduplicate if we already depend on a fence from the same context.
> >>>         * This lets the size of the array of deps scale with the number of
> >>>         * engines involved, rather than the number of BOs.
> >>>         */
> >>>        xa_for_each(&job->dependencies, index, entry) {
> >>>                if (entry->context != fence->context)
> >>>                        continue;
> >>>
> >>>                if (dma_fence_is_later(fence, entry)) {
> >>>                        dma_fence_put(entry);
> >>> @@ -709,20 +696,46 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>>                }
> >>>                return 0;
> >>>        }
> >>>
> >>>        ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> >>>        if (ret != 0)
> >>>                dma_fence_put(fence);
> >>>
> >>>        return ret;
> >>>    }
> >>> +
> >>> +/**
> >>> + * drm_sched_job_add_dependency - adds the fence as a job dependency
> >>> + * @job: scheduler job to add the dependencies to
> >>> + * @fence: the dma_fence to add to the list of dependencies.
> >>> + *
> >>> + * Note that @fence is consumed in both the success and error cases.
> >>> + *
> >>> + * Returns:
> >>> + * 0 on success, or an error on failing to expand the array.
> >>> + */
> >>> +int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>> +                              struct dma_fence *fence)
> >> Maybe name the new function drm_sched_job_unwrap_add_dependency or
> >> something like this.
> >>
> >> I need to double check, but I think for some cases we don't need or
> >> don't even want this in the driver.
> > I'd be curious to know the cases where you don't want this.. one thing
> > I was thinking about, what if you have a container fence with two
> > contained fences.  One is on the same ctx as the job, one is not but
> > signals sooner.  You end up artificially waiting on both, which seems
> > sub-optimal.
>
> Well resv objects don't contain other containers for example.

I suppose I have the explicit sync case more in mind, where the
dependent fence ends up being a chain or array (if userspace is
merging fence fd's).

> Then we also have an use case in amdgpu where fence need to be
> explicitly waited for even when they are from the same ctx as the job
> because otherwise we wouldn't see everything cache coherent.

This was the kinda weird case I wanted to make sure I wasn't breaking.
I remember seeing something fly by for this, but can't find it now or
remember what amdgpu's solution was..

> On the other hand we currently handle that amdgpu use case differently
> and the extra overhead of unwrapping fences even if they can't be
> containers is probably negligible.
>
> > Anyways, I can make this a new entrypoint which unwraps, and/or rename
> > the internal static function, if we think this is a good idea.
>
> If you think that's unnecessary keeping your original approach is fine
> with me as well.

I'm going to assume unnecessary until someone speaks up with their
weird specific case ;-)

BR,
-R

> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Christian.
> >>
> >>> +{
> >>> +     struct dma_fence_unwrap iter;
> >>> +     struct dma_fence *f;
> >>> +     int ret = 0;
> >>> +
> >>> +     dma_fence_unwrap_for_each (f, &iter, fence) {
> >>> +             ret = _add_dependency(job, f);
> >>> +             if (ret)
> >>> +                     break;
> >>> +     }
> >>> +
> >>> +     return ret;
> >>> +}
> >>>    EXPORT_SYMBOL(drm_sched_job_add_dependency);
> >>>
> >>>    /**
> >>>     * drm_sched_job_add_resv_dependencies - add all fences from the resv to the job
> >>>     * @job: scheduler job to add the dependencies to
> >>>     * @resv: the dma_resv object to get the fences from
> >>>     * @usage: the dma_resv_usage to use to filter the fences
> >>>     *
> >>>     * This adds all fences matching the given usage from @resv to @job.
> >>>     * Must be called with the @resv lock held.
>

WARNING: multiple messages have this Message-ID (diff)
From: Rob Clark <robdclark@gmail.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@chromium.org>,
	open list <linux-kernel@vger.kernel.org>,
	dri-devel@lists.freedesktop.org,
	"moderated list:DMA BUFFER SHARING FRAMEWORK"
	<linaro-mm-sig@lists.linaro.org>,
	Luben Tuikov <luben.tuikov@amd.com>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	"open list:DMA BUFFER SHARING FRAMEWORK"
	<linux-media@vger.kernel.org>
Subject: Re: [RFC] drm/scheduler: Unwrap job dependencies
Date: Thu, 23 Mar 2023 14:30:24 -0700	[thread overview]
Message-ID: <CAF6AEGvdVca_mnZVo9He9oKVfYp84e_kOPWaxX+K5aV4Es9kcQ@mail.gmail.com> (raw)
In-Reply-To: <e2fa296b-9b71-a41b-d37d-33f0fac2cd4e@amd.com>

On Thu, Mar 23, 2023 at 7:03 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 23.03.23 um 14:54 schrieb Rob Clark:
> > On Thu, Mar 23, 2023 at 12:35 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 22.03.23 um 23:44 schrieb Rob Clark:
> >>> From: Rob Clark <robdclark@chromium.org>
> >>>
> >>> Container fences have burner contexts, which makes the trick to store at
> >>> most one fence per context somewhat useless if we don't unwrap array or
> >>> chain fences.
> >> Mhm, we intentionally kept them not unwrapped since this way they only
> >> occupy one fence slot.
> >>
> >> But it might be better to unwrap them if you add many of those dependencies.
> >>
> >>> Signed-off-by: Rob Clark <robdclark@chromium.org>
> >>> ---
> >>> tbh, I'm not sure why we weren't doing this already, unless there is
> >>> something I'm overlooking
> >>>
> >>>    drivers/gpu/drm/scheduler/sched_main.c | 43 +++++++++++++++++---------
> >>>    1 file changed, 28 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> >>> index c2ee44d6224b..f59e5335afbb 100644
> >>> --- a/drivers/gpu/drm/scheduler/sched_main.c
> >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> >>> @@ -41,20 +41,21 @@
> >>>     * 4. Entities themselves maintain a queue of jobs that will be scheduled on
> >>>     *    the hardware.
> >>>     *
> >>>     * The jobs in a entity are always scheduled in the order that they were pushed.
> >>>     */
> >>>
> >>>    #include <linux/kthread.h>
> >>>    #include <linux/wait.h>
> >>>    #include <linux/sched.h>
> >>>    #include <linux/completion.h>
> >>> +#include <linux/dma-fence-unwrap.h>
> >>>    #include <linux/dma-resv.h>
> >>>    #include <uapi/linux/sched/types.h>
> >>>
> >>>    #include <drm/drm_print.h>
> >>>    #include <drm/drm_gem.h>
> >>>    #include <drm/gpu_scheduler.h>
> >>>    #include <drm/spsc_queue.h>
> >>>
> >>>    #define CREATE_TRACE_POINTS
> >>>    #include "gpu_scheduler_trace.h"
> >>> @@ -665,41 +666,27 @@ void drm_sched_job_arm(struct drm_sched_job *job)
> >>>        sched = entity->rq->sched;
> >>>
> >>>        job->sched = sched;
> >>>        job->s_priority = entity->rq - sched->sched_rq;
> >>>        job->id = atomic64_inc_return(&sched->job_id_count);
> >>>
> >>>        drm_sched_fence_init(job->s_fence, job->entity);
> >>>    }
> >>>    EXPORT_SYMBOL(drm_sched_job_arm);
> >>>
> >>> -/**
> >>> - * drm_sched_job_add_dependency - adds the fence as a job dependency
> >>> - * @job: scheduler job to add the dependencies to
> >>> - * @fence: the dma_fence to add to the list of dependencies.
> >>> - *
> >>> - * Note that @fence is consumed in both the success and error cases.
> >>> - *
> >>> - * Returns:
> >>> - * 0 on success, or an error on failing to expand the array.
> >>> - */
> >>> -int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>> -                              struct dma_fence *fence)
> >>> +static int _add_dependency(struct drm_sched_job *job, struct dma_fence *fence)
> >> Please keep the drm_sched_job_ prefix here even for static functions.
> >> The symbol _add_dependency just sucks in a backtrace, especially when
> >> it's tail optimized.
> >>
> >>>    {
> >>>        struct dma_fence *entry;
> >>>        unsigned long index;
> >>>        u32 id = 0;
> >>>        int ret;
> >>>
> >>> -     if (!fence)
> >>> -             return 0;
> >>> -
> >>>        /* Deduplicate if we already depend on a fence from the same context.
> >>>         * This lets the size of the array of deps scale with the number of
> >>>         * engines involved, rather than the number of BOs.
> >>>         */
> >>>        xa_for_each(&job->dependencies, index, entry) {
> >>>                if (entry->context != fence->context)
> >>>                        continue;
> >>>
> >>>                if (dma_fence_is_later(fence, entry)) {
> >>>                        dma_fence_put(entry);
> >>> @@ -709,20 +696,46 @@ int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>>                }
> >>>                return 0;
> >>>        }
> >>>
> >>>        ret = xa_alloc(&job->dependencies, &id, fence, xa_limit_32b, GFP_KERNEL);
> >>>        if (ret != 0)
> >>>                dma_fence_put(fence);
> >>>
> >>>        return ret;
> >>>    }
> >>> +
> >>> +/**
> >>> + * drm_sched_job_add_dependency - adds the fence as a job dependency
> >>> + * @job: scheduler job to add the dependencies to
> >>> + * @fence: the dma_fence to add to the list of dependencies.
> >>> + *
> >>> + * Note that @fence is consumed in both the success and error cases.
> >>> + *
> >>> + * Returns:
> >>> + * 0 on success, or an error on failing to expand the array.
> >>> + */
> >>> +int drm_sched_job_add_dependency(struct drm_sched_job *job,
> >>> +                              struct dma_fence *fence)
> >> Maybe name the new function drm_sched_job_unwrap_add_dependency or
> >> something like this.
> >>
> >> I need to double check, but I think for some cases we don't need or
> >> don't even want this in the driver.
> > I'd be curious to know the cases where you don't want this.. one thing
> > I was thinking about, what if you have a container fence with two
> > contained fences.  One is on the same ctx as the job, one is not but
> > signals sooner.  You end up artificially waiting on both, which seems
> > sub-optimal.
>
> Well resv objects don't contain other containers for example.

I suppose I have the explicit sync case more in mind, where the
dependent fence ends up being a chain or array (if userspace is
merging fence fd's).

> Then we also have an use case in amdgpu where fence need to be
> explicitly waited for even when they are from the same ctx as the job
> because otherwise we wouldn't see everything cache coherent.

This was the kinda weird case I wanted to make sure I wasn't breaking.
I remember seeing something fly by for this, but can't find it now or
remember what amdgpu's solution was..

> On the other hand we currently handle that amdgpu use case differently
> and the extra overhead of unwrapping fences even if they can't be
> containers is probably negligible.
>
> > Anyways, I can make this a new entrypoint which unwraps, and/or rename
> > the internal static function, if we think this is a good idea.
>
> If you think that's unnecessary keeping your original approach is fine
> with me as well.

I'm going to assume unnecessary until someone speaks up with their
weird specific case ;-)

BR,
-R

> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Christian.
> >>
> >>> +{
> >>> +     struct dma_fence_unwrap iter;
> >>> +     struct dma_fence *f;
> >>> +     int ret = 0;
> >>> +
> >>> +     dma_fence_unwrap_for_each (f, &iter, fence) {
> >>> +             ret = _add_dependency(job, f);
> >>> +             if (ret)
> >>> +                     break;
> >>> +     }
> >>> +
> >>> +     return ret;
> >>> +}
> >>>    EXPORT_SYMBOL(drm_sched_job_add_dependency);
> >>>
> >>>    /**
> >>>     * drm_sched_job_add_resv_dependencies - add all fences from the resv to the job
> >>>     * @job: scheduler job to add the dependencies to
> >>>     * @resv: the dma_resv object to get the fences from
> >>>     * @usage: the dma_resv_usage to use to filter the fences
> >>>     *
> >>>     * This adds all fences matching the given usage from @resv to @job.
> >>>     * Must be called with the @resv lock held.
>

  reply	other threads:[~2023-03-23 21:30 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22 22:44 [RFC] drm/scheduler: Unwrap job dependencies Rob Clark
2023-03-22 22:44 ` Rob Clark
2023-03-23  7:35 ` Christian König
2023-03-23  7:35   ` Christian König
2023-03-23 13:54   ` Rob Clark
2023-03-23 13:54     ` Rob Clark
2023-03-23 14:03     ` Christian König
2023-03-23 14:03       ` Christian König
2023-03-23 21:30       ` Rob Clark [this message]
2023-03-23 21:30         ` Rob Clark
2023-12-04 21:54         ` Rob Clark
2023-12-04 21:54           ` Rob Clark
2023-12-05  6:46           ` Christian König
2023-12-05  6:46             ` Christian König
2023-12-05 15:41             ` Rob Clark
2023-12-05 15:41               ` Rob Clark
2023-12-05 15:58               ` Christian König
2023-12-05 15:58                 ` Christian König
2023-12-05 16:56                 ` Rob Clark
2023-12-05 16:56                   ` Rob Clark
2023-12-05 17:14                   ` Rob Clark
2023-12-05 17:14                     ` Rob Clark
2023-12-06  9:04                     ` [Linaro-mm-sig] " Christian König
2023-12-06  9:04                       ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGvdVca_mnZVo9He9oKVfYp84e_kOPWaxX+K5aV4Es9kcQ@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=airlied@gmail.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=robdclark@chromium.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.