All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/scheduler for vc5
@ 2018-03-30 20:05 Eric Anholt
  2018-03-31 17:46 ` Christian König
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Anholt @ 2018-03-30 20:05 UTC (permalink / raw)
  To: dri-devel; +Cc: Alex Deucher, Christian König


[-- Attachment #1.1: Type: text/plain, Size: 1444 bytes --]

I've been keeping my eye on what's going on with drm/scheduler, and I'm
definitely interested in using it.  I've got some questions about how to
fit it to this HW, though.

For this HW, most rendering jobs have two phases: binning and rendering,
and the HW has two small FIFOs for descriptions of each type of job to
be submitted.  The bin portion must be completed before emitting the
render.  Some jobs may be render only, skipping the bin phase.

The render side is what takes most of the time.  However, you can
usually bin the next frame while rendering the current one, helping keep
your shared shader cores busy when you're parsing command lists.  The
exception is if the next bin depends on your last render (think
render-to-texture with texturing in a vertex shader).

This makes me think that I should expose two entities for the HW's
binner and the renderer.  Each VC6 job would have two drm_sched_job: The
render job would depend on the fence from the bin job, and bin may or
may not depend on the previous render.

However, as an extra complication, the MMU is shared between binner and
renderer, so I can't schedule a new job with a page table change until
the other side finishes up.  Is there a good way to express this with
drm/scheduler, or should I work around this by internally stalling my
job submissions to the HW when a page table change is needed, and then
trigger that page table swap and submit once a job completes?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drm/scheduler for vc5
  2018-03-30 20:05 drm/scheduler for vc5 Eric Anholt
@ 2018-03-31 17:46 ` Christian König
  2018-04-02 18:49   ` Eric Anholt
  2018-04-03 23:08   ` Eric Anholt
  0 siblings, 2 replies; 6+ messages in thread
From: Christian König @ 2018-03-31 17:46 UTC (permalink / raw)
  To: Eric Anholt, dri-devel; +Cc: Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 3517 bytes --]


Hi Eric,

nice to see that the scheduler gets used more and more.

The feature your need to solve both your binning/rendering as well as 
your MMU problem is dependency handling. See the "dependency" callback 
of the backend operations.

With this callback the driver can return dma_fences which need to signal 
(or at least be scheduled if it targets the same ring buffer/fifo).

Now you need dma_fences as result of your run_job callback for the 
binning step anyway. So when you return this fence from the binning step 
as dependency for your rendering step the scheduler does exactly what 
you want, e.g. not start the rendering before the binning is finished.


The same idea can be used for the MMU switch. As an example on how to do 
this see how the dependency callback is implemented in 
amdgpu_job_dependency():
>     struct dma_fence *fence = amdgpu_sync_get_fence(&job->sync, 
> &explicit);

First we get the "normal" dependencies from our sync object (a storage 
for fences).

...
>     while (fence == NULL && vm && !job->vmid) {
>         struct amdgpu_ring *ring = job->ring;
>
>         r = amdgpu_vmid_grab(vm, ring, &job->sync,
>                      &job->base.s_fence->finished,
>                      job);
>         if (r)
>             DRM_ERROR("Error getting VM ID (%d)\n", r);
>
>         fence = amdgpu_sync_get_fence(&job->sync, NULL);
>     }

If we don't have any more "normal" dependencies left we call into the 
VMID subsystem to allocate an MMU for that job (we have 16 of those).

This call will pick a VMID and remember that the process of the job is 
now the owner of this VMID. If the VMID previously didn't belonged to 
the process of the current job all fences of the old process are added 
to the job->sync object again.

So after having returned all "normal" dependencies we now return the one 
necessary to grab the hardware resource VMID.

Regards,
Christian.

Am 30.03.2018 um 22:05 schrieb Eric Anholt:
> I've been keeping my eye on what's going on with drm/scheduler, and I'm
> definitely interested in using it.  I've got some questions about how to
> fit it to this HW, though.
>
> For this HW, most rendering jobs have two phases: binning and rendering,
> and the HW has two small FIFOs for descriptions of each type of job to
> be submitted.  The bin portion must be completed before emitting the
> render.  Some jobs may be render only, skipping the bin phase.
>
> The render side is what takes most of the time.  However, you can
> usually bin the next frame while rendering the current one, helping keep
> your shared shader cores busy when you're parsing command lists.  The
> exception is if the next bin depends on your last render (think
> render-to-texture with texturing in a vertex shader).
>
> This makes me think that I should expose two entities for the HW's
> binner and the renderer.  Each VC6 job would have two drm_sched_job: The
> render job would depend on the fence from the bin job, and bin may or
> may not depend on the previous render.
>
> However, as an extra complication, the MMU is shared between binner and
> renderer, so I can't schedule a new job with a page table change until
> the other side finishes up.  Is there a good way to express this with
> drm/scheduler, or should I work around this by internally stalling my
> job submissions to the HW when a page table change is needed, and then
> trigger that page table swap and submit once a job completes?


[-- Attachment #1.2: Type: text/html, Size: 4602 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drm/scheduler for vc5
  2018-03-31 17:46 ` Christian König
@ 2018-04-02 18:49   ` Eric Anholt
  2018-04-03  9:18     ` Christian König
  2018-04-03 23:08   ` Eric Anholt
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Anholt @ 2018-04-02 18:49 UTC (permalink / raw)
  To: Christian König, dri-devel; +Cc: Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 2442 bytes --]

Christian König <christian.koenig@amd.com> writes:

> Hi Eric,
>
> nice to see that the scheduler gets used more and more.
>
> The feature your need to solve both your binning/rendering as well as 
> your MMU problem is dependency handling. See the "dependency" callback 
> of the backend operations.
>
> With this callback the driver can return dma_fences which need to signal 
> (or at least be scheduled if it targets the same ring buffer/fifo).
>
> Now you need dma_fences as result of your run_job callback for the 
> binning step anyway. So when you return this fence from the binning step 
> as dependency for your rendering step the scheduler does exactly what 
> you want, e.g. not start the rendering before the binning is finished.
>
>
> The same idea can be used for the MMU switch. As an example on how to do 
> this see how the dependency callback is implemented in 
> amdgpu_job_dependency():
>>     struct dma_fence *fence = amdgpu_sync_get_fence(&job->sync, 
>> &explicit);
>
> First we get the "normal" dependencies from our sync object (a storage 
> for fences).
>
> ...
>>     while (fence == NULL && vm && !job->vmid) {
>>         struct amdgpu_ring *ring = job->ring;
>>
>>         r = amdgpu_vmid_grab(vm, ring, &job->sync,
>>                      &job->base.s_fence->finished,
>>                      job);
>>         if (r)
>>             DRM_ERROR("Error getting VM ID (%d)\n", r);
>>
>>         fence = amdgpu_sync_get_fence(&job->sync, NULL);
>>     }
>
> If we don't have any more "normal" dependencies left we call into the 
> VMID subsystem to allocate an MMU for that job (we have 16 of those).
>
> This call will pick a VMID and remember that the process of the job is 
> now the owner of this VMID. If the VMID previously didn't belonged to 
> the process of the current job all fences of the old process are added 
> to the job->sync object again.

This makes some sense when you have many VMIDs and reuse won't happen
very often.  I'm concerned that when I effectively have one VMID that I
need to keep swapping, then we're creating a specific serialization of
the jobs at the time they're submitted to the kernel (dependency()
callback) rather than when the scheduler decides it would like to submit
to the HW (run_job() callback after deciding on a job based on
priority).

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drm/scheduler for vc5
  2018-04-02 18:49   ` Eric Anholt
@ 2018-04-03  9:18     ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2018-04-03  9:18 UTC (permalink / raw)
  To: Eric Anholt, dri-devel; +Cc: Alex Deucher

Am 02.04.2018 um 20:49 schrieb Eric Anholt:
> [SNIP]
>> This call will pick a VMID and remember that the process of the job is
>> now the owner of this VMID. If the VMID previously didn't belonged to
>> the process of the current job all fences of the old process are added
>> to the job->sync object again.
> This makes some sense when you have many VMIDs and reuse won't happen
> very often.  I'm concerned that when I effectively have one VMID that I
> need to keep swapping, then we're creating a specific serialization of
> the jobs at the time they're submitted to the kernel (dependency()
> callback) rather than when the scheduler decides it would like to submit
> to the HW (run_job() callback after deciding on a job based on
> priority).

We have the same problem, e.g. a currently active process on a VMID 
would get an unfair advantage because you don't need to switch between 
page tables for its submissions.

That is avoided by remembering the submission as user of a VMID as soon 
as it asks for it and not when the job is actually run. This way the 
current submission will be guaranteed to get a chance with the MMU.

Regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drm/scheduler for vc5
  2018-03-31 17:46 ` Christian König
  2018-04-02 18:49   ` Eric Anholt
@ 2018-04-03 23:08   ` Eric Anholt
  2018-04-04  7:13     ` Christian König
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Anholt @ 2018-04-03 23:08 UTC (permalink / raw)
  To: Christian König, dri-devel; +Cc: Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]

Christian König <christian.koenig@amd.com> writes:

> Hi Eric,
>
> nice to see that the scheduler gets used more and more.
>
> The feature your need to solve both your binning/rendering as well as 
> your MMU problem is dependency handling. See the "dependency" callback 
> of the backend operations.
>
> With this callback the driver can return dma_fences which need to signal 
> (or at least be scheduled if it targets the same ring buffer/fifo).
>
> Now you need dma_fences as result of your run_job callback for the 
> binning step anyway. So when you return this fence from the binning step 
> as dependency for your rendering step the scheduler does exactly what 
> you want, e.g. not start the rendering before the binning is finished.

It looks like in order to use the bin's fence returned from run_job,
render first needs to depend on exec->bin.base.s_fence->scheduled so
that run_job has been called.  Is there any reason not to just depend on
exec->bin.base.s_fence->finished, instead?  Finished will be signaled
basically immediately after the run_job fence completes, right?

Also, I hadn't quite followed your suggestion about MMU switching
before.  Your trick was that you return a newly-generated dependency on
MMU switching as the final dependency, so that you only decide on
serializing the MMU switch once you're ready to run and the scheduler
was about to pick your job anyway.  This seems good to me.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: drm/scheduler for vc5
  2018-04-03 23:08   ` Eric Anholt
@ 2018-04-04  7:13     ` Christian König
  0 siblings, 0 replies; 6+ messages in thread
From: Christian König @ 2018-04-04  7:13 UTC (permalink / raw)
  To: Eric Anholt, dri-devel; +Cc: Alex Deucher

Am 04.04.2018 um 01:08 schrieb Eric Anholt:
> Christian König <christian.koenig@amd.com> writes:
>
>> Hi Eric,
>>
>> nice to see that the scheduler gets used more and more.
>>
>> The feature your need to solve both your binning/rendering as well as
>> your MMU problem is dependency handling. See the "dependency" callback
>> of the backend operations.
>>
>> With this callback the driver can return dma_fences which need to signal
>> (or at least be scheduled if it targets the same ring buffer/fifo).
>>
>> Now you need dma_fences as result of your run_job callback for the
>> binning step anyway. So when you return this fence from the binning step
>> as dependency for your rendering step the scheduler does exactly what
>> you want, e.g. not start the rendering before the binning is finished.
> It looks like in order to use the bin's fence returned from run_job,
> render first needs to depend on exec->bin.base.s_fence->scheduled so
> that run_job has been called.  Is there any reason not to just depend on
> exec->bin.base.s_fence->finished, instead?  Finished will be signaled
> basically immediately after the run_job fence completes, right?

Yes exec->bin.base.s_fence->finished should be sufficient as well.

See there are three fences involved in the scheduler:
1. The hardware fence returned by the run_job callback.

The scheduler will register on that one to be notified for completion so 
that it can schedule the next job.

If you use the timeout feature it can be that we push a job to the 
hardware multiple times and replace this fence when we do so.

2. s_fence->scheduled this one is signaled when the scheduler has picked 
up a job.

It is the first one signaled and generally means that the job entered 
the hardware fifo.

3. s_fence->finished this one is signaled when the underlying hardware 
fence is signaled.

The difference to the hardware fence is that it is created much earlier 
during command submission.

I should probably write all this into some kind of documentation.

Regards,
Christian.

>
> Also, I hadn't quite followed your suggestion about MMU switching
> before.  Your trick was that you return a newly-generated dependency on
> MMU switching as the final dependency, so that you only decide on
> serializing the MMU switch once you're ready to run and the scheduler
> was about to pick your job anyway.  This seems good to me.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-04  7:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-30 20:05 drm/scheduler for vc5 Eric Anholt
2018-03-31 17:46 ` Christian König
2018-04-02 18:49   ` Eric Anholt
2018-04-03  9:18     ` Christian König
2018-04-03 23:08   ` Eric Anholt
2018-04-04  7:13     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.