All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work
Date: Thu, 9 Jul 2020 12:01:29 +0100	[thread overview]
Message-ID: <71aaf1cf-9d3a-6681-c9b0-fc25144b86b0@linux.intel.com> (raw)
In-Reply-To: <159422257929.17526.13795947568657610354@build.alporthouse.com>


On 08/07/2020 16:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-07-08 15:24:20)
>>
>> On 08/07/2020 13:42, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-07-08 13:26:24)
>>>>
>>>> On 06/07/2020 07:19, Chris Wilson wrote:
>>>>> Allocate a few dma fence context id that we can use to associate async work
>>>>> [for the CPU] launched on behalf of this context. For extra fun, we allow
>>>>> a configurable concurrency width.
>>>>>
>>>>> A current example would be that we spawn an unbound worker for every
>>>>> userptr get_pages. In the future, we wish to charge this work to the
>>>>> context that initiated the async work and to impose concurrency limits
>>>>> based on the context.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gem/i915_gem_context.c       | 4 ++++
>>>>>     drivers/gpu/drm/i915/gem/i915_gem_context.h       | 6 ++++++
>>>>>     drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++++++
>>>>>     3 files changed, 16 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> index 41784df51e58..bd68746327b3 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>> @@ -714,6 +714,10 @@ __create_context(struct drm_i915_private *i915)
>>>>>         ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
>>>>>         mutex_init(&ctx->mutex);
>>>>>     
>>>>> +     ctx->async.width = rounddown_pow_of_two(num_online_cpus());
>>>>> +     ctx->async.context = dma_fence_context_alloc(ctx->async.width);
>>>>> +     ctx->async.width--;
>>>>
>>>> Hey I had a tri-core CPU back in the day.. :) Really, I can only assume
>>>> you are oding some tricks with masks which maybe only work with power of
>>>> 2 num cpus? Hard to say.. please explain in a comment.
>>>
>>> Just a pot mask, that fits in the currently available set of CPUs.
>>>    
>>>> I don't even understand what the context will be for yet and why it
>>>> needs a separate context id.
>>>
>>> The longer term view is that I want to pull the various async tasks we
>>> use into a CPU scheduling kthread[s], that shares the same priority
>>> inheritance of tasks. The issue at the moment is that as we use the
>>> system_wq, that imposes an implicit FIFO ordering on our tasks upsetting
>>> our context priorities. This is a step towards that to start looking at
>>> how we might limit concurrency in various stages by using a bunch of
>>> timelines for each stage, and queuing our work along each timeline before
>>> submitting to an unbound system_wq. [The immediate goal is to limit how
>>> much of the CPU one client can hog by submitting deferred work that would
>>> run in parallel, with a view to making that configurable per-context.]
>>
>> You are thinking of connecting the GEM context priority with task
>> priority? Or create the async kthreads with the same task priority as
>> the task who owns the GEM context has? Will that be too many kthreads? I
>> suppose they would be created and destroyed on demand so maybe not.
> 
> I'm thinking of having dedicated kthread task runners. Maybe adjusting
> between midRT-prio and normal-prio depending on workload. The essence is
> to simply replace the FIFO workqueue with our own priolists. (Running
> the first task in the queue, hopefully each task is short enough so that
> we really don't have to start thinking about making the tasks
> preemptible.]
> 
> Then world domination.
> 
> But first something that works with/like kthread_worker.
> 
>>>>>         spin_lock_init(&ctx->stale.lock);
>>>>>         INIT_LIST_HEAD(&ctx->stale.engines);
>>>>>     
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>>>> index 3702b2fb27ab..e104ff0ae740 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>>>> @@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>>>>>     int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data,
>>>>>                                        struct drm_file *file);
>>>>>     
>>>>> +static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx)
>>>>> +{
>>>>> +     return (ctx->async.context +
>>>>> +             (atomic_fetch_inc(&ctx->async.cur) & ctx->async.width));
>>>>> +}
>>>>> +
>>>>>     static inline struct i915_gem_context *
>>>>>     i915_gem_context_get(struct i915_gem_context *ctx)
>>>>>     {
>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> index ae14ca24a11f..52561f98000f 100644
>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
>>>>> @@ -85,6 +85,12 @@ struct i915_gem_context {
>>>>>     
>>>>>         struct intel_timeline *timeline;
>>>>>     
>>>>> +     struct {
>>>>> +             u64 context;
>>>>> +             atomic_t cur;
>>>>
>>>> What is cur? In which patch it gets used? (Can't see it.)
>>>
>>> See i915_gem_context_async_id() above.
>>
>> Yeah found it later.
>>
>> So in the patch where you use it, could you explain the significance of
>> number of fence contexts vs the number of CPUs. What logic drives the
>> choice of CPU concurrency per GEM context?
> 
> Logic? Pick a number out of a hat.
> 
>> And what is the effective behaviour you get with N contexts - emit N
>> concurrent operations and for N + 1 block in execbuf?
> 
> Each context defines a timeline. A task is not ready to run until the
> task before it in its timeline is completed. So we don't block in
> execbuf, the scheduler waits until the request is ready before putting
> it into the HW queues -- i.e. the number chain of fences with everything
> that entails about ensuring it runs to completion [whether successfully
> or not, if not we then rely on the error propagation to limit the damage
> and report it back to the user if they kept a fence around to inspect].

Okay but what is the benefit of N contexts in this series, before the 
work is actually spread over ctx async width CPUs? Is there any? If not 
I would prefer this patch is delayed until the time some actual 
parallelism is ready to be added.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-07-09 11:01 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06  6:19 [Intel-gfx] s/obj->mm.lock// Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 01/20] drm/i915: Preallocate stashes for vma page-directories Chris Wilson
2020-07-06 18:15   ` Matthew Auld
2020-07-06 18:20     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 02/20] drm/i915: Switch to object allocations for page directories Chris Wilson
2020-07-06 19:06   ` Matthew Auld
2020-07-06 19:31     ` Chris Wilson
2020-07-06 20:01     ` Chris Wilson
2020-07-06 21:08       ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 03/20] drm/i915/gem: Don't drop the timeline lock during execbuf Chris Wilson
2020-07-08 16:54   ` Tvrtko Ursulin
2020-07-08 18:08     ` Chris Wilson
2020-07-09 10:52       ` Tvrtko Ursulin
2020-07-09 10:57         ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 04/20] drm/i915/gem: Rename execbuf.bind_link to unbound_link Chris Wilson
2020-07-10 11:26   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 05/20] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup Chris Wilson
2020-07-10 11:27   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 06/20] drm/i915/gem: Remove the call for no-evict i915_vma_pin Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 07/20] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 08/20] drm/i915: Always defer fenced work to the worker Chris Wilson
2020-07-08 12:18   ` Tvrtko Ursulin
2020-07-08 12:25     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 09/20] drm/i915/gem: Assign context id for async work Chris Wilson
2020-07-08 12:26   ` Tvrtko Ursulin
2020-07-08 12:42     ` Chris Wilson
2020-07-08 14:24       ` Tvrtko Ursulin
2020-07-08 15:36         ` Chris Wilson
2020-07-09 11:01           ` Tvrtko Ursulin [this message]
2020-07-09 11:07             ` Chris Wilson
2020-07-09 11:59               ` Tvrtko Ursulin
2020-07-09 12:07                 ` Chris Wilson
2020-07-13 12:22                   ` Tvrtko Ursulin
2020-07-14 14:01                     ` Chris Wilson
2020-07-08 12:45     ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 10/20] drm/i915: Export a preallocate variant of i915_active_acquire() Chris Wilson
2020-07-09 14:36   ` Maarten Lankhorst
2020-07-10 12:24     ` Tvrtko Ursulin
2020-07-10 12:32       ` Maarten Lankhorst
2020-07-13 14:29   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 11/20] drm/i915/gem: Separate the ww_mutex walker into its own list Chris Wilson
2020-07-13 14:53   ` Tvrtko Ursulin
2020-07-14 14:10     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 12/20] drm/i915/gem: Asynchronous GTT unbinding Chris Wilson
2020-07-14  9:02   ` Tvrtko Ursulin
2020-07-14 15:05     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 13/20] drm/i915/gem: Bind the fence async for execbuf Chris Wilson
2020-07-14 12:19   ` Tvrtko Ursulin
2020-07-14 15:21     ` Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 14/20] drm/i915/gem: Include cmdparser in common execbuf pinning Chris Wilson
2020-07-14 12:48   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 15/20] drm/i915/gem: Include secure batch " Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 16/20] drm/i915/gem: Reintroduce multiple passes for reloc processing Chris Wilson
2020-07-09 15:39   ` Tvrtko Ursulin
2020-07-06  6:19 ` [Intel-gfx] [PATCH 17/20] drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2 Chris Wilson
2020-07-06 17:21   ` kernel test robot
2020-07-06  6:19 ` [Intel-gfx] [PATCH 18/20] drm/i915/gem: Pull execbuf dma resv under a single critical section Chris Wilson
2020-07-06  6:19 ` [Intel-gfx] [PATCH 19/20] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class Chris Wilson
2020-07-09 14:06   ` Maarten Lankhorst
2020-07-06  6:19 ` [Intel-gfx] [PATCH 20/20] drm/i915: Track i915_vma with its own reference counter Chris Wilson
2020-07-06  6:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/20] drm/i915: Preallocate stashes for vma page-directories Patchwork
2020-07-06  6:29 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-07-06  6:51 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-06  7:55 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-07-27 18:53 ` [Intel-gfx] s/obj->mm.lock// Thomas Hellström (Intel)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71aaf1cf-9d3a-6681-c9b0-fc25144b86b0@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.