All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 14/40] drm/i915: Combine multiple internal plists into the same i915_priolist bucket
Date: Tue, 25 Sep 2018 08:55:20 +0100	[thread overview]
Message-ID: <153786212006.21139.15998380731239267981@skylake-alporthouse-com> (raw)
In-Reply-To: <1dd9f561-d12f-8bd6-f617-e50524e1ca4c@linux.intel.com>

Quoting Tvrtko Ursulin (2018-09-24 11:25:37)
> 
> On 19/09/2018 20:55, Chris Wilson wrote:
> > As we are about to allow ourselves to slightly bump the user priority
> > into a few different sublevels, packthose internal priority lists
> > into the same i915_priolist to keep the rbtree compact and avoid having
> > to allocate the default user priority even after the internal bumping.
> > The downside to having an requests[] rather than a node per active list,
> > is that we then have to walk over the empty higher priority lists. To
> > compensate, we track the active buckets and use a small bitmap to skip
> > over any inactive ones.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  6 +-
> >   drivers/gpu/drm/i915/intel_guc_submission.c | 12 ++-
> >   drivers/gpu/drm/i915/intel_lrc.c            | 87 ++++++++++++++-------
> >   drivers/gpu/drm/i915/intel_ringbuffer.h     | 13 ++-
> >   4 files changed, 80 insertions(+), 38 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 217ed3ee1cab..83f2f7774c1f 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1534,10 +1534,10 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >       count = 0;
> >       drm_printf(m, "\t\tQueue priority: %d\n", execlists->queue_priority);
> >       for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> > -             struct i915_priolist *p =
> > -                     rb_entry(rb, typeof(*p), node);
> > +             struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
> > +             int i;
> >   
> > -             list_for_each_entry(rq, &p->requests, sched.link) {
> > +             priolist_for_each_request(rq, p, i) {
> >                       if (count++ < MAX_REQUESTS_TO_SHOW - 1)
> >                               print_request(m, rq, "\t\tQ ");
> >                       else
> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 6f693ef62c64..8531bd917ec3 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -726,30 +726,28 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
> >       while ((rb = rb_first_cached(&execlists->queue))) {
> >               struct i915_priolist *p = to_priolist(rb);
> >               struct i915_request *rq, *rn;
> > +             int i;
> >   
> > -             list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
> > +             priolist_for_each_request_consume(rq, rn, p, i) {
> 
> Hm consumed clears the bitmask every time, but we are not certain yet we 
> will consume the request.

We clear it after the loop, so when we take the early break the bitmask
is unaffected.

> >                       if (last && rq->hw_context != last->hw_context) {
> > -                             if (port == last_port) {
> > -                                     __list_del_many(&p->requests,
> > -                                                     &rq->sched.link);
> > +                             if (port == last_port)
> >                                       goto done;
> > -                             }
> >   
> >                               if (submit)
> >                                       port_assign(port, last);
> >                               port++;
> >                       }
> >   
> > -                     INIT_LIST_HEAD(&rq->sched.link);
> > +                     list_del_init(&rq->sched.link);
> >   
> >                       __i915_request_submit(rq);
> >                       trace_i915_request_in(rq, port_index(port, execlists));
> > +
> >                       last = rq;
> >                       submit = true;
> >               }
> >   
> >               rb_erase_cached(&p->node, &execlists->queue);
> > -             INIT_LIST_HEAD(&p->requests);
> >               if (p->priority != I915_PRIORITY_NORMAL)
> >                       kmem_cache_free(engine->i915->priorities, p);
> >       }
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e8de250c3413..aeae82b5223c 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -259,14 +259,49 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
> >       ce->lrc_desc = desc;
> >   }
> >   
> > -static struct i915_priolist *
> > +static void assert_priolists(struct intel_engine_execlists * const execlists,
> > +                          int queue_priority)
> > +{
> > +     struct rb_node *rb;
> > +     int last_prio, i;
> > +
> > +     if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
> > +             return;
> > +
> > +     GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
> > +                rb_first(&execlists->queue.rb_root));
> > +
> > +     last_prio = (queue_priority >> I915_USER_PRIORITY_SHIFT) + 1;
>  >
> > +     for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> > +             struct i915_priolist *p = to_priolist(rb);
> > +
> > +             GEM_BUG_ON(p->priority >= last_prio);
> > +             last_prio = p->priority;
> > +
> > +             GEM_BUG_ON(!p->used);
> > +             for (i = 0; i < ARRAY_SIZE(p->requests); i++) {
> > +                     if (list_empty(&p->requests[i]))
> > +                             continue;
> 
> Asserting that bitmask slot is not set if list is empty is not interesting?

We can have the bit set with an empty list after request promotion. That
seemed like the reasonable tradeoff rather than have a more complicated
check after each list_move to decide if the old bit needs to be cleared.

> > +
> > +                     GEM_BUG_ON(!(p->used & BIT(i)));
> > +             }
> > +     }
> 
> In general, won't this be a tiny bit too expensive on debug build and 
> with priority/rq stress tests? To walk absolutely all priority levels 
> times bucket count on every lookup and dequeue. Or maybe we don't have 
> any tests at the moment which instantiate that many levels.

Worth it though. Whenever we abuse the system, our ability to catch
errors depends on our sanitychecks. I thought this was worth having the
reassurance that it worked :)

> > +}
> > +
> > +static struct list_head *
> >   lookup_priolist(struct intel_engine_cs *engine, int prio)
> >   {
> >       struct intel_engine_execlists * const execlists = &engine->execlists;
> >       struct i915_priolist *p;
> >       struct rb_node **parent, *rb;
> >       bool first = true;
> > +     int idx, i;
> > +
> > +     assert_priolists(execlists, INT_MAX);
> >   
> > +     /* buckets sorted from highest [in slot 0] to lowest priority */
> > +     idx = I915_PRIORITY_COUNT - (prio & ~I915_PRIORITY_MASK) - 1;
> 
> 0 - (prio & ~0) - 1 = -prio - 1, hm..?
> 
> Hm no.. I915_PRIORITY_COUNT with zero reserved internal bits is actually 
> 1.  So 1 - (prio & 0x0) - 1 = 0, ok phew..
> 
> Unintuitive for the count to be one with no reserved bits? Although it 
> is only a temporary stage..

Would it be preferrable to use "~MASK - (prio & ~MASK)"?

Iirc, mask is only used for internal bit. I followed the pattern of
page_mask but I think it would simply a bit to use INTERNAL_MASK.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-09-25  7:55 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-19 19:55 [PATCH 01/40] drm: Use default dma_fence hooks where possible for null syncobj Chris Wilson
2018-09-19 19:55 ` [PATCH 02/40] drm: Fix syncobj handing of schedule() returning 0 Chris Wilson
2018-09-20 14:13   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 03/40] drm/i915/selftests: Live tests emit requests and so require rpm Chris Wilson
2018-09-20 14:01   ` Tvrtko Ursulin
2018-09-20 14:47     ` Chris Wilson
2018-09-19 19:55 ` [PATCH 04/40] drm/i915: Park the GPU on module load Chris Wilson
2018-09-20 14:02   ` Tvrtko Ursulin
2018-09-20 14:52     ` Chris Wilson
2018-09-19 19:55 ` [PATCH 05/40] drm/i915: Handle incomplete Z_FINISH for compressed error states Chris Wilson
2018-09-19 19:55   ` Chris Wilson
2018-09-19 19:55 ` [PATCH 06/40] drm/i915: Clear the error PTE just once on finish Chris Wilson
2018-09-19 19:55 ` [PATCH 07/40] drm/i915: Cache the error string Chris Wilson
2018-09-19 19:55 ` [PATCH 08/40] drm/i915/execlists: Avoid kicking priority on the current context Chris Wilson
2018-09-19 19:55 ` [PATCH 09/40] drm/i915/selftests: Free the batch along the contexts error path Chris Wilson
2018-09-20  8:30   ` Mika Kuoppala
2018-09-20  8:36     ` Chris Wilson
2018-09-20  9:19       ` Mika Kuoppala
2018-09-19 19:55 ` [PATCH 10/40] drm/i915/selftests: Basic stress test for rapid context switching Chris Wilson
2018-09-20 10:38   ` Mika Kuoppala
2018-09-20 10:46     ` Chris Wilson
2018-09-19 19:55 ` [PATCH 11/40] drm/i915/execlists: Onion unwind for logical_ring_init() failure Chris Wilson
2018-09-20 14:18   ` Mika Kuoppala
2018-09-20 14:21   ` Tvrtko Ursulin
2018-09-20 19:59     ` Chris Wilson
2018-09-21 10:00       ` Tvrtko Ursulin
2018-09-21 10:01         ` Chris Wilson
2018-09-19 19:55 ` [PATCH 12/40] drm/i915/execlists: Assert the queue is non-empty on unsubmitting Chris Wilson
2018-09-24  9:07   ` Tvrtko Ursulin
2018-09-25  7:41     ` Chris Wilson
2018-09-25  8:51       ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 13/40] drm/i915: Reserve some priority bits for internal use Chris Wilson
2018-09-24  9:12   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 14/40] drm/i915: Combine multiple internal plists into the same i915_priolist bucket Chris Wilson
2018-09-24 10:25   ` Tvrtko Ursulin
2018-09-25  7:55     ` Chris Wilson [this message]
2018-09-19 19:55 ` [PATCH 15/40] drm/i915: Priority boost for new clients Chris Wilson
2018-09-24 10:29   ` Tvrtko Ursulin
2018-09-25  8:01     ` Chris Wilson
2018-09-25  8:26       ` Chris Wilson
2018-09-25  8:57         ` Tvrtko Ursulin
2018-09-25  9:06           ` Chris Wilson
2018-09-25  9:08             ` Tvrtko Ursulin
2018-09-25 11:20         ` Michal Wajdeczko
2018-09-19 19:55 ` [PATCH 16/40] drm/i915: Pull scheduling under standalone lock Chris Wilson
2018-09-24 11:19   ` Tvrtko Ursulin
2018-09-25  8:19     ` Chris Wilson
2018-09-25  9:01       ` Tvrtko Ursulin
2018-09-25  9:10         ` Chris Wilson
2018-09-25  9:19           ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 17/40] drm/i915: Priority boost for waiting clients Chris Wilson
2018-09-24 11:29   ` Tvrtko Ursulin
2018-09-25  9:00     ` Chris Wilson
2018-09-25  9:07       ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 18/40] drm/i915: Report the number of closed vma held by each context in debugfs Chris Wilson
2018-09-24 11:57   ` Tvrtko Ursulin
2018-09-25 12:20     ` Chris Wilson
2018-09-19 19:55 ` [PATCH 19/40] drm/i915: Remove debugfs/i915_ppgtt_info Chris Wilson
2018-09-24 12:03   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 20/40] drm/i915: Track all held rpm wakerefs Chris Wilson
2018-09-19 19:55 ` [PATCH 21/40] drm/i915: Markup paired operations on wakerefs Chris Wilson
2018-09-19 19:55 ` [PATCH 22/40] drm/i915: Syntatic sugar for using intel_runtime_pm Chris Wilson
2018-09-24 12:08   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 23/40] drm/i915: Markup paired operations on display power domains Chris Wilson
2018-09-19 19:55 ` [PATCH 24/40] drm/i915: Track the wakeref used to initialise " Chris Wilson
2018-09-19 19:55 ` [PATCH 25/40] drm/i915/dp: Markup pps lock power well Chris Wilson
2018-09-19 19:55 ` [PATCH 26/40] drm/i915: Complain if hsw_get_pipe_config acquires the same power well twice Chris Wilson
2018-09-19 19:55 ` [PATCH 27/40] drm/i915: Mark up Ironlake ips with rpm wakerefs Chris Wilson
2018-09-19 19:55 ` [PATCH 28/40] drm/i915: Serialise concurrent calls to i915_gem_set_wedged() Chris Wilson
2018-09-19 19:55 ` [PATCH 29/40] drm/i915: Differentiate between ggtt->mutex and ppgtt->mutex Chris Wilson
2018-09-24 13:04   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 30/40] drm/i915: Pull all the reset functionality together into i915_reset.c Chris Wilson
2018-09-19 19:55 ` [PATCH 31/40] drm/i915: Make all GPU resets atomic Chris Wilson
2018-09-19 19:55 ` [PATCH 32/40] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2018-09-24 13:20   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 33/40] drm/i915: Extend CREATE_CONTEXT to allow inheritance ala clone() Chris Wilson
2018-09-24 17:22   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 34/40] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2018-09-25  8:45   ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 35/40] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
2018-09-25  8:46   ` [Intel-gfx] " Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 36/40] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
2018-09-19 19:55 ` [PATCH 37/40] drm/i915: Allow a context to define its set of engines Chris Wilson
2018-09-27 11:28   ` Tvrtko Ursulin
2018-09-28 20:22     ` Chris Wilson
2018-10-01  8:30       ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 38/40] drm/i915/execlists: Flush the CS events before unpinning Chris Wilson
2018-10-01 10:51   ` Tvrtko Ursulin
2018-10-01 11:06     ` Chris Wilson
2018-10-01 13:15       ` Tvrtko Ursulin
2018-10-01 13:26         ` Chris Wilson
2018-10-01 14:03           ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 39/40] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
2018-09-27 11:32   ` Tvrtko Ursulin
2018-09-28 20:11     ` Chris Wilson
2018-10-01  8:14       ` Tvrtko Ursulin
2018-10-01  8:18         ` Chris Wilson
2018-10-01 10:18           ` Tvrtko Ursulin
2018-09-19 19:55 ` [PATCH 40/40] drm/i915: Load balancing across a virtual engine Chris Wilson
2018-10-01 11:37   ` Tvrtko Ursulin
2018-09-19 21:54 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/40] drm: Use default dma_fence hooks where possible for null syncobj Patchwork
2018-09-19 22:08 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-09-19 22:17 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-09-20 13:34 ` [PATCH 01/40] " Tvrtko Ursulin
2018-09-20 13:40   ` Chris Wilson
2018-09-20 13:54     ` Tvrtko Ursulin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=153786212006.21139.15998380731239267981@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.