Re: [PATCH 3/3] drm/i915: Store a pointer to intel_context in i915_request

From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/3] drm/i915: Store a pointer to intel_context in i915_request
Date: Thu, 19 Apr 2018 12:10:22 +0100	[thread overview]
Message-ID: <152413622285.27610.8888272949330694993@mail.alporthouse.com> (raw)
In-Reply-To: <4bed3d0d-81e2-aa87-e804-16a26e9aed31@linux.intel.com>

Quoting Tvrtko Ursulin (2018-04-19 11:40:06)
> 
> On 19/04/2018 08:17, Chris Wilson wrote:
> > To ease the frequent and ugly pointer dance of
> > &request->gem_context->engine[request->engine->id] during request
> > submission, store that pointer as request->hw_context. One major
> > advantage that we will exploit later is that this decouples the logical
> > context state from the engine itself.
> 
> I can see merit in making all the places which work on engines or 
> requests more logically and elegantly grab the engine specific context.
> 
> Authors of current unmerged work will probably be not so thrilled though.

> > @@ -353,60 +346,56 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload)
> >       struct intel_vgpu_submission *s = &vgpu->submission;
> >       struct i915_gem_context *shadow_ctx = s->shadow_ctx;
> >       struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
> > -     int ring_id = workload->ring_id;
> > -     struct intel_engine_cs *engine = dev_priv->engine[ring_id];
> > -     struct intel_ring *ring;
> > +     struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id];
> > +     struct intel_context *ce;
> >       int ret;
> >   
> >       lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >   
> > -     if (workload->shadowed)
> > +     if (workload->req)
> >               return 0;
> 
> GVT team will need to look at this hunk/function.
...
> At which point I skip over all GVT and continue further down. :)

That code doesn't merit your attention (or ours really until it is
improved, it really is staging/ quality).

> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2fe779cab298..ed1c591ee866 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -125,16 +125,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
> >       i915_ppgtt_put(ctx->ppgtt);
> >   
> >       for (i = 0; i < I915_NUM_ENGINES; i++) {
> > -             struct intel_context *ce = &ctx->engine[i];
> > +             struct intel_context *ce = &ctx->__engine[i];
> >   
> > -             if (!ce->state)
> > -                     continue;
> > -
> > -             WARN_ON(ce->pin_count);
> > -             if (ce->ring)
> > -                     intel_ring_free(ce->ring);
> > -
> > -             __i915_gem_object_release_unless_active(ce->state->obj);
> > +             if (ce->ops)
> 
> Is ce->ops now the gate which was "if (!ce->state) continue" before, to 
> avoid GEM_BUG_ON in execlists_context_destroy? I am wondering if this 
> loop should not just be for_each_engine. Places which walk until 
> I915_NUM_ENGINES should be very special and this one doesn't feel like 
> one.

s/I915_NUM_ENGINES/ARRAY_SIZE(ctx->__engine)/ That's the semantic intent
here. That we aren't just thinking about engines, but the contents of
the context struct. We teardown the struct.

> > @@ -1010,9 +1018,12 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
> >        */
> >       rq = __i915_gem_active_peek(&engine->timeline->last_request);
> >       if (rq)
> > -             return rq->gem_context == kernel_context;
> > -     else
> > -             return engine->last_retired_context == kernel_context;
> > +             return rq->gem_context == kctx;
> > +
> > +     if (!engine->last_retired_context)
> > +             return false;
> > +
> > +     return engine->last_retired_context->gem_context == kctx;
> 
> You thought this will be a bit more readable/obvious?

Hmm, I wrote this before I wrote to_intel_context(). If we use that, it
looks like

        const struct intel_context *kernel_context =
                to_intel_context(engine->i915->kernel_context, engine);
        struct i915_request *rq;

        rq = __i915_gem_active_peek(&engine->timeline->last_request);
        if (rq)
                return rq->hw_context == kernel_context;
        else
                return engine->last_retired_context == kernel_context;

which isn't as bad as the first pass was...

> > @@ -511,9 +511,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >       struct intel_guc_client *client = guc->execbuf_client;
> >       struct intel_engine_cs *engine = rq->engine;
> > -     u32 ctx_desc =
> > -             lower_32_bits(intel_lr_context_descriptor(rq->gem_context,
> > -                                                       engine));
> > +     u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc);
> 
> Ha...
> 
> >       u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
> >   
> >       spin_lock(&client->wq_lock);
> > @@ -551,8 +549,8 @@ static void inject_preempt_context(struct work_struct *work)
> >                                            preempt_work[engine->id]);
> >       struct intel_guc_client *client = guc->preempt_client;
> >       struct guc_stage_desc *stage_desc = __get_stage_desc(client);
> > -     u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(client->owner,
> > -                                                              engine));
> > +     u32 ctx_desc = lower_32_bits(to_intel_context(client->owner,
> > +                                                   engine)->lrc_desc);
> 
> ..hum.

Just pointing out that this code is horrible ;)

One of the goals of having rq->hw_context was to avoid the pointer
dance, such as exemplified by intel_lr_context_descriptor(). But here we
should just stick the ctx_desc in the intel_guc_client.

> > @@ -708,7 +706,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
> >               struct i915_request *rq, *rn;
> >   
> >               list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
> > -                     if (last && rq->gem_context != last->gem_context) {
> > +                     if (last && rq->hw_context != last->hw_context) {
> 
> Same thing really since this is already per engine.

Not always...

> >                               if (port == last_port) {
> >                                       __list_del_many(&p->requests,
> >                                                       &rq->sched.link);
> > @@ -991,7 +989,8 @@ static void guc_fill_preempt_context(struct intel_guc *guc)
> >       enum intel_engine_id id;
> >   
> >       for_each_engine(engine, dev_priv, id) {
> > -             struct intel_context *ce = &client->owner->engine[id];
> > +             struct intel_context *ce =
> > +                     to_intel_context(client->owner, engine);
> >               u32 addr = intel_hws_preempt_done_address(engine);
> >               u32 *cs;
> >   
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 0777226e65a6..e6725690b5b6 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -164,7 +164,8 @@
> >   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >   
> >   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> > -                                         struct intel_engine_cs *engine);
> > +                                         struct intel_engine_cs *engine,
> > +                                         struct intel_context *ce);
> 
> Feels a bit redundant to pass in everything but OK.

You do remember which patch this was split out of ;)

> >   {
> >       return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
> > -             i915_gem_context_force_single_submission(ctx));
> > +             i915_gem_context_force_single_submission(ctx->gem_context));
> >   }
> 
> But the whole function change is again not needed I think.
> 
> >   
> > -static bool can_merge_ctx(const struct i915_gem_context *prev,
> > -                       const struct i915_gem_context *next)
> > +static bool can_merge_ctx(const struct intel_context *prev,
> > +                       const struct intel_context *next)
> 
> This one neither.

Consider what it means. The rule is that we are not allowed to submit
the same lrc descriptor to subsequent ports; that corresponds to
the hw context, not gem_context. The payoff, aside from the better
semantics now, is in subsequent patches.

> > @@ -686,14 +687,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                                * the same context (even though a different
> >                                * request) to the second port.
> >                                */
> > -                             if (ctx_single_port_submission(last->gem_context) ||
> > -                                 ctx_single_port_submission(rq->gem_context)) {
> > +                             if (ctx_single_port_submission(last->hw_context) ||
> > +                                 ctx_single_port_submission(rq->hw_context)) {
> >                                       __list_del_many(&p->requests,
> >                                                       &rq->sched.link);
> >                                       goto done;
> >                               }
> >   
> > -                             GEM_BUG_ON(last->gem_context == rq->gem_context);
> > +                             GEM_BUG_ON(last->hw_context == rq->hw_context);
> 
> And if you agree with the above then these two hunks do not have to be 
> in the patch.

Respectfully disagree as this is ground work for subsequent logical
contexts that float between engines.

> > +static struct intel_context *
> > +execlists_context_pin(struct intel_engine_cs *engine,
> > +                   struct i915_gem_context *ctx)
> >   {
> > -     struct intel_context *ce = &ctx->engine[engine->id];
> > +     struct intel_context *ce = to_intel_context(ctx, engine);
> >   
> >       lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> > -     GEM_BUG_ON(ce->pin_count == 0);
> > -
> > -     if (--ce->pin_count)
> > -             return;
> >   
> > -     intel_ring_unpin(ce->ring);
> > +     if (likely(ce->pin_count++))
> > +             return ce;
> > +     GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> >   
> > -     ce->state->obj->pin_global--;
> > -     i915_gem_object_unpin_map(ce->state->obj);
> > -     i915_vma_unpin(ce->state);
> > +     ce->ops = &execlists_context_ops;
> 
> If ops are set on pin it would be good to unset them on destroy.

No point on destroy, since it's being destroyed.

> Hm, or even set them not on pin but on creation.

engine->context_pin() is our intel_context factory, pin is the creation
function.

Definitely a bit asymmetrical, the key bit was having ce->unpin() so that
we didn't try to release the intel_context via a stale engine reference.

What I have been considering is passing intel_context to
i915_request_alloc() instead and trying to avoid that asymmetry by
managing the creation in the caller and separating it from pin().

> Not a huge deal but it is a 
> bit unbalanced, and also if ce->ops is also a guard to distinguish 
> possible from impossible engines then it is a bit more strange.

Not impossible engines in this regard, unused contexts.

> >   
> > -     i915_gem_context_put(ctx);
> > +     return __execlists_context_pin(engine, ctx, ce);
> >   }
> >   
> >   static int execlists_request_alloc(struct i915_request *request)
> >   {
> > -     struct intel_engine_cs *engine = request->engine;
> > -     struct intel_context *ce = &request->gem_context->engine[engine->id];
> > -     int ret;
> > +     int err;
> >   
> > -     GEM_BUG_ON(!ce->pin_count);
> > +     GEM_BUG_ON(!request->hw_context->pin_count);
> >   
> >       /* Flush enough space to reduce the likelihood of waiting after
> >        * we start building the request - in which case we will just
> > @@ -1388,9 +1413,9 @@ static int execlists_request_alloc(struct i915_request *request)
> >        */
> >       request->reserved_space += EXECLISTS_REQUEST_SIZE;
> >   
> > -     ret = intel_ring_wait_for_space(request->ring, request->reserved_space);
> > -     if (ret)
> > -             return ret;
> > +     err = intel_ring_wait_for_space(request->ring, request->reserved_space);
> > +     if (err)
> > +             return err;
> 
> Rename of ret to err for a smaller diff is avoidable.

A couple of lines for trying to harmonise between ret and err.

> > @@ -1288,32 +1305,37 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
> >   
> >       i915_gem_context_get(ctx);
> >   
> > -out:
> >       /* One ringbuffer to rule them all */
> > -     return engine->buffer;
> > +     GEM_BUG_ON(!engine->buffer);
> > +     ce->ring = engine->buffer;
> > +
> > +     return ce;
> >   
> >   err:
> >       ce->pin_count = 0;
> 
> Strictly speaking this should be done in the caller.

Sssh. Tail call, think of it as a single contiguous function ;)

> > @@ -1323,10 +1345,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
> >   
> >       intel_engine_setup_common(engine);
> >   
> > -     err = intel_engine_init_common(engine);
> > -     if (err)
> > -             goto err;
> > -
> >       ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE);
> >       if (IS_ERR(ring)) {
> >               err = PTR_ERR(ring);
> > @@ -1341,8 +1359,14 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
> >       GEM_BUG_ON(engine->buffer);
> >       engine->buffer = ring;
> >   
> > +     err = intel_engine_init_common(engine);
> > +     if (err)
> > +             goto err_unpin;
> > +
> 
> This ordering change doesn't look like it has to be done in this patch.

It does. Do you want to take another look ;)

The problem here is the legacy ringbuffer submission uses a common
engine->buffer, and the kernel contexts are pinned in init_common; so we
have to create that buffer first.

> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index e6d4b882599a..47a9de741c5f 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -243,6 +243,11 @@ struct drm_i915_private *mock_gem_device(void)
> >       if (!i915->kernel_context)
> >               goto err_engine;
> >   
> > +     /* Fake the initial load for the powersaving context */
> > +     i915->engine[RCS]->last_retired_context =
> > +             intel_context_pin(i915->kernel_context, i915->engine[RCS]);
> > +     GEM_BUG_ON(IS_ERR_OR_NULL(i915->engine[RCS]->last_retired_context));
> 
> What is this?

Keeping selftests happy. I was asserting engine->last_retired_context was
set chasing ce->ops and found an instance where it wasn't which caused
this patch to blow up. And I had I written
intel_engine_has_kernel_context() differently, this chunk wouldn't have
been required.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx