Re: [PATCH 12/32] drm/i915: Invert the GEM wakeref hierarchy

From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/32] drm/i915: Invert the GEM wakeref hierarchy
Date: Thu, 18 Apr 2019 14:07:49 +0100	[thread overview]
Message-ID: <155559286924.4314.12445180445128039391@skylake-alporthouse-com> (raw)
In-Reply-To: <07b1d8d6-eb75-e5c1-1792-6d1d0ebfa499@linux.intel.com>

Quoting Tvrtko Ursulin (2019-04-18 13:42:59)
> 
> On 17/04/2019 08:56, Chris Wilson wrote:
> > +static bool switch_to_kernel_context(struct intel_engine_cs *engine)
> > +{
> > +     struct i915_request *rq;
> > +
> > +     /* Already inside the kernel context, safe to power down. */
> > +     if (engine->wakeref_serial == engine->serial)
> > +             return true;
> > +
> > +     /* GPU is pointing to the void, as good as in the kernel context. */
> > +     if (i915_reset_failed(engine->i915))
> > +             return true;
> > +
> > +     /*
> > +      * Note, we do this without taking the timeline->mutex. We cannot
> > +      * as we may be called while retiring the kernel context and so
> > +      * already underneath the timeline->mutex. Instead we rely on the
> > +      * exclusive property of the intel_engine_park that prevents anyone
> > +      * else from creating a request on this engine. This also requires
> > +      * that the ring is empty and we avoid any waits while constructing
> > +      * the context, as they assume protection by the timeline->mutex.
> > +      * This should hold true as we can only park the engine after
> > +      * retiring the last request, thus all rings should be empty and
> > +      * all timelines idle.
> > +      */
> > +     rq = __i915_request_create(engine->kernel_context, GFP_NOWAIT);
> > +     if (IS_ERR(rq))
> > +             /* Context switch failed, hope for the best! Maybe reset? */
> > +             return true;
> > +
> > +     /* Check again on the next retirement. */
> > +     engine->wakeref_serial = engine->serial + 1;
> 
> Is engine->serial guaranteed to be stable at this point? I guess so 
> since there can only be one park at a time.

Yes, we're inside the engine unpark routine so we are serialised against
all other users of the engine.

> > +     __i915_request_commit(rq);
> > +
> > +     return false;
> > +}
> > +
> > +static int intel_engine_park(struct intel_wakeref *wf)
> > +{
> > +     struct intel_engine_cs *engine =
> > +             container_of(wf, typeof(*engine), wakeref);
> > +
> > +     /*
> > +      * If one and only one request is completed between pm events,
> > +      * we know that we are inside the kernel context and it is
> > +      * safe to power down. (We are paranoid in case that runtime
> > +      * suspend causes corruption to the active context image, and
> > +      * want to avoid that impacting userspace.)
> > +      */
> > +     if (!switch_to_kernel_context(engine))
> > +             return -EBUSY;
> 
> But it is ignored by intel_engine_pm_put. Should it be a WARN_ON or 
> something?

The intel_wakeref takes action and defers the put/parking. That's all we
need here, as the GEM layer stays awake with its background retire
worker still poking occasionally.

> > @@ -718,6 +721,7 @@ static void reset_prepare(struct drm_i915_private *i915)
> >       struct intel_engine_cs *engine;
> >       enum intel_engine_id id;
> >   
> > +     intel_gt_pm_get(i915);
> 
> It's not in the spirit of the patch to let engines wake up the gt?

You could, this was just because I liked the look of it. In an operation
affecting all engines, it felt the right thing to do.

> >       for_each_engine(engine, i915, id)
> >               reset_prepare_engine(engine);
> >   
> > @@ -755,48 +759,10 @@ static int gt_reset(struct drm_i915_private *i915,
> >   static void reset_finish_engine(struct intel_engine_cs *engine)
> >   {
> >       engine->reset.finish(engine);
> > +     intel_engine_pm_put(engine);
> >       intel_uncore_forcewake_put(engine->uncore, FORCEWAKE_ALL);
> >   }
> >   
> > -struct i915_gpu_restart {
> > -     struct work_struct work;
> > -     struct drm_i915_private *i915;
> > -};
> > -
> > -static void restart_work(struct work_struct *work)
> 
> Oh wow I did not see this part of the code so far. Ask a second pair of 
> eyes to check on it?

This is the best part! Resolved a very, very annoying thorn with the
reset requiring a worker.

> > @@ -1137,6 +1076,9 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
> >       GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
> >       GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
> >   
> > +     if (!intel_wakeref_active(&engine->wakeref))
> > +             return 0;
> 
> I guess there can't be any races here since stuck engine can't be 
> parked. Do we have any tests which trigger this without a guilty 
> request? I kind of remember that isn't possible so probably not.

We do, we have the reset idle engine selftests. Not that it proves very
much, just that we don't die.

> > -static int init_render_ring(struct intel_engine_cs *engine)
> > +static int rcs_resume(struct intel_engine_cs *engine)
> >   {
> >       struct drm_i915_private *dev_priv = engine->i915;
> > -     int ret = init_ring_common(engine);
> > -     if (ret)
> > -             return ret;
> >   
> >       /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
> >       if (IS_GEN_RANGE(dev_priv, 4, 6))
> > @@ -875,7 +875,7 @@ static int init_render_ring(struct intel_engine_cs *engine)
> >       if (INTEL_GEN(dev_priv) >= 6)
> >               ENGINE_WRITE(engine, RING_IMR, ~engine->irq_keep_mask);
> >   
> > -     return 0;
> > +     return xcs_resume(engine);
> 
> This inverts the order between the common and rcs init. One thing which 
> jump out is the RING_IMR which is now done after starting the engine. 
> Can we lose an interrupt now?

That write shouldn't be there, we take care of that inside the restart.

> >   static void idle_work_handler(struct work_struct *work)
> >   {
> >       struct drm_i915_private *i915 =
> >               container_of(work, typeof(*i915), gem.idle_work.work);
> > -     bool rearm_hangcheck;
> > -
> > -     if (!READ_ONCE(i915->gt.awake))
> > -             return;
> > -
> > -     if (READ_ONCE(i915->gt.active_requests))
> > -             return;
> > -
> > -     rearm_hangcheck =
> > -             cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
> >   
> >       if (!mutex_trylock(&i915->drm.struct_mutex)) {
> 
> Should struct_mutex be taken by i915_gem_park, inside the wakeref lock? 
> Or that would create a lock inversion somewhere else?

NO! Yes, that would create the most mighty of inversions! Never, ever,
ever, take struct_mutex inside another lock, for its the outer lock for
the entire driver (give or take an insignificant amount).

About the only lock it is under is the mmap_sem, and look at the pain
that causes. :(

> > +bool i915_retire_requests(struct drm_i915_private *i915)
> >   {
> >       struct intel_ring *ring, *tmp;
> >   
> >       lockdep_assert_held(&i915->drm.struct_mutex);
> >   
> > -     if (!i915->gt.active_requests)
> > -             return;
> 
> You don't want to replace this with a wakeref_active check?

I didn't feel it was worth it in the short term. At the end of the day,
the main caller of i915_retire_requests() should be the retirement
worker, with a perhaps a call from the shrinker (but hopefully not).

[snip]

> As said before concept is very elegant and I like it.
> 
> But it is a monster refactor and as much as did cross-reference the diff 
> versus the patched tree to get a full picture I have to say my review is 
> more about high level and trusting the CI to catch any details. :I
> 
> My main concernig is lock nesting, especially the nested annotation in 
> the preceding patch. Does lockdep catch anything if you don't have that 
> annotation?

Yes. The shrinker calls the intel_wakeref_put, but we need to take more
locks inside the intel_wakeref_get (the pin_map, and more in unpark).
Hence they get caught in the same lock_map and lockdep gets quite angry
even though they cannot overlap.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx