All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/5] drm/i915/execlists: Process interrupted context on reset
Date: Wed, 17 Jul 2019 14:43:34 +0100	[thread overview]
Message-ID: <156337101473.4375.9970391339491032566@skylake-alporthouse-com> (raw)
In-Reply-To: <156337082615.4375.11008860472465929485@skylake-alporthouse-com>

Quoting Chris Wilson (2019-07-17 14:40:26)
> Quoting Tvrtko Ursulin (2019-07-17 14:31:00)
> > 
> > On 16/07/2019 13:49, Chris Wilson wrote:
> > > By stopping the rings, we may trigger an arbitration point resulting in
> > > a premature context-switch (i.e. a completion event before the request
> > > is actually complete). This clears the active context before the reset,
> > > but we must remember to rewind the incomplete context for replay upon
> > > resume.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > index 9b87a2fc186c..7570a9256001 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > @@ -1419,7 +1419,8 @@ static void process_csb(struct intel_engine_cs *engine)
> > >                        * coherent (visible from the CPU) before the
> > >                        * user interrupt and CSB is processed.
> > >                        */
> > > -                     GEM_BUG_ON(!i915_request_completed(*execlists->active));
> > > +                     GEM_BUG_ON(!i915_request_completed(*execlists->active) &&
> > > +                                !reset_in_progress(execlists));
> > >                       execlists_schedule_out(*execlists->active++);
> > >   
> > >                       GEM_BUG_ON(execlists->active - execlists->inflight >
> > > @@ -2254,7 +2255,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> > >        */
> > >       rq = execlists_active(execlists);
> > >       if (!rq)
> > > -             return;
> > > +             goto unwind;
> > >   
> > >       ce = rq->hw_context;
> > >       GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > > @@ -2331,6 +2332,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
> > >       intel_ring_update_space(ce->ring);
> > >       __execlists_update_reg_state(ce, engine);
> > >   
> > > +unwind:
> > >       /* Push back any incomplete requests for replay after the reset. */
> > >       __unwind_incomplete_requests(engine);
> > >   }
> > > 
> > 
> > Sounds plausible.
> > 
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Shouldn't there be a Fixes: tag to go with it?
> 
> Yeah, it's rare even by our standards, I think there's a live_hangcheck
> failure about once a month that could be the result of this. However,
> the result would be an unrecoverable GPU hang as each attempt at
> resetting would not see the missing request and so it would remain
> perpetually in the engine->active.list until a set-wedged (i.e. suspend
> in the user case).

Heh, the commit responsible was one that was itself trying to workaround
the effect of stop_engines() setting RING_HEAD=0 :)

commit 1863e3020ab50bd5f68d85719ba26356cc282643
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Apr 11 14:05:15 2019 +0100

    drm/i915/execlists: Always reset the context's RING registers

    During reset, we try and stop the active ring. This has the consequence
    that we often clobber the RING registers within the context image. When
    we find an active request, we update the context image to rerun that
    request (if it was guilty, we replace the hanging user payload with
    NOPs). However, we were ignoring an active context if the request had
    completed, with the consequence that the next submission on that request
    would start with RING_HEAD==0 and not the tail of the previous request,
    causing all requests still in the ring to be rerun. Rare, but
    occasionally seen within CI where we would spot that the context seqno
    would reverse and complain that we were retiring an incomplete request.

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-07-17 13:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16 12:49 [PATCH 1/5] drm/i915/userptr: Beware recursive lock_page() Chris Wilson
2019-07-16 12:49 ` [PATCH 2/5] drm/i915/gt: Push engine stopping into reset-prepare Chris Wilson
2019-07-17 13:04   ` Tvrtko Ursulin
2019-07-17 13:08     ` Chris Wilson
2019-07-17 13:21       ` Tvrtko Ursulin
2019-07-17 13:30         ` Chris Wilson
2019-07-17 13:42           ` Tvrtko Ursulin
2019-07-17 13:56             ` Chris Wilson
2019-07-17 17:29               ` Tvrtko Ursulin
2019-07-16 12:49 ` [PATCH 3/5] drm/i915/execlists: Process interrupted context on reset Chris Wilson
2019-07-17 13:31   ` Tvrtko Ursulin
2019-07-17 13:40     ` Chris Wilson
2019-07-17 13:43       ` Chris Wilson [this message]
2019-07-16 12:49 ` [PATCH 4/5] drm/i915/execlists: Cancel breadcrumb on preempting the virtual engine Chris Wilson
2019-07-17 13:40   ` Tvrtko Ursulin
2019-07-19 11:51     ` Chris Wilson
2019-07-16 12:49 ` [PATCH 5/5] drm/i915: Hide unshrinkable context objects from the shrinker Chris Wilson
2019-07-16 13:46 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915/userptr: Beware recursive lock_page() Patchwork
2019-07-16 15:25 ` [Intel-gfx] [PATCH 1/5] " Tvrtko Ursulin
2019-07-16 15:25   ` Tvrtko Ursulin
2019-07-16 15:37   ` [Intel-gfx] " Chris Wilson
2019-07-17 13:09     ` Tvrtko Ursulin
2019-07-17 13:17       ` Chris Wilson
2019-07-17 13:23         ` Tvrtko Ursulin
2019-07-17 13:35           ` Chris Wilson
2019-07-17 13:46             ` Tvrtko Ursulin
2019-07-17 14:06               ` Chris Wilson
2019-07-17 18:09                 ` Tvrtko Ursulin
2019-07-26 13:38                   ` Lionel Landwerlin
2019-09-09 13:52                     ` Chris Wilson
2019-09-11 11:31                       ` Tvrtko Ursulin
2019-09-11 11:38                         ` Chris Wilson
2019-09-11 12:10                           ` Tvrtko Ursulin
2019-07-16 16:13 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
2019-11-06  7:22 ` [PATCH 1/5] " Chris Wilson
2019-11-06  7:22   ` [Intel-gfx] " Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156337101473.4375.9970391339491032566@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.