intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 3/3] drm/i915/execlists: Offline error capture
Date: Thu, 16 Jan 2020 18:32:31 +0000	[thread overview]
Message-ID: <157919955113.2795.11657063397337910037@skylake-alporthouse-com> (raw)
In-Reply-To: <e9be510e-fb47-a615-cb77-0507fe7ff04d@linux.intel.com>

Quoting Tvrtko Ursulin (2020-01-16 18:14:24)
> 
> On 16/01/2020 17:48, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-01-16 17:22:10)
> >>
> >> On 15/01/2020 08:33, Chris Wilson wrote:
> >>> +     /*
> >>> +      * We need to _quickly_ capture the engine state before we reset.
> >>> +      * We are inside an atomic section (softirq) here and we are delaying
> >>> +      * the forced preemption event.
> >>> +      */
> >>> +     cap->error = capture_regs(engine);
> >>> +     if (!cap->error)
> >>> +             goto err_free;
> >>> +
> >>> +     if (i915_request_completed(cap->rq)) /* oops, not so guilty! */
> >>> +             goto err_store;
> >>
> >> Should this be a bug on? Doesn't look active_request() can return a
> >> non-completed request. Hm I guess we can make a wrong decision to reset
> >> the engine.
> > 
> > Aye. Until we actually invoke the reset, the engine is still active and
> > so may have advanced. We call ring_set_paused() so it doesn't get too
> > far ahead, but that still lets the breadcrumb tick over, so it is still
> > possible for the active_request() to complete (but no more).
> 
> ...
> 
> >> But in any case, if request has completed in the meantime, why go to
> >> i915_error_state_store which will log a hang in dmesg?
> > 
> > Because we are about to call intel_reset_engine(), so want some debug
> > clue as to why we got into a situation where we invoked the forced
> > preemption. I thought it might be useful to see the engine state, and to
> > drop the "oops, please file a bug request" because of the reset.
> 
> ... so we could still decide to bail out if request completed in the 
> meantime and give up on the whole reset business. Why not if not? I 
> guess it is of little practical difference, micro-second here or there 
> before a potential false positive.

(When I first added the check here, it was following a hacky
__intel_gt_reset() to ensure the engine had stopped, so I needed to
always do a real reset to cleanup the mess.)

Hmm. I was about to say "but the preemption window expired and we need
to reset". However, if we have completed this request and having done
since our earlier inspection, it must also hit an arbitration point
where the preemption will take place.

So yes, we can bail out here quietly if we find ourselves with a
completed request at the last moment.

For simplicity, I'm just going to ignore the troublemaker and put it on
the hold list.

         * Note that because we have not yet reset the engine at this point,
         * it is possible for the request that we have identified as being
         * guilty, did in fact complete and we will then hit an arbitration
         * point allowing the preemption to succeed. The likelihood of that
         * is very low (as the capturing of the engine registers should be
         * fast enough to run inside an irq-off atomic section!), so we will
         * simply hold that request accountable for being non-preemptible
         * long enough to force the reset.

We will then skip the completed request when it comes time to dequeue.
Business as usual in the land of preempt-to-busy.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-01-16 18:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-15  8:33 [Intel-gfx] [PATCH 1/3] drm/i915: Use common priotree lists for virtual engine Chris Wilson
2020-01-15  8:33 ` [Intel-gfx] [PATCH 2/3] drm/i915/gt: Allow temporary suspension of inflight requests Chris Wilson
2020-01-15 10:58   ` Tvrtko Ursulin
2020-01-15 11:01     ` Chris Wilson
2020-01-15 11:10   ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-01-15 11:37     ` Tvrtko Ursulin
2020-01-15 11:46       ` Chris Wilson
2020-01-16 17:12     ` Tvrtko Ursulin
2020-01-15  8:33 ` [Intel-gfx] [PATCH 3/3] drm/i915/execlists: Offline error capture Chris Wilson
2020-01-16 17:22   ` Tvrtko Ursulin
2020-01-16 17:48     ` Chris Wilson
2020-01-16 18:14       ` Tvrtko Ursulin
2020-01-16 18:32         ` Chris Wilson [this message]
2020-01-15  9:02 ` [Intel-gfx] [PATCH v2] drm/i915: Keep track of request among the scheduling lists Chris Wilson
2020-01-16 17:23   ` Tvrtko Ursulin
2020-01-15  9:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2] drm/i915: Keep track of request among the scheduling lists (rev2) Patchwork
2020-01-15 10:06 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-15 10:06 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-15 14:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [v2] drm/i915: Keep track of request among the scheduling lists (rev3) Patchwork
2020-01-15 14:37 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-17 20:47 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=157919955113.2795.11657063397337910037@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).