All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/5] drm/i915/gt: Push engine stopping into reset-prepare
Date: Wed, 17 Jul 2019 14:30:52 +0100	[thread overview]
Message-ID: <156337025269.4375.8104628033771518861@skylake-alporthouse-com> (raw)
In-Reply-To: <b462d4a4-d2ef-e44f-e633-a7f22f6142ef@linux.intel.com>

Quoting Tvrtko Ursulin (2019-07-17 14:21:50)
> 
> On 17/07/2019 14:08, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-07-17 14:04:34)
> >>
> >> On 16/07/2019 13:49, Chris Wilson wrote:
> >>> Push the engine stop into the back reset_prepare (where it already was!)
> >>> This allows us to avoid dangerously setting the RING registers to 0 for
> >>> logical contexts. If we clear the register on a live context, those
> >>> invalid register values are recorded in the logical context state and
> >>> replayed (with hilarious results).
> >>
> >> So essentially statement is gen3_stop_engine is not needed and even
> >> dangerous with execlists?
> > 
> > Yes. It has been a nuisance in the past, which is why we try to avoid
> > it. I have come to conclusion that it serves no purpose for execlists
> > and only makes recovery worse.
> > 
> >>
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c        | 16 +++++-
> >>>    drivers/gpu/drm/i915/gt/intel_reset.c      | 58 ----------------------
> >>>    drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 40 ++++++++++++++-
> >>>    3 files changed, 53 insertions(+), 61 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 9e0992498087..9b87a2fc186c 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -2173,11 +2173,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
> >>>        __tasklet_disable_sync_once(&execlists->tasklet);
> >>>        GEM_BUG_ON(!reset_in_progress(execlists));
> >>>    
> >>> -     intel_engine_stop_cs(engine);
> >>> -
> >>>        /* And flush any current direct submission. */
> >>>        spin_lock_irqsave(&engine->active.lock, flags);
> >>>        spin_unlock_irqrestore(&engine->active.lock, flags);
> >>> +
> >>> +     /*
> >>> +      * We stop engines, otherwise we might get failed reset and a
> >>> +      * dead gpu (on elk). Also as modern gpu as kbl can suffer
> >>> +      * from system hang if batchbuffer is progressing when
> >>> +      * the reset is issued, regardless of READY_TO_RESET ack.
> >>> +      * Thus assume it is best to stop engines on all gens
> >>> +      * where we have a gpu reset.
> >>> +      *
> >>> +      * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES)
> >>> +      *
> >>> +      * FIXME: Wa for more modern gens needs to be validated
> >>> +      */
> >>> +     intel_engine_stop_cs(engine);
> >>>    }
> >>>    
> >>>    static void reset_csb_pointers(struct intel_engine_cs *engine)
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> >>> index 7ddedfb16aa2..55e2ddcbd215 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> >>> @@ -135,47 +135,6 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
> >>>        }
> >>>    }
> >>>    
> >>> -static void gen3_stop_engine(struct intel_engine_cs *engine)
> >>> -{
> >>> -     struct intel_uncore *uncore = engine->uncore;
> >>> -     const u32 base = engine->mmio_base;
> >>> -
> >>> -     GEM_TRACE("%s\n", engine->name);
> >>> -
> >>> -     if (intel_engine_stop_cs(engine))
> >>> -             GEM_TRACE("%s: timed out on STOP_RING\n", engine->name);
> >>> -
> >>> -     intel_uncore_write_fw(uncore,
> >>> -                           RING_HEAD(base),
> >>> -                           intel_uncore_read_fw(uncore, RING_TAIL(base)));
> >>> -     intel_uncore_posting_read_fw(uncore, RING_HEAD(base)); /* paranoia */
> >>> -
> >>> -     intel_uncore_write_fw(uncore, RING_HEAD(base), 0);
> >>> -     intel_uncore_write_fw(uncore, RING_TAIL(base), 0);
> >>> -     intel_uncore_posting_read_fw(uncore, RING_TAIL(base));
> >>> -
> >>> -     /* The ring must be empty before it is disabled */
> >>> -     intel_uncore_write_fw(uncore, RING_CTL(base), 0);
> >>> -
> >>> -     /* Check acts as a post */
> >>> -     if (intel_uncore_read_fw(uncore, RING_HEAD(base)))
> >>> -             GEM_TRACE("%s: ring head [%x] not parked\n",
> >>> -                       engine->name,
> >>> -                       intel_uncore_read_fw(uncore, RING_HEAD(base)));
> >>> -}
> >>> -
> >>> -static void stop_engines(struct intel_gt *gt, intel_engine_mask_t engine_mask)
> >>> -{
> >>> -     struct intel_engine_cs *engine;
> >>> -     intel_engine_mask_t tmp;
> >>> -
> >>> -     if (INTEL_GEN(gt->i915) < 3)
> >>> -             return;
> >>> -
> >>> -     for_each_engine_masked(engine, gt->i915, engine_mask, tmp)
> >>> -             gen3_stop_engine(engine);
> >>> -}
> >>> -
> >>>    static bool i915_in_reset(struct pci_dev *pdev)
> >>>    {
> >>>        u8 gdrst;
> >>> @@ -607,23 +566,6 @@ int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
> >>>         */
> >>>        intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
> >>>        for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) {
> >>> -             /*
> >>> -              * We stop engines, otherwise we might get failed reset and a
> >>> -              * dead gpu (on elk). Also as modern gpu as kbl can suffer
> >>> -              * from system hang if batchbuffer is progressing when
> >>> -              * the reset is issued, regardless of READY_TO_RESET ack.
> >>> -              * Thus assume it is best to stop engines on all gens
> >>> -              * where we have a gpu reset.
> >>> -              *
> >>> -              * WaKBLVECSSemaphoreWaitPoll:kbl (on ALL_ENGINES)
> >>> -              *
> >>> -              * WaMediaResetMainRingCleanup:ctg,elk (presumably)
> >>> -              *
> >>> -              * FIXME: Wa for more modern gens needs to be validated
> >>> -              */
> >>> -             if (retry)
> >>> -                     stop_engines(gt, engine_mask);
> >>> -
> >>
> >> Only other functional change I see is that we stop retrying to stop the
> >> engines before reset attempts. I don't know if that is a concern or not.
> > 
> > Ah, but we do stop the engine before resets in *reset_prepare. The other
> > path to arrive is in sanitize where we don't know enough state to safely
> > tweak the engines. For those, I claim it shouldn't matter as the engines
> > should be idle and we only need the reset to clear stale context state.
> 
> Yes I know that we do call stop in prepare, just not on the reset retry 
> path. It's the above loop, if reset was failing and needed retries 
> before we would re-retried stopping engines and now we would not.

The engines are still stopped. The functional change is to remove the
dangerous clearing of RING_HEAD/CTL.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-07-17 13:30 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-16 12:49 [PATCH 1/5] drm/i915/userptr: Beware recursive lock_page() Chris Wilson
2019-07-16 12:49 ` [PATCH 2/5] drm/i915/gt: Push engine stopping into reset-prepare Chris Wilson
2019-07-17 13:04   ` Tvrtko Ursulin
2019-07-17 13:08     ` Chris Wilson
2019-07-17 13:21       ` Tvrtko Ursulin
2019-07-17 13:30         ` Chris Wilson [this message]
2019-07-17 13:42           ` Tvrtko Ursulin
2019-07-17 13:56             ` Chris Wilson
2019-07-17 17:29               ` Tvrtko Ursulin
2019-07-16 12:49 ` [PATCH 3/5] drm/i915/execlists: Process interrupted context on reset Chris Wilson
2019-07-17 13:31   ` Tvrtko Ursulin
2019-07-17 13:40     ` Chris Wilson
2019-07-17 13:43       ` Chris Wilson
2019-07-16 12:49 ` [PATCH 4/5] drm/i915/execlists: Cancel breadcrumb on preempting the virtual engine Chris Wilson
2019-07-17 13:40   ` Tvrtko Ursulin
2019-07-19 11:51     ` Chris Wilson
2019-07-16 12:49 ` [PATCH 5/5] drm/i915: Hide unshrinkable context objects from the shrinker Chris Wilson
2019-07-16 13:46 ` ✓ Fi.CI.BAT: success for series starting with [1/5] drm/i915/userptr: Beware recursive lock_page() Patchwork
2019-07-16 15:25 ` [Intel-gfx] [PATCH 1/5] " Tvrtko Ursulin
2019-07-16 15:25   ` Tvrtko Ursulin
2019-07-16 15:37   ` [Intel-gfx] " Chris Wilson
2019-07-17 13:09     ` Tvrtko Ursulin
2019-07-17 13:17       ` Chris Wilson
2019-07-17 13:23         ` Tvrtko Ursulin
2019-07-17 13:35           ` Chris Wilson
2019-07-17 13:46             ` Tvrtko Ursulin
2019-07-17 14:06               ` Chris Wilson
2019-07-17 18:09                 ` Tvrtko Ursulin
2019-07-26 13:38                   ` Lionel Landwerlin
2019-09-09 13:52                     ` Chris Wilson
2019-09-11 11:31                       ` Tvrtko Ursulin
2019-09-11 11:38                         ` Chris Wilson
2019-09-11 12:10                           ` Tvrtko Ursulin
2019-07-16 16:13 ` ✓ Fi.CI.IGT: success for series starting with [1/5] " Patchwork
2019-11-06  7:22 ` [PATCH 1/5] " Chris Wilson
2019-11-06  7:22   ` [Intel-gfx] " Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156337025269.4375.8104628033771518861@skylake-alporthouse-com \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.