From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Sanitize and reset GPU before removing powercontext
Date: Mon, 13 Jan 2020 16:09:23 +0200 [thread overview]
Message-ID: <20200113140923.GP13686@intel.com> (raw)
In-Reply-To: <20200113132956.1832986-1-chris@chris-wilson.co.uk>
On Mon, Jan 13, 2020 at 01:29:56PM +0000, Chris Wilson wrote:
> As a final paranoid step (we _should_ have reset the GPU on suspending
> the device prior to unload), reset the GPU once more before removing the
> powercontext and other related power saving paraphernalia.
>
> A clue that this may not be the case is
>
> <7> [313.203721] __intel_gt_set_wedged rcs'0
> <7> [313.203746] __intel_gt_set_wedged Awake? 3
> <7> [313.203751] __intel_gt_set_wedged Barriers?: no
> <7> [313.203756] __intel_gt_set_wedged Latency: 0us
> <7> [313.203762] __intel_gt_set_wedged Reset count: 0 (global 0)
> <7> [313.203766] __intel_gt_set_wedged Requests:
> <7> [313.203785] __intel_gt_set_wedged MMIO base: 0x00002000
> <7> [313.203819] __intel_gt_set_wedged RING_START: 0x00000000
> <7> [313.203826] __intel_gt_set_wedged RING_HEAD: 0x00000000
> <7> [313.203833] __intel_gt_set_wedged RING_TAIL: 0x00000000
> <7> [313.203844] __intel_gt_set_wedged RING_CTL: 0x00000000
> <7> [313.203854] __intel_gt_set_wedged RING_MODE: 0x00000000
> <7> [313.203861] __intel_gt_set_wedged RING_IMR: fffffefe
> <7> [313.203875] __intel_gt_set_wedged ACTHD: 0x00000000_00000000
> <7> [313.203888] __intel_gt_set_wedged BBADDR: 0x00000000_00000000
> <7> [313.203901] __intel_gt_set_wedged DMA_FADDR: 0x00000000_00000000
> <7> [313.203909] __intel_gt_set_wedged IPEIR: 0x00000000
> <7> [313.203916] __intel_gt_set_wedged IPEHR: 0xcccccccc
> <7> [313.203921] __intel_gt_set_wedged Execlist tasklet queued? no (enabled), preempt? inactive, timeslice? inactive
> <7> [313.203932] __intel_gt_set_wedged Execlist status: 0x00044032 00000020; CSB read:5, write:0, entries:6
> <7> [313.203937] __intel_gt_set_wedged Execlist CSB[0]: 0x00000001, context: 0
> <7> [313.203952] __intel_gt_set_wedged Pending[0] ring:{start:000c4000, hwsp:fedfc000, seqno:00000000}, rq: 402e:2- prio=2147483647 @ 207ms: [i915]
> <7> [313.203983] __intel_gt_set_wedged E 402e:2- prio=2147483647 @ 207ms: [i915]
> <7> [313.204006] __intel_gt_set_wedged Queue priority hint: 3
>
> during rapid fault-injection reloads. 0xcc is POISON_FREE_INIT which
> suggests that the system cleared the pages on initialisation as they are
> still being used from the previous module load.
>
> Despite that we also have a couple of GPU resets prior to this...
> I have a sneaky suspicion that may be a GuC artifact.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>
> drm/i915/gt: Lift clearing GT wedged out of gt_sanitize
>
> We only want to try and reset a wedged device on resume, not before
> suspend, so lift the recovery out of the commont gt_sanitize().
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
> drivers/gpu/drm/i915/gt/intel_gt_pm.c | 56 +++++++++++----------------
> 1 file changed, 22 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index d1c2f034296a..09a78d767e24 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -118,36 +118,16 @@ void intel_gt_pm_init(struct intel_gt *gt)
> intel_rps_init(>->rps);
> }
>
> -static bool reset_engines(struct intel_gt *gt)
> +static void reset_engines(struct intel_gt *gt)
> {
> if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
Should that be a !gpu_reset_clobbers_display now?
> - return false;
> -
> - return __intel_gt_reset(gt, ALL_ENGINES) == 0;
> + __intel_gt_reset(gt, ALL_ENGINES);
> }
>
> -static void gt_sanitize(struct intel_gt *gt, bool force)
> +static void gt_sanitize(struct intel_gt *gt)
> {
> struct intel_engine_cs *engine;
> enum intel_engine_id id;
> - intel_wakeref_t wakeref;
> -
> - GT_TRACE(gt, "force:%s", yesno(force));
> -
> - /* Use a raw wakeref to avoid calling intel_display_power_get early */
> - wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> - intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
> -
> - /*
> - * As we have just resumed the machine and woken the device up from
> - * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> - * back to defaults, recovering from whatever wedged state we left it
> - * in and so worth trying to use the device once more.
> - */
> - if (intel_gt_is_wedged(gt))
> - intel_gt_unset_wedged(gt);
> -
> - intel_uc_sanitize(>->uc);
>
> for_each_engine(engine, gt, id)
> if (engine->reset.prepare)
> @@ -155,21 +135,18 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>
> intel_uc_reset_prepare(>->uc);
>
> - if (reset_engines(gt) || force) {
> - for_each_engine(engine, gt, id)
> - __intel_engine_reset(engine, false);
> - }
> + reset_engines(gt);
> + for_each_engine(engine, gt, id)
> + __intel_engine_reset(engine, false);
>
> for_each_engine(engine, gt, id)
> if (engine->reset.finish)
> engine->reset.finish(engine);
> -
> - intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);
> - intel_runtime_pm_put(gt->uncore->rpm, wakeref);
> }
>
> void intel_gt_pm_fini(struct intel_gt *gt)
> {
> + intel_gt_set_wedged(gt);
> intel_rc6_fini(>->rc6);
> }
>
> @@ -194,13 +171,25 @@ int intel_gt_resume(struct intel_gt *gt)
> intel_gt_pm_get(gt);
>
> intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
> +
> intel_rc6_sanitize(>->rc6);
> - gt_sanitize(gt, true);
> - if (intel_gt_is_wedged(gt)) {
> + intel_uc_sanitize(>->uc);
> +
> + /*
> + * As we have just resumed the machine and woken the device up from
> + * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> + * back to defaults, recovering from whatever wedged state we left it
> + * in and so worth trying to use the device once more.
> + */
> + if (intel_gt_is_wedged(gt))
> + intel_gt_unset_wedged(gt);
> + if (unlikely(intel_gt_is_wedged(gt))) {
> err = -EIO;
> goto out_fw;
> }
>
> + gt_sanitize(gt);
> +
> /* Only when the HW is re-initialised, can we replay the requests */
> err = intel_gt_init_hw(gt);
> if (err) {
> @@ -308,8 +297,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
> intel_llc_disable(>->llc);
> }
>
> - gt_sanitize(gt, false);
> -
> + intel_gt_set_wedged(gt);
> GT_TRACE(gt, "\n");
> }
>
> --
> 2.25.0.rc2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2020-01-13 14:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-13 13:26 [Intel-gfx] [PATCH v2] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 13:29 ` Chris Wilson
2020-01-13 14:09 ` Ville Syrjälä [this message]
2020-01-13 14:20 ` Chris Wilson
2020-01-13 14:26 ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-01-13 16:24 ` [Intel-gfx] [PATCH v4] " Chris Wilson
2020-01-13 15:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev4) Patchwork
2020-01-13 16:18 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 16:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 16:39 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev5) Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 17:17 ` [Intel-gfx] [PATCH v5] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev6) Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200113140923.GP13686@intel.com \
--to=ville.syrjala@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.