All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Sanitize and reset GPU before removing powercontext
Date: Mon, 13 Jan 2020 16:09:23 +0200	[thread overview]
Message-ID: <20200113140923.GP13686@intel.com> (raw)
In-Reply-To: <20200113132956.1832986-1-chris@chris-wilson.co.uk>

On Mon, Jan 13, 2020 at 01:29:56PM +0000, Chris Wilson wrote:
> As a final paranoid step (we _should_ have reset the GPU on suspending
> the device prior to unload), reset the GPU once more before removing the
> powercontext and other related power saving paraphernalia.
> 
> A clue that this may not be the case is
> 
> <7> [313.203721] __intel_gt_set_wedged rcs'0
> <7> [313.203746] __intel_gt_set_wedged 	Awake? 3
> <7> [313.203751] __intel_gt_set_wedged 	Barriers?: no
> <7> [313.203756] __intel_gt_set_wedged 	Latency: 0us
> <7> [313.203762] __intel_gt_set_wedged 	Reset count: 0 (global 0)
> <7> [313.203766] __intel_gt_set_wedged 	Requests:
> <7> [313.203785] __intel_gt_set_wedged 	MMIO base:  0x00002000
> <7> [313.203819] __intel_gt_set_wedged 	RING_START: 0x00000000
> <7> [313.203826] __intel_gt_set_wedged 	RING_HEAD:  0x00000000
> <7> [313.203833] __intel_gt_set_wedged 	RING_TAIL:  0x00000000
> <7> [313.203844] __intel_gt_set_wedged 	RING_CTL:   0x00000000
> <7> [313.203854] __intel_gt_set_wedged 	RING_MODE:  0x00000000
> <7> [313.203861] __intel_gt_set_wedged 	RING_IMR: fffffefe
> <7> [313.203875] __intel_gt_set_wedged 	ACTHD:  0x00000000_00000000
> <7> [313.203888] __intel_gt_set_wedged 	BBADDR: 0x00000000_00000000
> <7> [313.203901] __intel_gt_set_wedged 	DMA_FADDR: 0x00000000_00000000
> <7> [313.203909] __intel_gt_set_wedged 	IPEIR: 0x00000000
> <7> [313.203916] __intel_gt_set_wedged 	IPEHR: 0xcccccccc
> <7> [313.203921] __intel_gt_set_wedged 	Execlist tasklet queued? no (enabled), preempt? inactive, timeslice? inactive
> <7> [313.203932] __intel_gt_set_wedged 	Execlist status: 0x00044032 00000020; CSB read:5, write:0, entries:6
> <7> [313.203937] __intel_gt_set_wedged 	Execlist CSB[0]: 0x00000001, context: 0
> <7> [313.203952] __intel_gt_set_wedged 		Pending[0] ring:{start:000c4000, hwsp:fedfc000, seqno:00000000}, rq:  402e:2-  prio=2147483647 @ 207ms: [i915]
> <7> [313.203983] __intel_gt_set_wedged 		E  402e:2-  prio=2147483647 @ 207ms: [i915]
> <7> [313.204006] __intel_gt_set_wedged 		Queue priority hint: 3
> 
> during rapid fault-injection reloads. 0xcc is POISON_FREE_INIT which
> suggests that the system cleared the pages on initialisation as they are
> still being used from the previous module load.
> 
> Despite that we also have a couple of GPU resets prior to this...
> I have a sneaky suspicion that may be a GuC artifact.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> drm/i915/gt: Lift clearing GT wedged out of gt_sanitize
> 
> We only want to try and reset a wedged device on resume, not before
> suspend, so lift the recovery out of the commont gt_sanitize().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 56 +++++++++++----------------
>  1 file changed, 22 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index d1c2f034296a..09a78d767e24 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -118,36 +118,16 @@ void intel_gt_pm_init(struct intel_gt *gt)
>  	intel_rps_init(&gt->rps);
>  }
>  
> -static bool reset_engines(struct intel_gt *gt)
> +static void reset_engines(struct intel_gt *gt)
>  {
>  	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)

Should that be a !gpu_reset_clobbers_display now?

> -		return false;
> -
> -	return __intel_gt_reset(gt, ALL_ENGINES) == 0;
> +		__intel_gt_reset(gt, ALL_ENGINES);
>  }
>  
> -static void gt_sanitize(struct intel_gt *gt, bool force)
> +static void gt_sanitize(struct intel_gt *gt)
>  {
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
> -	intel_wakeref_t wakeref;
> -
> -	GT_TRACE(gt, "force:%s", yesno(force));
> -
> -	/* Use a raw wakeref to avoid calling intel_display_power_get early */
> -	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
> -	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
> -
> -	/*
> -	 * As we have just resumed the machine and woken the device up from
> -	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> -	 * back to defaults, recovering from whatever wedged state we left it
> -	 * in and so worth trying to use the device once more.
> -	 */
> -	if (intel_gt_is_wedged(gt))
> -		intel_gt_unset_wedged(gt);
> -
> -	intel_uc_sanitize(&gt->uc);
>  
>  	for_each_engine(engine, gt, id)
>  		if (engine->reset.prepare)
> @@ -155,21 +135,18 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>  
>  	intel_uc_reset_prepare(&gt->uc);
>  
> -	if (reset_engines(gt) || force) {
> -		for_each_engine(engine, gt, id)
> -			__intel_engine_reset(engine, false);
> -	}
> +	reset_engines(gt);
> +	for_each_engine(engine, gt, id)
> +		__intel_engine_reset(engine, false);
>  
>  	for_each_engine(engine, gt, id)
>  		if (engine->reset.finish)
>  			engine->reset.finish(engine);
> -
> -	intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);
> -	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
>  }
>  
>  void intel_gt_pm_fini(struct intel_gt *gt)
>  {
> +	intel_gt_set_wedged(gt);
>  	intel_rc6_fini(&gt->rc6);
>  }
>  
> @@ -194,13 +171,25 @@ int intel_gt_resume(struct intel_gt *gt)
>  	intel_gt_pm_get(gt);
>  
>  	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
> +
>  	intel_rc6_sanitize(&gt->rc6);
> -	gt_sanitize(gt, true);
> -	if (intel_gt_is_wedged(gt)) {
> +	intel_uc_sanitize(&gt->uc);
> +
> +	/*
> +	 * As we have just resumed the machine and woken the device up from
> +	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
> +	 * back to defaults, recovering from whatever wedged state we left it
> +	 * in and so worth trying to use the device once more.
> +	 */
> +	if (intel_gt_is_wedged(gt))
> +		intel_gt_unset_wedged(gt);
> +	if (unlikely(intel_gt_is_wedged(gt))) {
>  		err = -EIO;
>  		goto out_fw;
>  	}
>  
> +	gt_sanitize(gt);
> +
>  	/* Only when the HW is re-initialised, can we replay the requests */
>  	err = intel_gt_init_hw(gt);
>  	if (err) {
> @@ -308,8 +297,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
>  		intel_llc_disable(&gt->llc);
>  	}
>  
> -	gt_sanitize(gt, false);
> -
> +	intel_gt_set_wedged(gt);
>  	GT_TRACE(gt, "\n");
>  }
>  
> -- 
> 2.25.0.rc2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-01-13 14:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-13 13:26 [Intel-gfx] [PATCH v2] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 13:29 ` Chris Wilson
2020-01-13 14:09   ` Ville Syrjälä [this message]
2020-01-13 14:20     ` Chris Wilson
2020-01-13 14:26 ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-01-13 16:24   ` [Intel-gfx] [PATCH v4] " Chris Wilson
2020-01-13 15:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev4) Patchwork
2020-01-13 16:18 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 16:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 16:39 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev5) Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 17:17 ` [Intel-gfx] [PATCH v5] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev6) Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200113140923.GP13686@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.