All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Subject: [Intel-gfx] [PATCH v4] drm/i915/gt: Sanitize and reset GPU before removing powercontext
Date: Mon, 13 Jan 2020 16:24:29 +0000	[thread overview]
Message-ID: <20200113162429.1920747-1-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <20200113142630.1879666-1-chris@chris-wilson.co.uk>

As a final paranoid step (we _should_ have reset the GPU on suspending
the device prior to unload), reset the GPU once more before removing the
powercontext and other related power saving paraphernalia.

A clue that this may not be the case is

<7> [313.203721] __intel_gt_set_wedged rcs'0
<7> [313.203746] __intel_gt_set_wedged 	Awake? 3
<7> [313.203751] __intel_gt_set_wedged 	Barriers?: no
<7> [313.203756] __intel_gt_set_wedged 	Latency: 0us
<7> [313.203762] __intel_gt_set_wedged 	Reset count: 0 (global 0)
<7> [313.203766] __intel_gt_set_wedged 	Requests:
<7> [313.203785] __intel_gt_set_wedged 	MMIO base:  0x00002000
<7> [313.203819] __intel_gt_set_wedged 	RING_START: 0x00000000
<7> [313.203826] __intel_gt_set_wedged 	RING_HEAD:  0x00000000
<7> [313.203833] __intel_gt_set_wedged 	RING_TAIL:  0x00000000
<7> [313.203844] __intel_gt_set_wedged 	RING_CTL:   0x00000000
<7> [313.203854] __intel_gt_set_wedged 	RING_MODE:  0x00000000
<7> [313.203861] __intel_gt_set_wedged 	RING_IMR: fffffefe
<7> [313.203875] __intel_gt_set_wedged 	ACTHD:  0x00000000_00000000
<7> [313.203888] __intel_gt_set_wedged 	BBADDR: 0x00000000_00000000
<7> [313.203901] __intel_gt_set_wedged 	DMA_FADDR: 0x00000000_00000000
<7> [313.203909] __intel_gt_set_wedged 	IPEIR: 0x00000000
<7> [313.203916] __intel_gt_set_wedged 	IPEHR: 0xcccccccc
<7> [313.203921] __intel_gt_set_wedged 	Execlist tasklet queued? no (enabled), preempt? inactive, timeslice? inactive
<7> [313.203932] __intel_gt_set_wedged 	Execlist status: 0x00044032 00000020; CSB read:5, write:0, entries:6
<7> [313.203937] __intel_gt_set_wedged 	Execlist CSB[0]: 0x00000001, context: 0
<7> [313.203952] __intel_gt_set_wedged 		Pending[0] ring:{start:000c4000, hwsp:fedfc000, seqno:00000000}, rq:  402e:2-  prio=2147483647 @ 207ms: [i915]
<7> [313.203983] __intel_gt_set_wedged 		E  402e:2-  prio=2147483647 @ 207ms: [i915]
<7> [313.204006] __intel_gt_set_wedged 		Queue priority hint: 3

during rapid fault-injection reloads. 0xcc is POISON_FREE_INIT which
suggests that the system cleared the pages on initialisation as they are
still being used from the previous module load.

Despite that we also have a couple of GPU resets prior to this...
I have a sneaky suspicion that may be a GuC artifact.

v2: Just set the device as wedged (which includes a reset) on
suspend/unload, and leave the sanitization to load/resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c    |  3 +-
 drivers/gpu/drm/i915/gt/intel_gt_pm.c | 60 ++++++++++-----------------
 2 files changed, 24 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index da2b6e2ae692..700ee4c37487 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -588,7 +588,7 @@ int intel_gt_init(struct intel_gt *gt)
 
 	err = intel_gt_resume(gt);
 	if (err)
-		goto err_uc_init;
+		goto err_gt;
 
 	err = __engines_record_defaults(gt);
 	if (err)
@@ -606,7 +606,6 @@ int intel_gt_init(struct intel_gt *gt)
 err_gt:
 	__intel_gt_disable(gt);
 	intel_uc_fini_hw(&gt->uc);
-err_uc_init:
 	intel_uc_fini(&gt->uc);
 err_engines:
 	intel_engines_release(gt);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index d1c2f034296a..681cd986324f 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -118,36 +118,16 @@ void intel_gt_pm_init(struct intel_gt *gt)
 	intel_rps_init(&gt->rps);
 }
 
-static bool reset_engines(struct intel_gt *gt)
+static void reset_engines(struct intel_gt *gt)
 {
-	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
-		return false;
-
-	return __intel_gt_reset(gt, ALL_ENGINES) == 0;
+	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		__intel_gt_reset(gt, ALL_ENGINES);
 }
 
-static void gt_sanitize(struct intel_gt *gt, bool force)
+static void gt_sanitize(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-	intel_wakeref_t wakeref;
-
-	GT_TRACE(gt, "force:%s", yesno(force));
-
-	/* Use a raw wakeref to avoid calling intel_display_power_get early */
-	wakeref = intel_runtime_pm_get(gt->uncore->rpm);
-	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
-
-	/*
-	 * As we have just resumed the machine and woken the device up from
-	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
-	 * back to defaults, recovering from whatever wedged state we left it
-	 * in and so worth trying to use the device once more.
-	 */
-	if (intel_gt_is_wedged(gt))
-		intel_gt_unset_wedged(gt);
-
-	intel_uc_sanitize(&gt->uc);
 
 	for_each_engine(engine, gt, id)
 		if (engine->reset.prepare)
@@ -155,21 +135,18 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 
 	intel_uc_reset_prepare(&gt->uc);
 
-	if (reset_engines(gt) || force) {
-		for_each_engine(engine, gt, id)
-			__intel_engine_reset(engine, false);
-	}
+	reset_engines(gt);
+	for_each_engine(engine, gt, id)
+		__intel_engine_reset(engine, false);
 
 	for_each_engine(engine, gt, id)
 		if (engine->reset.finish)
 			engine->reset.finish(engine);
-
-	intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);
-	intel_runtime_pm_put(gt->uncore->rpm, wakeref);
 }
 
 void intel_gt_pm_fini(struct intel_gt *gt)
 {
+	intel_gt_set_wedged(gt);
 	intel_rc6_fini(&gt->rc6);
 }
 
@@ -192,15 +169,25 @@ int intel_gt_resume(struct intel_gt *gt)
 	 * allowing us to fixup the user contexts on their first pin.
 	 */
 	intel_gt_pm_get(gt);
-
 	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
-	intel_rc6_sanitize(&gt->rc6);
-	gt_sanitize(gt, true);
-	if (intel_gt_is_wedged(gt)) {
+
+	/*
+	 * As we have just resumed the machine and woken the device up from
+	 * deep PCI sleep (presumably D3_cold), assume the HW has been reset
+	 * back to defaults, recovering from whatever wedged state we left it
+	 * in and so worth trying to use the device once more.
+	 */
+	if (intel_gt_is_wedged(gt))
+		intel_gt_unset_wedged(gt);
+	if (unlikely(intel_gt_is_wedged(gt))) {
 		err = -EIO;
 		goto out_fw;
 	}
 
+	intel_rc6_sanitize(&gt->rc6);
+	intel_uc_sanitize(&gt->uc);
+	gt_sanitize(gt);
+
 	/* Only when the HW is re-initialised, can we replay the requests */
 	err = intel_gt_init_hw(gt);
 	if (err) {
@@ -308,8 +295,7 @@ void intel_gt_suspend_late(struct intel_gt *gt)
 		intel_llc_disable(&gt->llc);
 	}
 
-	gt_sanitize(gt, false);
-
+	intel_gt_set_wedged(gt);
 	GT_TRACE(gt, "\n");
 }
 
-- 
2.25.0.rc2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-01-13 16:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-13 13:26 [Intel-gfx] [PATCH v2] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 13:29 ` Chris Wilson
2020-01-13 14:09   ` Ville Syrjälä
2020-01-13 14:20     ` Chris Wilson
2020-01-13 14:26 ` [Intel-gfx] [PATCH v3] " Chris Wilson
2020-01-13 16:24   ` Chris Wilson [this message]
2020-01-13 15:44 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev4) Patchwork
2020-01-13 16:18 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 16:19 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 16:39 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev5) Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-01-13 17:05 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2020-01-13 17:17 ` [Intel-gfx] [PATCH v5] drm/i915/gt: Sanitize and reset GPU before removing powercontext Chris Wilson
2020-01-13 18:06 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gt: Sanitize and reset GPU before removing powercontext (rev6) Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2020-01-13 18:30 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200113162429.1920747-1-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.