[PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset
@ 2017-12-17 13:28 Chris Wilson
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-17 13:28 UTC (permalink / raw)
  To: intel-gfx

Inside i915_gem_reset(), we start touching the HW and so require the
low-level HW to be re-enabled, in particular the PCI BARs.

Fixes: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
Testcase: igt/drv_hangman # i915g/i915gm
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6d39fdf2b604..72bea281edb7 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1924,9 +1924,6 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
 		goto taint;
 	}
 
-	i915_gem_reset(i915);
-	intel_overlay_reset(i915);
-
 	/* Ok, now get things going again... */
 
 	/*
@@ -1939,6 +1936,9 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
 		goto error;
 	}
 
+	i915_gem_reset(i915);
+	intel_overlay_reset(i915);
+
 	/*
 	 * Next we need to restore the context, but we don't use those
 	 * yet either...
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
@ 2017-12-17 13:28 ` Chris Wilson
  2017-12-18 11:14   ` Tvrtko Ursulin
                     ` (4 more replies)
  2017-12-17 13:28 ` [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine Chris Wilson
                   ` (5 subsequent siblings)
  6 siblings, 5 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-17 13:28 UTC (permalink / raw)
  To: intel-gfx

A useful bit of information for inspecting GPU stalls from
intel_engine_dump() are the error registers, IPEIR and IPEHR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 510e0bc3a377..05bd9e17452c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1757,6 +1757,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	addr = intel_engine_get_last_batch_head(engine);
 	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
 		   upper_32_bits(addr), lower_32_bits(addr));
+	addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
+	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
+		   upper_32_bits(addr), lower_32_bits(addr));
+	drm_printf(m, "\tIPEIR: 0x%08x\n",
+		   I915_READ(RING_IPEIR(engine->mmio_base)));
+	drm_printf(m, "\tIPEHR: 0x%08x\n",
+		   I915_READ(RING_IPEHR(engine->mmio_base)));
 
 	if (HAS_EXECLISTS(dev_priv)) {
 		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
@ 2017-12-17 13:28 ` Chris Wilson
  2017-12-18 21:50   ` Michel Thierry
  2017-12-17 14:07 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset Patchwork
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-12-17 13:28 UTC (permalink / raw)
  To: intel-gfx

Now that we skip a per-engine reset on an idle engine, we need to update
the selftest to take that into account. In the process, we find that we
were not stressing the per-engine reset very hard, so add those missing
active resets.

v2: Actually test i915_reset_engine() by loading it with requests.

Fixes: f6ba181ada55 ("drm/i915: Skip an engine reset if it recovered before our preparations")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 314 ++++++++++++++++++-----
 1 file changed, 250 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index f98546b8a7fa..c8a756e2139f 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -132,6 +132,12 @@ static int emit_recurse_batch(struct hang *h,
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = upper_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
+		*batch++ = MI_ARB_CHECK;
+
+		memset(batch, 0, 1024);
+		batch += 1024 / sizeof(*batch);
+
+		*batch++ = MI_ARB_CHECK;
 		*batch++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
 		*batch++ = lower_32_bits(vma->node.start);
 		*batch++ = upper_32_bits(vma->node.start);
@@ -140,6 +146,12 @@ static int emit_recurse_batch(struct hang *h,
 		*batch++ = 0;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
+		*batch++ = MI_ARB_CHECK;
+
+		memset(batch, 0, 1024);
+		batch += 1024 / sizeof(*batch);
+
+		*batch++ = MI_ARB_CHECK;
 		*batch++ = MI_BATCH_BUFFER_START | 1 << 8;
 		*batch++ = lower_32_bits(vma->node.start);
 	} else if (INTEL_GEN(i915) >= 4) {
@@ -147,12 +159,24 @@ static int emit_recurse_batch(struct hang *h,
 		*batch++ = 0;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
+		*batch++ = MI_ARB_CHECK;
+
+		memset(batch, 0, 1024);
+		batch += 1024 / sizeof(*batch);
+
+		*batch++ = MI_ARB_CHECK;
 		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
 		*batch++ = lower_32_bits(vma->node.start);
 	} else {
 		*batch++ = MI_STORE_DWORD_IMM;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
+		*batch++ = MI_ARB_CHECK;
+
+		memset(batch, 0, 1024);
+		batch += 1024 / sizeof(*batch);
+
+		*batch++ = MI_ARB_CHECK;
 		*batch++ = MI_BATCH_BUFFER_START | 2 << 6 | 1;
 		*batch++ = lower_32_bits(vma->node.start);
 	}
@@ -234,6 +258,16 @@ static void hang_fini(struct hang *h)
 	i915_gem_wait_for_idle(h->i915, I915_WAIT_LOCKED);
 }
 
+static bool wait_for_hang(struct hang *h, struct drm_i915_gem_request *rq)
+{
+	return !(wait_for_us(i915_seqno_passed(hws_seqno(h, rq),
+					       rq->fence.seqno),
+			     10) &&
+		 wait_for(i915_seqno_passed(hws_seqno(h, rq),
+					    rq->fence.seqno),
+			  1000));
+}
+
 static int igt_hang_sanitycheck(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -296,6 +330,9 @@ static void global_reset_lock(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
+	pr_debug("%s: current gpu_error=%08lx\n",
+		 __func__, i915->gpu_error.flags);
+
 	while (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags))
 		wait_event(i915->gpu_error.reset_queue,
 			   !test_bit(I915_RESET_BACKOFF,
@@ -353,54 +390,127 @@ static int igt_global_reset(void *arg)
 	return err;
 }
 
-static int igt_reset_engine(void *arg)
+static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 {
-	struct drm_i915_private *i915 = arg;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-	unsigned int reset_count, reset_engine_count;
+	struct hang h;
 	int err = 0;
 
-	/* Check that we can issue a global GPU and engine reset */
+	/* Check that we can issue an engine reset on an idle engine (no-op) */
 
 	if (!intel_has_reset_engine(i915))
 		return 0;
 
+	if (active) {
+		mutex_lock(&i915->drm.struct_mutex);
+		err = hang_init(&h, i915);
+		mutex_unlock(&i915->drm.struct_mutex);
+		if (err)
+			return err;
+	}
+
 	for_each_engine(engine, i915, id) {
-		set_bit(I915_RESET_ENGINE + engine->id, &i915->gpu_error.flags);
+		unsigned int reset_count, reset_engine_count;
+		IGT_TIMEOUT(end_time);
+
+		if (active && !intel_engine_can_store_dword(engine))
+			continue;
+
 		reset_count = i915_reset_count(&i915->gpu_error);
 		reset_engine_count = i915_reset_engine_count(&i915->gpu_error,
 							     engine);
 
-		err = i915_reset_engine(engine, I915_RESET_QUIET);
-		if (err) {
-			pr_err("i915_reset_engine failed\n");
-			break;
-		}
+		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		do {
+			if (active) {
+				struct drm_i915_gem_request *rq;
+
+				mutex_lock(&i915->drm.struct_mutex);
+				rq = hang_create_request(&h, engine,
+							 i915->kernel_context);
+				if (IS_ERR(rq)) {
+					err = PTR_ERR(rq);
+					break;
+				}
+
+				i915_gem_request_get(rq);
+				__i915_add_request(rq, true);
+				mutex_unlock(&i915->drm.struct_mutex);
+
+				if (!wait_for_hang(&h, rq)) {
+					struct drm_printer p = drm_info_printer(i915->drm.dev);
+
+					pr_err("%s: Failed to start request %x, at %x\n",
+					       __func__, rq->fence.seqno, hws_seqno(&h, rq));
+					intel_engine_dump(engine, &p,
+							  "%s\n", engine->name);
+
+					i915_gem_request_put(rq);
+					err = -EIO;
+					break;
+				}
 
-		if (i915_reset_count(&i915->gpu_error) != reset_count) {
-			pr_err("Full GPU reset recorded! (engine reset expected)\n");
-			err = -EINVAL;
-			break;
-		}
+				i915_gem_request_put(rq);
+			}
+
+			engine->hangcheck.stalled = true;
+			engine->hangcheck.seqno =
+				intel_engine_get_seqno(engine);
+
+			err = i915_reset_engine(engine, I915_RESET_QUIET);
+			if (err) {
+				pr_err("i915_reset_engine failed\n");
+				break;
+			}
+
+			if (i915_reset_count(&i915->gpu_error) != reset_count) {
+				pr_err("Full GPU reset recorded! (engine reset expected)\n");
+				err = -EINVAL;
+				break;
+			}
+
+			reset_engine_count += active;
+			if (i915_reset_engine_count(&i915->gpu_error, engine) !=
+			    reset_engine_count) {
+				pr_err("%s engine reset %srecorded!\n",
+				       engine->name, active ? "not " : "");
+				err = -EINVAL;
+				break;
+			}
+
+			engine->hangcheck.stalled = false;
+		} while (time_before(jiffies, end_time));
+		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
 
-		if (i915_reset_engine_count(&i915->gpu_error, engine) ==
-		    reset_engine_count) {
-			pr_err("No %s engine reset recorded!\n", engine->name);
-			err = -EINVAL;
+		if (err)
 			break;
-		}
 
-		clear_bit(I915_RESET_ENGINE + engine->id,
-			  &i915->gpu_error.flags);
+		cond_resched();
 	}
 
 	if (i915_terminally_wedged(&i915->gpu_error))
 		err = -EIO;
 
+	if (active) {
+		mutex_lock(&i915->drm.struct_mutex);
+		hang_fini(&h);
+		mutex_unlock(&i915->drm.struct_mutex);
+	}
+
 	return err;
 }
 
+static int igt_reset_idle_engine(void *arg)
+{
+	return __igt_reset_engine(arg, false);
+}
+
+static int igt_reset_active_engine(void *arg)
+{
+	return __igt_reset_engine(arg, true);
+}
+
 static int active_engine(void *data)
 {
 	struct intel_engine_cs *engine = data;
@@ -462,11 +572,12 @@ static int active_engine(void *data)
 	return err;
 }
 
-static int igt_reset_active_engines(void *arg)
+static int __igt_reset_engine_others(struct drm_i915_private *i915,
+				     bool active)
 {
-	struct drm_i915_private *i915 = arg;
-	struct intel_engine_cs *engine, *active;
+	struct intel_engine_cs *engine, *other;
 	enum intel_engine_id id, tmp;
+	struct hang h;
 	int err = 0;
 
 	/* Check that issuing a reset on one engine does not interfere
@@ -476,24 +587,36 @@ static int igt_reset_active_engines(void *arg)
 	if (!intel_has_reset_engine(i915))
 		return 0;
 
+	if (active) {
+		mutex_lock(&i915->drm.struct_mutex);
+		err = hang_init(&h, i915);
+		mutex_unlock(&i915->drm.struct_mutex);
+		if (err)
+			return err;
+	}
+
 	for_each_engine(engine, i915, id) {
-		struct task_struct *threads[I915_NUM_ENGINES];
+		struct task_struct *threads[I915_NUM_ENGINES] = {};
 		unsigned long resets[I915_NUM_ENGINES];
 		unsigned long global = i915_reset_count(&i915->gpu_error);
+		unsigned long count = 0;
 		IGT_TIMEOUT(end_time);
 
+		if (active && !intel_engine_can_store_dword(engine))
+			continue;
+
 		memset(threads, 0, sizeof(threads));
-		for_each_engine(active, i915, tmp) {
+		for_each_engine(other, i915, tmp) {
 			struct task_struct *tsk;
 
-			if (active == engine)
-				continue;
-
 			resets[tmp] = i915_reset_engine_count(&i915->gpu_error,
-							      active);
+							      other);
 
-			tsk = kthread_run(active_engine, active,
-					  "igt/%s", active->name);
+			if (other == engine)
+				continue;
+
+			tsk = kthread_run(active_engine, other,
+					  "igt/%s", other->name);
 			if (IS_ERR(tsk)) {
 				err = PTR_ERR(tsk);
 				goto unwind;
@@ -503,20 +626,70 @@ static int igt_reset_active_engines(void *arg)
 			get_task_struct(tsk);
 		}
 
-		set_bit(I915_RESET_ENGINE + engine->id, &i915->gpu_error.flags);
+		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
 		do {
+			if (active) {
+				struct drm_i915_gem_request *rq;
+
+				mutex_lock(&i915->drm.struct_mutex);
+				rq = hang_create_request(&h, engine,
+							 i915->kernel_context);
+				if (IS_ERR(rq)) {
+					err = PTR_ERR(rq);
+					mutex_unlock(&i915->drm.struct_mutex);
+					break;
+				}
+
+				i915_gem_request_get(rq);
+				__i915_add_request(rq, true);
+				mutex_unlock(&i915->drm.struct_mutex);
+
+				if (!wait_for_hang(&h, rq)) {
+					struct drm_printer p = drm_info_printer(i915->drm.dev);
+
+					pr_err("%s: Failed to start request %x, at %x\n",
+					       __func__, rq->fence.seqno, hws_seqno(&h, rq));
+					intel_engine_dump(engine, &p,
+							  "%s\n", engine->name);
+
+					i915_gem_request_put(rq);
+					err = -EIO;
+					break;
+				}
+
+				i915_gem_request_put(rq);
+			}
+
+			engine->hangcheck.stalled = true;
+			engine->hangcheck.seqno =
+				intel_engine_get_seqno(engine);
+
 			err = i915_reset_engine(engine, I915_RESET_QUIET);
 			if (err) {
-				pr_err("i915_reset_engine(%s) failed, err=%d\n",
-				       engine->name, err);
+				pr_err("i915_reset_engine(%s:%s) failed, err=%d\n",
+				       engine->name, active ? "active" : "idle", err);
 				break;
 			}
+
+			engine->hangcheck.stalled = false;
+			count++;
 		} while (time_before(jiffies, end_time));
-		clear_bit(I915_RESET_ENGINE + engine->id,
-			  &i915->gpu_error.flags);
+		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
+			engine->name, active ? "active" : "idle", count);
+
+		if (i915_reset_engine_count(&i915->gpu_error, engine) -
+		    resets[engine->id] != (active ? count : 0)) {
+			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
+			       engine->name, active ? "active" : "idle", count,
+			       i915_reset_engine_count(&i915->gpu_error,
+						       engine) - resets[engine->id]);
+			if (!err)
+				err = -EINVAL;
+		}
 
 unwind:
-		for_each_engine(active, i915, tmp) {
+		for_each_engine(other, i915, tmp) {
 			int ret;
 
 			if (!threads[tmp])
@@ -524,27 +697,29 @@ static int igt_reset_active_engines(void *arg)
 
 			ret = kthread_stop(threads[tmp]);
 			if (ret) {
-				pr_err("kthread for active engine %s failed, err=%d\n",
-				       active->name, ret);
+				pr_err("kthread for other engine %s failed, err=%d\n",
+				       other->name, ret);
 				if (!err)
 					err = ret;
 			}
 			put_task_struct(threads[tmp]);
 
 			if (resets[tmp] != i915_reset_engine_count(&i915->gpu_error,
-								   active)) {
+								   other)) {
 				pr_err("Innocent engine %s was reset (count=%ld)\n",
-				       active->name,
+				       other->name,
 				       i915_reset_engine_count(&i915->gpu_error,
-							       active) - resets[tmp]);
-				err = -EIO;
+							       other) - resets[tmp]);
+				if (!err)
+					err = -EINVAL;
 			}
 		}
 
 		if (global != i915_reset_count(&i915->gpu_error)) {
 			pr_err("Global reset (count=%ld)!\n",
 			       i915_reset_count(&i915->gpu_error) - global);
-			err = -EIO;
+			if (!err)
+				err = -EINVAL;
 		}
 
 		if (err)
@@ -556,9 +731,25 @@ static int igt_reset_active_engines(void *arg)
 	if (i915_terminally_wedged(&i915->gpu_error))
 		err = -EIO;
 
+	if (active) {
+		mutex_lock(&i915->drm.struct_mutex);
+		hang_fini(&h);
+		mutex_unlock(&i915->drm.struct_mutex);
+	}
+
 	return err;
 }
 
+static int igt_reset_idle_engine_others(void *arg)
+{
+	return __igt_reset_engine_others(arg, false);
+}
+
+static int igt_reset_active_engine_others(void *arg)
+{
+	return __igt_reset_engine_others(arg, true);
+}
+
 static u32 fake_hangcheck(struct drm_i915_gem_request *rq)
 {
 	u32 reset_count;
@@ -574,16 +765,6 @@ static u32 fake_hangcheck(struct drm_i915_gem_request *rq)
 	return reset_count;
 }
 
-static bool wait_for_hang(struct hang *h, struct drm_i915_gem_request *rq)
-{
-	return !(wait_for_us(i915_seqno_passed(hws_seqno(h, rq),
-					       rq->fence.seqno),
-			     10) &&
-		 wait_for(i915_seqno_passed(hws_seqno(h, rq),
-					    rq->fence.seqno),
-			  1000));
-}
-
 static int igt_wait_reset(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -617,8 +798,8 @@ static int igt_wait_reset(void *arg)
 	if (!wait_for_hang(&h, rq)) {
 		struct drm_printer p = drm_info_printer(i915->drm.dev);
 
-		pr_err("Failed to start request %x, at %x\n",
-		       rq->fence.seqno, hws_seqno(&h, rq));
+		pr_err("%s: Failed to start request %x, at %x\n",
+		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
 		i915_reset(i915, 0);
@@ -712,8 +893,8 @@ static int igt_reset_queue(void *arg)
 			if (!wait_for_hang(&h, prev)) {
 				struct drm_printer p = drm_info_printer(i915->drm.dev);
 
-				pr_err("Failed to start request %x, at %x\n",
-				       prev->fence.seqno, hws_seqno(&h, prev));
+				pr_err("%s: Failed to start request %x, at %x\n",
+				       __func__, prev->fence.seqno, hws_seqno(&h, prev));
 				intel_engine_dump(prev->engine, &p,
 						  "%s\n", prev->engine->name);
 
@@ -819,8 +1000,8 @@ static int igt_handle_error(void *arg)
 	if (!wait_for_hang(&h, rq)) {
 		struct drm_printer p = drm_info_printer(i915->drm.dev);
 
-		pr_err("Failed to start request %x, at %x\n",
-		       rq->fence.seqno, hws_seqno(&h, rq));
+		pr_err("%s: Failed to start request %x, at %x\n",
+		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
 		i915_reset(i915, 0);
@@ -864,21 +1045,26 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_global_reset), /* attempt to recover GPU first */
 		SUBTEST(igt_hang_sanitycheck),
-		SUBTEST(igt_reset_engine),
-		SUBTEST(igt_reset_active_engines),
+		SUBTEST(igt_reset_idle_engine),
+		SUBTEST(igt_reset_active_engine),
+		SUBTEST(igt_reset_idle_engine_others),
+		SUBTEST(igt_reset_active_engine_others),
 		SUBTEST(igt_wait_reset),
 		SUBTEST(igt_reset_queue),
 		SUBTEST(igt_handle_error),
 	};
+	bool saved_hangcheck;
 	int err;
 
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
 	intel_runtime_pm_get(i915);
+	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
 
 	err = i915_subtests(tests, i915);
 
+	i915_modparams.enable_hangcheck = saved_hangcheck;
 	intel_runtime_pm_put(i915);
 
 	return err;
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
  2017-12-17 13:28 ` [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine Chris Wilson
@ 2017-12-17 14:07 ` Patchwork
  2017-12-17 15:36 ` ✗ Fi.CI.IGT: warning " Patchwork
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-12-17 14:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
URL   : https://patchwork.freedesktop.org/series/35471/
State : success

== Summary ==

Series 35471v1 series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
https://patchwork.freedesktop.org/api/1.0/series/35471/revisions/1/mbox/

Test debugfs_test:
        Subgroup read_all_entries:
                pass       -> DMESG-WARN (fi-elk-e7500) fdo#103989 +1
Test drv_hangman:
        Subgroup error-state-basic:
                dmesg-warn -> PASS       (fi-gdg-551)
Test gem_basic:
        Subgroup bad-close:
                incomplete -> PASS       (fi-gdg-551)
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-kbl-r) fdo#104172 +1
Test kms_psr_sink_crc:
        Subgroup psr_basic:
                pass       -> DMESG-WARN (fi-skl-6700hq) fdo#101144

fdo#103989 https://bugs.freedesktop.org/show_bug.cgi?id=103989
fdo#104172 https://bugs.freedesktop.org/show_bug.cgi?id=104172
fdo#101144 https://bugs.freedesktop.org/show_bug.cgi?id=101144

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:434s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:445s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:381s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:504s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:277s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:497s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:495s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:475s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:470s
fi-elk-e7500     total:224  pass:163  dwarn:15  dfail:0   fail:0   skip:45 
fi-gdg-551       total:288  pass:179  dwarn:1   dfail:0   fail:0   skip:108 time:265s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:537s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:405s
fi-hsw-4770r     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:413s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:383s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:468s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:427s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:476s
fi-kbl-7560u     total:288  pass:268  dwarn:1   dfail:0   fail:0   skip:19  time:517s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:466s
fi-kbl-r         total:288  pass:260  dwarn:1   dfail:0   fail:0   skip:27  time:521s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:606s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:438s
fi-skl-6600u     total:288  pass:260  dwarn:1   dfail:0   fail:0   skip:27  time:531s
fi-skl-6700hq    total:288  pass:261  dwarn:1   dfail:0   fail:0   skip:26  time:555s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:510s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:442s
fi-snb-2520m     total:245  pass:211  dwarn:0   dfail:0   fail:0   skip:33 
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:407s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:590s
fi-cnl-y         total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:612s
fi-glk-dsi       total:94   pass:45   dwarn:0   dfail:1   fail:0   skip:47 
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:503s

aceab3849b32f367b3a38fe6852c5118b1c95839 drm-tip: 2017y-12m-17d-12h-52m-17s UTC integration manifest
ed3ef1790ee7 drm/i915/selftests: Fix up igt_reset_engine
0895dbc71bb0 drm/i915: Show IPEIR and IPEHR in the engine dump
1e42e4c592f9 drm/i915: Re-enable GGTT earlier after GPU reset

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7520/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.IGT: warning for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
                   ` (2 preceding siblings ...)
  2017-12-17 14:07 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset Patchwork
@ 2017-12-17 15:36 ` Patchwork
  2017-12-17 18:19 ` [PATCH 1/3] " Chris Wilson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-12-17 15:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
URL   : https://patchwork.freedesktop.org/series/35471/
State : warning

== Summary ==

Test drv_suspend:
        Subgroup fence-restore-untiled:
                incomplete -> PASS       (shard-hsw)
Test gem_exec_suspend:
        Subgroup basic-s3-devices:
                incomplete -> PASS       (shard-hsw) fdo#103990
Test kms_draw_crc:
        Subgroup draw-method-rgb565-pwrite-untiled:
                pass       -> SKIP       (shard-snb)
Test drv_selftest:
        Subgroup live_hangcheck:
                incomplete -> PASS       (shard-snb) fdo#103880
Test kms_frontbuffer_tracking:
        Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
                pass       -> FAIL       (shard-snb) fdo#101623 +1
        Subgroup psr-1p-primscrn-spr-indfb-draw-pwrite:
                incomplete -> SKIP       (shard-hsw)
Test gem_tiled_swapping:
        Subgroup non-threaded:
                dmesg-warn -> PASS       (shard-hsw) fdo#104218
Test drv_module_reload:
        Subgroup basic-reload:
                dmesg-warn -> PASS       (shard-snb) fdo#102848
                pass       -> DMESG-WARN (shard-hsw) fdo#102707
Test kms_setmode:
        Subgroup basic:
                fail       -> PASS       (shard-hsw) fdo#99912

fdo#103990 https://bugs.freedesktop.org/show_bug.cgi?id=103990
fdo#103880 https://bugs.freedesktop.org/show_bug.cgi?id=103880
fdo#101623 https://bugs.freedesktop.org/show_bug.cgi?id=101623
fdo#104218 https://bugs.freedesktop.org/show_bug.cgi?id=104218
fdo#102848 https://bugs.freedesktop.org/show_bug.cgi?id=102848
fdo#102707 https://bugs.freedesktop.org/show_bug.cgi?id=102707
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912

shard-hsw        total:2712 pass:1537 dwarn:2   dfail:0   fail:9   skip:1164 time:9416s
shard-snb        total:2712 pass:1306 dwarn:1   dfail:0   fail:13  skip:1392 time:8023s
Blacklisted hosts:
shard-apl        total:2712 pass:1686 dwarn:1   dfail:0   fail:24  skip:1001 time:14096s
shard-kbl        total:2694 pass:1788 dwarn:2   dfail:0   fail:25  skip:878 time:10879s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7520/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
                   ` (3 preceding siblings ...)
  2017-12-17 15:36 ` ✗ Fi.CI.IGT: warning " Patchwork
@ 2017-12-17 18:19 ` Chris Wilson
  2017-12-18 11:11 ` Tvrtko Ursulin
  2017-12-18 13:13 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset (rev4) Patchwork
  6 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-17 18:19 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2017-12-17 13:28:50)
> Inside i915_gem_reset(), we start touching the HW and so require the
> low-level HW to be re-enabled, in particular the PCI BARs.
> 
> Fixes: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
References: 0db8c9612091 ("drm/i915: Re-enable GTT following a device reset")

Maybe fixes?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
                   ` (4 preceding siblings ...)
  2017-12-17 18:19 ` [PATCH 1/3] " Chris Wilson
@ 2017-12-18 11:11 ` Tvrtko Ursulin
  2017-12-18 11:19   ` Chris Wilson
  2017-12-18 13:13 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset (rev4) Patchwork
  6 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-12-18 11:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/12/2017 13:28, Chris Wilson wrote:
> Inside i915_gem_reset(), we start touching the HW and so require the
> low-level HW to be re-enabled, in particular the PCI BARs.
> 
> Fixes: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
> Testcase: igt/drv_hangman # i915g/i915gm
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6d39fdf2b604..72bea281edb7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1924,9 +1924,6 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
>   		goto taint;
>   	}
>   
> -	i915_gem_reset(i915);
> -	intel_overlay_reset(i915);
> -
>   	/* Ok, now get things going again... */
>   
>   	/*
> @@ -1939,6 +1936,9 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
>   		goto error;
>   	}
>   
> +	i915_gem_reset(i915);
> +	intel_overlay_reset(i915);
> +
>   	/*
>   	 * Next we need to restore the context, but we don't use those
>   	 * yet either...
> 

Looks fine to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
@ 2017-12-18 11:14   ` Tvrtko Ursulin
  2017-12-18 11:18     ` Chris Wilson
  2017-12-18 11:14   ` Chris Wilson
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-12-18 11:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/12/2017 13:28, Chris Wilson wrote:
> A useful bit of information for inspecting GPU stalls from
> intel_engine_dump() are the error registers, IPEIR and IPEHR.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 510e0bc3a377..05bd9e17452c 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1757,6 +1757,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	addr = intel_engine_get_last_batch_head(engine);
>   	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
>   		   upper_32_bits(addr), lower_32_bits(addr));
> +	addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> +	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
> +		   upper_32_bits(addr), lower_32_bits(addr));

ERror capture handles this register a bit differently.

> +	drm_printf(m, "\tIPEIR: 0x%08x\n",
> +		   I915_READ(RING_IPEIR(engine->mmio_base)));
> +	drm_printf(m, "\tIPEHR: 0x%08x\n",
> +		   I915_READ(RING_IPEHR(engine->mmio_base)));

This one as well two code paths depending on the gen.

Regards,

Tvrtko

>   
>   	if (HAS_EXECLISTS(dev_priv)) {
>   		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
  2017-12-18 11:14   ` Tvrtko Ursulin
@ 2017-12-18 11:14   ` Chris Wilson
  2017-12-18 11:26   ` [PATCH v2] " Chris Wilson
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 11:14 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2017-12-17 13:28:51)
> A useful bit of information for inspecting GPU stalls from
> intel_engine_dump() are the error registers, IPEIR and IPEHR.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_engine_cs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 510e0bc3a377..05bd9e17452c 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1757,6 +1757,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>         addr = intel_engine_get_last_batch_head(engine);
>         drm_printf(m, "    BBADDR: 0x%08x_%08x\n",
>                    upper_32_bits(addr), lower_32_bits(addr));
> +       addr = I915_READ(RING_DMA_FADD(engine->mmio_base));

if (GEN >= 8)
	addr |= (u64)I915_READ(RING_DMA_FADD_UDW) << 32;
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 11:14   ` Tvrtko Ursulin
@ 2017-12-18 11:18     ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 11:18 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-12-18 11:14:19)
> 
> On 17/12/2017 13:28, Chris Wilson wrote:
> > A useful bit of information for inspecting GPU stalls from
> > intel_engine_dump() are the error registers, IPEIR and IPEHR.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 510e0bc3a377..05bd9e17452c 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1757,6 +1757,13 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >       addr = intel_engine_get_last_batch_head(engine);
> >       drm_printf(m, "    BBADDR: 0x%08x_%08x\n",
> >                  upper_32_bits(addr), lower_32_bits(addr));
> > +     addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> > +     drm_printf(m, "    DMA_FADDR: 0x%08x_%08x\n",
> > +                upper_32_bits(addr), lower_32_bits(addr));
> 
> ERror capture handles this register a bit differently.
> 
> > +     drm_printf(m, "    IPEIR: 0x%08x\n",
> > +                I915_READ(RING_IPEIR(engine->mmio_base)));
> > +     drm_printf(m, "    IPEHR: 0x%08x\n",
> > +                I915_READ(RING_IPEHR(engine->mmio_base)));
> 
> This one as well two code paths depending on the gen.

My bad for assuming it was the same location, just per-engine-ified.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset
  2017-12-18 11:11 ` Tvrtko Ursulin
@ 2017-12-18 11:19   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 11:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-12-18 11:11:49)
> 
> On 17/12/2017 13:28, Chris Wilson wrote:
> > Inside i915_gem_reset(), we start touching the HW and so require the
> > low-level HW to be re-enabled, in particular the PCI BARs.
> > 
> > Fixes: 7b6da818d86f ("drm/i915: Restore the kernel context after a GPU reset on an idle engine")
> > Testcase: igt/drv_hangman # i915g/i915gm
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 6d39fdf2b604..72bea281edb7 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1924,9 +1924,6 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> >               goto taint;
> >       }
> >   
> > -     i915_gem_reset(i915);
> > -     intel_overlay_reset(i915);
> > -
> >       /* Ok, now get things going again... */
> >   
> >       /*
> > @@ -1939,6 +1936,9 @@ void i915_reset(struct drm_i915_private *i915, unsigned int flags)
> >               goto error;
> >       }
> >   
> > +     i915_gem_reset(i915);
> > +     intel_overlay_reset(i915);
> > +
> >       /*
> >        * Next we need to restore the context, but we don't use those
> >        * yet either...
> > 
> 
> Looks fine to me.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Ta, pushed this one by itself so we can bring gdg back online.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
  2017-12-18 11:14   ` Tvrtko Ursulin
  2017-12-18 11:14   ` Chris Wilson
@ 2017-12-18 11:26   ` Chris Wilson
  2017-12-18 12:08     ` Tvrtko Ursulin
  2017-12-18 12:17   ` [PATCH v3] " Chris Wilson
  2017-12-18 12:39   ` [PATCH v4] " Chris Wilson
  4 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 11:26 UTC (permalink / raw)
  To: intel-gfx

A useful bit of information for inspecting GPU stalls from
intel_engine_dump() are the error registers, IPEIR and IPEHR.

v2: Fixup gen changes in register offsets (Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 510e0bc3a377..92b9e0dd6378 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1757,6 +1757,20 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	addr = intel_engine_get_last_batch_head(engine);
 	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
 		   upper_32_bits(addr), lower_32_bits(addr));
+	addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
+	if (INTEL_GEN(dev_priv) >= 8)
+		addr |= (u64)I915_READ(RING_DMA_FADD_UDW(engine->mmio_base)) << 32;
+	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
+		   upper_32_bits(addr), lower_32_bits(addr));
+	if (INTEL_GEN(dev_priv) >= 4) {
+		drm_printf(m, "\tIPEIR: 0x%08x\n",
+			   I915_READ(RING_IPEIR(engine->mmio_base)));
+		drm_printf(m, "\tIPEHR: 0x%08x\n",
+			   I915_READ(RING_IPEHR(engine->mmio_base)));
+	} else {
+		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
+		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
+	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
 		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 11:26   ` [PATCH v2] " Chris Wilson
@ 2017-12-18 12:08     ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-12-18 12:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/12/2017 11:26, Chris Wilson wrote:
> A useful bit of information for inspecting GPU stalls from
> intel_engine_dump() are the error registers, IPEIR and IPEHR.
> 
> v2: Fixup gen changes in register offsets (Tvrtko)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 510e0bc3a377..92b9e0dd6378 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1757,6 +1757,20 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	addr = intel_engine_get_last_batch_head(engine);
>   	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
>   		   upper_32_bits(addr), lower_32_bits(addr));
> +	addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		addr |= (u64)I915_READ(RING_DMA_FADD_UDW(engine->mmio_base)) << 32;
> +	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
> +		   upper_32_bits(addr), lower_32_bits(addr));

< gen4 case not interesting here? Error capture reads DMA_FADD_I8XX in 
that case. Doesn't look like the same offset to me.

Regards,

Tvrtko

> +	if (INTEL_GEN(dev_priv) >= 4) {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n",
> +			   I915_READ(RING_IPEIR(engine->mmio_base)));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n",
> +			   I915_READ(RING_IPEHR(engine->mmio_base)));
> +	} else {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
> +	}
>   
>   	if (HAS_EXECLISTS(dev_priv)) {
>   		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
                     ` (2 preceding siblings ...)
  2017-12-18 11:26   ` [PATCH v2] " Chris Wilson
@ 2017-12-18 12:17   ` Chris Wilson
  2017-12-18 12:32     ` Tvrtko Ursulin
  2017-12-18 12:39   ` [PATCH v4] " Chris Wilson
  4 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 12:17 UTC (permalink / raw)
  To: intel-gfx

A useful bit of information for inspecting GPU stalls from
intel_engine_dump() are the error registers, IPEIR and IPEHR.

v2: Fixup gen changes in register offsets (Tvrtko)
v3: Old FADDR location as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 510e0bc3a377..257b03a67e1c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1757,6 +1757,26 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	addr = intel_engine_get_last_batch_head(engine);
 	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
 		   upper_32_bits(addr), lower_32_bits(addr));
+	if (INTEL_GEN(dev_priv) >= 4) {
+		if (INTEL_GEN(dev_priv) >= 8) {
+			addr = I915_READ(RING_DMA_FADD_UDW(engine->mmio_base));
+			addr <<= 32;
+		}
+		addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
+	} else {
+		addr = I915_READ(DMA_FADD_I8XX);
+	}
+	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
+		   upper_32_bits(addr), lower_32_bits(addr));
+	if (INTEL_GEN(dev_priv) >= 4) {
+		drm_printf(m, "\tIPEIR: 0x%08x\n",
+			   I915_READ(RING_IPEIR(engine->mmio_base)));
+		drm_printf(m, "\tIPEHR: 0x%08x\n",
+			   I915_READ(RING_IPEHR(engine->mmio_base)));
+	} else {
+		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
+		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
+	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
 		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 12:17   ` [PATCH v3] " Chris Wilson
@ 2017-12-18 12:32     ` Tvrtko Ursulin
  2017-12-18 12:35       ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-12-18 12:32 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/12/2017 12:17, Chris Wilson wrote:
> A useful bit of information for inspecting GPU stalls from
> intel_engine_dump() are the error registers, IPEIR and IPEHR.
> 
> v2: Fixup gen changes in register offsets (Tvrtko)
> v3: Old FADDR location as well
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 510e0bc3a377..257b03a67e1c 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1757,6 +1757,26 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	addr = intel_engine_get_last_batch_head(engine);
>   	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
>   		   upper_32_bits(addr), lower_32_bits(addr));
> +	if (INTEL_GEN(dev_priv) >= 4) {
> +		if (INTEL_GEN(dev_priv) >= 8) {
> +			addr = I915_READ(RING_DMA_FADD_UDW(engine->mmio_base));
> +			addr <<= 32;
> +		}
> +		addr = I915_READ(RING_DMA_FADD(engine->mmio_base));

|=, or better reverse order to avoid having to init addr.

Regards,

Tvrtko

> +	} else {
> +		addr = I915_READ(DMA_FADD_I8XX);
> +	}
> +	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
> +		   upper_32_bits(addr), lower_32_bits(addr));
> +	if (INTEL_GEN(dev_priv) >= 4) {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n",
> +			   I915_READ(RING_IPEIR(engine->mmio_base)));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n",
> +			   I915_READ(RING_IPEHR(engine->mmio_base)));
> +	} else {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
> +	}
>   
>   	if (HAS_EXECLISTS(dev_priv)) {
>   		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 12:32     ` Tvrtko Ursulin
@ 2017-12-18 12:35       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 12:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-12-18 12:32:37)
> 
> On 18/12/2017 12:17, Chris Wilson wrote:
> > A useful bit of information for inspecting GPU stalls from
> > intel_engine_dump() are the error registers, IPEIR and IPEHR.
> > 
> > v2: Fixup gen changes in register offsets (Tvrtko)
> > v3: Old FADDR location as well
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c | 20 ++++++++++++++++++++
> >   1 file changed, 20 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 510e0bc3a377..257b03a67e1c 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1757,6 +1757,26 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >       addr = intel_engine_get_last_batch_head(engine);
> >       drm_printf(m, "    BBADDR: 0x%08x_%08x\n",
> >                  upper_32_bits(addr), lower_32_bits(addr));
> > +     if (INTEL_GEN(dev_priv) >= 4) {
> > +             if (INTEL_GEN(dev_priv) >= 8) {
> > +                     addr = I915_READ(RING_DMA_FADD_UDW(engine->mmio_base));
> > +                     addr <<= 32;
> > +             }
> > +             addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> 
> |=, or better reverse order to avoid having to init addr.

 |= otherwise it's back to the ugly (u64) << 32;
Pick your poison. Or maybe if I started paying attention we wouldn't
need to be going round in some many circles.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
                     ` (3 preceding siblings ...)
  2017-12-18 12:17   ` [PATCH v3] " Chris Wilson
@ 2017-12-18 12:39   ` Chris Wilson
  2017-12-18 12:58     ` Tvrtko Ursulin
  4 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 12:39 UTC (permalink / raw)
  To: intel-gfx

A useful bit of information for inspecting GPU stalls from
intel_engine_dump() are the error registers, IPEIR and IPEHR.

v2: Fixup gen changes in register offsets (Tvrtko)
v3: Old FADDR location as well
v4: Use I915_READ64_2x32

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 510e0bc3a377..b4807497e92d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1757,6 +1757,24 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	addr = intel_engine_get_last_batch_head(engine);
 	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
 		   upper_32_bits(addr), lower_32_bits(addr));
+	if (INTEL_GEN(dev_priv) >= 8)
+		addr = I915_READ64_2x32(RING_DMA_FADD(engine->mmio_base),
+					RING_DMA_FADD_UDW(engine->mmio_base));
+	else if (INTEL_GEN(dev_priv) >= 4)
+		addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
+	else
+		addr = I915_READ(DMA_FADD_I8XX);
+	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
+		   upper_32_bits(addr), lower_32_bits(addr));
+	if (INTEL_GEN(dev_priv) >= 4) {
+		drm_printf(m, "\tIPEIR: 0x%08x\n",
+			   I915_READ(RING_IPEIR(engine->mmio_base)));
+		drm_printf(m, "\tIPEHR: 0x%08x\n",
+			   I915_READ(RING_IPEHR(engine->mmio_base)));
+	} else {
+		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
+		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
+	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
 		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
-- 
2.15.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 12:39   ` [PATCH v4] " Chris Wilson
@ 2017-12-18 12:58     ` Tvrtko Ursulin
  2017-12-18 13:27       ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2017-12-18 12:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/12/2017 12:39, Chris Wilson wrote:
> A useful bit of information for inspecting GPU stalls from
> intel_engine_dump() are the error registers, IPEIR and IPEHR.
> 
> v2: Fixup gen changes in register offsets (Tvrtko)
> v3: Old FADDR location as well
> v4: Use I915_READ64_2x32
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 510e0bc3a377..b4807497e92d 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1757,6 +1757,24 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	addr = intel_engine_get_last_batch_head(engine);
>   	drm_printf(m, "\tBBADDR: 0x%08x_%08x\n",
>   		   upper_32_bits(addr), lower_32_bits(addr));
> +	if (INTEL_GEN(dev_priv) >= 8)
> +		addr = I915_READ64_2x32(RING_DMA_FADD(engine->mmio_base),
> +					RING_DMA_FADD_UDW(engine->mmio_base));
> +	else if (INTEL_GEN(dev_priv) >= 4)
> +		addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> +	else
> +		addr = I915_READ(DMA_FADD_I8XX);
> +	drm_printf(m, "\tDMA_FADDR: 0x%08x_%08x\n",
> +		   upper_32_bits(addr), lower_32_bits(addr));
> +	if (INTEL_GEN(dev_priv) >= 4) {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n",
> +			   I915_READ(RING_IPEIR(engine->mmio_base)));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n",
> +			   I915_READ(RING_IPEHR(engine->mmio_base)));
> +	} else {
> +		drm_printf(m, "\tIPEIR: 0x%08x\n", I915_READ(IPEIR));
> +		drm_printf(m, "\tIPEHR: 0x%08x\n", I915_READ(IPEHR));
> +	}
>   
>   	if (HAS_EXECLISTS(dev_priv)) {
>   		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset (rev4)
  2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
                   ` (5 preceding siblings ...)
  2017-12-18 11:11 ` Tvrtko Ursulin
@ 2017-12-18 13:13 ` Patchwork
  6 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2017-12-18 13:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset (rev4)
URL   : https://patchwork.freedesktop.org/series/35471/
State : failure

== Summary ==

Series 35471v4 series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset
https://patchwork.freedesktop.org/api/1.0/series/35471/revisions/4/mbox/

Test kms_pipe_crc_basic:
        Subgroup read-crc-pipe-b:
                pass       -> FAIL       (fi-skl-6700k2)
        Subgroup suspend-read-crc-pipe-a:
                dmesg-warn -> PASS       (fi-kbl-r) fdo#104172 +1
        Subgroup suspend-read-crc-pipe-b:
                pass       -> INCOMPLETE (fi-snb-2520m) fdo#103713
Test kms_psr_sink_crc:
        Subgroup psr_basic:
                dmesg-warn -> PASS       (fi-skl-6700hq) fdo#101144

fdo#104172 https://bugs.freedesktop.org/show_bug.cgi?id=104172
fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#101144 https://bugs.freedesktop.org/show_bug.cgi?id=101144

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:433s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:439s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:381s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:489s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:276s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:496s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:498s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:475s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:460s
fi-elk-e7500     total:224  pass:163  dwarn:15  dfail:0   fail:0   skip:45 
fi-gdg-551       total:288  pass:179  dwarn:1   dfail:0   fail:0   skip:108 time:262s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:529s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:405s
fi-hsw-4770r     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:411s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:390s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:469s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:427s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:480s
fi-kbl-7560u     total:288  pass:268  dwarn:1   dfail:0   fail:0   skip:19  time:520s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:471s
fi-kbl-r         total:288  pass:260  dwarn:1   dfail:0   fail:0   skip:27  time:519s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:595s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:443s
fi-skl-6600u     total:288  pass:260  dwarn:1   dfail:0   fail:0   skip:27  time:525s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:558s
fi-skl-6700k2    total:288  pass:263  dwarn:0   dfail:0   fail:1   skip:24  time:505s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:495s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:441s
fi-snb-2520m     total:245  pass:211  dwarn:0   dfail:0   fail:0   skip:33 
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:415s
Blacklisted hosts:
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:591s
fi-glk-dsi       total:288  pass:256  dwarn:0   dfail:0   fail:2   skip:30  time:496s

bf5cdf9e055a88559a6fc707b6e89e88077a2124 drm-tip: 2017y-12m-18d-11h-53m-39s UTC integration manifest
d60eefb6e585 drm/i915/selftests: Fix up igt_reset_engine
46fc9d5cbe36 drm/i915: Show IPEIR and IPEHR in the engine dump

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7525/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4] drm/i915: Show IPEIR and IPEHR in the engine dump
  2017-12-18 12:58     ` Tvrtko Ursulin
@ 2017-12-18 13:27       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 13:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2017-12-18 12:58:51)
> 
> On 18/12/2017 12:39, Chris Wilson wrote:
> > A useful bit of information for inspecting GPU stalls from
> > intel_engine_dump() are the error registers, IPEIR and IPEHR.
> > 
> > v2: Fixup gen changes in register offsets (Tvrtko)
> > v3: Old FADDR location as well
> > v4: Use I915_READ64_2x32
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_engine_cs.c | 18 ++++++++++++++++++
> >   1 file changed, 18 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 510e0bc3a377..b4807497e92d 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -1757,6 +1757,24 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >       addr = intel_engine_get_last_batch_head(engine);
> >       drm_printf(m, "    BBADDR: 0x%08x_%08x\n",
> >                  upper_32_bits(addr), lower_32_bits(addr));
> > +     if (INTEL_GEN(dev_priv) >= 8)
> > +             addr = I915_READ64_2x32(RING_DMA_FADD(engine->mmio_base),
> > +                                     RING_DMA_FADD_UDW(engine->mmio_base));
> > +     else if (INTEL_GEN(dev_priv) >= 4)
> > +             addr = I915_READ(RING_DMA_FADD(engine->mmio_base));
> > +     else
> > +             addr = I915_READ(DMA_FADD_I8XX);
> > +     drm_printf(m, "    DMA_FADDR: 0x%08x_%08x\n",
> > +                upper_32_bits(addr), lower_32_bits(addr));
> > +     if (INTEL_GEN(dev_priv) >= 4) {
> > +             drm_printf(m, "    IPEIR: 0x%08x\n",
> > +                        I915_READ(RING_IPEIR(engine->mmio_base)));
> > +             drm_printf(m, "    IPEHR: 0x%08x\n",
> > +                        I915_READ(RING_IPEHR(engine->mmio_base)));
> > +     } else {
> > +             drm_printf(m, "    IPEIR: 0x%08x\n", I915_READ(IPEIR));
> > +             drm_printf(m, "    IPEHR: 0x%08x\n", I915_READ(IPEHR));
> > +     }
> >   
> >       if (HAS_EXECLISTS(dev_priv)) {
> >               const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
> > 
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Thanks for the review, lots of fixes in such a small patch.
Pushed, so just the selftest for per-engine resets remaining, which I
hope Michel will pick up later.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine
  2017-12-17 13:28 ` [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine Chris Wilson
@ 2017-12-18 21:50   ` Michel Thierry
  2017-12-18 21:54     ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Michel Thierry @ 2017-12-18 21:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 17/12/17 05:28, Chris Wilson wrote:
> Now that we skip a per-engine reset on an idle engine, we need to update
> the selftest to take that into account. In the process, we find that we
> were not stressing the per-engine reset very hard, so add those missing
> active resets.
> 
> v2: Actually test i915_reset_engine() by loading it with requests.
> 
> Fixes: f6ba181ada55 ("drm/i915: Skip an engine reset if it recovered before our preparations")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>


Reviewed-by: Michel Thierry <michel.thierry@intel.com>

And all these subtests passed with and without GuC in SKL.

> ---
>   drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 314 ++++++++++++++++++-----
>   1 file changed, 250 insertions(+), 64 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index f98546b8a7fa..c8a756e2139f 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -132,6 +132,12 @@ static int emit_recurse_batch(struct hang *h,
>   		*batch++ = lower_32_bits(hws_address(hws, rq));
>   		*batch++ = upper_32_bits(hws_address(hws, rq));
>   		*batch++ = rq->fence.seqno;
> +		*batch++ = MI_ARB_CHECK;
> +
> +		memset(batch, 0, 1024);
> +		batch += 1024 / sizeof(*batch);
> +
> +		*batch++ = MI_ARB_CHECK;
>   		*batch++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
>   		*batch++ = lower_32_bits(vma->node.start);
>   		*batch++ = upper_32_bits(vma->node.start);
> @@ -140,6 +146,12 @@ static int emit_recurse_batch(struct hang *h,
>   		*batch++ = 0;
>   		*batch++ = lower_32_bits(hws_address(hws, rq));
>   		*batch++ = rq->fence.seqno;
> +		*batch++ = MI_ARB_CHECK;
> +
> +		memset(batch, 0, 1024);
> +		batch += 1024 / sizeof(*batch);
> +
> +		*batch++ = MI_ARB_CHECK;
>   		*batch++ = MI_BATCH_BUFFER_START | 1 << 8;
>   		*batch++ = lower_32_bits(vma->node.start);
>   	} else if (INTEL_GEN(i915) >= 4) {
> @@ -147,12 +159,24 @@ static int emit_recurse_batch(struct hang *h,
>   		*batch++ = 0;
>   		*batch++ = lower_32_bits(hws_address(hws, rq));
>   		*batch++ = rq->fence.seqno;
> +		*batch++ = MI_ARB_CHECK;
> +
> +		memset(batch, 0, 1024);
> +		batch += 1024 / sizeof(*batch);
> +
> +		*batch++ = MI_ARB_CHECK;
>   		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
>   		*batch++ = lower_32_bits(vma->node.start);
>   	} else {
>   		*batch++ = MI_STORE_DWORD_IMM;
>   		*batch++ = lower_32_bits(hws_address(hws, rq));
>   		*batch++ = rq->fence.seqno;
> +		*batch++ = MI_ARB_CHECK;
> +
> +		memset(batch, 0, 1024);
> +		batch += 1024 / sizeof(*batch);
> +
> +		*batch++ = MI_ARB_CHECK;
>   		*batch++ = MI_BATCH_BUFFER_START | 2 << 6 | 1;
>   		*batch++ = lower_32_bits(vma->node.start);
>   	}
> @@ -234,6 +258,16 @@ static void hang_fini(struct hang *h)
>   	i915_gem_wait_for_idle(h->i915, I915_WAIT_LOCKED);
>   }
>   
> +static bool wait_for_hang(struct hang *h, struct drm_i915_gem_request *rq)
> +{
> +	return !(wait_for_us(i915_seqno_passed(hws_seqno(h, rq),
> +					       rq->fence.seqno),
> +			     10) &&
> +		 wait_for(i915_seqno_passed(hws_seqno(h, rq),
> +					    rq->fence.seqno),
> +			  1000));
> +}
> +
>   static int igt_hang_sanitycheck(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -296,6 +330,9 @@ static void global_reset_lock(struct drm_i915_private *i915)
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   
> +	pr_debug("%s: current gpu_error=%08lx\n",
> +		 __func__, i915->gpu_error.flags);
> +
>   	while (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags))
>   		wait_event(i915->gpu_error.reset_queue,
>   			   !test_bit(I915_RESET_BACKOFF,
> @@ -353,54 +390,127 @@ static int igt_global_reset(void *arg)
>   	return err;
>   }
>   
> -static int igt_reset_engine(void *arg)
> +static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>   {
> -	struct drm_i915_private *i915 = arg;
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
> -	unsigned int reset_count, reset_engine_count;
> +	struct hang h;
>   	int err = 0;
>   
> -	/* Check that we can issue a global GPU and engine reset */
> +	/* Check that we can issue an engine reset on an idle engine (no-op) */
>   
>   	if (!intel_has_reset_engine(i915))
>   		return 0;
>   
> +	if (active) {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		err = hang_init(&h, i915);
> +		mutex_unlock(&i915->drm.struct_mutex);
> +		if (err)
> +			return err;
> +	}
> +
>   	for_each_engine(engine, i915, id) {
> -		set_bit(I915_RESET_ENGINE + engine->id, &i915->gpu_error.flags);
> +		unsigned int reset_count, reset_engine_count;
> +		IGT_TIMEOUT(end_time);
> +
> +		if (active && !intel_engine_can_store_dword(engine))
> +			continue;
> +
>   		reset_count = i915_reset_count(&i915->gpu_error);
>   		reset_engine_count = i915_reset_engine_count(&i915->gpu_error,
>   							     engine);
>   
> -		err = i915_reset_engine(engine, I915_RESET_QUIET);
> -		if (err) {
> -			pr_err("i915_reset_engine failed\n");
> -			break;
> -		}
> +		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
> +		do {
> +			if (active) {
> +				struct drm_i915_gem_request *rq;
> +
> +				mutex_lock(&i915->drm.struct_mutex);
> +				rq = hang_create_request(&h, engine,
> +							 i915->kernel_context);
> +				if (IS_ERR(rq)) {
> +					err = PTR_ERR(rq);
> +					break;
> +				}
> +
> +				i915_gem_request_get(rq);
> +				__i915_add_request(rq, true);
> +				mutex_unlock(&i915->drm.struct_mutex);
> +
> +				if (!wait_for_hang(&h, rq)) {
> +					struct drm_printer p = drm_info_printer(i915->drm.dev);
> +
> +					pr_err("%s: Failed to start request %x, at %x\n",
> +					       __func__, rq->fence.seqno, hws_seqno(&h, rq));
> +					intel_engine_dump(engine, &p,
> +							  "%s\n", engine->name);
> +
> +					i915_gem_request_put(rq);
> +					err = -EIO;
> +					break;
> +				}
>   
> -		if (i915_reset_count(&i915->gpu_error) != reset_count) {
> -			pr_err("Full GPU reset recorded! (engine reset expected)\n");
> -			err = -EINVAL;
> -			break;
> -		}
> +				i915_gem_request_put(rq);
> +			}
> +
> +			engine->hangcheck.stalled = true;
> +			engine->hangcheck.seqno =
> +				intel_engine_get_seqno(engine);
> +
> +			err = i915_reset_engine(engine, I915_RESET_QUIET);
> +			if (err) {
> +				pr_err("i915_reset_engine failed\n");
> +				break;
> +			}
> +
> +			if (i915_reset_count(&i915->gpu_error) != reset_count) {
> +				pr_err("Full GPU reset recorded! (engine reset expected)\n");
> +				err = -EINVAL;
> +				break;
> +			}
> +
> +			reset_engine_count += active;
> +			if (i915_reset_engine_count(&i915->gpu_error, engine) !=
> +			    reset_engine_count) {
> +				pr_err("%s engine reset %srecorded!\n",
> +				       engine->name, active ? "not " : "");
> +				err = -EINVAL;
> +				break;
> +			}
> +
> +			engine->hangcheck.stalled = false;
> +		} while (time_before(jiffies, end_time));
> +		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
>   
> -		if (i915_reset_engine_count(&i915->gpu_error, engine) ==
> -		    reset_engine_count) {
> -			pr_err("No %s engine reset recorded!\n", engine->name);
> -			err = -EINVAL;
> +		if (err)
>   			break;
> -		}
>   
> -		clear_bit(I915_RESET_ENGINE + engine->id,
> -			  &i915->gpu_error.flags);
> +		cond_resched();
>   	}
>   
>   	if (i915_terminally_wedged(&i915->gpu_error))
>   		err = -EIO;
>   
> +	if (active) {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		hang_fini(&h);
> +		mutex_unlock(&i915->drm.struct_mutex);
> +	}
> +
>   	return err;
>   }
>   
> +static int igt_reset_idle_engine(void *arg)
> +{
> +	return __igt_reset_engine(arg, false);
> +}
> +
> +static int igt_reset_active_engine(void *arg)
> +{
> +	return __igt_reset_engine(arg, true);
> +}
> +
>   static int active_engine(void *data)
>   {
>   	struct intel_engine_cs *engine = data;
> @@ -462,11 +572,12 @@ static int active_engine(void *data)
>   	return err;
>   }
>   
> -static int igt_reset_active_engines(void *arg)
> +static int __igt_reset_engine_others(struct drm_i915_private *i915,
> +				     bool active)
>   {
> -	struct drm_i915_private *i915 = arg;
> -	struct intel_engine_cs *engine, *active;
> +	struct intel_engine_cs *engine, *other;
>   	enum intel_engine_id id, tmp;
> +	struct hang h;
>   	int err = 0;
>   
>   	/* Check that issuing a reset on one engine does not interfere
> @@ -476,24 +587,36 @@ static int igt_reset_active_engines(void *arg)
>   	if (!intel_has_reset_engine(i915))
>   		return 0;
>   
> +	if (active) {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		err = hang_init(&h, i915);
> +		mutex_unlock(&i915->drm.struct_mutex);
> +		if (err)
> +			return err;
> +	}
> +
>   	for_each_engine(engine, i915, id) {
> -		struct task_struct *threads[I915_NUM_ENGINES];
> +		struct task_struct *threads[I915_NUM_ENGINES] = {};
>   		unsigned long resets[I915_NUM_ENGINES];
>   		unsigned long global = i915_reset_count(&i915->gpu_error);
> +		unsigned long count = 0;
>   		IGT_TIMEOUT(end_time);
>   
> +		if (active && !intel_engine_can_store_dword(engine))
> +			continue;
> +
>   		memset(threads, 0, sizeof(threads));
> -		for_each_engine(active, i915, tmp) {
> +		for_each_engine(other, i915, tmp) {
>   			struct task_struct *tsk;
>   
> -			if (active == engine)
> -				continue;
> -
>   			resets[tmp] = i915_reset_engine_count(&i915->gpu_error,
> -							      active);
> +							      other);
>   
> -			tsk = kthread_run(active_engine, active,
> -					  "igt/%s", active->name);
> +			if (other == engine)
> +				continue;
> +
> +			tsk = kthread_run(active_engine, other,
> +					  "igt/%s", other->name);
>   			if (IS_ERR(tsk)) {
>   				err = PTR_ERR(tsk);
>   				goto unwind;
> @@ -503,20 +626,70 @@ static int igt_reset_active_engines(void *arg)
>   			get_task_struct(tsk);
>   		}
>   
> -		set_bit(I915_RESET_ENGINE + engine->id, &i915->gpu_error.flags);
> +		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
>   		do {
> +			if (active) {
> +				struct drm_i915_gem_request *rq;
> +
> +				mutex_lock(&i915->drm.struct_mutex);
> +				rq = hang_create_request(&h, engine,
> +							 i915->kernel_context);
> +				if (IS_ERR(rq)) {
> +					err = PTR_ERR(rq);
> +					mutex_unlock(&i915->drm.struct_mutex);
> +					break;
> +				}
> +
> +				i915_gem_request_get(rq);
> +				__i915_add_request(rq, true);
> +				mutex_unlock(&i915->drm.struct_mutex);
> +
> +				if (!wait_for_hang(&h, rq)) {
> +					struct drm_printer p = drm_info_printer(i915->drm.dev);
> +
> +					pr_err("%s: Failed to start request %x, at %x\n",
> +					       __func__, rq->fence.seqno, hws_seqno(&h, rq));
> +					intel_engine_dump(engine, &p,
> +							  "%s\n", engine->name);
> +
> +					i915_gem_request_put(rq);
> +					err = -EIO;
> +					break;
> +				}
> +
> +				i915_gem_request_put(rq);
> +			}
> +
> +			engine->hangcheck.stalled = true;
> +			engine->hangcheck.seqno =
> +				intel_engine_get_seqno(engine);
> +
>   			err = i915_reset_engine(engine, I915_RESET_QUIET);
>   			if (err) {
> -				pr_err("i915_reset_engine(%s) failed, err=%d\n",
> -				       engine->name, err);
> +				pr_err("i915_reset_engine(%s:%s) failed, err=%d\n",
> +				       engine->name, active ? "active" : "idle", err);
>   				break;
>   			}
> +
> +			engine->hangcheck.stalled = false;
> +			count++;
>   		} while (time_before(jiffies, end_time));
> -		clear_bit(I915_RESET_ENGINE + engine->id,
> -			  &i915->gpu_error.flags);
> +		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
> +		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
> +			engine->name, active ? "active" : "idle", count);
> +
> +		if (i915_reset_engine_count(&i915->gpu_error, engine) -
> +		    resets[engine->id] != (active ? count : 0)) {
> +			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> +			       engine->name, active ? "active" : "idle", count,
> +			       i915_reset_engine_count(&i915->gpu_error,
> +						       engine) - resets[engine->id]);
> +			if (!err)
> +				err = -EINVAL;
> +		}
>   
>   unwind:
> -		for_each_engine(active, i915, tmp) {
> +		for_each_engine(other, i915, tmp) {
>   			int ret;
>   
>   			if (!threads[tmp])
> @@ -524,27 +697,29 @@ static int igt_reset_active_engines(void *arg)
>   
>   			ret = kthread_stop(threads[tmp]);
>   			if (ret) {
> -				pr_err("kthread for active engine %s failed, err=%d\n",
> -				       active->name, ret);
> +				pr_err("kthread for other engine %s failed, err=%d\n",
> +				       other->name, ret);
>   				if (!err)
>   					err = ret;
>   			}
>   			put_task_struct(threads[tmp]);
>   
>   			if (resets[tmp] != i915_reset_engine_count(&i915->gpu_error,
> -								   active)) {
> +								   other)) {
>   				pr_err("Innocent engine %s was reset (count=%ld)\n",
> -				       active->name,
> +				       other->name,
>   				       i915_reset_engine_count(&i915->gpu_error,
> -							       active) - resets[tmp]);
> -				err = -EIO;
> +							       other) - resets[tmp]);
> +				if (!err)
> +					err = -EINVAL;
>   			}
>   		}
>   
>   		if (global != i915_reset_count(&i915->gpu_error)) {
>   			pr_err("Global reset (count=%ld)!\n",
>   			       i915_reset_count(&i915->gpu_error) - global);
> -			err = -EIO;
> +			if (!err)
> +				err = -EINVAL;
>   		}
>   
>   		if (err)
> @@ -556,9 +731,25 @@ static int igt_reset_active_engines(void *arg)
>   	if (i915_terminally_wedged(&i915->gpu_error))
>   		err = -EIO;
>   
> +	if (active) {
> +		mutex_lock(&i915->drm.struct_mutex);
> +		hang_fini(&h);
> +		mutex_unlock(&i915->drm.struct_mutex);
> +	}
> +
>   	return err;
>   }
>   
> +static int igt_reset_idle_engine_others(void *arg)
> +{
> +	return __igt_reset_engine_others(arg, false);
> +}
> +
> +static int igt_reset_active_engine_others(void *arg)
> +{
> +	return __igt_reset_engine_others(arg, true);
> +}
> +
>   static u32 fake_hangcheck(struct drm_i915_gem_request *rq)
>   {
>   	u32 reset_count;
> @@ -574,16 +765,6 @@ static u32 fake_hangcheck(struct drm_i915_gem_request *rq)
>   	return reset_count;
>   }
>   
> -static bool wait_for_hang(struct hang *h, struct drm_i915_gem_request *rq)
> -{
> -	return !(wait_for_us(i915_seqno_passed(hws_seqno(h, rq),
> -					       rq->fence.seqno),
> -			     10) &&
> -		 wait_for(i915_seqno_passed(hws_seqno(h, rq),
> -					    rq->fence.seqno),
> -			  1000));
> -}
> -
>   static int igt_wait_reset(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -617,8 +798,8 @@ static int igt_wait_reset(void *arg)
>   	if (!wait_for_hang(&h, rq)) {
>   		struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> -		pr_err("Failed to start request %x, at %x\n",
> -		       rq->fence.seqno, hws_seqno(&h, rq));
> +		pr_err("%s: Failed to start request %x, at %x\n",
> +		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
>   		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
>   
>   		i915_reset(i915, 0);
> @@ -712,8 +893,8 @@ static int igt_reset_queue(void *arg)
>   			if (!wait_for_hang(&h, prev)) {
>   				struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> -				pr_err("Failed to start request %x, at %x\n",
> -				       prev->fence.seqno, hws_seqno(&h, prev));
> +				pr_err("%s: Failed to start request %x, at %x\n",
> +				       __func__, prev->fence.seqno, hws_seqno(&h, prev));
>   				intel_engine_dump(prev->engine, &p,
>   						  "%s\n", prev->engine->name);
>   
> @@ -819,8 +1000,8 @@ static int igt_handle_error(void *arg)
>   	if (!wait_for_hang(&h, rq)) {
>   		struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
> -		pr_err("Failed to start request %x, at %x\n",
> -		       rq->fence.seqno, hws_seqno(&h, rq));
> +		pr_err("%s: Failed to start request %x, at %x\n",
> +		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
>   		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
>   
>   		i915_reset(i915, 0);
> @@ -864,21 +1045,26 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
>   	static const struct i915_subtest tests[] = {
>   		SUBTEST(igt_global_reset), /* attempt to recover GPU first */
>   		SUBTEST(igt_hang_sanitycheck),
> -		SUBTEST(igt_reset_engine),
> -		SUBTEST(igt_reset_active_engines),
> +		SUBTEST(igt_reset_idle_engine),
> +		SUBTEST(igt_reset_active_engine),
> +		SUBTEST(igt_reset_idle_engine_others),
> +		SUBTEST(igt_reset_active_engine_others),
>   		SUBTEST(igt_wait_reset),
>   		SUBTEST(igt_reset_queue),
>   		SUBTEST(igt_handle_error),
>   	};
> +	bool saved_hangcheck;
>   	int err;
>   
>   	if (!intel_has_gpu_reset(i915))
>   		return 0;
>   
>   	intel_runtime_pm_get(i915);
> +	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
>   
>   	err = i915_subtests(tests, i915);
>   
> +	i915_modparams.enable_hangcheck = saved_hangcheck;
>   	intel_runtime_pm_put(i915);
>   
>   	return err;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine
  2017-12-18 21:50   ` Michel Thierry
@ 2017-12-18 21:54     ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2017-12-18 21:54 UTC (permalink / raw)
  To: Michel Thierry, intel-gfx

Quoting Michel Thierry (2017-12-18 21:50:17)
> On 17/12/17 05:28, Chris Wilson wrote:
> > Now that we skip a per-engine reset on an idle engine, we need to update
> > the selftest to take that into account. In the process, we find that we
> > were not stressing the per-engine reset very hard, so add those missing
> > active resets.
> > 
> > v2: Actually test i915_reset_engine() by loading it with requests.
> > 
> > Fixes: f6ba181ada55 ("drm/i915: Skip an engine reset if it recovered before our preparations")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>

I could have put more effort into making it one function with a couple
of parameters (idle/active engine-reset; idle/active other engines), but
honestly I was just happy to put something together that worked!
 
> And all these subtests passed with and without GuC in SKL.

Happy and sad, I was hoping to break something! :)

Thanks, pushed for a quieter CI.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-12-18 21:54 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-17 13:28 [PATCH 1/3] drm/i915: Re-enable GGTT earlier after GPU reset Chris Wilson
2017-12-17 13:28 ` [PATCH 2/3] drm/i915: Show IPEIR and IPEHR in the engine dump Chris Wilson
2017-12-18 11:14   ` Tvrtko Ursulin
2017-12-18 11:18     ` Chris Wilson
2017-12-18 11:14   ` Chris Wilson
2017-12-18 11:26   ` [PATCH v2] " Chris Wilson
2017-12-18 12:08     ` Tvrtko Ursulin
2017-12-18 12:17   ` [PATCH v3] " Chris Wilson
2017-12-18 12:32     ` Tvrtko Ursulin
2017-12-18 12:35       ` Chris Wilson
2017-12-18 12:39   ` [PATCH v4] " Chris Wilson
2017-12-18 12:58     ` Tvrtko Ursulin
2017-12-18 13:27       ` Chris Wilson
2017-12-17 13:28 ` [PATCH 3/3] drm/i915/selftests: Fix up igt_reset_engine Chris Wilson
2017-12-18 21:50   ` Michel Thierry
2017-12-18 21:54     ` Chris Wilson
2017-12-17 14:07 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset Patchwork
2017-12-17 15:36 ` ✗ Fi.CI.IGT: warning " Patchwork
2017-12-17 18:19 ` [PATCH 1/3] " Chris Wilson
2017-12-18 11:11 ` Tvrtko Ursulin
2017-12-18 11:19   ` Chris Wilson
2017-12-18 13:13 ` ✗ Fi.CI.BAT: failure for series starting with [1/3] drm/i915: Re-enable GGTT earlier after GPU reset (rev4) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.