All of lore.kernel.org
 help / color / mirror / Atom feed
* [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 12:21 ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

If we are doing a normal GPU reset triggered after detecting a long
period of stalled work, we can take our time and allow the engines to
quiesce. Since we've stopped submission to the engine, and if we wait
long enough an innocent context should complete, leaving the engine idle.
So by waiting a short amount of time, we should prevent clobbering other
users when resetting a stuck context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 48df8889a88a..97f01bfeda41 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
 	  May be 0 to disable the initial spin. In practice, we estimate
 	  the cost of enabling the interrupt (if currently disabled) to be
 	  a few microseconds.
+
+config DRM_I915_STOP_TIMEOUT
+	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
+	default 100 # milliseconds
+	help
+	  By stopping submission and sleeping for a short time before resetting
+	  the GPU, we allow the innocent contexts also on the system to quiesce.
+	  It is then less likely for a hanging context to cause collateral
+	  damage as the system is reset in order to recover. The colorary is
+	  that the reset itself may take longer and so be more disruptive to
+	  interactive or low latency workloads.
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0e20713603ec..499b775f7231 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
 
+	engine->props.stop_timeout_ms =
+		CONFIG_DRM_I915_STOP_TIMEOUT;
+
 	/*
 	 * To be overridden by the backend on setup. However to facilitate
 	 * cleanup on error during setup, we always provide the destroy vfunc.
@@ -875,6 +878,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
 	return bbaddr;
 }
 
+static unsigned long stop_timeout(const struct intel_engine_cs *engine)
+{
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
+		return 0; /* Going too fast, can't stop now! */
+
+	/*
+	 * If we are doing a normal GPU reset, we can take our time and allow
+	 * the engine to quiesce. We've stopped submission to the engine, and
+	 * if we wait long enough an innocent context should complete and
+	 * leave the engine idle. So they should not be caught unaware by
+	 * the forthcoming GPU reset (which usually follows the stop_cs)!
+	 */
+	return READ_ONCE(engine->props.stop_timeout_ms);
+}
+
 int intel_engine_stop_cs(struct intel_engine_cs *engine)
 {
 	struct intel_uncore *uncore = engine->uncore;
@@ -892,7 +910,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
 	err = 0;
 	if (__intel_wait_for_register_fw(uncore,
 					 mode, MODE_IDLE, MODE_IDLE,
-					 1000, 0,
+					 1000, stop_timeout(engine),
 					 NULL)) {
 		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
 		err = -ETIMEDOUT;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3451be034caf..87d5c4ef3ae7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -542,6 +542,10 @@ struct intel_engine_cs {
 		 */
 		ktime_t total;
 	} stats;
+
+	struct {
+		unsigned long stop_timeout_ms;
+	} props;
 };
 
 static inline bool
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 12:21 ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

If we are doing a normal GPU reset triggered after detecting a long
period of stalled work, we can take our time and allow the engines to
quiesce. Since we've stopped submission to the engine, and if we wait
long enough an innocent context should complete, leaving the engine idle.
So by waiting a short amount of time, we should prevent clobbering other
users when resetting a stuck context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 48df8889a88a..97f01bfeda41 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
 	  May be 0 to disable the initial spin. In practice, we estimate
 	  the cost of enabling the interrupt (if currently disabled) to be
 	  a few microseconds.
+
+config DRM_I915_STOP_TIMEOUT
+	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
+	default 100 # milliseconds
+	help
+	  By stopping submission and sleeping for a short time before resetting
+	  the GPU, we allow the innocent contexts also on the system to quiesce.
+	  It is then less likely for a hanging context to cause collateral
+	  damage as the system is reset in order to recover. The colorary is
+	  that the reset itself may take longer and so be more disruptive to
+	  interactive or low latency workloads.
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0e20713603ec..499b775f7231 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
 
+	engine->props.stop_timeout_ms =
+		CONFIG_DRM_I915_STOP_TIMEOUT;
+
 	/*
 	 * To be overridden by the backend on setup. However to facilitate
 	 * cleanup on error during setup, we always provide the destroy vfunc.
@@ -875,6 +878,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
 	return bbaddr;
 }
 
+static unsigned long stop_timeout(const struct intel_engine_cs *engine)
+{
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
+		return 0; /* Going too fast, can't stop now! */
+
+	/*
+	 * If we are doing a normal GPU reset, we can take our time and allow
+	 * the engine to quiesce. We've stopped submission to the engine, and
+	 * if we wait long enough an innocent context should complete and
+	 * leave the engine idle. So they should not be caught unaware by
+	 * the forthcoming GPU reset (which usually follows the stop_cs)!
+	 */
+	return READ_ONCE(engine->props.stop_timeout_ms);
+}
+
 int intel_engine_stop_cs(struct intel_engine_cs *engine)
 {
 	struct intel_uncore *uncore = engine->uncore;
@@ -892,7 +910,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
 	err = 0;
 	if (__intel_wait_for_register_fw(uncore,
 					 mode, MODE_IDLE, MODE_IDLE,
-					 1000, 0,
+					 1000, stop_timeout(engine),
 					 NULL)) {
 		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
 		err = -ETIMEDOUT;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 3451be034caf..87d5c4ef3ae7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -542,6 +542,10 @@ struct intel_engine_cs {
 		 */
 		ktime_t total;
 	} stats;
+
+	struct {
+		unsigned long stop_timeout_ms;
+	} props;
 };
 
 static inline bool
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [CI 2/4] drm/i915/execlists: Force preemption
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~100ms).  The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.

Note that coupled with timeslicing, this will lead to rapid GPU "hang"
detection with multiple active contexts vying for GPU time.

The forced preemption mechanism can be compiled out with

	./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile         |  12 +++
 drivers/gpu/drm/i915/gt/intel_engine.h       |   9 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    |   5 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h |   6 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c          |  93 +++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_lrc.c       | 100 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.h              |  14 ---
 drivers/gpu/drm/i915/i915_params.h           |   2 +-
 drivers/gpu/drm/i915/i915_utils.c            |  29 ++++++
 drivers/gpu/drm/i915/i915_utils.h            |   9 ++
 10 files changed, 255 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 97f01bfeda41..3d6b0ad4b920 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -36,3 +36,15 @@ config DRM_I915_STOP_TIMEOUT
 	  damage as the system is reset in order to recover. The colorary is
 	  that the reset itself may take longer and so be more disruptive to
 	  interactive or low latency workloads.
+
+config DRM_I915_PREEMPT_TIMEOUT
+	int "Preempt timeout (ms, jiffy granularity)"
+	default 100 # milliseconds
+	help
+	  How long to wait (in milliseconds) for a preemption event to occur
+	  when submitting a new context via execlists. If the current context
+	  does not hit an arbitration point and yield to HW before the timer
+	  expires, the HW will be reset to allow the more important context
+	  to execute.
+
+	  May be 0 to disable the timeout.
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index c2d9d67c63d9..9409b7856299 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -527,4 +527,13 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2
 
+static inline bool
+intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
+{
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	return intel_engine_has_preemption(engine);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 499b775f7231..c08a2c4a219e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -308,6 +308,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
 
+	engine->props.preempt_timeout_ms =
+		CONFIG_DRM_I915_PREEMPT_TIMEOUT;
 	engine->props.stop_timeout_ms =
 		CONFIG_DRM_I915_STOP_TIMEOUT;
 
@@ -1338,10 +1340,11 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 		unsigned int idx;
 		u8 read, write;
 
-		drm_printf(m, "\tExeclist tasklet queued? %s (%s), timeslice? %s\n",
+		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
+			   repr_timer(&engine->execlists.preempt),
 			   repr_timer(&engine->execlists.timer));
 
 		read = execlists->csb_head;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 87d5c4ef3ae7..1251dac91f31 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -174,6 +174,11 @@ struct intel_engine_execlists {
 	 */
 	struct timer_list timer;
 
+	/**
+	 * @preempt: reset the current context if it fails to give way
+	 */
+	struct timer_list preempt;
+
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
 	 */
@@ -544,6 +549,7 @@ struct intel_engine_cs {
 	} stats;
 
 	struct {
+		unsigned long preempt_timeout_ms;
 		unsigned long stop_timeout_ms;
 	} props;
 };
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f9f3e985bb79..ff0dd297e782 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1372,6 +1372,26 @@ static void record_preemption(struct intel_engine_execlists *execlists)
 	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
 }
 
+static unsigned long active_preempt_timeout(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = last_active(&engine->execlists);
+	if (!rq)
+		return 0;
+
+	return READ_ONCE(engine->props.preempt_timeout_ms);
+}
+
+static void set_preempt_timeout(struct intel_engine_cs *engine)
+{
+	if (!intel_engine_has_preempt_reset(engine))
+		return;
+
+	set_timer_ms(&engine->execlists.preempt,
+		     active_preempt_timeout(engine));
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1747,6 +1767,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		memset(port + 1, 0, (last_port - port) * sizeof(*port));
 		execlists_submit_ports(engine);
+
+		set_preempt_timeout(engine);
 	} else {
 skip_submit:
 		ring_set_paused(engine, 0);
@@ -1987,6 +2009,43 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 	}
 }
 
+static noinline void preempt_reset(struct intel_engine_cs *engine)
+{
+	const unsigned int bit = I915_RESET_ENGINE + engine->id;
+	unsigned long *lock = &engine->gt->reset.flags;
+
+	if (i915_modparams.reset < 3)
+		return;
+
+	if (test_and_set_bit(bit, lock))
+		return;
+
+	/* Mark this tasklet as disabled to avoid waiting for it to complete */
+	tasklet_disable_nosync(&engine->execlists.tasklet);
+
+	GEM_TRACE("%s: preempt timeout %lu+%ums\n",
+		  engine->name,
+		  READ_ONCE(engine->props.preempt_timeout_ms),
+		  jiffies_to_msecs(jiffies - engine->execlists.preempt.expires));
+	intel_engine_reset(engine, "preemption time out");
+
+	tasklet_enable(&engine->execlists.tasklet);
+	clear_and_wake_up_bit(bit, lock);
+}
+
+static bool preempt_timeout(const struct intel_engine_cs *const engine)
+{
+	const struct timer_list *t = &engine->execlists.preempt;
+
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return false;
+
+	if (!timer_expired(t))
+		return false;
+
+	return READ_ONCE(engine->execlists.pending[0]);
+}
+
 /*
  * Check the unread Context Status Buffers and manage the submission of new
  * contexts to the ELSP accordingly.
@@ -1994,23 +2053,39 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 static void execlists_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-	unsigned long flags;
+	bool timeout = preempt_timeout(engine);
 
 	process_csb(engine);
-	if (!READ_ONCE(engine->execlists.pending[0])) {
+	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
+		unsigned long flags;
+
 		spin_lock_irqsave(&engine->active.lock, flags);
 		__execlists_submission_tasklet(engine);
 		spin_unlock_irqrestore(&engine->active.lock, flags);
+
+		/* Recheck after serialising with direct-submission */
+		if (timeout && preempt_timeout(engine))
+			preempt_reset(engine);
 	}
 }
 
-static void execlists_submission_timer(struct timer_list *timer)
+static void __execlists_kick(struct intel_engine_execlists *execlists)
 {
-	struct intel_engine_cs *engine =
-		from_timer(engine, timer, execlists.timer);
-
 	/* Kick the tasklet for some interrupt coalescing and reset handling */
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	tasklet_hi_schedule(&execlists->tasklet);
+}
+
+#define execlists_kick(t, member) \
+	__execlists_kick(container_of(t, struct intel_engine_execlists, member))
+
+static void execlists_timeslice(struct timer_list *timer)
+{
+	execlists_kick(timer, timer);
+}
+
+static void execlists_preempt(struct timer_list *timer)
+{
+	execlists_kick(timer, preempt);
 }
 
 static void queue_request(struct intel_engine_cs *engine,
@@ -3455,6 +3530,7 @@ gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 static void execlists_park(struct intel_engine_cs *engine)
 {
 	cancel_timer(&engine->execlists.timer);
+	cancel_timer(&engine->execlists.preempt);
 }
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
@@ -3572,7 +3648,8 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 {
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
-	timer_setup(&engine->execlists.timer, execlists_submission_timer, 0);
+	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
+	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 7516d1c90925..b6352671c5a0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1697,6 +1697,105 @@ static int live_preempt_hang(void *arg)
 	return err;
 }
 
+static int live_preempt_timeout(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct i915_gem_context *ctx_hi, *ctx_lo;
+	struct igt_spinner spin_lo;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * Check that we force preemption to occur by cancelling the previous
+	 * context if it refuses to yield the GPU.
+	 */
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (!intel_has_reset_engine(gt))
+		return 0;
+
+	if (igt_spinner_init(&spin_lo, gt))
+		return -ENOMEM;
+
+	ctx_hi = kernel_context(gt->i915);
+	if (!ctx_hi)
+		goto err_spin_lo;
+	ctx_hi->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+
+	ctx_lo = kernel_context(gt->i915);
+	if (!ctx_lo)
+		goto err_ctx_hi;
+	ctx_lo->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+
+	for_each_engine(engine, gt, id) {
+		unsigned long saved_timeout;
+		struct i915_request *rq;
+
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		rq = spinner_create_request(&spin_lo, ctx_lo, engine,
+					    MI_NOOP); /* preemption disabled */
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!igt_wait_for_spinner(&spin_lo, rq)) {
+			intel_gt_set_wedged(gt);
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+
+		rq = igt_request_alloc(ctx_hi, engine);
+		if (IS_ERR(rq)) {
+			igt_spinner_end(&spin_lo);
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		/* Flush the previous CS ack before changing timeouts */
+		while (READ_ONCE(engine->execlists.pending[0]))
+			cpu_relax();
+
+		saved_timeout = engine->props.preempt_timeout_ms;
+		engine->props.preempt_timeout_ms = 1; /* in ms, -> 1 jiffie */
+
+		i915_request_get(rq);
+		i915_request_add(rq);
+
+		intel_engine_flush_submission(engine);
+		engine->props.preempt_timeout_ms = saved_timeout;
+
+		if (i915_request_wait(rq, 0, HZ / 10) < 0) {
+			intel_gt_set_wedged(gt);
+			i915_request_put(rq);
+			err = -ETIME;
+			goto err_ctx_lo;
+		}
+
+		igt_spinner_end(&spin_lo);
+		i915_request_put(rq);
+	}
+
+	err = 0;
+err_ctx_lo:
+	kernel_context_close(ctx_lo);
+err_ctx_hi:
+	kernel_context_close(ctx_hi);
+err_spin_lo:
+	igt_spinner_fini(&spin_lo);
+	return err;
+}
+
 static int random_range(struct rnd_state *rnd, int min, int max)
 {
 	return i915_prandom_u32_max_state(max - min, rnd) + min;
@@ -2598,6 +2697,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
+		SUBTEST(live_preempt_timeout),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
 		SUBTEST(live_virtual_mask),
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 2011f8e9a9f1..f6f9675848b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -112,18 +112,4 @@ static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
 	return test_bit(TASKLET_STATE_SCHED, &t->state);
 }
 
-static inline void cancel_timer(struct timer_list *t)
-{
-	if (!READ_ONCE(t->expires))
-		return;
-
-	del_timer(t);
-	WRITE_ONCE(t->expires, 0);
-}
-
-static inline bool timer_expired(const struct timer_list *t)
-{
-	return READ_ONCE(t->expires) && !timer_pending(t);
-}
-
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index d29ade3b7de6..56058978bb27 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -61,7 +61,7 @@ struct drm_printer;
 	param(char *, dmc_firmware_path, NULL) \
 	param(int, mmio_debug, -IS_ENABLED(CONFIG_DRM_I915_DEBUG_MMIO)) \
 	param(int, edp_vswing, 0) \
-	param(int, reset, 2) \
+	param(int, reset, 3) \
 	param(unsigned int, inject_load_failure, 0) \
 	param(int, fastboot, -1) \
 	param(int, enable_dpcd_backlight, 0) \
diff --git a/drivers/gpu/drm/i915/i915_utils.c b/drivers/gpu/drm/i915/i915_utils.c
index 16acdf7bdbe6..02e969b64505 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -76,3 +76,32 @@ bool i915_error_injected(void)
 }
 
 #endif
+
+void cancel_timer(struct timer_list *t)
+{
+	if (!READ_ONCE(t->expires))
+		return;
+
+	del_timer(t);
+	WRITE_ONCE(t->expires, 0);
+}
+
+void set_timer_ms(struct timer_list *t, unsigned long timeout)
+{
+	if (!timeout) {
+		cancel_timer(t);
+		return;
+	}
+
+	timeout = msecs_to_jiffies_timeout(timeout);
+
+	/*
+	 * Paranoia to make sure the compiler computes the timeout before
+	 * loading 'jiffies' as jiffies is volatile and may be updated in
+	 * the background by a timer tick. All to reduce the complexity
+	 * of the addition and reduce the risk of losing a jiffie.
+	 */
+	barrier();
+
+	mod_timer(t, jiffies + timeout);
+}
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 562f756da421..94f136d8a5fd 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -32,6 +32,7 @@
 #include <linux/workqueue.h>
 
 struct drm_i915_private;
+struct timer_list;
 
 #undef WARN_ON
 /* Many gcc seem to no see through this and fall over :( */
@@ -421,4 +422,12 @@ static inline void add_taint_for_CI(unsigned int taint)
 	add_taint(taint, LOCKDEP_STILL_OK);
 }
 
+void cancel_timer(struct timer_list *t);
+void set_timer_ms(struct timer_list *t, unsigned long timeout);
+
+static inline bool timer_expired(const struct timer_list *t)
+{
+	return READ_ONCE(t->expires) && !timer_pending(t);
+}
+
 #endif /* !__I915_UTILS_H */
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [CI 2/4] drm/i915/execlists: Force preemption
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

If the preempted context takes too long to relinquish control, e.g. it
is stuck inside a shader with arbitration disabled, evict that context
with an engine reset. This ensures that preemptions are reasonably
responsive, providing a tighter QoS for the more important context at
the cost of flagging unresponsive contexts more frequently (i.e. instead
of using an ~10s hangcheck, we now evict at ~100ms).  The challenge of
lies in picking a timeout that can be reasonably serviced by HW for
typical workloads, balancing the existing clients against the needs for
responsiveness.

Note that coupled with timeslicing, this will lead to rapid GPU "hang"
detection with multiple active contexts vying for GPU time.

The forced preemption mechanism can be compiled out with

	./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/Kconfig.profile         |  12 +++
 drivers/gpu/drm/i915/gt/intel_engine.h       |   9 ++
 drivers/gpu/drm/i915/gt/intel_engine_cs.c    |   5 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h |   6 ++
 drivers/gpu/drm/i915/gt/intel_lrc.c          |  93 +++++++++++++++--
 drivers/gpu/drm/i915/gt/selftest_lrc.c       | 100 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem.h              |  14 ---
 drivers/gpu/drm/i915/i915_params.h           |   2 +-
 drivers/gpu/drm/i915/i915_utils.c            |  29 ++++++
 drivers/gpu/drm/i915/i915_utils.h            |   9 ++
 10 files changed, 255 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
index 97f01bfeda41..3d6b0ad4b920 100644
--- a/drivers/gpu/drm/i915/Kconfig.profile
+++ b/drivers/gpu/drm/i915/Kconfig.profile
@@ -36,3 +36,15 @@ config DRM_I915_STOP_TIMEOUT
 	  damage as the system is reset in order to recover. The colorary is
 	  that the reset itself may take longer and so be more disruptive to
 	  interactive or low latency workloads.
+
+config DRM_I915_PREEMPT_TIMEOUT
+	int "Preempt timeout (ms, jiffy granularity)"
+	default 100 # milliseconds
+	help
+	  How long to wait (in milliseconds) for a preemption event to occur
+	  when submitting a new context via execlists. If the current context
+	  does not hit an arbitration point and yield to HW before the timer
+	  expires, the HW will be reset to allow the more important context
+	  to execute.
+
+	  May be 0 to disable the timeout.
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index c2d9d67c63d9..9409b7856299 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -527,4 +527,13 @@ void intel_engine_init_active(struct intel_engine_cs *engine,
 #define ENGINE_MOCK	1
 #define ENGINE_VIRTUAL	2
 
+static inline bool
+intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
+{
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	return intel_engine_has_preemption(engine);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 499b775f7231..c08a2c4a219e 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -308,6 +308,8 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	engine->instance = info->instance;
 	__sprint_engine_name(engine);
 
+	engine->props.preempt_timeout_ms =
+		CONFIG_DRM_I915_PREEMPT_TIMEOUT;
 	engine->props.stop_timeout_ms =
 		CONFIG_DRM_I915_STOP_TIMEOUT;
 
@@ -1338,10 +1340,11 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 		unsigned int idx;
 		u8 read, write;
 
-		drm_printf(m, "\tExeclist tasklet queued? %s (%s), timeslice? %s\n",
+		drm_printf(m, "\tExeclist tasklet queued? %s (%s), preempt? %s, timeslice? %s\n",
 			   yesno(test_bit(TASKLET_STATE_SCHED,
 					  &engine->execlists.tasklet.state)),
 			   enableddisabled(!atomic_read(&engine->execlists.tasklet.count)),
+			   repr_timer(&engine->execlists.preempt),
 			   repr_timer(&engine->execlists.timer));
 
 		read = execlists->csb_head;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 87d5c4ef3ae7..1251dac91f31 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -174,6 +174,11 @@ struct intel_engine_execlists {
 	 */
 	struct timer_list timer;
 
+	/**
+	 * @preempt: reset the current context if it fails to give way
+	 */
+	struct timer_list preempt;
+
 	/**
 	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
 	 */
@@ -544,6 +549,7 @@ struct intel_engine_cs {
 	} stats;
 
 	struct {
+		unsigned long preempt_timeout_ms;
 		unsigned long stop_timeout_ms;
 	} props;
 };
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f9f3e985bb79..ff0dd297e782 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1372,6 +1372,26 @@ static void record_preemption(struct intel_engine_execlists *execlists)
 	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
 }
 
+static unsigned long active_preempt_timeout(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = last_active(&engine->execlists);
+	if (!rq)
+		return 0;
+
+	return READ_ONCE(engine->props.preempt_timeout_ms);
+}
+
+static void set_preempt_timeout(struct intel_engine_cs *engine)
+{
+	if (!intel_engine_has_preempt_reset(engine))
+		return;
+
+	set_timer_ms(&engine->execlists.preempt,
+		     active_preempt_timeout(engine));
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1747,6 +1767,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		memset(port + 1, 0, (last_port - port) * sizeof(*port));
 		execlists_submit_ports(engine);
+
+		set_preempt_timeout(engine);
 	} else {
 skip_submit:
 		ring_set_paused(engine, 0);
@@ -1987,6 +2009,43 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 	}
 }
 
+static noinline void preempt_reset(struct intel_engine_cs *engine)
+{
+	const unsigned int bit = I915_RESET_ENGINE + engine->id;
+	unsigned long *lock = &engine->gt->reset.flags;
+
+	if (i915_modparams.reset < 3)
+		return;
+
+	if (test_and_set_bit(bit, lock))
+		return;
+
+	/* Mark this tasklet as disabled to avoid waiting for it to complete */
+	tasklet_disable_nosync(&engine->execlists.tasklet);
+
+	GEM_TRACE("%s: preempt timeout %lu+%ums\n",
+		  engine->name,
+		  READ_ONCE(engine->props.preempt_timeout_ms),
+		  jiffies_to_msecs(jiffies - engine->execlists.preempt.expires));
+	intel_engine_reset(engine, "preemption time out");
+
+	tasklet_enable(&engine->execlists.tasklet);
+	clear_and_wake_up_bit(bit, lock);
+}
+
+static bool preempt_timeout(const struct intel_engine_cs *const engine)
+{
+	const struct timer_list *t = &engine->execlists.preempt;
+
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return false;
+
+	if (!timer_expired(t))
+		return false;
+
+	return READ_ONCE(engine->execlists.pending[0]);
+}
+
 /*
  * Check the unread Context Status Buffers and manage the submission of new
  * contexts to the ELSP accordingly.
@@ -1994,23 +2053,39 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
 static void execlists_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-	unsigned long flags;
+	bool timeout = preempt_timeout(engine);
 
 	process_csb(engine);
-	if (!READ_ONCE(engine->execlists.pending[0])) {
+	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
+		unsigned long flags;
+
 		spin_lock_irqsave(&engine->active.lock, flags);
 		__execlists_submission_tasklet(engine);
 		spin_unlock_irqrestore(&engine->active.lock, flags);
+
+		/* Recheck after serialising with direct-submission */
+		if (timeout && preempt_timeout(engine))
+			preempt_reset(engine);
 	}
 }
 
-static void execlists_submission_timer(struct timer_list *timer)
+static void __execlists_kick(struct intel_engine_execlists *execlists)
 {
-	struct intel_engine_cs *engine =
-		from_timer(engine, timer, execlists.timer);
-
 	/* Kick the tasklet for some interrupt coalescing and reset handling */
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	tasklet_hi_schedule(&execlists->tasklet);
+}
+
+#define execlists_kick(t, member) \
+	__execlists_kick(container_of(t, struct intel_engine_execlists, member))
+
+static void execlists_timeslice(struct timer_list *timer)
+{
+	execlists_kick(timer, timer);
+}
+
+static void execlists_preempt(struct timer_list *timer)
+{
+	execlists_kick(timer, preempt);
 }
 
 static void queue_request(struct intel_engine_cs *engine,
@@ -3455,6 +3530,7 @@ gen12_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 static void execlists_park(struct intel_engine_cs *engine)
 {
 	cancel_timer(&engine->execlists.timer);
+	cancel_timer(&engine->execlists.preempt);
 }
 
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
@@ -3572,7 +3648,8 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 {
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
-	timer_setup(&engine->execlists.timer, execlists_submission_timer, 0);
+	timer_setup(&engine->execlists.timer, execlists_timeslice, 0);
+	timer_setup(&engine->execlists.preempt, execlists_preempt, 0);
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 7516d1c90925..b6352671c5a0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1697,6 +1697,105 @@ static int live_preempt_hang(void *arg)
 	return err;
 }
 
+static int live_preempt_timeout(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct i915_gem_context *ctx_hi, *ctx_lo;
+	struct igt_spinner spin_lo;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * Check that we force preemption to occur by cancelling the previous
+	 * context if it refuses to yield the GPU.
+	 */
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (!intel_has_reset_engine(gt))
+		return 0;
+
+	if (igt_spinner_init(&spin_lo, gt))
+		return -ENOMEM;
+
+	ctx_hi = kernel_context(gt->i915);
+	if (!ctx_hi)
+		goto err_spin_lo;
+	ctx_hi->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+
+	ctx_lo = kernel_context(gt->i915);
+	if (!ctx_lo)
+		goto err_ctx_hi;
+	ctx_lo->sched.priority =
+		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+
+	for_each_engine(engine, gt, id) {
+		unsigned long saved_timeout;
+		struct i915_request *rq;
+
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
+		rq = spinner_create_request(&spin_lo, ctx_lo, engine,
+					    MI_NOOP); /* preemption disabled */
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!igt_wait_for_spinner(&spin_lo, rq)) {
+			intel_gt_set_wedged(gt);
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+
+		rq = igt_request_alloc(ctx_hi, engine);
+		if (IS_ERR(rq)) {
+			igt_spinner_end(&spin_lo);
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		/* Flush the previous CS ack before changing timeouts */
+		while (READ_ONCE(engine->execlists.pending[0]))
+			cpu_relax();
+
+		saved_timeout = engine->props.preempt_timeout_ms;
+		engine->props.preempt_timeout_ms = 1; /* in ms, -> 1 jiffie */
+
+		i915_request_get(rq);
+		i915_request_add(rq);
+
+		intel_engine_flush_submission(engine);
+		engine->props.preempt_timeout_ms = saved_timeout;
+
+		if (i915_request_wait(rq, 0, HZ / 10) < 0) {
+			intel_gt_set_wedged(gt);
+			i915_request_put(rq);
+			err = -ETIME;
+			goto err_ctx_lo;
+		}
+
+		igt_spinner_end(&spin_lo);
+		i915_request_put(rq);
+	}
+
+	err = 0;
+err_ctx_lo:
+	kernel_context_close(ctx_lo);
+err_ctx_hi:
+	kernel_context_close(ctx_hi);
+err_spin_lo:
+	igt_spinner_fini(&spin_lo);
+	return err;
+}
+
 static int random_range(struct rnd_state *rnd, int min, int max)
 {
 	return i915_prandom_u32_max_state(max - min, rnd) + min;
@@ -2598,6 +2697,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
+		SUBTEST(live_preempt_timeout),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
 		SUBTEST(live_virtual_mask),
diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 2011f8e9a9f1..f6f9675848b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -112,18 +112,4 @@ static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
 	return test_bit(TASKLET_STATE_SCHED, &t->state);
 }
 
-static inline void cancel_timer(struct timer_list *t)
-{
-	if (!READ_ONCE(t->expires))
-		return;
-
-	del_timer(t);
-	WRITE_ONCE(t->expires, 0);
-}
-
-static inline bool timer_expired(const struct timer_list *t)
-{
-	return READ_ONCE(t->expires) && !timer_pending(t);
-}
-
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index d29ade3b7de6..56058978bb27 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -61,7 +61,7 @@ struct drm_printer;
 	param(char *, dmc_firmware_path, NULL) \
 	param(int, mmio_debug, -IS_ENABLED(CONFIG_DRM_I915_DEBUG_MMIO)) \
 	param(int, edp_vswing, 0) \
-	param(int, reset, 2) \
+	param(int, reset, 3) \
 	param(unsigned int, inject_load_failure, 0) \
 	param(int, fastboot, -1) \
 	param(int, enable_dpcd_backlight, 0) \
diff --git a/drivers/gpu/drm/i915/i915_utils.c b/drivers/gpu/drm/i915/i915_utils.c
index 16acdf7bdbe6..02e969b64505 100644
--- a/drivers/gpu/drm/i915/i915_utils.c
+++ b/drivers/gpu/drm/i915/i915_utils.c
@@ -76,3 +76,32 @@ bool i915_error_injected(void)
 }
 
 #endif
+
+void cancel_timer(struct timer_list *t)
+{
+	if (!READ_ONCE(t->expires))
+		return;
+
+	del_timer(t);
+	WRITE_ONCE(t->expires, 0);
+}
+
+void set_timer_ms(struct timer_list *t, unsigned long timeout)
+{
+	if (!timeout) {
+		cancel_timer(t);
+		return;
+	}
+
+	timeout = msecs_to_jiffies_timeout(timeout);
+
+	/*
+	 * Paranoia to make sure the compiler computes the timeout before
+	 * loading 'jiffies' as jiffies is volatile and may be updated in
+	 * the background by a timer tick. All to reduce the complexity
+	 * of the addition and reduce the risk of losing a jiffie.
+	 */
+	barrier();
+
+	mod_timer(t, jiffies + timeout);
+}
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 562f756da421..94f136d8a5fd 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -32,6 +32,7 @@
 #include <linux/workqueue.h>
 
 struct drm_i915_private;
+struct timer_list;
 
 #undef WARN_ON
 /* Many gcc seem to no see through this and fall over :( */
@@ -421,4 +422,12 @@ static inline void add_taint_for_CI(unsigned int taint)
 	add_taint(taint, LOCKDEP_STILL_OK);
 }
 
+void cancel_timer(struct timer_list *t);
+void set_timer_ms(struct timer_list *t, unsigned long timeout);
+
+static inline bool timer_expired(const struct timer_list *t)
+{
+	return READ_ONCE(t->expires) && !timer_pending(t);
+}
+
 #endif /* !__I915_UTILS_H */
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [CI 3/4] drm/i915/execlists: Cancel banned contexts on schedule-out
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

On schedule-out (CS completion) of a banned context, scrub the context
image so that we do not replay the active payload. The intent is that we
skip banned payloads on request submission so that the timeline
advancement continues on in the background. However, if we are returning
to a preempted request, i915_request_skip() is ineffective and instead we
need to patch up the context image so that it continues from the start
of the next request.

v2: Fixup cancellation so that we only scrub the payload of the active
request and do not short-circuit the breadcrumbs (which might cause
other contexts to execute out of order).
v3: Grammar pass

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 131 ++++++----
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 321 +++++++++++++++++++++++++
 2 files changed, 411 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index ff0dd297e782..651d5dd39464 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -234,6 +234,9 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     const struct intel_engine_cs *engine,
 				     const struct intel_ring *ring,
 				     bool close);
+static void
+__execlists_update_reg_state(const struct intel_context *ce,
+			     const struct intel_engine_cs *engine);
 
 static void mark_eio(struct i915_request *rq)
 {
@@ -246,6 +249,31 @@ static void mark_eio(struct i915_request *rq)
 	i915_request_mark_complete(rq);
 }
 
+static struct i915_request *active_request(struct i915_request *rq)
+{
+	const struct intel_context * const ce = rq->hw_context;
+	struct i915_request *active = NULL;
+	struct list_head *list;
+
+	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
+		return rq;
+
+	rcu_read_lock();
+	list = &rcu_dereference(rq->timeline)->requests;
+	list_for_each_entry_from_reverse(rq, list, link) {
+		if (i915_request_completed(rq))
+			break;
+
+		if (rq->hw_context != ce)
+			break;
+
+		active = rq;
+	}
+	rcu_read_unlock();
+
+	return active;
+}
+
 static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine)
 {
 	return (i915_ggtt_offset(engine->status_page.vma) +
@@ -972,6 +1000,58 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static void restore_default_state(struct intel_context *ce,
+				  struct intel_engine_cs *engine)
+{
+	u32 *regs = ce->lrc_reg_state;
+
+	if (engine->pinned_default_state)
+		memcpy(regs, /* skip restoring the vanilla PPHWSP */
+		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
+		       engine->context_size - PAGE_SIZE);
+
+	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+}
+
+static void reset_active(struct i915_request *rq,
+			 struct intel_engine_cs *engine)
+{
+	struct intel_context * const ce = rq->hw_context;
+
+	/*
+	 * The executing context has been cancelled. We want to prevent
+	 * further execution along this context and propagate the error on
+	 * to anything depending on its results.
+	 *
+	 * In __i915_request_submit(), we apply the -EIO and remove the
+	 * requests' payloads for any banned requests. But first, we must
+	 * rewind the context back to the start of the incomplete request so
+	 * that we do not jump back into the middle of the batch.
+	 *
+	 * We preserve the breadcrumbs and semaphores of the incomplete
+	 * requests so that inter-timeline dependencies (i.e other timelines)
+	 * remain correctly ordered. And we defer to __i915_request_submit()
+	 * so that all asynchronous waits are correctly handled.
+	 */
+	GEM_TRACE("%s(%s): { rq=%llx:%lld }\n",
+		  __func__, engine->name, rq->fence.context, rq->fence.seqno);
+
+	/* On resubmission of the active request, payload will be scrubbed */
+	rq = active_request(rq);
+	if (rq)
+		ce->ring->head = intel_ring_wrap(ce->ring, rq->head);
+	else
+		ce->ring->head = ce->ring->tail;
+	intel_ring_update_space(ce->ring);
+
+	/* Scrub the context image to prevent replaying the previous batch */
+	restore_default_state(ce, engine);
+	__execlists_update_reg_state(ce, engine);
+
+	/* We've switched away, so this should be a no-op, but intent matters */
+	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE;
+}
+
 static inline void
 __execlists_schedule_out(struct i915_request *rq,
 			 struct intel_engine_cs * const engine)
@@ -982,6 +1062,9 @@ __execlists_schedule_out(struct i915_request *rq,
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 	intel_gt_pm_put(engine->gt);
 
+	if (unlikely(i915_gem_context_is_banned(ce->gem_context)))
+		reset_active(rq, engine);
+
 	/*
 	 * If this is part of a virtual engine, its next request may
 	 * have been blocked waiting for access to the active context.
@@ -1380,6 +1463,10 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine)
 	if (!rq)
 		return 0;
 
+	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
+	if (unlikely(i915_gem_context_is_banned(rq->gem_context)))
+		return 1;
+
 	return READ_ONCE(engine->props.preempt_timeout_ms);
 }
 
@@ -2792,29 +2879,6 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
 			       &execlists->csb_status[reset_value]);
 }
 
-static struct i915_request *active_request(struct i915_request *rq)
-{
-	const struct intel_context * const ce = rq->hw_context;
-	struct i915_request *active = NULL;
-	struct list_head *list;
-
-	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
-		return rq;
-
-	list = &i915_request_active_timeline(rq)->requests;
-	list_for_each_entry_from_reverse(rq, list, link) {
-		if (i915_request_completed(rq))
-			break;
-
-		if (rq->hw_context != ce)
-			break;
-
-		active = rq;
-	}
-
-	return active;
-}
-
 static void __execlists_reset_reg_state(const struct intel_context *ce,
 					const struct intel_engine_cs *engine)
 {
@@ -2831,7 +2895,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct intel_context *ce;
 	struct i915_request *rq;
-	u32 *regs;
 
 	mb(); /* paranoia: read the CSB pointers from after the reset */
 	clflush(execlists->csb_write);
@@ -2907,13 +2970,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * to recreate its own state.
 	 */
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
-	regs = ce->lrc_reg_state;
-	if (engine->pinned_default_state) {
-		memcpy(regs, /* skip restoring the vanilla PPHWSP */
-		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-		       engine->context_size - PAGE_SIZE);
-	}
-	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+	restore_default_state(ce, engine);
 
 out_replay:
 	GEM_TRACE("%s replay {head:%04x, tail:%04x\n",
@@ -4574,16 +4631,8 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	if (scrub) {
-		u32 *regs = ce->lrc_reg_state;
-
-		if (engine->pinned_default_state) {
-			memcpy(regs, /* skip restoring the vanilla PPHWSP */
-			       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-			       engine->context_size - PAGE_SIZE);
-		}
-		execlists_init_reg_state(regs, ce, engine, ce->ring, false);
-	}
+	if (scrub)
+		restore_default_state(ce, engine);
 
 	/* Rerun the request; its payload has been neutered (if guilty). */
 	ce->ring->head = head;
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index b6352671c5a0..d5d268be554e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -7,6 +7,7 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_reset.h"
 
 #include "i915_selftest.h"
@@ -1160,6 +1161,325 @@ static int live_nopreempt(void *arg)
 	goto err_client_b;
 }
 
+struct live_preempt_cancel {
+	struct intel_engine_cs *engine;
+	struct preempt_client a, b;
+};
+
+static int __cancel_active0(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP0 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_ARB_CHECK);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_active1(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[2] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP1 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_NOOP); /* no preemption */
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = spinner_create_request(&arg->b.spin,
+				       arg->b.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->b.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	igt_spinner_end(&arg->a.spin);
+	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != 0) {
+		pr_err("Normal inflight0 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != -EIO) {
+		pr_err("Cancelled inflight1 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_queued(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[3] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Full ELSP and one in the wings */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = igt_request_alloc(arg->b.ctx, arg->engine);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	rq[2] = spinner_create_request(&arg->b.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[2])) {
+		err = PTR_ERR(rq[2]);
+		goto out;
+	}
+
+	i915_request_get(rq[2]);
+	err = i915_request_await_dma_fence(rq[2], &rq[1]->fence);
+	i915_request_add(rq[2]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq[2], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != 0) {
+		pr_err("Normal inflight1 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[2]->fence.error != -EIO) {
+		pr_err("Cancelled queued request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[2]);
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_hostile(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	int err;
+
+	/* Preempt cancel non-preemptible spinner in ELSP0 */
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_NOOP); /* preemption disabled */
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine); /* force reset */
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_flush_test(arg->engine->i915))
+		err = -EIO;
+	return err;
+}
+
+static int live_preempt_cancel(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct live_preempt_cancel data;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * To cancel an inflight context, we need to first remove it from the
+	 * GPU. That sounds like preemption! Plus a little bit of bookkeeping.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (preempt_client_init(gt, &data.a))
+		return -ENOMEM;
+	if (preempt_client_init(gt, &data.b))
+		goto err_client_a;
+
+	for_each_engine(data.engine, gt, id) {
+		if (!intel_engine_has_preemption(data.engine))
+			continue;
+
+		err = __cancel_active0(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_active1(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_queued(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_hostile(&data);
+		if (err)
+			goto err_wedged;
+	}
+
+	err = 0;
+err_client_b:
+	preempt_client_fini(&data.b);
+err_client_a:
+	preempt_client_fini(&data.a);
+	return err;
+
+err_wedged:
+	GEM_TRACE_DUMP();
+	igt_spinner_end(&data.b.spin);
+	igt_spinner_end(&data.a.spin);
+	intel_gt_set_wedged(gt);
+	goto err_client_b;
+}
+
 static int live_suppress_self_preempt(void *arg)
 {
 	struct intel_gt *gt = arg;
@@ -2693,6 +3013,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_nopreempt),
+		SUBTEST(live_preempt_cancel),
 		SUBTEST(live_suppress_self_preempt),
 		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [CI 3/4] drm/i915/execlists: Cancel banned contexts on schedule-out
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

On schedule-out (CS completion) of a banned context, scrub the context
image so that we do not replay the active payload. The intent is that we
skip banned payloads on request submission so that the timeline
advancement continues on in the background. However, if we are returning
to a preempted request, i915_request_skip() is ineffective and instead we
need to patch up the context image so that it continues from the start
of the next request.

v2: Fixup cancellation so that we only scrub the payload of the active
request and do not short-circuit the breadcrumbs (which might cause
other contexts to execute out of order).
v3: Grammar pass

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 131 ++++++----
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 321 +++++++++++++++++++++++++
 2 files changed, 411 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index ff0dd297e782..651d5dd39464 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -234,6 +234,9 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     const struct intel_engine_cs *engine,
 				     const struct intel_ring *ring,
 				     bool close);
+static void
+__execlists_update_reg_state(const struct intel_context *ce,
+			     const struct intel_engine_cs *engine);
 
 static void mark_eio(struct i915_request *rq)
 {
@@ -246,6 +249,31 @@ static void mark_eio(struct i915_request *rq)
 	i915_request_mark_complete(rq);
 }
 
+static struct i915_request *active_request(struct i915_request *rq)
+{
+	const struct intel_context * const ce = rq->hw_context;
+	struct i915_request *active = NULL;
+	struct list_head *list;
+
+	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
+		return rq;
+
+	rcu_read_lock();
+	list = &rcu_dereference(rq->timeline)->requests;
+	list_for_each_entry_from_reverse(rq, list, link) {
+		if (i915_request_completed(rq))
+			break;
+
+		if (rq->hw_context != ce)
+			break;
+
+		active = rq;
+	}
+	rcu_read_unlock();
+
+	return active;
+}
+
 static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine)
 {
 	return (i915_ggtt_offset(engine->status_page.vma) +
@@ -972,6 +1000,58 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static void restore_default_state(struct intel_context *ce,
+				  struct intel_engine_cs *engine)
+{
+	u32 *regs = ce->lrc_reg_state;
+
+	if (engine->pinned_default_state)
+		memcpy(regs, /* skip restoring the vanilla PPHWSP */
+		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
+		       engine->context_size - PAGE_SIZE);
+
+	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+}
+
+static void reset_active(struct i915_request *rq,
+			 struct intel_engine_cs *engine)
+{
+	struct intel_context * const ce = rq->hw_context;
+
+	/*
+	 * The executing context has been cancelled. We want to prevent
+	 * further execution along this context and propagate the error on
+	 * to anything depending on its results.
+	 *
+	 * In __i915_request_submit(), we apply the -EIO and remove the
+	 * requests' payloads for any banned requests. But first, we must
+	 * rewind the context back to the start of the incomplete request so
+	 * that we do not jump back into the middle of the batch.
+	 *
+	 * We preserve the breadcrumbs and semaphores of the incomplete
+	 * requests so that inter-timeline dependencies (i.e other timelines)
+	 * remain correctly ordered. And we defer to __i915_request_submit()
+	 * so that all asynchronous waits are correctly handled.
+	 */
+	GEM_TRACE("%s(%s): { rq=%llx:%lld }\n",
+		  __func__, engine->name, rq->fence.context, rq->fence.seqno);
+
+	/* On resubmission of the active request, payload will be scrubbed */
+	rq = active_request(rq);
+	if (rq)
+		ce->ring->head = intel_ring_wrap(ce->ring, rq->head);
+	else
+		ce->ring->head = ce->ring->tail;
+	intel_ring_update_space(ce->ring);
+
+	/* Scrub the context image to prevent replaying the previous batch */
+	restore_default_state(ce, engine);
+	__execlists_update_reg_state(ce, engine);
+
+	/* We've switched away, so this should be a no-op, but intent matters */
+	ce->lrc_desc |= CTX_DESC_FORCE_RESTORE;
+}
+
 static inline void
 __execlists_schedule_out(struct i915_request *rq,
 			 struct intel_engine_cs * const engine)
@@ -982,6 +1062,9 @@ __execlists_schedule_out(struct i915_request *rq,
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 	intel_gt_pm_put(engine->gt);
 
+	if (unlikely(i915_gem_context_is_banned(ce->gem_context)))
+		reset_active(rq, engine);
+
 	/*
 	 * If this is part of a virtual engine, its next request may
 	 * have been blocked waiting for access to the active context.
@@ -1380,6 +1463,10 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine)
 	if (!rq)
 		return 0;
 
+	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
+	if (unlikely(i915_gem_context_is_banned(rq->gem_context)))
+		return 1;
+
 	return READ_ONCE(engine->props.preempt_timeout_ms);
 }
 
@@ -2792,29 +2879,6 @@ static void reset_csb_pointers(struct intel_engine_cs *engine)
 			       &execlists->csb_status[reset_value]);
 }
 
-static struct i915_request *active_request(struct i915_request *rq)
-{
-	const struct intel_context * const ce = rq->hw_context;
-	struct i915_request *active = NULL;
-	struct list_head *list;
-
-	if (!i915_request_is_active(rq)) /* unwound, but incomplete! */
-		return rq;
-
-	list = &i915_request_active_timeline(rq)->requests;
-	list_for_each_entry_from_reverse(rq, list, link) {
-		if (i915_request_completed(rq))
-			break;
-
-		if (rq->hw_context != ce)
-			break;
-
-		active = rq;
-	}
-
-	return active;
-}
-
 static void __execlists_reset_reg_state(const struct intel_context *ce,
 					const struct intel_engine_cs *engine)
 {
@@ -2831,7 +2895,6 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct intel_context *ce;
 	struct i915_request *rq;
-	u32 *regs;
 
 	mb(); /* paranoia: read the CSB pointers from after the reset */
 	clflush(execlists->csb_write);
@@ -2907,13 +2970,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * to recreate its own state.
 	 */
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
-	regs = ce->lrc_reg_state;
-	if (engine->pinned_default_state) {
-		memcpy(regs, /* skip restoring the vanilla PPHWSP */
-		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-		       engine->context_size - PAGE_SIZE);
-	}
-	execlists_init_reg_state(regs, ce, engine, ce->ring, false);
+	restore_default_state(ce, engine);
 
 out_replay:
 	GEM_TRACE("%s replay {head:%04x, tail:%04x\n",
@@ -4574,16 +4631,8 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	if (scrub) {
-		u32 *regs = ce->lrc_reg_state;
-
-		if (engine->pinned_default_state) {
-			memcpy(regs, /* skip restoring the vanilla PPHWSP */
-			       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
-			       engine->context_size - PAGE_SIZE);
-		}
-		execlists_init_reg_state(regs, ce, engine, ce->ring, false);
-	}
+	if (scrub)
+		restore_default_state(ce, engine);
 
 	/* Rerun the request; its payload has been neutered (if guilty). */
 	ce->ring->head = head;
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index b6352671c5a0..d5d268be554e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -7,6 +7,7 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_reset.h"
 
 #include "i915_selftest.h"
@@ -1160,6 +1161,325 @@ static int live_nopreempt(void *arg)
 	goto err_client_b;
 }
 
+struct live_preempt_cancel {
+	struct intel_engine_cs *engine;
+	struct preempt_client a, b;
+};
+
+static int __cancel_active0(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP0 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_ARB_CHECK);
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_active1(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[2] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Preempt cancel of ELSP1 */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_NOOP); /* no preemption */
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = spinner_create_request(&arg->b.spin,
+				       arg->b.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->b.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	igt_spinner_end(&arg->a.spin);
+	if (i915_request_wait(rq[1], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != 0) {
+		pr_err("Normal inflight0 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != -EIO) {
+		pr_err("Cancelled inflight1 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_queued(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq[3] = {};
+	struct igt_live_test t;
+	int err;
+
+	/* Full ELSP and one in the wings */
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	if (igt_live_test_begin(&t, arg->engine->i915,
+				__func__, arg->engine->name))
+		return -EIO;
+
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq[0] = spinner_create_request(&arg->a.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[0]))
+		return PTR_ERR(rq[0]);
+
+	i915_request_get(rq[0]);
+	i915_request_add(rq[0]);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq[0])) {
+		err = -EIO;
+		goto out;
+	}
+
+	clear_bit(CONTEXT_BANNED, &arg->b.ctx->flags);
+	rq[1] = igt_request_alloc(arg->b.ctx, arg->engine);
+	if (IS_ERR(rq[1])) {
+		err = PTR_ERR(rq[1]);
+		goto out;
+	}
+
+	i915_request_get(rq[1]);
+	err = i915_request_await_dma_fence(rq[1], &rq[0]->fence);
+	i915_request_add(rq[1]);
+	if (err)
+		goto out;
+
+	rq[2] = spinner_create_request(&arg->b.spin,
+				       arg->a.ctx, arg->engine,
+				       MI_ARB_CHECK);
+	if (IS_ERR(rq[2])) {
+		err = PTR_ERR(rq[2]);
+		goto out;
+	}
+
+	i915_request_get(rq[2]);
+	err = i915_request_await_dma_fence(rq[2], &rq[1]->fence);
+	i915_request_add(rq[2]);
+	if (err)
+		goto out;
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine);
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq[2], 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq[0]->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[1]->fence.error != 0) {
+		pr_err("Normal inflight1 request did not complete\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	if (rq[2]->fence.error != -EIO) {
+		pr_err("Cancelled queued request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq[2]);
+	i915_request_put(rq[1]);
+	i915_request_put(rq[0]);
+	if (igt_live_test_end(&t))
+		err = -EIO;
+	return err;
+}
+
+static int __cancel_hostile(struct live_preempt_cancel *arg)
+{
+	struct i915_request *rq;
+	int err;
+
+	/* Preempt cancel non-preemptible spinner in ELSP0 */
+	if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT)
+		return 0;
+
+	GEM_TRACE("%s(%s)\n", __func__, arg->engine->name);
+	clear_bit(CONTEXT_BANNED, &arg->a.ctx->flags);
+	rq = spinner_create_request(&arg->a.spin,
+				    arg->a.ctx, arg->engine,
+				    MI_NOOP); /* preemption disabled */
+	if (IS_ERR(rq))
+		return PTR_ERR(rq);
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (!igt_wait_for_spinner(&arg->a.spin, rq)) {
+		err = -EIO;
+		goto out;
+	}
+
+	i915_gem_context_set_banned(arg->a.ctx);
+	err = intel_engine_pulse(arg->engine); /* force reset */
+	if (err)
+		goto out;
+
+	if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+		err = -EIO;
+		goto out;
+	}
+
+	if (rq->fence.error != -EIO) {
+		pr_err("Cancelled inflight0 request did not report -EIO\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	i915_request_put(rq);
+	if (igt_flush_test(arg->engine->i915))
+		err = -EIO;
+	return err;
+}
+
+static int live_preempt_cancel(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct live_preempt_cancel data;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	/*
+	 * To cancel an inflight context, we need to first remove it from the
+	 * GPU. That sounds like preemption! Plus a little bit of bookkeeping.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
+		return 0;
+
+	if (preempt_client_init(gt, &data.a))
+		return -ENOMEM;
+	if (preempt_client_init(gt, &data.b))
+		goto err_client_a;
+
+	for_each_engine(data.engine, gt, id) {
+		if (!intel_engine_has_preemption(data.engine))
+			continue;
+
+		err = __cancel_active0(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_active1(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_queued(&data);
+		if (err)
+			goto err_wedged;
+
+		err = __cancel_hostile(&data);
+		if (err)
+			goto err_wedged;
+	}
+
+	err = 0;
+err_client_b:
+	preempt_client_fini(&data.b);
+err_client_a:
+	preempt_client_fini(&data.a);
+	return err;
+
+err_wedged:
+	GEM_TRACE_DUMP();
+	igt_spinner_end(&data.b.spin);
+	igt_spinner_end(&data.a.spin);
+	intel_gt_set_wedged(gt);
+	goto err_client_b;
+}
+
 static int live_suppress_self_preempt(void *arg)
 {
 	struct intel_gt *gt = arg;
@@ -2693,6 +3013,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_nopreempt),
+		SUBTEST(live_preempt_cancel),
 		SUBTEST(live_suppress_self_preempt),
 		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at replacing hangcheck, in the
next patch, with a softer mechanism, that sends a pulse down the engine
to check if it is well. We can use the same preemptive pulse to flush an
active context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination.

Testcase: igt/gem_ctx_exec/basic-nohangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 141 ++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7b01f4605f21..b2f042d87be0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -70,6 +70,7 @@
 #include <drm/i915_drm.h>
 
 #include "gt/intel_lrc_reg.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
 
 #include "i915_gem_context.h"
@@ -276,6 +277,135 @@ void i915_gem_context_release(struct kref *ref)
 		schedule_work(&gc->free_work);
 }
 
+static inline struct i915_gem_engines *
+__context_engines_static(const struct i915_gem_context *ctx)
+{
+	return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+	struct intel_gt *gt = engine->gt;
+	bool success = false;
+
+	if (!intel_has_reset_engine(gt))
+		return false;
+
+	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+			      &gt->reset.flags)) {
+		success = intel_engine_reset(engine, NULL) == 0;
+		clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+				      &gt->reset.flags);
+	}
+
+	return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+			    struct intel_engine_cs *engine)
+{
+	intel_gt_handle_error(engine->gt, engine->mask, 0,
+			      "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+	/*
+	 * Send a "high priority pulse" down the engine to cause the
+	 * current request to be momentarily preempted. (If it fails to
+	 * be preempted, it will be reset). As we have marked our context
+	 * as banned, any incomplete request, including any running, will
+	 * be skipped following the preemption.
+	 *
+	 * If there is no hangchecking (one of the reasons why we try to
+	 * cancel the context) and no forced preemption, there may be no
+	 * means by which we reset the GPU and evict the persistent hog.
+	 * Ergo if we are unable to inject a preemptive pulse that can
+	 * kill the banned context, we fallback to doing a local reset
+	 * instead.
+	 */
+	if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
+		return true;
+
+	/* If we are unable to send a pulse, try resetting this engine. */
+	return __reset_engine(engine);
+}
+
+static struct intel_engine_cs *
+active_engine(struct dma_fence *fence, struct intel_context *ce)
+{
+	struct i915_request *rq = to_request(fence);
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Serialise with __i915_request_submit() so that it sees
+	 * is-banned?, or we know the request is already inflight.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->active.lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->active.lock);
+		spin_lock(&engine->active.lock);
+		locked = engine;
+	}
+
+	engine = NULL;
+	if (i915_request_is_active(rq) && !rq->fence.error)
+		engine = rq->engine;
+
+	spin_unlock_irq(&locked->active.lock);
+
+	return engine;
+}
+
+static void kill_context(struct i915_gem_context *ctx)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	/*
+	 * If we are already banned, it was due to a guilty request causing
+	 * a reset and the entire context being evicted from the GPU.
+	 */
+	if (i915_gem_context_is_banned(ctx))
+		return;
+
+	i915_gem_context_set_banned(ctx);
+
+	/*
+	 * Map the user's engine back to the actual engines; one virtual
+	 * engine will be mapped to multiple engines, and using ctx->engine[]
+	 * the same engine may be have multiple instances in the user's map.
+	 * However, we only care about pending requests, so only include
+	 * engines on which there are incomplete requests.
+	 */
+	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+		struct intel_engine_cs *engine;
+		struct dma_fence *fence;
+
+		if (!ce->timeline)
+			continue;
+
+		fence = i915_active_fence_get(&ce->timeline->last_request);
+		if (!fence)
+			continue;
+
+		/* Check with the backend if the request is still inflight */
+		engine = active_engine(fence, ce);
+
+		/* First attempt to gracefully cancel the context */
+		if (engine && !__cancel_engine(engine))
+			/*
+			 * If we are unable to send a preemptive pulse to bump
+			 * the context from the GPU, we have to resort to a full
+			 * reset. We hope the collateral damage is worth it.
+			 */
+			__reset_context(ctx, engine);
+
+		dma_fence_put(fence);
+	}
+}
+
 static void context_close(struct i915_gem_context *ctx)
 {
 	struct i915_address_space *vm;
@@ -298,6 +428,17 @@ static void context_close(struct i915_gem_context *ctx)
 	lut_close(ctx);
 
 	mutex_unlock(&ctx->mutex);
+
+	/*
+	 * If the user has disabled hangchecking, we can not be sure that
+	 * the batches will ever complete after the context is closed,
+	 * keeping the context and all resources pinned forever. So in this
+	 * case we opt to forcibly kill off all remaining requests on
+	 * context close.
+	 */
+	if (!i915_modparams.enable_hangcheck)
+		kill_context(ctx);
+
 	i915_gem_context_put(ctx);
 }
 
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 12:21   ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 12:21 UTC (permalink / raw)
  To: intel-gfx

Normally, we rely on our hangcheck to prevent persistent batches from
hogging the GPU. However, if the user disables hangcheck, this mechanism
breaks down. Despite our insistence that this is unsafe, the users are
equally insistent that they want to use endless batches and will disable
the hangcheck mechanism. We are looking at replacing hangcheck, in the
next patch, with a softer mechanism, that sends a pulse down the engine
to check if it is well. We can use the same preemptive pulse to flush an
active context off the GPU upon context close, preventing resources
being lost and unkillable requests remaining on the GPU after process
termination.

Testcase: igt/gem_ctx_exec/basic-nohangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 141 ++++++++++++++++++++
 1 file changed, 141 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7b01f4605f21..b2f042d87be0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -70,6 +70,7 @@
 #include <drm/i915_drm.h>
 
 #include "gt/intel_lrc_reg.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
 
 #include "i915_gem_context.h"
@@ -276,6 +277,135 @@ void i915_gem_context_release(struct kref *ref)
 		schedule_work(&gc->free_work);
 }
 
+static inline struct i915_gem_engines *
+__context_engines_static(const struct i915_gem_context *ctx)
+{
+	return rcu_dereference_protected(ctx->engines, true);
+}
+
+static bool __reset_engine(struct intel_engine_cs *engine)
+{
+	struct intel_gt *gt = engine->gt;
+	bool success = false;
+
+	if (!intel_has_reset_engine(gt))
+		return false;
+
+	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
+			      &gt->reset.flags)) {
+		success = intel_engine_reset(engine, NULL) == 0;
+		clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+				      &gt->reset.flags);
+	}
+
+	return success;
+}
+
+static void __reset_context(struct i915_gem_context *ctx,
+			    struct intel_engine_cs *engine)
+{
+	intel_gt_handle_error(engine->gt, engine->mask, 0,
+			      "context closure in %s", ctx->name);
+}
+
+static bool __cancel_engine(struct intel_engine_cs *engine)
+{
+	/*
+	 * Send a "high priority pulse" down the engine to cause the
+	 * current request to be momentarily preempted. (If it fails to
+	 * be preempted, it will be reset). As we have marked our context
+	 * as banned, any incomplete request, including any running, will
+	 * be skipped following the preemption.
+	 *
+	 * If there is no hangchecking (one of the reasons why we try to
+	 * cancel the context) and no forced preemption, there may be no
+	 * means by which we reset the GPU and evict the persistent hog.
+	 * Ergo if we are unable to inject a preemptive pulse that can
+	 * kill the banned context, we fallback to doing a local reset
+	 * instead.
+	 */
+	if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
+		return true;
+
+	/* If we are unable to send a pulse, try resetting this engine. */
+	return __reset_engine(engine);
+}
+
+static struct intel_engine_cs *
+active_engine(struct dma_fence *fence, struct intel_context *ce)
+{
+	struct i915_request *rq = to_request(fence);
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Serialise with __i915_request_submit() so that it sees
+	 * is-banned?, or we know the request is already inflight.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->active.lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->active.lock);
+		spin_lock(&engine->active.lock);
+		locked = engine;
+	}
+
+	engine = NULL;
+	if (i915_request_is_active(rq) && !rq->fence.error)
+		engine = rq->engine;
+
+	spin_unlock_irq(&locked->active.lock);
+
+	return engine;
+}
+
+static void kill_context(struct i915_gem_context *ctx)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	/*
+	 * If we are already banned, it was due to a guilty request causing
+	 * a reset and the entire context being evicted from the GPU.
+	 */
+	if (i915_gem_context_is_banned(ctx))
+		return;
+
+	i915_gem_context_set_banned(ctx);
+
+	/*
+	 * Map the user's engine back to the actual engines; one virtual
+	 * engine will be mapped to multiple engines, and using ctx->engine[]
+	 * the same engine may be have multiple instances in the user's map.
+	 * However, we only care about pending requests, so only include
+	 * engines on which there are incomplete requests.
+	 */
+	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+		struct intel_engine_cs *engine;
+		struct dma_fence *fence;
+
+		if (!ce->timeline)
+			continue;
+
+		fence = i915_active_fence_get(&ce->timeline->last_request);
+		if (!fence)
+			continue;
+
+		/* Check with the backend if the request is still inflight */
+		engine = active_engine(fence, ce);
+
+		/* First attempt to gracefully cancel the context */
+		if (engine && !__cancel_engine(engine))
+			/*
+			 * If we are unable to send a preemptive pulse to bump
+			 * the context from the GPU, we have to resort to a full
+			 * reset. We hope the collateral damage is worth it.
+			 */
+			__reset_context(ctx, engine);
+
+		dma_fence_put(fence);
+	}
+}
+
 static void context_close(struct i915_gem_context *ctx)
 {
 	struct i915_address_space *vm;
@@ -298,6 +428,17 @@ static void context_close(struct i915_gem_context *ctx)
 	lut_close(ctx);
 
 	mutex_unlock(&ctx->mutex);
+
+	/*
+	 * If the user has disabled hangchecking, we can not be sure that
+	 * the batches will ever complete after the context is closed,
+	 * keeping the context and all resources pinned forever. So in this
+	 * case we opt to forcibly kill off all remaining requests on
+	 * context close.
+	 */
+	if (!i915_modparams.enable_hangcheck)
+		kill_context(ctx);
+
 	i915_gem_context_put(ctx);
 }
 
-- 
2.24.0.rc0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 13:21   ` Mika Kuoppala
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Kuoppala @ 2019-10-23 13:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we are doing a normal GPU reset triggered after detecting a long
> period of stalled work, we can take our time and allow the engines to
> quiesce. Since we've stopped submission to the engine, and if we wait
> long enough an innocent context should complete, leaving the engine idle.
> So by waiting a short amount of time, we should prevent clobbering other
> users when resetting a stuck context.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
>  3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index 48df8889a88a..97f01bfeda41 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
>  	  May be 0 to disable the initial spin. In practice, we estimate
>  	  the cost of enabling the interrupt (if currently disabled) to be
>  	  a few microseconds.
> +
> +config DRM_I915_STOP_TIMEOUT
> +	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> +	default 100 # milliseconds
> +	help
> +	  By stopping submission and sleeping for a short time before resetting
> +	  the GPU, we allow the innocent contexts also on the system to quiesce.
> +	  It is then less likely for a hanging context to cause collateral
> +	  damage as the system is reset in order to recover. The colorary is

s/coloray/corollary

I am not claiming that I would know a better value for this tunable.

But atleast currently with the hangcheck periods we have, I think
there is room for more time to actual reset processing.

We could go as far as we start to idle the other engines
in parallel, when one shows symptoms. But well perhaps
the effect is the same as shortening the detection cycle.

But not a topic for this patch,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> +	  that the reset itself may take longer and so be more disruptive to
> +	  interactive or low latency workloads.
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 0e20713603ec..499b775f7231 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	engine->instance = info->instance;
>  	__sprint_engine_name(engine);
>  
> +	engine->props.stop_timeout_ms =
> +		CONFIG_DRM_I915_STOP_TIMEOUT;
> +
>  	/*
>  	 * To be overridden by the backend on setup. However to facilitate
>  	 * cleanup on error during setup, we always provide the destroy vfunc.
> @@ -875,6 +878,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
>  	return bbaddr;
>  }
>  
> +static unsigned long stop_timeout(const struct intel_engine_cs *engine)
> +{
> +	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
> +		return 0; /* Going too fast, can't stop now! */
> +
> +	/*
> +	 * If we are doing a normal GPU reset, we can take our time and allow
> +	 * the engine to quiesce. We've stopped submission to the engine, and
> +	 * if we wait long enough an innocent context should complete and
> +	 * leave the engine idle. So they should not be caught unaware by
> +	 * the forthcoming GPU reset (which usually follows the stop_cs)!
> +	 */
> +	return READ_ONCE(engine->props.stop_timeout_ms);
> +}
> +
>  int intel_engine_stop_cs(struct intel_engine_cs *engine)
>  {
>  	struct intel_uncore *uncore = engine->uncore;
> @@ -892,7 +910,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
>  	err = 0;
>  	if (__intel_wait_for_register_fw(uncore,
>  					 mode, MODE_IDLE, MODE_IDLE,
> -					 1000, 0,
> +					 1000, stop_timeout(engine),
>  					 NULL)) {
>  		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
>  		err = -ETIMEDOUT;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 3451be034caf..87d5c4ef3ae7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -542,6 +542,10 @@ struct intel_engine_cs {
>  		 */
>  		ktime_t total;
>  	} stats;
> +
> +	struct {
> +		unsigned long stop_timeout_ms;
> +	} props;
>  };
>  
>  static inline bool
> -- 
> 2.24.0.rc0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 13:21   ` Mika Kuoppala
  0 siblings, 0 replies; 22+ messages in thread
From: Mika Kuoppala @ 2019-10-23 13:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> If we are doing a normal GPU reset triggered after detecting a long
> period of stalled work, we can take our time and allow the engines to
> quiesce. Since we've stopped submission to the engine, and if we wait
> long enough an innocent context should complete, leaving the engine idle.
> So by waiting a short amount of time, we should prevent clobbering other
> users when resetting a stuck context.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
>  3 files changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> index 48df8889a88a..97f01bfeda41 100644
> --- a/drivers/gpu/drm/i915/Kconfig.profile
> +++ b/drivers/gpu/drm/i915/Kconfig.profile
> @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
>  	  May be 0 to disable the initial spin. In practice, we estimate
>  	  the cost of enabling the interrupt (if currently disabled) to be
>  	  a few microseconds.
> +
> +config DRM_I915_STOP_TIMEOUT
> +	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> +	default 100 # milliseconds
> +	help
> +	  By stopping submission and sleeping for a short time before resetting
> +	  the GPU, we allow the innocent contexts also on the system to quiesce.
> +	  It is then less likely for a hanging context to cause collateral
> +	  damage as the system is reset in order to recover. The colorary is

s/coloray/corollary

I am not claiming that I would know a better value for this tunable.

But atleast currently with the hangcheck periods we have, I think
there is room for more time to actual reset processing.

We could go as far as we start to idle the other engines
in parallel, when one shows symptoms. But well perhaps
the effect is the same as shortening the detection cycle.

But not a topic for this patch,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> +	  that the reset itself may take longer and so be more disruptive to
> +	  interactive or low latency workloads.
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 0e20713603ec..499b775f7231 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
>  	engine->instance = info->instance;
>  	__sprint_engine_name(engine);
>  
> +	engine->props.stop_timeout_ms =
> +		CONFIG_DRM_I915_STOP_TIMEOUT;
> +
>  	/*
>  	 * To be overridden by the backend on setup. However to facilitate
>  	 * cleanup on error during setup, we always provide the destroy vfunc.
> @@ -875,6 +878,21 @@ u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine)
>  	return bbaddr;
>  }
>  
> +static unsigned long stop_timeout(const struct intel_engine_cs *engine)
> +{
> +	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */
> +		return 0; /* Going too fast, can't stop now! */
> +
> +	/*
> +	 * If we are doing a normal GPU reset, we can take our time and allow
> +	 * the engine to quiesce. We've stopped submission to the engine, and
> +	 * if we wait long enough an innocent context should complete and
> +	 * leave the engine idle. So they should not be caught unaware by
> +	 * the forthcoming GPU reset (which usually follows the stop_cs)!
> +	 */
> +	return READ_ONCE(engine->props.stop_timeout_ms);
> +}
> +
>  int intel_engine_stop_cs(struct intel_engine_cs *engine)
>  {
>  	struct intel_uncore *uncore = engine->uncore;
> @@ -892,7 +910,7 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
>  	err = 0;
>  	if (__intel_wait_for_register_fw(uncore,
>  					 mode, MODE_IDLE, MODE_IDLE,
> -					 1000, 0,
> +					 1000, stop_timeout(engine),
>  					 NULL)) {
>  		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
>  		err = -ETIMEDOUT;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 3451be034caf..87d5c4ef3ae7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -542,6 +542,10 @@ struct intel_engine_cs {
>  		 */
>  		ktime_t total;
>  	} stats;
> +
> +	struct {
> +		unsigned long stop_timeout_ms;
> +	} props;
>  };
>  
>  static inline bool
> -- 
> 2.24.0.rc0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 13:28     ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 13:28 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-10-23 14:21:01)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we are doing a normal GPU reset triggered after detecting a long
> > period of stalled work, we can take our time and allow the engines to
> > quiesce. Since we've stopped submission to the engine, and if we wait
> > long enough an innocent context should complete, leaving the engine idle.
> > So by waiting a short amount of time, we should prevent clobbering other
> > users when resetting a stuck context.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
> >  3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > index 48df8889a88a..97f01bfeda41 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> >         May be 0 to disable the initial spin. In practice, we estimate
> >         the cost of enabling the interrupt (if currently disabled) to be
> >         a few microseconds.
> > +
> > +config DRM_I915_STOP_TIMEOUT
> > +     int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > +     default 100 # milliseconds
> > +     help
> > +       By stopping submission and sleeping for a short time before resetting
> > +       the GPU, we allow the innocent contexts also on the system to quiesce.
> > +       It is then less likely for a hanging context to cause collateral
> > +       damage as the system is reset in order to recover. The colorary is
> 
> s/coloray/corollary
> 
> I am not claiming that I would know a better value for this tunable.
> 
> But atleast currently with the hangcheck periods we have, I think
> there is room for more time to actual reset processing.
> 
> We could go as far as we start to idle the other engines
> in parallel, when one shows symptoms. But well perhaps
> the effect is the same as shortening the detection cycle.

True, the other idea I think I may experiment with is pushing the
stalled flag down. There's no point waiting for the engine if we've
declared it hung already, and that should eliminate the need for the if
(in_atomic). I think the essence of the path stands -- we can reset
more gracefully if we wait.

I probably should make it a Suggested-by Joonas & Jon.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 13:28     ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 13:28 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-10-23 14:21:01)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > If we are doing a normal GPU reset triggered after detecting a long
> > period of stalled work, we can take our time and allow the engines to
> > quiesce. Since we've stopped submission to the engine, and if we wait
> > long enough an innocent context should complete, leaving the engine idle.
> > So by waiting a short amount of time, we should prevent clobbering other
> > users when resetting a stuck context.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/Kconfig.profile         | 11 +++++++++++
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c    | 20 +++++++++++++++++++-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h |  4 ++++
> >  3 files changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile
> > index 48df8889a88a..97f01bfeda41 100644
> > --- a/drivers/gpu/drm/i915/Kconfig.profile
> > +++ b/drivers/gpu/drm/i915/Kconfig.profile
> > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST
> >         May be 0 to disable the initial spin. In practice, we estimate
> >         the cost of enabling the interrupt (if currently disabled) to be
> >         a few microseconds.
> > +
> > +config DRM_I915_STOP_TIMEOUT
> > +     int "How long to wait for an engine to quiesce gracefully before reset (ms)"
> > +     default 100 # milliseconds
> > +     help
> > +       By stopping submission and sleeping for a short time before resetting
> > +       the GPU, we allow the innocent contexts also on the system to quiesce.
> > +       It is then less likely for a hanging context to cause collateral
> > +       damage as the system is reset in order to recover. The colorary is
> 
> s/coloray/corollary
> 
> I am not claiming that I would know a better value for this tunable.
> 
> But atleast currently with the hangcheck periods we have, I think
> there is room for more time to actual reset processing.
> 
> We could go as far as we start to idle the other engines
> in parallel, when one shows symptoms. But well perhaps
> the effect is the same as shortening the detection cycle.

True, the other idea I think I may experiment with is pushing the
stalled flag down. There's no point waiting for the engine if we've
declared it hung already, and that should eliminate the need for the if
(in_atomic). I think the essence of the path stands -- we can reset
more gracefully if we wait.

I probably should make it a Suggested-by Joonas & Jon.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 21:44   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-23 21:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
139aa7679696 drm/i915/gt: Try to more gracefully quiesce the system before resets
-:58: ERROR:IN_ATOMIC: do not use in_atomic in drivers
#58: FILE: drivers/gpu/drm/i915/gt/intel_engine_cs.c:883:
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */

total: 1 errors, 0 warnings, 0 checks, 62 lines checked
de7f703951e2 drm/i915/execlists: Force preemption
5432266dcd68 drm/i915/execlists: Cancel banned contexts on schedule-out
c5ae74589e9e drm/i915/gem: Cancel contexts when hangchecking is disabled

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 21:44   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-23 21:44 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
139aa7679696 drm/i915/gt: Try to more gracefully quiesce the system before resets
-:58: ERROR:IN_ATOMIC: do not use in_atomic in drivers
#58: FILE: drivers/gpu/drm/i915/gt/intel_engine_cs.c:883:
+	if (in_atomic() || irqs_disabled()) /* inside atomic preempt-reset? */

total: 1 errors, 0 warnings, 0 checks, 62 lines checked
de7f703951e2 drm/i915/execlists: Force preemption
5432266dcd68 drm/i915/execlists: Cancel banned contexts on schedule-out
c5ae74589e9e drm/i915/gem: Cancel contexts when hangchecking is disabled

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 22:14   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-23 22:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7165 -> Patchwork_14948
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14948:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@i915_selftest@live_gt_heartbeat}:
    - fi-glk-dsi:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-glk-dsi/igt@i915_selftest@live_gt_heartbeat.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-glk-dsi/igt@i915_selftest@live_gt_heartbeat.html

  
Known issues
------------

  Here are the changes found in Patchwork_14948 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_flink_basic@basic:
    - fi-icl-u3:          [PASS][3] -> [DMESG-WARN][4] ([fdo#107724] / [fdo#112052 ])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@gem_flink_basic@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@gem_flink_basic@basic.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       [PASS][5] -> [DMESG-WARN][6] ([fdo#102614])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html

  * igt@prime_busy@basic-before-default:
    - fi-icl-u3:          [PASS][7] -> [DMESG-WARN][8] ([fdo#107724]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@prime_busy@basic-before-default.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@prime_busy@basic-before-default.html

  
#### Possible fixes ####

  * igt@gem_ctx_create@basic-files:
    - fi-bxt-dsi:         [INCOMPLETE][9] ([fdo#103927]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-bxt-dsi/igt@gem_ctx_create@basic-files.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-bxt-dsi/igt@gem_ctx_create@basic-files.html
    - {fi-tgl-u2}:        [INCOMPLETE][11] ([fdo#111735]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-tgl-u2/igt@gem_ctx_create@basic-files.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-tgl-u2/igt@gem_ctx_create@basic-files.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-icl-u3:          [DMESG-WARN][13] ([fdo#107724]) -> [PASS][14] +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@gem_exec_suspend@basic-s4-devices.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@gem_exec_suspend@basic-s4-devices.html

  * {igt@i915_selftest@live_gt_timelines}:
    - {fi-tgl-u}:         [INCOMPLETE][15] ([fdo#111831]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-tgl-u/igt@i915_selftest@live_gt_timelines.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-tgl-u/igt@i915_selftest@live_gt_timelines.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          [FAIL][17] ([fdo#109483]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-kbl-7500u:       [FAIL][19] ([fdo#103375]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  
#### Warnings ####

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [FAIL][21] ([fdo#111407]) -> [FAIL][22] ([fdo#111045] / [fdo#111096])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#109483]: https://bugs.freedesktop.org/show_bug.cgi?id=109483
  [fdo#111045]: https://bugs.freedesktop.org/show_bug.cgi?id=111045
  [fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
  [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
  [fdo#111735]: https://bugs.freedesktop.org/show_bug.cgi?id=111735
  [fdo#111831]: https://bugs.freedesktop.org/show_bug.cgi?id=111831
  [fdo#112052 ]: https://bugs.freedesktop.org/show_bug.cgi?id=112052 
  [fdo#112096]: https://bugs.freedesktop.org/show_bug.cgi?id=112096


Participating hosts (51 -> 45)
------------------------------

  Additional (1): fi-bwr-2160 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7165 -> Patchwork_14948

  CI-20190529: 20190529
  CI_DRM_7165: b50cc0bc0f669a3c3ead36573496596b8b8ed8f4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5236: 8153b95b53bdef26d2c3e318197d174e982b4265 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14948: c5ae74589e9e678294ee6c11a40fe70b19b7907f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c5ae74589e9e drm/i915/gem: Cancel contexts when hangchecking is disabled
5432266dcd68 drm/i915/execlists: Cancel banned contexts on schedule-out
de7f703951e2 drm/i915/execlists: Force preemption
139aa7679696 drm/i915/gt: Try to more gracefully quiesce the system before resets

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-23 22:14   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-23 22:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7165 -> Patchwork_14948
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14948:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@i915_selftest@live_gt_heartbeat}:
    - fi-glk-dsi:         [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-glk-dsi/igt@i915_selftest@live_gt_heartbeat.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-glk-dsi/igt@i915_selftest@live_gt_heartbeat.html

  
Known issues
------------

  Here are the changes found in Patchwork_14948 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_flink_basic@basic:
    - fi-icl-u3:          [PASS][3] -> [DMESG-WARN][4] ([fdo#107724] / [fdo#112052 ])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@gem_flink_basic@basic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@gem_flink_basic@basic.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       [PASS][5] -> [DMESG-WARN][6] ([fdo#102614])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-hsw-peppy/igt@kms_frontbuffer_tracking@basic.html

  * igt@prime_busy@basic-before-default:
    - fi-icl-u3:          [PASS][7] -> [DMESG-WARN][8] ([fdo#107724]) +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@prime_busy@basic-before-default.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@prime_busy@basic-before-default.html

  
#### Possible fixes ####

  * igt@gem_ctx_create@basic-files:
    - fi-bxt-dsi:         [INCOMPLETE][9] ([fdo#103927]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-bxt-dsi/igt@gem_ctx_create@basic-files.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-bxt-dsi/igt@gem_ctx_create@basic-files.html
    - {fi-tgl-u2}:        [INCOMPLETE][11] ([fdo#111735]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-tgl-u2/igt@gem_ctx_create@basic-files.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-tgl-u2/igt@gem_ctx_create@basic-files.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-icl-u3:          [DMESG-WARN][13] ([fdo#107724]) -> [PASS][14] +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u3/igt@gem_exec_suspend@basic-s4-devices.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u3/igt@gem_exec_suspend@basic-s4-devices.html

  * {igt@i915_selftest@live_gt_timelines}:
    - {fi-tgl-u}:         [INCOMPLETE][15] ([fdo#111831]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-tgl-u/igt@i915_selftest@live_gt_timelines.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-tgl-u/igt@i915_selftest@live_gt_timelines.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          [FAIL][17] ([fdo#109483]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-kbl-7500u:       [FAIL][19] ([fdo#103375]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-kbl-7500u/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  
#### Warnings ####

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [FAIL][21] ([fdo#111407]) -> [FAIL][22] ([fdo#111045] / [fdo#111096])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#109483]: https://bugs.freedesktop.org/show_bug.cgi?id=109483
  [fdo#111045]: https://bugs.freedesktop.org/show_bug.cgi?id=111045
  [fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
  [fdo#111407]: https://bugs.freedesktop.org/show_bug.cgi?id=111407
  [fdo#111735]: https://bugs.freedesktop.org/show_bug.cgi?id=111735
  [fdo#111831]: https://bugs.freedesktop.org/show_bug.cgi?id=111831
  [fdo#112052 ]: https://bugs.freedesktop.org/show_bug.cgi?id=112052 
  [fdo#112096]: https://bugs.freedesktop.org/show_bug.cgi?id=112096


Participating hosts (51 -> 45)
------------------------------

  Additional (1): fi-bwr-2160 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7165 -> Patchwork_14948

  CI-20190529: 20190529
  CI_DRM_7165: b50cc0bc0f669a3c3ead36573496596b8b8ed8f4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5236: 8153b95b53bdef26d2c3e318197d174e982b4265 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_14948: c5ae74589e9e678294ee6c11a40fe70b19b7907f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c5ae74589e9e drm/i915/gem: Cancel contexts when hangchecking is disabled
5432266dcd68 drm/i915/execlists: Cancel banned contexts on schedule-out
de7f703951e2 drm/i915/execlists: Force preemption
139aa7679696 drm/i915/gt: Try to more gracefully quiesce the system before resets

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 23:26     ` Kumar Valsan, Prathap
  0 siblings, 0 replies; 22+ messages in thread
From: Kumar Valsan, Prathap @ 2019-10-23 23:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Oct 23, 2019 at 01:21:51PM +0100, Chris Wilson wrote:
> Normally, we rely on our hangcheck to prevent persistent batches from
> hogging the GPU. However, if the user disables hangcheck, this mechanism
> breaks down. Despite our insistence that this is unsafe, the users are
> equally insistent that they want to use endless batches and will disable
> the hangcheck mechanism. We are looking at replacing hangcheck, in the
> next patch, with a softer mechanism, that sends a pulse down the engine
> to check if it is well. We can use the same preemptive pulse to flush an
> active context off the GPU upon context close, preventing resources
> being lost and unkillable requests remaining on the GPU after process
> termination.
> 
> Testcase: igt/gem_ctx_exec/basic-nohangcheck
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 141 ++++++++++++++++++++
>  1 file changed, 141 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7b01f4605f21..b2f042d87be0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -70,6 +70,7 @@
>  #include <drm/i915_drm.h>
>  
>  #include "gt/intel_lrc_reg.h"
> +#include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_engine_user.h"
>  
>  #include "i915_gem_context.h"
> @@ -276,6 +277,135 @@ void i915_gem_context_release(struct kref *ref)
>  		schedule_work(&gc->free_work);
>  }
>  
> +static inline struct i915_gem_engines *
> +__context_engines_static(const struct i915_gem_context *ctx)
> +{
> +	return rcu_dereference_protected(ctx->engines, true);
> +}
> +
> +static bool __reset_engine(struct intel_engine_cs *engine)
> +{
> +	struct intel_gt *gt = engine->gt;
> +	bool success = false;
> +
> +	if (!intel_has_reset_engine(gt))
> +		return false;
> +
> +	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> +			      &gt->reset.flags)) {
> +		success = intel_engine_reset(engine, NULL) == 0;
> +		clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
> +				      &gt->reset.flags);
> +	}
> +
> +	return success;
> +}
> +
> +static void __reset_context(struct i915_gem_context *ctx,
> +			    struct intel_engine_cs *engine)
> +{
> +	intel_gt_handle_error(engine->gt, engine->mask, 0,
> +			      "context closure in %s", ctx->name);
> +}
> +
> +static bool __cancel_engine(struct intel_engine_cs *engine)
> +{
> +	/*
> +	 * Send a "high priority pulse" down the engine to cause the
> +	 * current request to be momentarily preempted. (If it fails to
> +	 * be preempted, it will be reset). As we have marked our context
> +	 * as banned, any incomplete request, including any running, will
> +	 * be skipped following the preemption.
> +	 *
> +	 * If there is no hangchecking (one of the reasons why we try to
> +	 * cancel the context) and no forced preemption, there may be no
> +	 * means by which we reset the GPU and evict the persistent hog.
> +	 * Ergo if we are unable to inject a preemptive pulse that can
> +	 * kill the banned context, we fallback to doing a local reset
> +	 * instead.
> +	 */
> +	if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
> +		return true;
> +
> +	/* If we are unable to send a pulse, try resetting this engine. */
> +	return __reset_engine(engine);
> +}
> +
> +static struct intel_engine_cs *
> +active_engine(struct dma_fence *fence, struct intel_context *ce)
> +{
> +	struct i915_request *rq = to_request(fence);
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Serialise with __i915_request_submit() so that it sees
> +	 * is-banned?, or we know the request is already inflight.
> +	 */
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->active.lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->active.lock);
> +		spin_lock(&engine->active.lock);
> +		locked = engine;
> +	}
> +
> +	engine = NULL;
> +	if (i915_request_is_active(rq) && !rq->fence.error)
> +		engine = rq->engine;
> +
> +	spin_unlock_irq(&locked->active.lock);
> +
> +	return engine;
> +}
> +
> +static void kill_context(struct i915_gem_context *ctx)
> +{
> +	struct i915_gem_engines_iter it;
> +	struct intel_context *ce;
> +
> +	/*
> +	 * If we are already banned, it was due to a guilty request causing
> +	 * a reset and the entire context being evicted from the GPU.
> +	 */
> +	if (i915_gem_context_is_banned(ctx))
> +		return;
> +
> +	i915_gem_context_set_banned(ctx);
> +
> +	/*
> +	 * Map the user's engine back to the actual engines; one virtual
> +	 * engine will be mapped to multiple engines, and using ctx->engine[]
> +	 * the same engine may be have multiple instances in the user's map.
> +	 * However, we only care about pending requests, so only include
> +	 * engines on which there are incomplete requests.
> +	 */
> +	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
> +		struct intel_engine_cs *engine;
> +		struct dma_fence *fence;
> +
> +		if (!ce->timeline)
> +			continue;
> +
> +		fence = i915_active_fence_get(&ce->timeline->last_request);
> +		if (!fence)
> +			continue;
> +
> +		/* Check with the backend if the request is still inflight */
> +		engine = active_engine(fence, ce);
> +
> +		/* First attempt to gracefully cancel the context */
> +		if (engine && !__cancel_engine(engine))
> +			/*
> +			 * If we are unable to send a preemptive pulse to bump
> +			 * the context from the GPU, we have to resort to a full
> +			 * reset. We hope the collateral damage is worth it.
> +			 */
> +			__reset_context(ctx, engine);
> +
> +		dma_fence_put(fence);
> +	}
> +}
> +
>  static void context_close(struct i915_gem_context *ctx)
>  {
>  	struct i915_address_space *vm;
> @@ -298,6 +428,17 @@ static void context_close(struct i915_gem_context *ctx)
>  	lut_close(ctx);
>  
>  	mutex_unlock(&ctx->mutex);
> +
> +	/*
> +	 * If the user has disabled hangchecking, we can not be sure that
> +	 * the batches will ever complete after the context is closed,
> +	 * keeping the context and all resources pinned forever. So in this
> +	 * case we opt to forcibly kill off all remaining requests on
> +	 * context close.
> +	 */
> +	if (!i915_modparams.enable_hangcheck)
> +		kill_context(ctx);

Why not killing the context always when a context is closed?
When hang_check is enabled, how would it know the context is closed and
we should release its resources, unless and untill the context has
hanged?

Thanks,
Prathap
> +
>  	i915_gem_context_put(ctx);
>  }
>  
> -- 
> 2.24.0.rc0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 23:26     ` Kumar Valsan, Prathap
  0 siblings, 0 replies; 22+ messages in thread
From: Kumar Valsan, Prathap @ 2019-10-23 23:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Oct 23, 2019 at 01:21:51PM +0100, Chris Wilson wrote:
> Normally, we rely on our hangcheck to prevent persistent batches from
> hogging the GPU. However, if the user disables hangcheck, this mechanism
> breaks down. Despite our insistence that this is unsafe, the users are
> equally insistent that they want to use endless batches and will disable
> the hangcheck mechanism. We are looking at replacing hangcheck, in the
> next patch, with a softer mechanism, that sends a pulse down the engine
> to check if it is well. We can use the same preemptive pulse to flush an
> active context off the GPU upon context close, preventing resources
> being lost and unkillable requests remaining on the GPU after process
> termination.
> 
> Testcase: igt/gem_ctx_exec/basic-nohangcheck
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Jon Bloomfield <jon.bloomfield@intel.com>
> Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 141 ++++++++++++++++++++
>  1 file changed, 141 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7b01f4605f21..b2f042d87be0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -70,6 +70,7 @@
>  #include <drm/i915_drm.h>
>  
>  #include "gt/intel_lrc_reg.h"
> +#include "gt/intel_engine_heartbeat.h"
>  #include "gt/intel_engine_user.h"
>  
>  #include "i915_gem_context.h"
> @@ -276,6 +277,135 @@ void i915_gem_context_release(struct kref *ref)
>  		schedule_work(&gc->free_work);
>  }
>  
> +static inline struct i915_gem_engines *
> +__context_engines_static(const struct i915_gem_context *ctx)
> +{
> +	return rcu_dereference_protected(ctx->engines, true);
> +}
> +
> +static bool __reset_engine(struct intel_engine_cs *engine)
> +{
> +	struct intel_gt *gt = engine->gt;
> +	bool success = false;
> +
> +	if (!intel_has_reset_engine(gt))
> +		return false;
> +
> +	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> +			      &gt->reset.flags)) {
> +		success = intel_engine_reset(engine, NULL) == 0;
> +		clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
> +				      &gt->reset.flags);
> +	}
> +
> +	return success;
> +}
> +
> +static void __reset_context(struct i915_gem_context *ctx,
> +			    struct intel_engine_cs *engine)
> +{
> +	intel_gt_handle_error(engine->gt, engine->mask, 0,
> +			      "context closure in %s", ctx->name);
> +}
> +
> +static bool __cancel_engine(struct intel_engine_cs *engine)
> +{
> +	/*
> +	 * Send a "high priority pulse" down the engine to cause the
> +	 * current request to be momentarily preempted. (If it fails to
> +	 * be preempted, it will be reset). As we have marked our context
> +	 * as banned, any incomplete request, including any running, will
> +	 * be skipped following the preemption.
> +	 *
> +	 * If there is no hangchecking (one of the reasons why we try to
> +	 * cancel the context) and no forced preemption, there may be no
> +	 * means by which we reset the GPU and evict the persistent hog.
> +	 * Ergo if we are unable to inject a preemptive pulse that can
> +	 * kill the banned context, we fallback to doing a local reset
> +	 * instead.
> +	 */
> +	if (CONFIG_DRM_I915_PREEMPT_TIMEOUT && !intel_engine_pulse(engine))
> +		return true;
> +
> +	/* If we are unable to send a pulse, try resetting this engine. */
> +	return __reset_engine(engine);
> +}
> +
> +static struct intel_engine_cs *
> +active_engine(struct dma_fence *fence, struct intel_context *ce)
> +{
> +	struct i915_request *rq = to_request(fence);
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Serialise with __i915_request_submit() so that it sees
> +	 * is-banned?, or we know the request is already inflight.
> +	 */
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->active.lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->active.lock);
> +		spin_lock(&engine->active.lock);
> +		locked = engine;
> +	}
> +
> +	engine = NULL;
> +	if (i915_request_is_active(rq) && !rq->fence.error)
> +		engine = rq->engine;
> +
> +	spin_unlock_irq(&locked->active.lock);
> +
> +	return engine;
> +}
> +
> +static void kill_context(struct i915_gem_context *ctx)
> +{
> +	struct i915_gem_engines_iter it;
> +	struct intel_context *ce;
> +
> +	/*
> +	 * If we are already banned, it was due to a guilty request causing
> +	 * a reset and the entire context being evicted from the GPU.
> +	 */
> +	if (i915_gem_context_is_banned(ctx))
> +		return;
> +
> +	i915_gem_context_set_banned(ctx);
> +
> +	/*
> +	 * Map the user's engine back to the actual engines; one virtual
> +	 * engine will be mapped to multiple engines, and using ctx->engine[]
> +	 * the same engine may be have multiple instances in the user's map.
> +	 * However, we only care about pending requests, so only include
> +	 * engines on which there are incomplete requests.
> +	 */
> +	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
> +		struct intel_engine_cs *engine;
> +		struct dma_fence *fence;
> +
> +		if (!ce->timeline)
> +			continue;
> +
> +		fence = i915_active_fence_get(&ce->timeline->last_request);
> +		if (!fence)
> +			continue;
> +
> +		/* Check with the backend if the request is still inflight */
> +		engine = active_engine(fence, ce);
> +
> +		/* First attempt to gracefully cancel the context */
> +		if (engine && !__cancel_engine(engine))
> +			/*
> +			 * If we are unable to send a preemptive pulse to bump
> +			 * the context from the GPU, we have to resort to a full
> +			 * reset. We hope the collateral damage is worth it.
> +			 */
> +			__reset_context(ctx, engine);
> +
> +		dma_fence_put(fence);
> +	}
> +}
> +
>  static void context_close(struct i915_gem_context *ctx)
>  {
>  	struct i915_address_space *vm;
> @@ -298,6 +428,17 @@ static void context_close(struct i915_gem_context *ctx)
>  	lut_close(ctx);
>  
>  	mutex_unlock(&ctx->mutex);
> +
> +	/*
> +	 * If the user has disabled hangchecking, we can not be sure that
> +	 * the batches will ever complete after the context is closed,
> +	 * keeping the context and all resources pinned forever. So in this
> +	 * case we opt to forcibly kill off all remaining requests on
> +	 * context close.
> +	 */
> +	if (!i915_modparams.enable_hangcheck)
> +		kill_context(ctx);

Why not killing the context always when a context is closed?
When hang_check is enabled, how would it know the context is closed and
we should release its resources, unless and untill the context has
hanged?

Thanks,
Prathap
> +
>  	i915_gem_context_put(ctx);
>  }
>  
> -- 
> 2.24.0.rc0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 23:30       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 23:30 UTC (permalink / raw)
  To: Kumar Valsan, Prathap; +Cc: intel-gfx

Quoting Kumar Valsan, Prathap (2019-10-24 00:26:06)
> On Wed, Oct 23, 2019 at 01:21:51PM +0100, Chris Wilson wrote:
> > +
> > +     /*
> > +      * If the user has disabled hangchecking, we can not be sure that
> > +      * the batches will ever complete after the context is closed,
> > +      * keeping the context and all resources pinned forever. So in this
> > +      * case we opt to forcibly kill off all remaining requests on
> > +      * context close.
> > +      */
> > +     if (!i915_modparams.enable_hangcheck)
> > +             kill_context(ctx);
> 
> Why not killing the context always when a context is closed?

Because we historically have not and so desktop userspace has come to
depend on that behaviour (think one client handing over a framebuffer to
the display server with pending rendering).

> When hang_check is enabled, how would it know the context is closed and
> we should release its resources, unless and untill the context has
> hanged?

Exactly. The contexts persist until complete. The same dos prevention
rules apply to the outstanding work as applied when the context was
open.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled
@ 2019-10-23 23:30       ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2019-10-23 23:30 UTC (permalink / raw)
  To: Kumar Valsan, Prathap; +Cc: intel-gfx

Quoting Kumar Valsan, Prathap (2019-10-24 00:26:06)
> On Wed, Oct 23, 2019 at 01:21:51PM +0100, Chris Wilson wrote:
> > +
> > +     /*
> > +      * If the user has disabled hangchecking, we can not be sure that
> > +      * the batches will ever complete after the context is closed,
> > +      * keeping the context and all resources pinned forever. So in this
> > +      * case we opt to forcibly kill off all remaining requests on
> > +      * context close.
> > +      */
> > +     if (!i915_modparams.enable_hangcheck)
> > +             kill_context(ctx);
> 
> Why not killing the context always when a context is closed?

Because we historically have not and so desktop userspace has come to
depend on that behaviour (think one client handing over a framebuffer to
the display server with pending rendering).

> When hang_check is enabled, how would it know the context is closed and
> we should release its resources, unless and untill the context has
> hanged?

Exactly. The contexts persist until complete. The same dos prevention
rules apply to the outstanding work as applied when the context was
open.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-24 16:35   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-24 16:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_7165_full -> Patchwork_14948_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_14948_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_14948_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14948_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_shared@q-promotion-bsd1:
    - shard-apl:          [PASS][1] -> [FAIL][2] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl6/igt@gem_ctx_shared@q-promotion-bsd1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl2/igt@gem_ctx_shared@q-promotion-bsd1.html

  * igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd:
    - shard-glk:          [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-glk6/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-glk6/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html

  * igt@gem_exec_schedule@preempt-queue-contexts-render:
    - shard-skl:          [PASS][5] -> [FAIL][6] +4 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl9/igt@gem_exec_schedule@preempt-queue-contexts-render.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl6/igt@gem_exec_schedule@preempt-queue-contexts-render.html

  
#### Warnings ####

  * igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd:
    - shard-iclb:         [SKIP][7] ([fdo#111325]) -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb1/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb5/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_schedule@wide-bsd2:
    - {shard-tglb}:       [PASS][9] -> [FAIL][10] +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb8/igt@gem_exec_schedule@wide-bsd2.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb6/igt@gem_exec_schedule@wide-bsd2.html

  * {igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen}:
    - {shard-tglb}:       [PASS][11] -> [INCOMPLETE][12] +1 similar issue
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb1/igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb7/igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen.html

  

### Piglit changes ###

#### Possible regressions ####

  * spec@arb_shader_image_load_store@shader-mem-barrier (NEW):
    - pig-glk-j5005:      NOTRUN -> [FAIL][13]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/pig-glk-j5005/spec@arb_shader_image_load_store@shader-mem-barrier.html

  
New tests
---------

  New tests have been introduced between CI_DRM_7165_full and Patchwork_14948_full:

### New Piglit tests (1) ###

  * spec@arb_shader_image_load_store@shader-mem-barrier:
    - Statuses : 1 fail(s)
    - Exec time: [2.75] s

  

Known issues
------------

  Here are the changes found in Patchwork_14948_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_exec@basic-invalid-context-vcs1:
    - shard-iclb:         [PASS][14] -> [SKIP][15] ([fdo#112080]) +16 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb4/igt@gem_ctx_exec@basic-invalid-context-vcs1.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb7/igt@gem_ctx_exec@basic-invalid-context-vcs1.html

  * igt@gem_ctx_isolation@rcs0-s3:
    - shard-kbl:          [PASS][16] -> [DMESG-WARN][17] ([fdo#108566]) +4 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl2/igt@gem_ctx_isolation@rcs0-s3.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl4/igt@gem_ctx_isolation@rcs0-s3.html

  * igt@gem_ctx_isolation@vcs1-dirty-create:
    - shard-iclb:         [PASS][18] -> [SKIP][19] ([fdo#109276] / [fdo#112080])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb2/igt@gem_ctx_isolation@vcs1-dirty-create.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb3/igt@gem_ctx_isolation@vcs1-dirty-create.html

  * igt@gem_exec_schedule@fifo-bsd1:
    - shard-iclb:         [PASS][20] -> [SKIP][21] ([fdo#109276]) +23 similar issues
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb4/igt@gem_exec_schedule@fifo-bsd1.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb5/igt@gem_exec_schedule@fifo-bsd1.html

  * igt@gem_exec_schedule@pi-ringfull-render:
    - shard-kbl:          [PASS][22] -> [INCOMPLETE][23] ([fdo#103665])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@gem_exec_schedule@pi-ringfull-render.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl1/igt@gem_exec_schedule@pi-ringfull-render.html

  * igt@gem_exec_schedule@preempt-bsd:
    - shard-iclb:         [PASS][24] -> [SKIP][25] ([fdo#111325]) +3 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@gem_exec_schedule@preempt-bsd.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb1/igt@gem_exec_schedule@preempt-bsd.html

  * igt@gem_persistent_relocs@forked-interruptible-thrash-inactive:
    - shard-kbl:          [PASS][26] -> [TIMEOUT][27] ([fdo#112068 ])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl4/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl2/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html

  * igt@gem_userptr_blits@map-fixed-invalidate-busy-gup:
    - shard-hsw:          [PASS][28] -> [DMESG-WARN][29] ([fdo#111870]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@gem_userptr_blits@map-fixed-invalidate-busy-gup.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@gem_userptr_blits@map-fixed-invalidate-busy-gup.html

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy:
    - shard-snb:          [PASS][30] -> [DMESG-WARN][31] ([fdo#111870])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb6/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb2/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html

  * igt@kms_color@pipe-a-ctm-0-25:
    - shard-skl:          [PASS][32] -> [DMESG-WARN][33] ([fdo#106107])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl10/igt@kms_color@pipe-a-ctm-0-25.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl5/igt@kms_color@pipe-a-ctm-0-25.html

  * igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge:
    - shard-apl:          [PASS][34] -> [INCOMPLETE][35] ([fdo#103927]) +4 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl8/igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl7/igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-hsw:          [PASS][36] -> [INCOMPLETE][37] ([fdo#103540])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-badstride:
    - shard-iclb:         [PASS][38] -> [FAIL][39] ([fdo#103167]) +6 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb8/igt@kms_frontbuffer_tracking@fbc-badstride.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb7/igt@kms_frontbuffer_tracking@fbc-badstride.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt:
    - shard-skl:          [PASS][40] -> [FAIL][41] ([fdo#103167]) +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl4/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl7/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - shard-skl:          [PASS][42] -> [FAIL][43] ([fdo#103191])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl4/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl7/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
    - shard-iclb:         [PASS][44] -> [SKIP][45] ([fdo#109441])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb8/igt@kms_psr@psr2_cursor_mmap_cpu.html

  * igt@kms_setmode@basic:
    - shard-glk:          [PASS][46] -> [FAIL][47] ([fdo#99912])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-glk3/igt@kms_setmode@basic.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-glk8/igt@kms_setmode@basic.html

  * igt@prime_busy@wait-hang-bsd:
    - shard-kbl:          [PASS][48] -> [SKIP][49] ([fdo#109271])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@prime_busy@wait-hang-bsd.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl1/igt@prime_busy@wait-hang-bsd.html

  
#### Possible fixes ####

  * igt@gem_busy@busy-vcs1:
    - shard-iclb:         [SKIP][50] ([fdo#112080]) -> [PASS][51] +14 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb6/igt@gem_busy@busy-vcs1.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb4/igt@gem_busy@busy-vcs1.html

  * {igt@gem_ctx_exec@basic-nohangcheck}:
    - shard-snb:          [FAIL][52] ([fdo#112117]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@gem_ctx_exec@basic-nohangcheck.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb5/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-hsw:          [FAIL][54] ([fdo#112117]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@gem_ctx_exec@basic-nohangcheck.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_ctx_isolation@vcs1-clean:
    - shard-iclb:         [SKIP][56] ([fdo#109276] / [fdo#112080]) -> [PASS][57] +1 similar issue
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@gem_ctx_isolation@vcs1-clean.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb1/igt@gem_ctx_isolation@vcs1-clean.html

  * igt@gem_ctx_isolation@vcs1-s3:
    - {shard-tglb}:       [INCOMPLETE][58] ([fdo#111832]) -> [PASS][59] +1 similar issue
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb3/igt@gem_ctx_isolation@vcs1-s3.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb6/igt@gem_ctx_isolation@vcs1-s3.html

  * igt@gem_ctx_shared@q-smoketest-bsd2:
    - {shard-tglb}:       [INCOMPLETE][60] ([fdo# 111852 ]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb6/igt@gem_ctx_shared@q-smoketest-bsd2.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb2/igt@gem_ctx_shared@q-smoketest-bsd2.html

  * igt@gem_eio@wait-immediate:
    - shard-kbl:          [SKIP][62] ([fdo#109271]) -> [PASS][63] +4 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl6/igt@gem_eio@wait-immediate.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl7/igt@gem_eio@wait-immediate.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][64] ([fdo#110854]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb5/igt@gem_exec_balancer@smoke.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb2/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@preempt-other-chain-bsd:
    - shard-iclb:         [SKIP][66] ([fdo#111325]) -> [PASS][67] +9 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb1/igt@gem_exec_schedule@preempt-other-chain-bsd.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb6/igt@gem_exec_schedule@preempt-other-chain-bsd.html

  * igt@gem_linear_blits@interruptible:
    - shard-apl:          [INCOMPLETE][68] ([fdo#103927] / [fdo#112067]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl8/igt@gem_linear_blits@interruptible.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl8/igt@gem_linear_blits@interruptible.html

  * igt@gem_persistent_relocs@forked-interruptible-thrashing:
    - shard-kbl:          [FAIL][70] ([fdo#112037]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl2/igt@gem_persistent_relocs@forked-interruptible-thrashing.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-snb:          [DMESG-WARN][72] ([fdo#111870]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@gem_userptr_blits@dmabuf-sync.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb7/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@dmabuf-unsync:
    - shard-hsw:          [DMESG-WARN][74] ([fdo#111870]) -> [PASS][75] +1 similar issue
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw2/igt@gem_userptr_blits@dmabuf-unsync.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw4/igt@gem_userptr_blits@dmabuf-unsync.html

  * igt@i915_hangman@error-state-basic:
    - shard-hsw:          [SKIP][76] ([fdo#109271]) -> [PASS][77] +5 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@i915_hangman@error-state-basic.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw2/igt@i915_hangman@error-state-basic.html

  * igt@i915_selftest@live_hangcheck:
    - {shard-tglb}:       [INCOMPLETE][78] ([fdo#111747]) -> [PASS][79]
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb5/igt@i915_selftest@live_hangcheck.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb8/igt@i915_selftest@live_hangcheck.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - {shard-tglb}:       [INCOMPLETE][80] ([fdo#111832] / [fdo#111850]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb8/igt@i915_suspend@fence-restore-tiled2untiled.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb5/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b:
    - shard-snb:          [SKIP][82] ([fdo#109271]) -> [PASS][83] +2 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb5/igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b.html

  * igt@kms_color@pipe-a-ctm-green-to-red:
    - shard-skl:          [DMESG-WARN][84] ([fdo#106107]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl7/igt@kms_color@pipe-a-ctm-green-to-red.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl9/igt@kms_color@pipe-a-ctm-green-to-red.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - shard-apl:          [INCOMPLETE][86] ([fdo#103927]) -> [PASS][87] +1 similar issue
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [FAIL][88] ([fdo#105363]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl3/igt@kms_flip@flip-vs-expired-vblank.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl8/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-skl:          [INCOMPLETE][90] ([fdo#109507]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl2/igt@kms_flip@flip-vs-suspend-interruptible.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl1/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-stridechange:
    - {shard-tglb}:       [FAIL][92] ([fdo#103167]) -> [PASS][93] +4 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-stridechange.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb4/igt@kms_frontbuffer_tracking@fbc-stridechange.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite:
    - shard-iclb:         [FAIL][94] ([fdo#103167]) -> [PASS][95] +4 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-kbl:          [DMESG-WARN][96] ([fdo#108566]) -> [PASS][97] +3 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl3/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [FAIL][98] ([fdo#108145] / [fdo#110403]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl6/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [FAIL][100] ([fdo#108145]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [101]: https://intel-gfx-ci.

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
@ 2019-10-24 16:35   ` Patchwork
  0 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2019-10-24 16:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets
URL   : https://patchwork.freedesktop.org/series/68457/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_7165_full -> Patchwork_14948_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_14948_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_14948_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_14948_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_shared@q-promotion-bsd1:
    - shard-apl:          [PASS][1] -> [FAIL][2] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl6/igt@gem_ctx_shared@q-promotion-bsd1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl2/igt@gem_ctx_shared@q-promotion-bsd1.html

  * igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd:
    - shard-glk:          [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-glk6/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-glk6/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html

  * igt@gem_exec_schedule@preempt-queue-contexts-render:
    - shard-skl:          [PASS][5] -> [FAIL][6] +4 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl9/igt@gem_exec_schedule@preempt-queue-contexts-render.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl6/igt@gem_exec_schedule@preempt-queue-contexts-render.html

  
#### Warnings ####

  * igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd:
    - shard-iclb:         [SKIP][7] ([fdo#111325]) -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb1/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb5/igt@gem_exec_schedule@preempt-queue-contexts-chain-bsd.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_schedule@wide-bsd2:
    - {shard-tglb}:       [PASS][9] -> [FAIL][10] +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb8/igt@gem_exec_schedule@wide-bsd2.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb6/igt@gem_exec_schedule@wide-bsd2.html

  * {igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen}:
    - {shard-tglb}:       [PASS][11] -> [INCOMPLETE][12] +1 similar issue
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb1/igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb7/igt@kms_cursor_crc@pipe-d-cursor-64x64-offscreen.html

  

### Piglit changes ###

#### Possible regressions ####

  * spec@arb_shader_image_load_store@shader-mem-barrier (NEW):
    - pig-glk-j5005:      NOTRUN -> [FAIL][13]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/pig-glk-j5005/spec@arb_shader_image_load_store@shader-mem-barrier.html

  
New tests
---------

  New tests have been introduced between CI_DRM_7165_full and Patchwork_14948_full:

### New Piglit tests (1) ###

  * spec@arb_shader_image_load_store@shader-mem-barrier:
    - Statuses : 1 fail(s)
    - Exec time: [2.75] s

  

Known issues
------------

  Here are the changes found in Patchwork_14948_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_exec@basic-invalid-context-vcs1:
    - shard-iclb:         [PASS][14] -> [SKIP][15] ([fdo#112080]) +16 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb4/igt@gem_ctx_exec@basic-invalid-context-vcs1.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb7/igt@gem_ctx_exec@basic-invalid-context-vcs1.html

  * igt@gem_ctx_isolation@rcs0-s3:
    - shard-kbl:          [PASS][16] -> [DMESG-WARN][17] ([fdo#108566]) +4 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl2/igt@gem_ctx_isolation@rcs0-s3.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl4/igt@gem_ctx_isolation@rcs0-s3.html

  * igt@gem_ctx_isolation@vcs1-dirty-create:
    - shard-iclb:         [PASS][18] -> [SKIP][19] ([fdo#109276] / [fdo#112080])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb2/igt@gem_ctx_isolation@vcs1-dirty-create.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb3/igt@gem_ctx_isolation@vcs1-dirty-create.html

  * igt@gem_exec_schedule@fifo-bsd1:
    - shard-iclb:         [PASS][20] -> [SKIP][21] ([fdo#109276]) +23 similar issues
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb4/igt@gem_exec_schedule@fifo-bsd1.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb5/igt@gem_exec_schedule@fifo-bsd1.html

  * igt@gem_exec_schedule@pi-ringfull-render:
    - shard-kbl:          [PASS][22] -> [INCOMPLETE][23] ([fdo#103665])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@gem_exec_schedule@pi-ringfull-render.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl1/igt@gem_exec_schedule@pi-ringfull-render.html

  * igt@gem_exec_schedule@preempt-bsd:
    - shard-iclb:         [PASS][24] -> [SKIP][25] ([fdo#111325]) +3 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@gem_exec_schedule@preempt-bsd.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb1/igt@gem_exec_schedule@preempt-bsd.html

  * igt@gem_persistent_relocs@forked-interruptible-thrash-inactive:
    - shard-kbl:          [PASS][26] -> [TIMEOUT][27] ([fdo#112068 ])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl4/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl2/igt@gem_persistent_relocs@forked-interruptible-thrash-inactive.html

  * igt@gem_userptr_blits@map-fixed-invalidate-busy-gup:
    - shard-hsw:          [PASS][28] -> [DMESG-WARN][29] ([fdo#111870]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@gem_userptr_blits@map-fixed-invalidate-busy-gup.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@gem_userptr_blits@map-fixed-invalidate-busy-gup.html

  * igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy:
    - shard-snb:          [PASS][30] -> [DMESG-WARN][31] ([fdo#111870])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb6/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb2/igt@gem_userptr_blits@map-fixed-invalidate-overlap-busy.html

  * igt@kms_color@pipe-a-ctm-0-25:
    - shard-skl:          [PASS][32] -> [DMESG-WARN][33] ([fdo#106107])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl10/igt@kms_color@pipe-a-ctm-0-25.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl5/igt@kms_color@pipe-a-ctm-0-25.html

  * igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge:
    - shard-apl:          [PASS][34] -> [INCOMPLETE][35] ([fdo#103927]) +4 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl8/igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl7/igt@kms_cursor_edge_walk@pipe-a-64x64-left-edge.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-hsw:          [PASS][36] -> [INCOMPLETE][37] ([fdo#103540])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-badstride:
    - shard-iclb:         [PASS][38] -> [FAIL][39] ([fdo#103167]) +6 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb8/igt@kms_frontbuffer_tracking@fbc-badstride.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb7/igt@kms_frontbuffer_tracking@fbc-badstride.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt:
    - shard-skl:          [PASS][40] -> [FAIL][41] ([fdo#103167]) +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl4/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl7/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-mmap-gtt.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - shard-skl:          [PASS][42] -> [FAIL][43] ([fdo#103191])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl4/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl7/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
    - shard-iclb:         [PASS][44] -> [SKIP][45] ([fdo#109441])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb8/igt@kms_psr@psr2_cursor_mmap_cpu.html

  * igt@kms_setmode@basic:
    - shard-glk:          [PASS][46] -> [FAIL][47] ([fdo#99912])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-glk3/igt@kms_setmode@basic.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-glk8/igt@kms_setmode@basic.html

  * igt@prime_busy@wait-hang-bsd:
    - shard-kbl:          [PASS][48] -> [SKIP][49] ([fdo#109271])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@prime_busy@wait-hang-bsd.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl1/igt@prime_busy@wait-hang-bsd.html

  
#### Possible fixes ####

  * igt@gem_busy@busy-vcs1:
    - shard-iclb:         [SKIP][50] ([fdo#112080]) -> [PASS][51] +14 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb6/igt@gem_busy@busy-vcs1.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb4/igt@gem_busy@busy-vcs1.html

  * {igt@gem_ctx_exec@basic-nohangcheck}:
    - shard-snb:          [FAIL][52] ([fdo#112117]) -> [PASS][53]
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@gem_ctx_exec@basic-nohangcheck.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb5/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-hsw:          [FAIL][54] ([fdo#112117]) -> [PASS][55]
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@gem_ctx_exec@basic-nohangcheck.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw5/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_ctx_isolation@vcs1-clean:
    - shard-iclb:         [SKIP][56] ([fdo#109276] / [fdo#112080]) -> [PASS][57] +1 similar issue
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@gem_ctx_isolation@vcs1-clean.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb1/igt@gem_ctx_isolation@vcs1-clean.html

  * igt@gem_ctx_isolation@vcs1-s3:
    - {shard-tglb}:       [INCOMPLETE][58] ([fdo#111832]) -> [PASS][59] +1 similar issue
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb3/igt@gem_ctx_isolation@vcs1-s3.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb6/igt@gem_ctx_isolation@vcs1-s3.html

  * igt@gem_ctx_shared@q-smoketest-bsd2:
    - {shard-tglb}:       [INCOMPLETE][60] ([fdo# 111852 ]) -> [PASS][61]
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb6/igt@gem_ctx_shared@q-smoketest-bsd2.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb2/igt@gem_ctx_shared@q-smoketest-bsd2.html

  * igt@gem_eio@wait-immediate:
    - shard-kbl:          [SKIP][62] ([fdo#109271]) -> [PASS][63] +4 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl6/igt@gem_eio@wait-immediate.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl7/igt@gem_eio@wait-immediate.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [SKIP][64] ([fdo#110854]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb5/igt@gem_exec_balancer@smoke.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb2/igt@gem_exec_balancer@smoke.html

  * igt@gem_exec_schedule@preempt-other-chain-bsd:
    - shard-iclb:         [SKIP][66] ([fdo#111325]) -> [PASS][67] +9 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb1/igt@gem_exec_schedule@preempt-other-chain-bsd.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb6/igt@gem_exec_schedule@preempt-other-chain-bsd.html

  * igt@gem_linear_blits@interruptible:
    - shard-apl:          [INCOMPLETE][68] ([fdo#103927] / [fdo#112067]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl8/igt@gem_linear_blits@interruptible.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl8/igt@gem_linear_blits@interruptible.html

  * igt@gem_persistent_relocs@forked-interruptible-thrashing:
    - shard-kbl:          [FAIL][70] ([fdo#112037]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl3/igt@gem_persistent_relocs@forked-interruptible-thrashing.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl2/igt@gem_persistent_relocs@forked-interruptible-thrashing.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-snb:          [DMESG-WARN][72] ([fdo#111870]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@gem_userptr_blits@dmabuf-sync.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb7/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@dmabuf-unsync:
    - shard-hsw:          [DMESG-WARN][74] ([fdo#111870]) -> [PASS][75] +1 similar issue
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw2/igt@gem_userptr_blits@dmabuf-unsync.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw4/igt@gem_userptr_blits@dmabuf-unsync.html

  * igt@i915_hangman@error-state-basic:
    - shard-hsw:          [SKIP][76] ([fdo#109271]) -> [PASS][77] +5 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-hsw6/igt@i915_hangman@error-state-basic.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-hsw2/igt@i915_hangman@error-state-basic.html

  * igt@i915_selftest@live_hangcheck:
    - {shard-tglb}:       [INCOMPLETE][78] ([fdo#111747]) -> [PASS][79]
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb5/igt@i915_selftest@live_hangcheck.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb8/igt@i915_selftest@live_hangcheck.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - {shard-tglb}:       [INCOMPLETE][80] ([fdo#111832] / [fdo#111850]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb8/igt@i915_suspend@fence-restore-tiled2untiled.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb5/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b:
    - shard-snb:          [SKIP][82] ([fdo#109271]) -> [PASS][83] +2 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-snb5/igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-snb5/igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b.html

  * igt@kms_color@pipe-a-ctm-green-to-red:
    - shard-skl:          [DMESG-WARN][84] ([fdo#106107]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl7/igt@kms_color@pipe-a-ctm-green-to-red.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl9/igt@kms_color@pipe-a-ctm-green-to-red.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - shard-apl:          [INCOMPLETE][86] ([fdo#103927]) -> [PASS][87] +1 similar issue
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-apl2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-apl2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          [FAIL][88] ([fdo#105363]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl3/igt@kms_flip@flip-vs-expired-vblank.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl8/igt@kms_flip@flip-vs-expired-vblank.html

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-skl:          [INCOMPLETE][90] ([fdo#109507]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl2/igt@kms_flip@flip-vs-suspend-interruptible.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl1/igt@kms_flip@flip-vs-suspend-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-stridechange:
    - {shard-tglb}:       [FAIL][92] ([fdo#103167]) -> [PASS][93] +4 similar issues
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-stridechange.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-tglb4/igt@kms_frontbuffer_tracking@fbc-stridechange.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite:
    - shard-iclb:         [FAIL][94] ([fdo#103167]) -> [PASS][95] +4 similar issues
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-iclb7/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-iclb6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-kbl:          [DMESG-WARN][96] ([fdo#108566]) -> [PASS][97] +3 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-kbl3/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [FAIL][98] ([fdo#108145] / [fdo#110403]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/shard-skl6/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          [FAIL][100] ([fdo#108145]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7165/shard-skl1/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html
   [101]: https://intel-gfx-ci.

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_14948/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-24 16:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 12:21 [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets Chris Wilson
2019-10-23 12:21 ` [Intel-gfx] " Chris Wilson
2019-10-23 12:21 ` [CI 2/4] drm/i915/execlists: Force preemption Chris Wilson
2019-10-23 12:21   ` [Intel-gfx] " Chris Wilson
2019-10-23 12:21 ` [CI 3/4] drm/i915/execlists: Cancel banned contexts on schedule-out Chris Wilson
2019-10-23 12:21   ` [Intel-gfx] " Chris Wilson
2019-10-23 12:21 ` [CI 4/4] drm/i915/gem: Cancel contexts when hangchecking is disabled Chris Wilson
2019-10-23 12:21   ` [Intel-gfx] " Chris Wilson
2019-10-23 23:26   ` Kumar Valsan, Prathap
2019-10-23 23:26     ` [Intel-gfx] " Kumar Valsan, Prathap
2019-10-23 23:30     ` Chris Wilson
2019-10-23 23:30       ` [Intel-gfx] " Chris Wilson
2019-10-23 13:21 ` [CI 1/4] drm/i915/gt: Try to more gracefully quiesce the system before resets Mika Kuoppala
2019-10-23 13:21   ` [Intel-gfx] " Mika Kuoppala
2019-10-23 13:28   ` Chris Wilson
2019-10-23 13:28     ` [Intel-gfx] " Chris Wilson
2019-10-23 21:44 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [CI,1/4] " Patchwork
2019-10-23 21:44   ` [Intel-gfx] " Patchwork
2019-10-23 22:14 ` ✓ Fi.CI.BAT: success " Patchwork
2019-10-23 22:14   ` [Intel-gfx] " Patchwork
2019-10-24 16:35 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-10-24 16:35   ` [Intel-gfx] " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.