All of lore.kernel.org
 help / color / mirror / Atom feed
* HWSP for HW semaphores
@ 2019-01-21 22:20 Chris Wilson
  2019-01-21 22:20 ` [PATCH 01/34] drm/i915/execlists: Mark up priority boost on preemption Chris Wilson
                   ` (40 more replies)
  0 siblings, 41 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

I extended the HWSP implementation to consider the impact of using it
for HW semaphores, one of the end goals of per-context seqno. That opens
up an interesting problem in that we need to keep the HWSP around until
all external GPU references to it are retired. For simplicity, this is
until the GPU is next idle, but Tvrtko suggested the likelihood of that
happening on a busy system is slight and those busy systems are also
more likely to run into resource contentions issues as well. That was a
can of worms I was hoping to ignore until later, as one of the
simplifications for removing the global_seqno was that we could simply
keep all resources pinned until idle, a full GC. With a full GC being
forced if we ever starved. Far more graceful is that if we did a more
incremental GC, and combined with the case of tracking external references
we would end up with a read-copy-update mechanism...

Anyway this series shows off HW semaphores for inter-engine
synchronisation and should also extend easily to unordered work queuing
unto the GuC. I need the fence primitives for the next (well, older!)
series...
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 01/34] drm/i915/execlists: Mark up priority boost on preemption
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-21 22:20 ` [PATCH 02/34] drm/i915/execlists: Suppress preempting self Chris Wilson
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Record the priority boost we giving to the preempted client or else we
may end up in a situation where the priority queue no longer matches the
request priority order and so we can end up in an infinite loop of
preempting the same pair of requests.

Fixes: e9eaf82d97a2 ("drm/i915: Priority boost for waiting clients")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c0a42afaf177..b74f25420683 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -302,6 +302,7 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 */
 	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
 		prio |= I915_PRIORITY_NEWCLIENT;
+		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
 	}
@@ -625,6 +626,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
+			GEM_BUG_ON(last &&
+				   need_preempt(engine, last, rq_prio(rq)));
+
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 02/34] drm/i915/execlists: Suppress preempting self
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
  2019-01-21 22:20 ` [PATCH 01/34] drm/i915/execlists: Mark up priority boost on preemption Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 22:18   ` John Harrison
  2019-01-21 22:20 ` [PATCH 03/34] drm/i915: Show all active engines on hangcheck Chris Wilson
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

To avoid the extra complications, after deciding that we have
potentially queued a request with higher priority than the currently
executing request, inspect the head of the queue to see if it is indeed
higher priority from another context.

References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 20 ++++++++++++++----
 drivers/gpu/drm/i915/intel_lrc.c      | 29 ++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 340faea6c08a..fb5d953430e5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -239,6 +239,18 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
 	return engine;
 }
 
+static bool inflight(const struct i915_request *rq,
+		     const struct intel_engine_cs *engine)
+{
+	const struct i915_request *active;
+
+	if (!rq->global_seqno)
+		return false;
+
+	active = port_request(engine->execlists.port);
+	return active->hw_context == rq->hw_context;
+}
+
 static void __i915_schedule(struct i915_request *rq,
 			    const struct i915_sched_attr *attr)
 {
@@ -328,6 +340,7 @@ static void __i915_schedule(struct i915_request *rq,
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		engine = sched_lock_engine(node, engine);
+		lockdep_assert_held(&engine->timeline.lock);
 
 		/* Recheck after acquiring the engine->timeline.lock */
 		if (prio <= node->attr.priority || node_signaled(node))
@@ -356,17 +369,16 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= engine->execlists.queue_priority)
 			continue;
 
+		engine->execlists.queue_priority = prio;
+
 		/*
 		 * If we are already the currently executing context, don't
 		 * bother evaluating if we should preempt ourselves.
 		 */
-		if (node_to_request(node)->global_seqno &&
-		    i915_seqno_passed(port_request(engine->execlists.port)->global_seqno,
-				      node_to_request(node)->global_seqno))
+		if (inflight(node_to_request(node), engine))
 			continue;
 
 		/* Defer (tasklet) submission until after all of our updates. */
-		engine->execlists.queue_priority = prio;
 		tasklet_hi_schedule(&engine->execlists.tasklet);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b74f25420683..28d183439952 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -190,6 +190,30 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 		!i915_request_completed(last));
 }
 
+static inline bool check_preempt(const struct intel_engine_cs *engine,
+				 const struct i915_request *rq)
+{
+	const struct intel_context *ctx = rq->hw_context;
+	const int prio = rq_prio(rq);
+	struct rb_node *rb;
+	int idx;
+
+	list_for_each_entry_continue(rq, &engine->timeline.requests, link) {
+		GEM_BUG_ON(rq->hw_context == ctx);
+		if (rq_prio(rq) > prio)
+			return true;
+	}
+
+	rb = rb_first_cached(&engine->execlists.queue);
+	if (!rb)
+		return false;
+
+	priolist_for_each_request(rq, to_priolist(rb), idx)
+		return rq->hw_context != ctx && rq_prio(rq) > prio;
+
+	return false;
+}
+
 /*
  * The context descriptor encodes various attributes of a context,
  * including its GTT address and some flags. Because it's fairly
@@ -580,7 +604,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last, execlists->queue_priority)) {
+		if (need_preempt(engine, last, execlists->queue_priority) &&
+		    check_preempt(engine, last)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -872,6 +897,8 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u32 * const buf = execlists->csb_status;
 	u8 head, tail;
 
+	lockdep_assert_held(&engine->timeline.lock);
+
 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
 	 * When reading from the csb_write mmio register, we have to be
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 03/34] drm/i915: Show all active engines on hangcheck
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
  2019-01-21 22:20 ` [PATCH 01/34] drm/i915/execlists: Mark up priority boost on preemption Chris Wilson
  2019-01-21 22:20 ` [PATCH 02/34] drm/i915/execlists: Suppress preempting self Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 12:33   ` Mika Kuoppala
  2019-01-21 22:20 ` [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework Chris Wilson
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

This turns out to be quite useful if one happens to be debugging
semaphore deadlocks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_hangcheck.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 7dc11fcb13de..741441daae32 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -195,10 +195,6 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
 		break;
 
 	case ENGINE_DEAD:
-		if (GEM_SHOW_DEBUG()) {
-			struct drm_printer p = drm_debug_printer("hangcheck");
-			intel_engine_dump(engine, &p, "%s\n", engine->name);
-		}
 		break;
 
 	default:
@@ -285,6 +281,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			wedged |= intel_engine_flag(engine);
 	}
 
+	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
+		struct drm_printer p = drm_debug_printer("hangcheck");
+
+		for_each_engine(engine, dev_priv, id) {
+			if (intel_engine_is_idle(engine))
+				continue;
+
+			intel_engine_dump(engine, &p, "%s\n", engine->name);
+		}
+	}
+
 	if (wedged) {
 		dev_err(dev_priv->drm.dev,
 			"GPU recovery timed out,"
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (2 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 03/34] drm/i915: Show all active engines on hangcheck Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 12:37   ` Matthew Auld
  2019-01-21 22:20 ` [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly Chris Wilson
                   ` (36 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Before adding yet another copy of struct live_test and its handler,
refactor the existing code into a common framework for live selftests.
For many live selftests, we want to know if the GPU hung or otherwise
misbehaved during the execution of the test (beyond any infraction in
the behaviour under test), live_test provides this by comparing the
GPU state before and after, alerting if it unexpectedly changed (e.g.
the reset counter changed). It also ensures that the GPU is idle before
and after the test, so that residual code running on the GPU is flushed
before testing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/selftests/i915_gem_context.c | 103 +++---------------
 drivers/gpu/drm/i915/selftests/i915_request.c |  86 +++------------
 .../gpu/drm/i915/selftests/igt_live_test.c    |  85 +++++++++++++++
 .../gpu/drm/i915/selftests/igt_live_test.h    |  35 ++++++
 5 files changed, 147 insertions(+), 163 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/igt_live_test.c
 create mode 100644 drivers/gpu/drm/i915/selftests/igt_live_test.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 611115ed00db..f050759686ca 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -167,6 +167,7 @@ i915-$(CONFIG_DRM_I915_SELFTEST) += \
 	selftests/i915_random.o \
 	selftests/i915_selftest.o \
 	selftests/igt_flush_test.o \
+	selftests/igt_live_test.o \
 	selftests/igt_reset.o \
 	selftests/igt_spinner.o
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 4cba50679607..e2c1f0bc2abe 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -27,6 +27,7 @@
 #include "../i915_selftest.h"
 #include "i915_random.h"
 #include "igt_flush_test.h"
+#include "igt_live_test.h"
 
 #include "mock_drm.h"
 #include "mock_gem_device.h"
@@ -34,84 +35,6 @@
 
 #define DW_PER_PAGE (PAGE_SIZE / sizeof(u32))
 
-struct live_test {
-	struct drm_i915_private *i915;
-	const char *func;
-	const char *name;
-
-	unsigned int reset_global;
-	unsigned int reset_engine[I915_NUM_ENGINES];
-};
-
-static int begin_live_test(struct live_test *t,
-			   struct drm_i915_private *i915,
-			   const char *func,
-			   const char *name)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int err;
-
-	t->i915 = i915;
-	t->func = func;
-	t->name = name;
-
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err) {
-		pr_err("%s(%s): failed to idle before, with err=%d!",
-		       func, name, err);
-		return err;
-	}
-
-	i915->gpu_error.missed_irq_rings = 0;
-	t->reset_global = i915_reset_count(&i915->gpu_error);
-
-	for_each_engine(engine, i915, id)
-		t->reset_engine[id] =
-			i915_reset_engine_count(&i915->gpu_error, engine);
-
-	return 0;
-}
-
-static int end_live_test(struct live_test *t)
-{
-	struct drm_i915_private *i915 = t->i915;
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	if (igt_flush_test(i915, I915_WAIT_LOCKED))
-		return -EIO;
-
-	if (t->reset_global != i915_reset_count(&i915->gpu_error)) {
-		pr_err("%s(%s): GPU was reset %d times!\n",
-		       t->func, t->name,
-		       i915_reset_count(&i915->gpu_error) - t->reset_global);
-		return -EIO;
-	}
-
-	for_each_engine(engine, i915, id) {
-		if (t->reset_engine[id] ==
-		    i915_reset_engine_count(&i915->gpu_error, engine))
-			continue;
-
-		pr_err("%s(%s): engine '%s' was reset %d times!\n",
-		       t->func, t->name, engine->name,
-		       i915_reset_engine_count(&i915->gpu_error, engine) -
-		       t->reset_engine[id]);
-		return -EIO;
-	}
-
-	if (i915->gpu_error.missed_irq_rings) {
-		pr_err("%s(%s): Missed interrupts on engines %lx\n",
-		       t->func, t->name, i915->gpu_error.missed_irq_rings);
-		return -EIO;
-	}
-
-	return 0;
-}
-
 static int live_nop_switch(void *arg)
 {
 	const unsigned int nctx = 1024;
@@ -120,8 +43,8 @@ static int live_nop_switch(void *arg)
 	struct i915_gem_context **ctx;
 	enum intel_engine_id id;
 	intel_wakeref_t wakeref;
+	struct igt_live_test t;
 	struct drm_file *file;
-	struct live_test t;
 	unsigned long n;
 	int err = -ENODEV;
 
@@ -185,7 +108,7 @@ static int live_nop_switch(void *arg)
 		pr_info("Populated %d contexts on %s in %lluns\n",
 			nctx, engine->name, ktime_to_ns(times[1] - times[0]));
 
-		err = begin_live_test(&t, i915, __func__, engine->name);
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
 		if (err)
 			goto out_unlock;
 
@@ -233,7 +156,7 @@ static int live_nop_switch(void *arg)
 				break;
 		}
 
-		err = end_live_test(&t);
+		err = igt_live_test_end(&t);
 		if (err)
 			goto out_unlock;
 
@@ -554,10 +477,10 @@ static int igt_ctx_exec(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct drm_i915_gem_object *obj = NULL;
 	unsigned long ncontexts, ndwords, dw;
+	struct igt_live_test t;
 	struct drm_file *file;
 	IGT_TIMEOUT(end_time);
 	LIST_HEAD(objects);
-	struct live_test t;
 	int err = -ENODEV;
 
 	/*
@@ -575,7 +498,7 @@ static int igt_ctx_exec(void *arg)
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	err = begin_live_test(&t, i915, __func__, "");
+	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
@@ -645,7 +568,7 @@ static int igt_ctx_exec(void *arg)
 	}
 
 out_unlock:
-	if (end_live_test(&t))
+	if (igt_live_test_end(&t))
 		err = -EIO;
 	mutex_unlock(&i915->drm.struct_mutex);
 
@@ -660,11 +583,11 @@ static int igt_ctx_readonly(void *arg)
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
 	unsigned long ndwords, dw;
+	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
 	IGT_TIMEOUT(end_time);
 	LIST_HEAD(objects);
-	struct live_test t;
 	int err = -ENODEV;
 
 	/*
@@ -679,7 +602,7 @@ static int igt_ctx_readonly(void *arg)
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	err = begin_live_test(&t, i915, __func__, "");
+	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
@@ -757,7 +680,7 @@ static int igt_ctx_readonly(void *arg)
 	}
 
 out_unlock:
-	if (end_live_test(&t))
+	if (igt_live_test_end(&t))
 		err = -EIO;
 	mutex_unlock(&i915->drm.struct_mutex);
 
@@ -982,10 +905,10 @@ static int igt_vm_isolation(void *arg)
 	struct i915_gem_context *ctx_a, *ctx_b;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
+	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
 	unsigned long count;
-	struct live_test t;
 	unsigned int id;
 	u64 vm_total;
 	int err;
@@ -1004,7 +927,7 @@ static int igt_vm_isolation(void *arg)
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	err = begin_live_test(&t, i915, __func__, "");
+	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
@@ -1075,7 +998,7 @@ static int igt_vm_isolation(void *arg)
 out_rpm:
 	intel_runtime_pm_put(i915, wakeref);
 out_unlock:
-	if (end_live_test(&t))
+	if (igt_live_test_end(&t))
 		err = -EIO;
 	mutex_unlock(&i915->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 2e14d6d3bad7..4d4b86b5fa11 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -25,6 +25,7 @@
 #include <linux/prime_numbers.h>
 
 #include "../i915_selftest.h"
+#include "igt_live_test.h"
 
 #include "mock_context.h"
 #include "mock_gem_device.h"
@@ -270,73 +271,12 @@ int i915_request_mock_selftests(void)
 	return err;
 }
 
-struct live_test {
-	struct drm_i915_private *i915;
-	const char *func;
-	const char *name;
-
-	unsigned int reset_count;
-};
-
-static int begin_live_test(struct live_test *t,
-			   struct drm_i915_private *i915,
-			   const char *func,
-			   const char *name)
-{
-	int err;
-
-	t->i915 = i915;
-	t->func = func;
-	t->name = name;
-
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err) {
-		pr_err("%s(%s): failed to idle before, with err=%d!",
-		       func, name, err);
-		return err;
-	}
-
-	i915->gpu_error.missed_irq_rings = 0;
-	t->reset_count = i915_reset_count(&i915->gpu_error);
-
-	return 0;
-}
-
-static int end_live_test(struct live_test *t)
-{
-	struct drm_i915_private *i915 = t->i915;
-
-	i915_retire_requests(i915);
-
-	if (wait_for(intel_engines_are_idle(i915), 10)) {
-		pr_err("%s(%s): GPU not idle\n", t->func, t->name);
-		return -EIO;
-	}
-
-	if (t->reset_count != i915_reset_count(&i915->gpu_error)) {
-		pr_err("%s(%s): GPU was reset %d times!\n",
-		       t->func, t->name,
-		       i915_reset_count(&i915->gpu_error) - t->reset_count);
-		return -EIO;
-	}
-
-	if (i915->gpu_error.missed_irq_rings) {
-		pr_err("%s(%s): Missed interrupts on engines %lx\n",
-		       t->func, t->name, i915->gpu_error.missed_irq_rings);
-		return -EIO;
-	}
-
-	return 0;
-}
-
 static int live_nop_request(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
-	struct live_test t;
+	struct igt_live_test t;
 	unsigned int id;
 	int err = -ENODEV;
 
@@ -354,7 +294,7 @@ static int live_nop_request(void *arg)
 		IGT_TIMEOUT(end_time);
 		ktime_t times[2] = {};
 
-		err = begin_live_test(&t, i915, __func__, engine->name);
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
 		if (err)
 			goto out_unlock;
 
@@ -396,7 +336,7 @@ static int live_nop_request(void *arg)
 				break;
 		}
 
-		err = end_live_test(&t);
+		err = igt_live_test_end(&t);
 		if (err)
 			goto out_unlock;
 
@@ -483,8 +423,8 @@ static int live_empty_request(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
+	struct igt_live_test t;
 	struct i915_vma *batch;
-	struct live_test t;
 	unsigned int id;
 	int err = 0;
 
@@ -508,7 +448,7 @@ static int live_empty_request(void *arg)
 		unsigned long n, prime;
 		ktime_t times[2] = {};
 
-		err = begin_live_test(&t, i915, __func__, engine->name);
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
 		if (err)
 			goto out_batch;
 
@@ -544,7 +484,7 @@ static int live_empty_request(void *arg)
 				break;
 		}
 
-		err = end_live_test(&t);
+		err = igt_live_test_end(&t);
 		if (err)
 			goto out_batch;
 
@@ -643,8 +583,8 @@ static int live_all_engines(void *arg)
 	struct intel_engine_cs *engine;
 	struct i915_request *request[I915_NUM_ENGINES];
 	intel_wakeref_t wakeref;
+	struct igt_live_test t;
 	struct i915_vma *batch;
-	struct live_test t;
 	unsigned int id;
 	int err;
 
@@ -656,7 +596,7 @@ static int live_all_engines(void *arg)
 	mutex_lock(&i915->drm.struct_mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
-	err = begin_live_test(&t, i915, __func__, "");
+	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
@@ -728,7 +668,7 @@ static int live_all_engines(void *arg)
 		request[id] = NULL;
 	}
 
-	err = end_live_test(&t);
+	err = igt_live_test_end(&t);
 
 out_request:
 	for_each_engine(engine, i915, id)
@@ -749,7 +689,7 @@ static int live_sequential_engines(void *arg)
 	struct i915_request *prev = NULL;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
-	struct live_test t;
+	struct igt_live_test t;
 	unsigned int id;
 	int err;
 
@@ -762,7 +702,7 @@ static int live_sequential_engines(void *arg)
 	mutex_lock(&i915->drm.struct_mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
-	err = begin_live_test(&t, i915, __func__, "");
+	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
@@ -845,7 +785,7 @@ static int live_sequential_engines(void *arg)
 		GEM_BUG_ON(!i915_request_completed(request[id]));
 	}
 
-	err = end_live_test(&t);
+	err = igt_live_test_end(&t);
 
 out_request:
 	for_each_engine(engine, i915, id) {
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
new file mode 100644
index 000000000000..5deb485fb942
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -0,0 +1,85 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include "../i915_drv.h"
+
+#include "../i915_selftest.h"
+#include "igt_flush_test.h"
+#include "igt_live_test.h"
+
+int igt_live_test_begin(struct igt_live_test *t,
+			struct drm_i915_private *i915,
+			const char *func,
+			const char *name)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	t->i915 = i915;
+	t->func = func;
+	t->name = name;
+
+	err = i915_gem_wait_for_idle(i915,
+				     I915_WAIT_INTERRUPTIBLE |
+				     I915_WAIT_LOCKED,
+				     MAX_SCHEDULE_TIMEOUT);
+	if (err) {
+		pr_err("%s(%s): failed to idle before, with err=%d!",
+		       func, name, err);
+		return err;
+	}
+
+	i915->gpu_error.missed_irq_rings = 0;
+	t->reset_global = i915_reset_count(&i915->gpu_error);
+
+	for_each_engine(engine, i915, id)
+		t->reset_engine[id] =
+			i915_reset_engine_count(&i915->gpu_error, engine);
+
+	return 0;
+}
+
+int igt_live_test_end(struct igt_live_test *t)
+{
+	struct drm_i915_private *i915 = t->i915;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		return -EIO;
+
+	if (t->reset_global != i915_reset_count(&i915->gpu_error)) {
+		pr_err("%s(%s): GPU was reset %d times!\n",
+		       t->func, t->name,
+		       i915_reset_count(&i915->gpu_error) - t->reset_global);
+		return -EIO;
+	}
+
+	for_each_engine(engine, i915, id) {
+		if (t->reset_engine[id] ==
+		    i915_reset_engine_count(&i915->gpu_error, engine))
+			continue;
+
+		pr_err("%s(%s): engine '%s' was reset %d times!\n",
+		       t->func, t->name, engine->name,
+		       i915_reset_engine_count(&i915->gpu_error, engine) -
+		       t->reset_engine[id]);
+		return -EIO;
+	}
+
+	if (i915->gpu_error.missed_irq_rings) {
+		pr_err("%s(%s): Missed interrupts on engines %lx\n",
+		       t->func, t->name, i915->gpu_error.missed_irq_rings);
+		return -EIO;
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.h b/drivers/gpu/drm/i915/selftests/igt_live_test.h
new file mode 100644
index 000000000000..c0e9f99d50de
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.h
@@ -0,0 +1,35 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef IGT_LIVE_TEST_H
+#define IGT_LIVE_TEST_H
+
+#include "../i915_gem.h"
+
+struct drm_i915_private;
+
+struct igt_live_test {
+	struct drm_i915_private *i915;
+	const char *func;
+	const char *name;
+
+	unsigned int reset_global;
+	unsigned int reset_engine[I915_NUM_ENGINES];
+};
+
+/*
+ * Flush the GPU state before and after the test to ensure that no residual
+ * code is running on the GPU that may affect this test. Also compare the
+ * state before and after the test and alert if it unexpectedly changes,
+ * e.g. if the GPU was reset.
+ */
+int igt_live_test_begin(struct igt_live_test *t,
+			struct drm_i915_private *i915,
+			const char *func,
+			const char *name);
+int igt_live_test_end(struct igt_live_test *t);
+
+#endif /* IGT_LIVE_TEST_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (3 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 11:53   ` Matthew Auld
  2019-01-21 22:20 ` [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting Chris Wilson
                   ` (35 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

During review of commit 71fc448c1aaf ("drm/i915/selftests: Make evict
tolerant of foreign objects"), Matthew mentioned it would be better if
we explicitly tracked the objects we created. We have an obj->st_link
hook for this purpose, so add the corresponding list of objects and
reduce our loops to only consider our own list.

References: 71fc448c1aaf ("drm/i915/selftests: Make evict tolerant of foreign objects")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/selftests/i915_gem_evict.c   | 114 +++++++++---------
 1 file changed, 55 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 543d618c152b..d0553bc69705 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -29,25 +29,21 @@
 #include "mock_drm.h"
 #include "mock_gem_device.h"
 
-static int populate_ggtt(struct drm_i915_private *i915)
+static void quirk_add(struct drm_i915_gem_object *obj,
+		      struct list_head *objects)
+{
+	/* quirk is only for live tiled objects, use it to declare ownership */
+	GEM_BUG_ON(obj->mm.quirked);
+	obj->mm.quirked = true;
+	list_add(&obj->st_link, objects);
+}
+
+static int populate_ggtt(struct drm_i915_private *i915,
+			 struct list_head *objects)
 {
-	struct drm_i915_gem_object *obj, *on;
-	unsigned long expected_unbound, expected_bound;
 	unsigned long unbound, bound, count;
+	struct drm_i915_gem_object *obj;
 	u64 size;
-	int err;
-
-	expected_unbound = 0;
-	list_for_each_entry(obj, &i915->mm.unbound_list, mm.link) {
-		i915_gem_object_get(obj);
-		expected_unbound++;
-	}
-
-	expected_bound = 0;
-	list_for_each_entry(obj, &i915->mm.bound_list, mm.link) {
-		i915_gem_object_get(obj);
-		expected_bound++;
-	}
 
 	count = 0;
 	for (size = 0;
@@ -56,38 +52,36 @@ static int populate_ggtt(struct drm_i915_private *i915)
 		struct i915_vma *vma;
 
 		obj = i915_gem_object_create_internal(i915, I915_GTT_PAGE_SIZE);
-		if (IS_ERR(obj)) {
-			err = PTR_ERR(obj);
-			goto cleanup;
-		}
+		if (IS_ERR(obj))
+			return PTR_ERR(obj);
+
+		quirk_add(obj, objects);
 
 		vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
-		if (IS_ERR(vma)) {
-			err = PTR_ERR(vma);
-			goto cleanup;
-		}
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
 
 		count++;
 	}
 
 	unbound = 0;
 	list_for_each_entry(obj, &i915->mm.unbound_list, mm.link)
-		unbound++;
-	if (unbound != expected_unbound) {
-		pr_err("%s: Found %lu objects unbound, expected %lu!\n",
-		       __func__, unbound, expected_unbound);
-		err = -EINVAL;
-		goto cleanup;
+		if (obj->mm.quirked)
+			unbound++;
+	if (unbound) {
+		pr_err("%s: Found %lu objects unbound, expected %u!\n",
+		       __func__, unbound, 0);
+		return -EINVAL;
 	}
 
 	bound = 0;
 	list_for_each_entry(obj, &i915->mm.bound_list, mm.link)
-		bound++;
-	if (bound != expected_bound + count) {
+		if (obj->mm.quirked)
+			bound++;
+	if (bound != count) {
 		pr_err("%s: Found %lu objects bound, expected %lu!\n",
-		       __func__, bound, expected_bound + count);
-		err = -EINVAL;
-		goto cleanup;
+		       __func__, bound, count);
+		return -EINVAL;
 	}
 
 	if (list_empty(&i915->ggtt.vm.inactive_list)) {
@@ -96,15 +90,6 @@ static int populate_ggtt(struct drm_i915_private *i915)
 	}
 
 	return 0;
-
-cleanup:
-	list_for_each_entry_safe(obj, on, &i915->mm.unbound_list, mm.link)
-		i915_gem_object_put(obj);
-
-	list_for_each_entry_safe(obj, on, &i915->mm.bound_list, mm.link)
-		i915_gem_object_put(obj);
-
-	return err;
 }
 
 static void unpin_ggtt(struct drm_i915_private *i915)
@@ -112,18 +97,20 @@ static void unpin_ggtt(struct drm_i915_private *i915)
 	struct i915_vma *vma;
 
 	list_for_each_entry(vma, &i915->ggtt.vm.inactive_list, vm_link)
-		i915_vma_unpin(vma);
+		if (vma->obj->mm.quirked)
+			i915_vma_unpin(vma);
 }
 
-static void cleanup_objects(struct drm_i915_private *i915)
+static void cleanup_objects(struct drm_i915_private *i915,
+			    struct list_head *list)
 {
 	struct drm_i915_gem_object *obj, *on;
 
-	list_for_each_entry_safe(obj, on, &i915->mm.unbound_list, mm.link)
-		i915_gem_object_put(obj);
-
-	list_for_each_entry_safe(obj, on, &i915->mm.bound_list, mm.link)
+	list_for_each_entry_safe(obj, on, list, st_link) {
+		GEM_BUG_ON(!obj->mm.quirked);
+		obj->mm.quirked = false;
 		i915_gem_object_put(obj);
+	}
 
 	mutex_unlock(&i915->drm.struct_mutex);
 
@@ -136,11 +123,12 @@ static int igt_evict_something(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct i915_ggtt *ggtt = &i915->ggtt;
+	LIST_HEAD(objects);
 	int err;
 
 	/* Fill the GGTT with pinned objects and try to evict one. */
 
-	err = populate_ggtt(i915);
+	err = populate_ggtt(i915, &objects);
 	if (err)
 		goto cleanup;
 
@@ -169,7 +157,7 @@ static int igt_evict_something(void *arg)
 	}
 
 cleanup:
-	cleanup_objects(i915);
+	cleanup_objects(i915, &objects);
 	return err;
 }
 
@@ -178,13 +166,14 @@ static int igt_overcommit(void *arg)
 	struct drm_i915_private *i915 = arg;
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	LIST_HEAD(objects);
 	int err;
 
 	/* Fill the GGTT with pinned objects and then try to pin one more.
 	 * We expect it to fail.
 	 */
 
-	err = populate_ggtt(i915);
+	err = populate_ggtt(i915, &objects);
 	if (err)
 		goto cleanup;
 
@@ -194,6 +183,8 @@ static int igt_overcommit(void *arg)
 		goto cleanup;
 	}
 
+	quirk_add(obj, &objects);
+
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, 0);
 	if (!IS_ERR(vma) || PTR_ERR(vma) != -ENOSPC) {
 		pr_err("Failed to evict+insert, i915_gem_object_ggtt_pin returned err=%d\n", (int)PTR_ERR(vma));
@@ -202,7 +193,7 @@ static int igt_overcommit(void *arg)
 	}
 
 cleanup:
-	cleanup_objects(i915);
+	cleanup_objects(i915, &objects);
 	return err;
 }
 
@@ -214,11 +205,12 @@ static int igt_evict_for_vma(void *arg)
 		.start = 0,
 		.size = 4096,
 	};
+	LIST_HEAD(objects);
 	int err;
 
 	/* Fill the GGTT with pinned objects and try to evict a range. */
 
-	err = populate_ggtt(i915);
+	err = populate_ggtt(i915, &objects);
 	if (err)
 		goto cleanup;
 
@@ -241,7 +233,7 @@ static int igt_evict_for_vma(void *arg)
 	}
 
 cleanup:
-	cleanup_objects(i915);
+	cleanup_objects(i915, &objects);
 	return err;
 }
 
@@ -264,6 +256,7 @@ static int igt_evict_for_cache_color(void *arg)
 	};
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
+	LIST_HEAD(objects);
 	int err;
 
 	/* Currently the use of color_adjust is limited to cache domains within
@@ -279,6 +272,7 @@ static int igt_evict_for_cache_color(void *arg)
 		goto cleanup;
 	}
 	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	quirk_add(obj, &objects);
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       I915_GTT_PAGE_SIZE | flags);
@@ -294,6 +288,7 @@ static int igt_evict_for_cache_color(void *arg)
 		goto cleanup;
 	}
 	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	quirk_add(obj, &objects);
 
 	/* Neighbouring; same colour - should fit */
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -329,7 +324,7 @@ static int igt_evict_for_cache_color(void *arg)
 
 cleanup:
 	unpin_ggtt(i915);
-	cleanup_objects(i915);
+	cleanup_objects(i915, &objects);
 	ggtt->vm.mm.color_adjust = NULL;
 	return err;
 }
@@ -338,11 +333,12 @@ static int igt_evict_vm(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
 	struct i915_ggtt *ggtt = &i915->ggtt;
+	LIST_HEAD(objects);
 	int err;
 
 	/* Fill the GGTT with pinned objects and try to evict everything. */
 
-	err = populate_ggtt(i915);
+	err = populate_ggtt(i915, &objects);
 	if (err)
 		goto cleanup;
 
@@ -364,7 +360,7 @@ static int igt_evict_vm(void *arg)
 	}
 
 cleanup:
-	cleanup_objects(i915);
+	cleanup_objects(i915, &objects);
 	return err;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (4 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 12:07   ` Matthew Auld
  2019-01-21 22:20 ` [PATCH 07/34] drm/i915: Refactor out intel_context_init() Chris Wilson
                   ` (34 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Some tests (e.g. igt_vma_pin1) presume that we have a completely clean
GGTT so that it can probe boundaries without fear that something is
already allocated there. However, the mock device is starting to get
complicated and following similar rules to the live device, i.e. we
can't guarantee that i915->ggtt remains clean, so create a temporary
address_space equivalent to the mock ggtt for the purpose.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 108 +++++++++++-------
 drivers/gpu/drm/i915/selftests/i915_vma.c     |  77 +++++++------
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   4 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.c     |   9 +-
 drivers/gpu/drm/i915/selftests/mock_gtt.h     |   4 +-
 5 files changed, 114 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index fea8ab14e79d..06bde4a273cb 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1267,27 +1267,35 @@ static int exercise_mock(struct drm_i915_private *i915,
 
 static int igt_mock_fill(void *arg)
 {
-	return exercise_mock(arg, fill_hole);
+	struct i915_ggtt *ggtt = arg;
+
+	return exercise_mock(ggtt->vm.i915, fill_hole);
 }
 
 static int igt_mock_walk(void *arg)
 {
-	return exercise_mock(arg, walk_hole);
+	struct i915_ggtt *ggtt = arg;
+
+	return exercise_mock(ggtt->vm.i915, walk_hole);
 }
 
 static int igt_mock_pot(void *arg)
 {
-	return exercise_mock(arg, pot_hole);
+	struct i915_ggtt *ggtt = arg;
+
+	return exercise_mock(ggtt->vm.i915, pot_hole);
 }
 
 static int igt_mock_drunk(void *arg)
 {
-	return exercise_mock(arg, drunk_hole);
+	struct i915_ggtt *ggtt = arg;
+
+	return exercise_mock(ggtt->vm.i915, drunk_hole);
 }
 
 static int igt_gtt_reserve(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
+	struct i915_ggtt *ggtt = arg;
 	struct drm_i915_gem_object *obj, *on;
 	LIST_HEAD(objects);
 	u64 total;
@@ -1300,11 +1308,12 @@ static int igt_gtt_reserve(void *arg)
 
 	/* Start by filling the GGTT */
 	for (total = 0;
-	     total + 2*I915_GTT_PAGE_SIZE <= i915->ggtt.vm.total;
-	     total += 2*I915_GTT_PAGE_SIZE) {
+	     total + 2 * I915_GTT_PAGE_SIZE <= ggtt->vm.total;
+	     total += 2 * I915_GTT_PAGE_SIZE) {
 		struct i915_vma *vma;
 
-		obj = i915_gem_object_create_internal(i915, 2*PAGE_SIZE);
+		obj = i915_gem_object_create_internal(ggtt->vm.i915,
+						      2 * PAGE_SIZE);
 		if (IS_ERR(obj)) {
 			err = PTR_ERR(obj);
 			goto out;
@@ -1318,20 +1327,20 @@ static int igt_gtt_reserve(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
 		}
 
-		err = i915_gem_gtt_reserve(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node,
 					   obj->base.size,
 					   total,
 					   obj->cache_level,
 					   0);
 		if (err) {
 			pr_err("i915_gem_gtt_reserve (pass 1) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1349,11 +1358,12 @@ static int igt_gtt_reserve(void *arg)
 
 	/* Now we start forcing evictions */
 	for (total = I915_GTT_PAGE_SIZE;
-	     total + 2*I915_GTT_PAGE_SIZE <= i915->ggtt.vm.total;
-	     total += 2*I915_GTT_PAGE_SIZE) {
+	     total + 2 * I915_GTT_PAGE_SIZE <= ggtt->vm.total;
+	     total += 2 * I915_GTT_PAGE_SIZE) {
 		struct i915_vma *vma;
 
-		obj = i915_gem_object_create_internal(i915, 2*PAGE_SIZE);
+		obj = i915_gem_object_create_internal(ggtt->vm.i915,
+						      2 * PAGE_SIZE);
 		if (IS_ERR(obj)) {
 			err = PTR_ERR(obj);
 			goto out;
@@ -1367,20 +1377,20 @@ static int igt_gtt_reserve(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
 		}
 
-		err = i915_gem_gtt_reserve(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node,
 					   obj->base.size,
 					   total,
 					   obj->cache_level,
 					   0);
 		if (err) {
 			pr_err("i915_gem_gtt_reserve (pass 2) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1401,7 +1411,7 @@ static int igt_gtt_reserve(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1413,18 +1423,18 @@ static int igt_gtt_reserve(void *arg)
 			goto out;
 		}
 
-		offset = random_offset(0, i915->ggtt.vm.total,
+		offset = random_offset(0, ggtt->vm.total,
 				       2*I915_GTT_PAGE_SIZE,
 				       I915_GTT_MIN_ALIGNMENT);
 
-		err = i915_gem_gtt_reserve(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_reserve(&ggtt->vm, &vma->node,
 					   obj->base.size,
 					   offset,
 					   obj->cache_level,
 					   0);
 		if (err) {
 			pr_err("i915_gem_gtt_reserve (pass 3) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1450,7 +1460,7 @@ static int igt_gtt_reserve(void *arg)
 
 static int igt_gtt_insert(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
+	struct i915_ggtt *ggtt = arg;
 	struct drm_i915_gem_object *obj, *on;
 	struct drm_mm_node tmp = {};
 	const struct invalid_insert {
@@ -1459,8 +1469,8 @@ static int igt_gtt_insert(void *arg)
 		u64 start, end;
 	} invalid_insert[] = {
 		{
-			i915->ggtt.vm.total + I915_GTT_PAGE_SIZE, 0,
-			0, i915->ggtt.vm.total,
+			ggtt->vm.total + I915_GTT_PAGE_SIZE, 0,
+			0, ggtt->vm.total,
 		},
 		{
 			2*I915_GTT_PAGE_SIZE, 0,
@@ -1490,7 +1500,7 @@ static int igt_gtt_insert(void *arg)
 
 	/* Check a couple of obviously invalid requests */
 	for (ii = invalid_insert; ii->size; ii++) {
-		err = i915_gem_gtt_insert(&i915->ggtt.vm, &tmp,
+		err = i915_gem_gtt_insert(&ggtt->vm, &tmp,
 					  ii->size, ii->alignment,
 					  I915_COLOR_UNEVICTABLE,
 					  ii->start, ii->end,
@@ -1505,11 +1515,12 @@ static int igt_gtt_insert(void *arg)
 
 	/* Start by filling the GGTT */
 	for (total = 0;
-	     total + I915_GTT_PAGE_SIZE <= i915->ggtt.vm.total;
+	     total + I915_GTT_PAGE_SIZE <= ggtt->vm.total;
 	     total += I915_GTT_PAGE_SIZE) {
 		struct i915_vma *vma;
 
-		obj = i915_gem_object_create_internal(i915, I915_GTT_PAGE_SIZE);
+		obj = i915_gem_object_create_internal(ggtt->vm.i915,
+						      I915_GTT_PAGE_SIZE);
 		if (IS_ERR(obj)) {
 			err = PTR_ERR(obj);
 			goto out;
@@ -1523,15 +1534,15 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
 		}
 
-		err = i915_gem_gtt_insert(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_insert(&ggtt->vm, &vma->node,
 					  obj->base.size, 0, obj->cache_level,
-					  0, i915->ggtt.vm.total,
+					  0, ggtt->vm.total,
 					  0);
 		if (err == -ENOSPC) {
 			/* maxed out the GGTT space */
@@ -1540,7 +1551,7 @@ static int igt_gtt_insert(void *arg)
 		}
 		if (err) {
 			pr_err("i915_gem_gtt_insert (pass 1) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1552,7 +1563,7 @@ static int igt_gtt_insert(void *arg)
 	list_for_each_entry(obj, &objects, st_link) {
 		struct i915_vma *vma;
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1572,7 +1583,7 @@ static int igt_gtt_insert(void *arg)
 		struct i915_vma *vma;
 		u64 offset;
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
@@ -1587,13 +1598,13 @@ static int igt_gtt_insert(void *arg)
 			goto out;
 		}
 
-		err = i915_gem_gtt_insert(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_insert(&ggtt->vm, &vma->node,
 					  obj->base.size, 0, obj->cache_level,
-					  0, i915->ggtt.vm.total,
+					  0, ggtt->vm.total,
 					  0);
 		if (err) {
 			pr_err("i915_gem_gtt_insert (pass 2) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1609,11 +1620,12 @@ static int igt_gtt_insert(void *arg)
 
 	/* And then force evictions */
 	for (total = 0;
-	     total + 2*I915_GTT_PAGE_SIZE <= i915->ggtt.vm.total;
-	     total += 2*I915_GTT_PAGE_SIZE) {
+	     total + 2 * I915_GTT_PAGE_SIZE <= ggtt->vm.total;
+	     total += 2 * I915_GTT_PAGE_SIZE) {
 		struct i915_vma *vma;
 
-		obj = i915_gem_object_create_internal(i915, 2*I915_GTT_PAGE_SIZE);
+		obj = i915_gem_object_create_internal(ggtt->vm.i915,
+						      2 * I915_GTT_PAGE_SIZE);
 		if (IS_ERR(obj)) {
 			err = PTR_ERR(obj);
 			goto out;
@@ -1627,19 +1639,19 @@ static int igt_gtt_insert(void *arg)
 
 		list_add(&obj->st_link, &objects);
 
-		vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+		vma = i915_vma_instance(obj, &ggtt->vm, NULL);
 		if (IS_ERR(vma)) {
 			err = PTR_ERR(vma);
 			goto out;
 		}
 
-		err = i915_gem_gtt_insert(&i915->ggtt.vm, &vma->node,
+		err = i915_gem_gtt_insert(&ggtt->vm, &vma->node,
 					  obj->base.size, 0, obj->cache_level,
-					  0, i915->ggtt.vm.total,
+					  0, ggtt->vm.total,
 					  0);
 		if (err) {
 			pr_err("i915_gem_gtt_insert (pass 3) failed at %llu/%llu with err=%d\n",
-			       total, i915->ggtt.vm.total, err);
+			       total, ggtt->vm.total, err);
 			goto out;
 		}
 		track_vma_bind(vma);
@@ -1666,17 +1678,25 @@ int i915_gem_gtt_mock_selftests(void)
 		SUBTEST(igt_gtt_insert),
 	};
 	struct drm_i915_private *i915;
+	struct i915_ggtt ggtt;
 	int err;
 
 	i915 = mock_gem_device();
 	if (!i915)
 		return -ENOMEM;
 
+	mock_init_ggtt(i915, &ggtt);
+
 	mutex_lock(&i915->drm.struct_mutex);
-	err = i915_subtests(tests, i915);
+	err = i915_subtests(tests, &ggtt);
+	mock_device_flush(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
+	i915_gem_drain_freed_objects(i915);
+
+	mock_fini_ggtt(&ggtt);
 	drm_dev_put(&i915->drm);
+
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index ffa74290e054..f0a32edfb9b1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -28,6 +28,7 @@
 
 #include "mock_gem_device.h"
 #include "mock_context.h"
+#include "mock_gtt.h"
 
 static bool assert_vma(struct i915_vma *vma,
 		       struct drm_i915_gem_object *obj,
@@ -141,7 +142,8 @@ static int create_vmas(struct drm_i915_private *i915,
 
 static int igt_vma_create(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
+	struct i915_ggtt *ggtt = arg;
+	struct drm_i915_private *i915 = ggtt->vm.i915;
 	struct drm_i915_gem_object *obj, *on;
 	struct i915_gem_context *ctx, *cn;
 	unsigned long num_obj, num_ctx;
@@ -245,7 +247,7 @@ static bool assert_pin_einval(const struct i915_vma *vma,
 
 static int igt_vma_pin1(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
+	struct i915_ggtt *ggtt = arg;
 	const struct pin_mode modes[] = {
 #define VALID(sz, fl) { .size = (sz), .flags = (fl), .assert = assert_pin_valid, .string = #sz ", " #fl ", (valid) " }
 #define __INVALID(sz, fl, check, eval) { .size = (sz), .flags = (fl), .assert = (check), .string = #sz ", " #fl ", (invalid " #eval ")" }
@@ -256,30 +258,30 @@ static int igt_vma_pin1(void *arg)
 
 		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | 4096),
 		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | 8192),
-		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | (i915->ggtt.mappable_end - 4096)),
-		VALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | (i915->ggtt.mappable_end - 4096)),
-		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | (i915->ggtt.vm.total - 4096)),
-
-		VALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (i915->ggtt.mappable_end - 4096)),
-		INVALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | i915->ggtt.mappable_end),
-		VALID(0, PIN_GLOBAL | PIN_OFFSET_FIXED | (i915->ggtt.vm.total - 4096)),
-		INVALID(0, PIN_GLOBAL | PIN_OFFSET_FIXED | i915->ggtt.vm.total),
+		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | (ggtt->mappable_end - 4096)),
+		VALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | (ggtt->mappable_end - 4096)),
+		VALID(0, PIN_GLOBAL | PIN_OFFSET_BIAS | (ggtt->vm.total - 4096)),
+
+		VALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (ggtt->mappable_end - 4096)),
+		INVALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | ggtt->mappable_end),
+		VALID(0, PIN_GLOBAL | PIN_OFFSET_FIXED | (ggtt->vm.total - 4096)),
+		INVALID(0, PIN_GLOBAL | PIN_OFFSET_FIXED | ggtt->vm.total),
 		INVALID(0, PIN_GLOBAL | PIN_OFFSET_FIXED | round_down(U64_MAX, PAGE_SIZE)),
 
 		VALID(4096, PIN_GLOBAL),
 		VALID(8192, PIN_GLOBAL),
-		VALID(i915->ggtt.mappable_end - 4096, PIN_GLOBAL | PIN_MAPPABLE),
-		VALID(i915->ggtt.mappable_end, PIN_GLOBAL | PIN_MAPPABLE),
-		NOSPACE(i915->ggtt.mappable_end + 4096, PIN_GLOBAL | PIN_MAPPABLE),
-		VALID(i915->ggtt.vm.total - 4096, PIN_GLOBAL),
-		VALID(i915->ggtt.vm.total, PIN_GLOBAL),
-		NOSPACE(i915->ggtt.vm.total + 4096, PIN_GLOBAL),
+		VALID(ggtt->mappable_end - 4096, PIN_GLOBAL | PIN_MAPPABLE),
+		VALID(ggtt->mappable_end, PIN_GLOBAL | PIN_MAPPABLE),
+		NOSPACE(ggtt->mappable_end + 4096, PIN_GLOBAL | PIN_MAPPABLE),
+		VALID(ggtt->vm.total - 4096, PIN_GLOBAL),
+		VALID(ggtt->vm.total, PIN_GLOBAL),
+		NOSPACE(ggtt->vm.total + 4096, PIN_GLOBAL),
 		NOSPACE(round_down(U64_MAX, PAGE_SIZE), PIN_GLOBAL),
-		INVALID(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (i915->ggtt.mappable_end - 4096)),
-		INVALID(8192, PIN_GLOBAL | PIN_OFFSET_FIXED | (i915->ggtt.vm.total - 4096)),
+		INVALID(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (ggtt->mappable_end - 4096)),
+		INVALID(8192, PIN_GLOBAL | PIN_OFFSET_FIXED | (ggtt->vm.total - 4096)),
 		INVALID(8192, PIN_GLOBAL | PIN_OFFSET_FIXED | (round_down(U64_MAX, PAGE_SIZE) - 4096)),
 
-		VALID(8192, PIN_GLOBAL | PIN_OFFSET_BIAS | (i915->ggtt.mappable_end - 4096)),
+		VALID(8192, PIN_GLOBAL | PIN_OFFSET_BIAS | (ggtt->mappable_end - 4096)),
 
 #if !IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)
 		/* Misusing BIAS is a programming error (it is not controllable
@@ -287,10 +289,10 @@ static int igt_vma_pin1(void *arg)
 		 * However, the tests are still quite interesting for checking
 		 * variable start, end and size.
 		 */
-		NOSPACE(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | i915->ggtt.mappable_end),
-		NOSPACE(0, PIN_GLOBAL | PIN_OFFSET_BIAS | i915->ggtt.vm.total),
-		NOSPACE(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | (i915->ggtt.mappable_end - 4096)),
-		NOSPACE(8192, PIN_GLOBAL | PIN_OFFSET_BIAS | (i915->ggtt.vm.total - 4096)),
+		NOSPACE(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | ggtt->mappable_end),
+		NOSPACE(0, PIN_GLOBAL | PIN_OFFSET_BIAS | ggtt->vm.total),
+		NOSPACE(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | (ggtt->mappable_end - 4096)),
+		NOSPACE(8192, PIN_GLOBAL | PIN_OFFSET_BIAS | (ggtt->vm.total - 4096)),
 #endif
 		{ },
 #undef NOSPACE
@@ -306,13 +308,13 @@ static int igt_vma_pin1(void *arg)
 	 * focusing on error handling of boundary conditions.
 	 */
 
-	GEM_BUG_ON(!drm_mm_clean(&i915->ggtt.vm.mm));
+	GEM_BUG_ON(!drm_mm_clean(&ggtt->vm.mm));
 
-	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	obj = i915_gem_object_create_internal(ggtt->vm.i915, PAGE_SIZE);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
-	vma = checked_vma_instance(obj, &i915->ggtt.vm, NULL);
+	vma = checked_vma_instance(obj, &ggtt->vm, NULL);
 	if (IS_ERR(vma))
 		goto out;
 
@@ -403,8 +405,8 @@ static unsigned int rotated_size(const struct intel_rotation_plane_info *a,
 
 static int igt_vma_rotate(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
-	struct i915_address_space *vm = &i915->ggtt.vm;
+	struct i915_ggtt *ggtt = arg;
+	struct i915_address_space *vm = &ggtt->vm;
 	struct drm_i915_gem_object *obj;
 	const struct intel_rotation_plane_info planes[] = {
 		{ .width = 1, .height = 1, .stride = 1 },
@@ -431,7 +433,7 @@ static int igt_vma_rotate(void *arg)
 	 * that the page layout within the rotated VMA match our expectations.
 	 */
 
-	obj = i915_gem_object_create_internal(i915, max_pages * PAGE_SIZE);
+	obj = i915_gem_object_create_internal(vm->i915, max_pages * PAGE_SIZE);
 	if (IS_ERR(obj))
 		goto out;
 
@@ -602,8 +604,8 @@ static bool assert_pin(struct i915_vma *vma,
 
 static int igt_vma_partial(void *arg)
 {
-	struct drm_i915_private *i915 = arg;
-	struct i915_address_space *vm = &i915->ggtt.vm;
+	struct i915_ggtt *ggtt = arg;
+	struct i915_address_space *vm = &ggtt->vm;
 	const unsigned int npages = 1021; /* prime! */
 	struct drm_i915_gem_object *obj;
 	const struct phase {
@@ -621,7 +623,7 @@ static int igt_vma_partial(void *arg)
 	 * we are returned the same VMA when we later request the same range.
 	 */
 
-	obj = i915_gem_object_create_internal(i915, npages*PAGE_SIZE);
+	obj = i915_gem_object_create_internal(vm->i915, npages * PAGE_SIZE);
 	if (IS_ERR(obj))
 		goto out;
 
@@ -723,17 +725,24 @@ int i915_vma_mock_selftests(void)
 		SUBTEST(igt_vma_partial),
 	};
 	struct drm_i915_private *i915;
+	struct i915_ggtt ggtt;
 	int err;
 
 	i915 = mock_gem_device();
 	if (!i915)
 		return -ENOMEM;
 
+	mock_init_ggtt(i915, &ggtt);
+
 	mutex_lock(&i915->drm.struct_mutex);
-	err = i915_subtests(tests, i915);
+	err = i915_subtests(tests, &ggtt);
+	mock_device_flush(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
+	i915_gem_drain_freed_objects(i915);
+
+	mock_fini_ggtt(&ggtt);
 	drm_dev_put(&i915->drm);
+
 	return err;
 }
-
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 3cda66292e76..5477ad4a7e7d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -72,7 +72,7 @@ static void mock_device_release(struct drm_device *dev)
 	i915_gem_drain_freed_objects(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	mock_fini_ggtt(i915);
+	mock_fini_ggtt(&i915->ggtt);
 	mutex_unlock(&i915->drm.struct_mutex);
 	WARN_ON(!list_empty(&i915->gt.timelines));
 
@@ -232,7 +232,7 @@ struct drm_i915_private *mock_gem_device(void)
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	mock_init_ggtt(i915);
+	mock_init_ggtt(i915, &i915->ggtt);
 
 	mkwrite_device_info(i915)->ring_mask = BIT(0);
 	i915->kernel_context = mock_context(i915, NULL);
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c
index 976c862b3842..cd83929fde8e 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c
@@ -97,9 +97,9 @@ static void mock_unbind_ggtt(struct i915_vma *vma)
 {
 }
 
-void mock_init_ggtt(struct drm_i915_private *i915)
+void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt)
 {
-	struct i915_ggtt *ggtt = &i915->ggtt;
+	memset(ggtt, 0, sizeof(*ggtt));
 
 	ggtt->vm.i915 = i915;
 	ggtt->vm.is_ggtt = true;
@@ -118,13 +118,10 @@ void mock_init_ggtt(struct drm_i915_private *i915)
 	ggtt->vm.vma_ops.set_pages   = ggtt_set_pages;
 	ggtt->vm.vma_ops.clear_pages = clear_pages;
 
-
 	i915_address_space_init(&ggtt->vm, VM_CLASS_GGTT);
 }
 
-void mock_fini_ggtt(struct drm_i915_private *i915)
+void mock_fini_ggtt(struct i915_ggtt *ggtt)
 {
-	struct i915_ggtt *ggtt = &i915->ggtt;
-
 	i915_address_space_fini(&ggtt->vm);
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.h b/drivers/gpu/drm/i915/selftests/mock_gtt.h
index 9a0a833bb545..40d544bde1d5 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gtt.h
+++ b/drivers/gpu/drm/i915/selftests/mock_gtt.h
@@ -25,8 +25,8 @@
 #ifndef __MOCK_GTT_H
 #define __MOCK_GTT_H
 
-void mock_init_ggtt(struct drm_i915_private *i915);
-void mock_fini_ggtt(struct drm_i915_private *i915);
+void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt);
+void mock_fini_ggtt(struct i915_ggtt *ggtt);
 
 struct i915_hw_ppgtt *
 mock_ppgtt(struct drm_i915_private *i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 07/34] drm/i915: Refactor out intel_context_init()
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (5 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 12:32   ` Matthew Auld
  2019-01-22 12:39   ` Mika Kuoppala
  2019-01-21 22:20 ` [PATCH 08/34] drm/i915: Make all GPU resets atomic Chris Wilson
                   ` (33 subsequent siblings)
  40 siblings, 2 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Prior to adding a third instance of intel_context_init() and extending
the information stored therewithin, refactor out the common assignments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 7 ++-----
 drivers/gpu/drm/i915/i915_gem_context.h       | 8 ++++++++
 drivers/gpu/drm/i915/selftests/mock_context.c | 7 ++-----
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 5933adbe3d99..fae68c4c4683 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -338,11 +338,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
-		struct intel_context *ce = &ctx->__engine[n];
-
-		ce->gem_context = ctx;
-	}
+	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
+		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index f6d870b1f73e..47d82ce7ba6a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -364,4 +364,12 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
+static inline void
+intel_context_init(struct intel_context *ce,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine)
+{
+	ce->gem_context = ctx;
+}
+
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index d937bdff26f9..b646cdcdd602 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -45,11 +45,8 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
-		struct intel_context *ce = &ctx->__engine[n];
-
-		ce->gem_context = ctx;
-	}
+	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
+		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
 
 	ret = i915_gem_context_pin_hw_id(ctx);
 	if (ret < 0)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 08/34] drm/i915: Make all GPU resets atomic
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (6 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 07/34] drm/i915: Refactor out intel_context_init() Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 22:19   ` John Harrison
  2019-01-21 22:20 ` [PATCH 09/34] drm/i915/guc: Disable global reset Chris Wilson
                   ` (32 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

In preparation for the next few commits, make resetting the GPU atomic.
Currently, we have prepared gen6+ for atomic resetting of individual
engines, but now there is a requirement to perform the whole device
level reset (just the register poking) from inside an atomic context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reset.c | 50 +++++++++++++++++--------------
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 342d9ee42601..b9d0ea70361c 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -144,14 +144,14 @@ static int i915_do_reset(struct drm_i915_private *i915,
 
 	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
-	usleep_range(50, 200);
-	err = wait_for(i915_in_reset(pdev), 500);
+	udelay(50);
+	err = wait_for_atomic(i915_in_reset(pdev), 50);
 
 	/* Clear the reset request. */
 	pci_write_config_byte(pdev, I915_GDRST, 0);
-	usleep_range(50, 200);
+	udelay(50);
 	if (!err)
-		err = wait_for(!i915_in_reset(pdev), 500);
+		err = wait_for_atomic(!i915_in_reset(pdev), 50);
 
 	return err;
 }
@@ -171,7 +171,7 @@ static int g33_do_reset(struct drm_i915_private *i915,
 	struct pci_dev *pdev = i915->drm.pdev;
 
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
-	return wait_for(g4x_reset_complete(pdev), 500);
+	return wait_for_atomic(g4x_reset_complete(pdev), 50);
 }
 
 static int g4x_do_reset(struct drm_i915_private *dev_priv,
@@ -182,13 +182,13 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
 	int ret;
 
 	/* WaVcpClkGateDisableForMediaReset:ctg,elk */
-	I915_WRITE(VDECCLK_GATE_D,
-		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
-	POSTING_READ(VDECCLK_GATE_D);
+	I915_WRITE_FW(VDECCLK_GATE_D,
+		      I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
+	POSTING_READ_FW(VDECCLK_GATE_D);
 
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
-	ret =  wait_for(g4x_reset_complete(pdev), 500);
+	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
 		goto out;
@@ -196,7 +196,7 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
 
 	pci_write_config_byte(pdev, I915_GDRST,
 			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
-	ret =  wait_for(g4x_reset_complete(pdev), 500);
+	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
 		goto out;
@@ -205,9 +205,9 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
 out:
 	pci_write_config_byte(pdev, I915_GDRST, 0);
 
-	I915_WRITE(VDECCLK_GATE_D,
-		   I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
-	POSTING_READ(VDECCLK_GATE_D);
+	I915_WRITE_FW(VDECCLK_GATE_D,
+		      I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
+	POSTING_READ_FW(VDECCLK_GATE_D);
 
 	return ret;
 }
@@ -218,27 +218,29 @@ static int ironlake_do_reset(struct drm_i915_private *dev_priv,
 {
 	int ret;
 
-	I915_WRITE(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
-	ret = intel_wait_for_register(dev_priv,
-				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
-				      500);
+	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
+	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
+					   ILK_GRDOM_RESET_ENABLE, 0,
+					   5000, 0,
+					   NULL);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
 		goto out;
 	}
 
-	I915_WRITE(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
-	ret = intel_wait_for_register(dev_priv,
-				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
-				      500);
+	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
+	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
+					   ILK_GRDOM_RESET_ENABLE, 0,
+					   5000, 0,
+					   NULL);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
 		goto out;
 	}
 
 out:
-	I915_WRITE(ILK_GDSR, 0);
-	POSTING_READ(ILK_GDSR);
+	I915_WRITE_FW(ILK_GDSR, 0);
+	POSTING_READ_FW(ILK_GDSR);
 	return ret;
 }
 
@@ -572,7 +574,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
 		ret = -ENODEV;
 		if (reset) {
 			GEM_TRACE("engine_mask=%x\n", engine_mask);
+			preempt_disable();
 			ret = reset(i915, engine_mask, retry);
+			preempt_enable();
 		}
 		if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
 			break;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 09/34] drm/i915/guc: Disable global reset
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (7 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 08/34] drm/i915: Make all GPU resets atomic Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22 22:23   ` John Harrison
  2019-01-21 22:20 ` [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex Chris Wilson
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

The guc (and huc) currently inexcruitably depend on struct_mutex for
device reinitialisation from inside the reset, and indeed taking any
mutex here is verboten (as we must be able to reset from underneath any
of our mutexes). That makes recovering the guc unviable without, for
example, reserving contiguous vma space and pages for it to use.

The plan to re-enable global reset for the GuC centres around reusing the
WOPM reserved space at the top of the aperture (that we know we can
populate a contiguous range large enough to dma xfer the fw image).

In the meantime, hopefully no one even notices as the device-reset is
only used as a backup to the per-engine resets for handling GPU hangs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/i915_reset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index b9d0ea70361c..2961c21d9420 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -590,6 +590,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
 
 bool intel_has_gpu_reset(struct drm_i915_private *i915)
 {
+	if (USES_GUC(i915))
+		return false;
+
 	return intel_get_gpu_reset(i915);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (8 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 09/34] drm/i915/guc: Disable global reset Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-24 12:06   ` Mika Kuoppala
  2019-01-21 22:20 ` [PATCH 11/34] drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest Chris Wilson
                   ` (30 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Now that the submission backends are controlled via their own spinlocks,
with a wave of a magic wand we can lift the struct_mutex requirement
around GPU reset. That is we allow the submission frontend (userspace)
to keep on submitting while we process the GPU reset as we can suspend
the backend independently.

The major change is around the backoff/handoff strategy for performing
the reset. With no mutex deadlock, we no longer have to coordinate with
any waiter, and just perform the reset immediately.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  38 +-
 drivers/gpu/drm/i915/i915_drv.h               |   5 -
 drivers/gpu/drm/i915/i915_gem.c               |  18 +-
 drivers/gpu/drm/i915/i915_gem_fence_reg.h     |   1 -
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         | 104 +++--
 drivers/gpu/drm/i915/i915_gpu_error.h         |  28 +-
 drivers/gpu/drm/i915/i915_request.c           |  47 ---
 drivers/gpu/drm/i915/i915_reset.c             | 397 ++++++++----------
 drivers/gpu/drm/i915/i915_reset.h             |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |   6 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   5 +-
 drivers/gpu/drm/i915/intel_hangcheck.c        |  28 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  92 ++--
 drivers/gpu/drm/i915/intel_overlay.c          |   2 -
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  91 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  57 +--
 .../drm/i915/selftests/intel_workarounds.c    |   3 -
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   4 +-
 20 files changed, 393 insertions(+), 554 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 24d6d4ce14ef..3ec369980d40 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1284,8 +1284,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		seq_puts(m, "Wedged\n");
 	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
 		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
-	if (test_bit(I915_RESET_HANDOFF, &dev_priv->gpu_error.flags))
-		seq_puts(m, "Reset in progress: reset handoff to waiter\n");
 	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
 		seq_puts(m, "Waiter holding struct mutex\n");
 	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
@@ -1321,15 +1319,15 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		struct rb_node *rb;
 
 		seq_printf(m, "%s:\n", engine->name);
-		seq_printf(m, "\tseqno = %x [current %x, last %x]\n",
+		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
 			   engine->hangcheck.seqno, seqno[id],
-			   intel_engine_last_submit(engine));
-		seq_printf(m, "\twaiters? %s, fake irq active? %s, stalled? %s, wedged? %s\n",
+			   intel_engine_last_submit(engine),
+			   jiffies_to_msecs(jiffies -
+					    engine->hangcheck.action_timestamp));
+		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
 			   yesno(intel_engine_has_waiter(engine)),
 			   yesno(test_bit(engine->id,
-					  &dev_priv->gpu_error.missed_irq_rings)),
-			   yesno(engine->hangcheck.stalled),
-			   yesno(engine->hangcheck.wedged));
+					  &dev_priv->gpu_error.missed_irq_rings)));
 
 		spin_lock_irq(&b->rb_lock);
 		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
@@ -1343,11 +1341,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
-		seq_printf(m, "\taction = %s(%d) %d ms ago\n",
-			   hangcheck_action_to_str(engine->hangcheck.action),
-			   engine->hangcheck.action,
-			   jiffies_to_msecs(jiffies -
-					    engine->hangcheck.action_timestamp));
 
 		if (engine->id == RCS) {
 			seq_puts(m, "\tinstdone read =\n");
@@ -3886,8 +3879,6 @@ static int
 i915_wedged_set(void *data, u64 val)
 {
 	struct drm_i915_private *i915 = data;
-	struct intel_engine_cs *engine;
-	unsigned int tmp;
 
 	/*
 	 * There is no safeguard against this debugfs entry colliding
@@ -3900,18 +3891,8 @@ i915_wedged_set(void *data, u64 val)
 	if (i915_reset_backoff(&i915->gpu_error))
 		return -EAGAIN;
 
-	for_each_engine_masked(engine, i915, val, tmp) {
-		engine->hangcheck.seqno = intel_engine_get_seqno(engine);
-		engine->hangcheck.stalled = true;
-	}
-
 	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
 			  "Manually set wedged engine mask = %llx", val);
-
-	wait_on_bit(&i915->gpu_error.flags,
-		    I915_RESET_HANDOFF,
-		    TASK_UNINTERRUPTIBLE);
-
 	return 0;
 }
 
@@ -4066,13 +4047,8 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE &&
-	    i915_terminally_wedged(&i915->gpu_error)) {
+	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
-		wait_on_bit(&i915->gpu_error.flags,
-			    I915_RESET_HANDOFF,
-			    TASK_UNINTERRUPTIBLE);
-	}
 
 	fs_reclaim_acquire(GFP_KERNEL);
 	if (val & DROP_BOUND)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 03db011caa8e..59a7e90113d7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3001,11 +3001,6 @@ static inline bool i915_reset_backoff(struct i915_gpu_error *error)
 	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
 }
 
-static inline bool i915_reset_handoff(struct i915_gpu_error *error)
-{
-	return unlikely(test_bit(I915_RESET_HANDOFF, &error->flags));
-}
-
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b359390ba22c..d20b42386c3c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -657,11 +657,6 @@ i915_gem_object_wait(struct drm_i915_gem_object *obj,
 		     struct intel_rps_client *rps_client)
 {
 	might_sleep();
-#if IS_ENABLED(CONFIG_LOCKDEP)
-	GEM_BUG_ON(debug_locks &&
-		   !!lockdep_is_held(&obj->base.dev->struct_mutex) !=
-		   !!(flags & I915_WAIT_LOCKED));
-#endif
 	GEM_BUG_ON(timeout < 0);
 
 	timeout = i915_gem_object_wait_reservation(obj->resv,
@@ -4493,8 +4488,6 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 
 	GEM_TRACE("\n");
 
-	mutex_lock(&i915->drm.struct_mutex);
-
 	wakeref = intel_runtime_pm_get(i915);
 	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
 
@@ -4520,6 +4513,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
 	intel_runtime_pm_put(i915, wakeref);
 
+	mutex_lock(&i915->drm.struct_mutex);
 	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 }
@@ -4534,6 +4528,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	wakeref = intel_runtime_pm_get(i915);
 	intel_suspend_gt_powersave(i915);
 
+	flush_workqueue(i915->wq);
+
 	mutex_lock(&i915->drm.struct_mutex);
 
 	/*
@@ -4563,11 +4559,9 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	i915_retire_requests(i915); /* ensure we flush after wedging */
 
 	mutex_unlock(&i915->drm.struct_mutex);
+	i915_reset_flush(i915);
 
-	intel_uc_suspend(i915);
-
-	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
-	cancel_delayed_work_sync(&i915->gt.retire_work);
+	drain_delayed_work(&i915->gt.retire_work);
 
 	/*
 	 * As the idle_work is rearming if it detects a race, play safe and
@@ -4575,6 +4569,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 */
 	drain_delayed_work(&i915->gt.idle_work);
 
+	intel_uc_suspend(i915);
+
 	/*
 	 * Assert that we successfully flushed all the work and
 	 * reset the GPU back to its idle, low power state.
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.h b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
index 99a31ded4dfd..09dcaf14121b 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.h
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
@@ -50,4 +50,3 @@ struct drm_i915_fence_reg {
 };
 
 #endif
-
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 9229b03d629b..a0039ea97cdc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -39,6 +39,7 @@
 #include <linux/pagevec.h>
 
 #include "i915_request.h"
+#include "i915_reset.h"
 #include "i915_selftest.h"
 #include "i915_timeline.h"
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f8e80e31b49..4eef0462489c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -533,10 +533,7 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
 	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
 	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
-	err_printf(m, "  hangcheck stall: %s\n", yesno(ee->hangcheck_stalled));
-	err_printf(m, "  hangcheck action: %s\n",
-		   hangcheck_action_to_str(ee->hangcheck_action));
-	err_printf(m, "  hangcheck action timestamp: %dms (%lu%s)\n",
+	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
 		   jiffies_to_msecs(ee->hangcheck_timestamp - epoch),
 		   ee->hangcheck_timestamp,
 		   ee->hangcheck_timestamp == epoch ? "; epoch" : "");
@@ -684,15 +681,15 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 		   jiffies_to_msecs(error->capture - error->epoch));
 
 	for (i = 0; i < ARRAY_SIZE(error->engine); i++) {
-		if (error->engine[i].hangcheck_stalled &&
-		    error->engine[i].context.pid) {
-			err_printf(m, "Active process (on ring %s): %s [%d], score %d%s\n",
-				   engine_name(m->i915, i),
-				   error->engine[i].context.comm,
-				   error->engine[i].context.pid,
-				   error->engine[i].context.ban_score,
-				   bannable(&error->engine[i].context));
-		}
+		if (!error->engine[i].context.pid)
+			continue;
+
+		err_printf(m, "Active process (on ring %s): %s [%d], score %d%s\n",
+			   engine_name(m->i915, i),
+			   error->engine[i].context.comm,
+			   error->engine[i].context.pid,
+			   error->engine[i].context.ban_score,
+			   bannable(&error->engine[i].context));
 	}
 	err_printf(m, "Reset count: %u\n", error->reset_count);
 	err_printf(m, "Suspend count: %u\n", error->suspend_count);
@@ -1144,7 +1141,8 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
 	return i;
 }
 
-/* Generate a semi-unique error code. The code is not meant to have meaning, The
+/*
+ * Generate a semi-unique error code. The code is not meant to have meaning, The
  * code's only purpose is to try to prevent false duplicated bug reports by
  * grossly estimating a GPU error state.
  *
@@ -1153,29 +1151,23 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
  *
  * It's only a small step better than a random number in its current form.
  */
-static u32 i915_error_generate_code(struct drm_i915_private *dev_priv,
-				    struct i915_gpu_state *error,
-				    int *engine_id)
+static u32 i915_error_generate_code(struct i915_gpu_state *error,
+				    unsigned long engine_mask)
 {
-	u32 error_code = 0;
-	int i;
-
-	/* IPEHR would be an ideal way to detect errors, as it's the gross
+	/*
+	 * IPEHR would be an ideal way to detect errors, as it's the gross
 	 * measure of "the command that hung." However, has some very common
 	 * synchronization commands which almost always appear in the case
 	 * strictly a client bug. Use instdone to differentiate those some.
 	 */
-	for (i = 0; i < I915_NUM_ENGINES; i++) {
-		if (error->engine[i].hangcheck_stalled) {
-			if (engine_id)
-				*engine_id = i;
+	if (engine_mask) {
+		struct drm_i915_error_engine *ee =
+			&error->engine[ffs(engine_mask)];
 
-			return error->engine[i].ipehr ^
-			       error->engine[i].instdone.instdone;
-		}
+		return ee->ipehr ^ ee->instdone.instdone;
 	}
 
-	return error_code;
+	return 0;
 }
 
 static void gem_record_fences(struct i915_gpu_state *error)
@@ -1338,9 +1330,8 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 	}
 
 	ee->idle = intel_engine_is_idle(engine);
-	ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
-	ee->hangcheck_action = engine->hangcheck.action;
-	ee->hangcheck_stalled = engine->hangcheck.stalled;
+	if (!ee->idle)
+		ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
 	ee->reset_count = i915_reset_engine_count(&dev_priv->gpu_error,
 						  engine);
 
@@ -1783,31 +1774,35 @@ static void capture_reg_state(struct i915_gpu_state *error)
 	error->pgtbl_er = I915_READ(PGTBL_ER);
 }
 
-static void i915_error_capture_msg(struct drm_i915_private *dev_priv,
-				   struct i915_gpu_state *error,
-				   u32 engine_mask,
-				   const char *error_msg)
+static const char *
+error_msg(struct i915_gpu_state *error, unsigned long engines, const char *msg)
 {
-	u32 ecode;
-	int engine_id = -1, len;
+	int len;
+	int i;
 
-	ecode = i915_error_generate_code(dev_priv, error, &engine_id);
+	for (i = 0; i < ARRAY_SIZE(error->engine); i++)
+		if (!error->engine[i].context.pid)
+			engines &= ~BIT(i);
 
 	len = scnprintf(error->error_msg, sizeof(error->error_msg),
-			"GPU HANG: ecode %d:%d:0x%08x",
-			INTEL_GEN(dev_priv), engine_id, ecode);
-
-	if (engine_id != -1 && error->engine[engine_id].context.pid)
+			"GPU HANG: ecode %d:%lx:0x%08x",
+			INTEL_GEN(error->i915), engines,
+			i915_error_generate_code(error, engines));
+	if (engines) {
+		/* Just show the first executing process, more is confusing */
+		i = ffs(engines);
 		len += scnprintf(error->error_msg + len,
 				 sizeof(error->error_msg) - len,
 				 ", in %s [%d]",
-				 error->engine[engine_id].context.comm,
-				 error->engine[engine_id].context.pid);
+				 error->engine[i].context.comm,
+				 error->engine[i].context.pid);
+	}
+	if (msg)
+		len += scnprintf(error->error_msg + len,
+				 sizeof(error->error_msg) - len,
+				 ", %s", msg);
 
-	scnprintf(error->error_msg + len, sizeof(error->error_msg) - len,
-		  ", reason: %s, action: %s",
-		  error_msg,
-		  engine_mask ? "reset" : "continue");
+	return error->error_msg;
 }
 
 static void capture_gen_state(struct i915_gpu_state *error)
@@ -1847,7 +1842,7 @@ static unsigned long capture_find_epoch(const struct i915_gpu_state *error)
 	for (i = 0; i < ARRAY_SIZE(error->engine); i++) {
 		const struct drm_i915_error_engine *ee = &error->engine[i];
 
-		if (ee->hangcheck_stalled &&
+		if (ee->hangcheck_timestamp &&
 		    time_before(ee->hangcheck_timestamp, epoch))
 			epoch = ee->hangcheck_timestamp;
 	}
@@ -1921,7 +1916,7 @@ i915_capture_gpu_state(struct drm_i915_private *i915)
  * i915_capture_error_state - capture an error record for later analysis
  * @i915: i915 device
  * @engine_mask: the mask of engines triggering the hang
- * @error_msg: a message to insert into the error capture header
+ * @msg: a message to insert into the error capture header
  *
  * Should be called when an error is detected (either a hang or an error
  * interrupt) to capture error state from the time of the error.  Fills
@@ -1929,8 +1924,8 @@ i915_capture_gpu_state(struct drm_i915_private *i915)
  * to pick up.
  */
 void i915_capture_error_state(struct drm_i915_private *i915,
-			      u32 engine_mask,
-			      const char *error_msg)
+			      unsigned long engine_mask,
+			      const char *msg)
 {
 	static bool warned;
 	struct i915_gpu_state *error;
@@ -1946,8 +1941,7 @@ void i915_capture_error_state(struct drm_i915_private *i915,
 	if (IS_ERR(error))
 		return;
 
-	i915_error_capture_msg(i915, error, engine_mask, error_msg);
-	DRM_INFO("%s\n", error->error_msg);
+	dev_info(i915->drm.dev, "%s\n", error_msg(error, engine_mask, msg));
 
 	if (!error->simulated) {
 		spin_lock_irqsave(&i915->gpu_error.lock, flags);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 604291f7762d..231173786eae 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -85,8 +85,6 @@ struct i915_gpu_state {
 		bool waiting;
 		int num_waiters;
 		unsigned long hangcheck_timestamp;
-		bool hangcheck_stalled;
-		enum intel_engine_hangcheck_action hangcheck_action;
 		struct i915_address_space *vm;
 		int num_requests;
 		u32 reset_count;
@@ -197,6 +195,8 @@ struct i915_gpu_state {
 	struct scatterlist *sgl, *fit;
 };
 
+struct i915_gpu_restart;
+
 struct i915_gpu_error {
 	/* For hangcheck timer */
 #define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
@@ -247,15 +247,6 @@ struct i915_gpu_error {
 	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
 	 * secondary role in preventing two concurrent global reset attempts.
 	 *
-	 * #I915_RESET_HANDOFF - To perform the actual GPU reset, we need the
-	 * struct_mutex. We try to acquire the struct_mutex in the reset worker,
-	 * but it may be held by some long running waiter (that we cannot
-	 * interrupt without causing trouble). Once we are ready to do the GPU
-	 * reset, we set the I915_RESET_HANDOFF bit and wakeup any waiters. If
-	 * they already hold the struct_mutex and want to participate they can
-	 * inspect the bit and do the reset directly, otherwise the worker
-	 * waits for the struct_mutex.
-	 *
 	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
 	 * acquire the struct_mutex to reset an engine, we need an explicit
 	 * flag to prevent two concurrent reset attempts in the same engine.
@@ -269,20 +260,13 @@ struct i915_gpu_error {
 	 */
 	unsigned long flags;
 #define I915_RESET_BACKOFF	0
-#define I915_RESET_HANDOFF	1
-#define I915_RESET_MODESET	2
-#define I915_RESET_ENGINE	3
+#define I915_RESET_MODESET	1
+#define I915_RESET_ENGINE	2
 #define I915_WEDGED		(BITS_PER_LONG - 1)
 
 	/** Number of times an engine has been reset */
 	u32 reset_engine_count[I915_NUM_ENGINES];
 
-	/** Set of stalled engines with guilty requests, in the current reset */
-	u32 stalled_mask;
-
-	/** Reason for the current *global* reset */
-	const char *reason;
-
 	struct mutex wedge_mutex; /* serialises wedging/unwedging */
 
 	/**
@@ -299,6 +283,8 @@ struct i915_gpu_error {
 
 	/* For missed irq/seqno simulation. */
 	unsigned long test_irq_rings;
+
+	struct i915_gpu_restart *restart;
 };
 
 struct drm_i915_error_state_buf {
@@ -320,7 +306,7 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
 
 struct i915_gpu_state *i915_capture_gpu_state(struct drm_i915_private *i915);
 void i915_capture_error_state(struct drm_i915_private *dev_priv,
-			      u32 engine_mask,
+			      unsigned long engine_mask,
 			      const char *error_msg);
 
 static inline struct i915_gpu_state *
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5e178f5ac18b..80232de8e2be 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1083,18 +1083,6 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	return false;
 }
 
-static bool __i915_wait_request_check_and_reset(struct i915_request *request)
-{
-	struct i915_gpu_error *error = &request->i915->gpu_error;
-
-	if (likely(!i915_reset_handoff(error)))
-		return false;
-
-	__set_current_state(TASK_RUNNING);
-	i915_reset(request->i915, error->stalled_mask, error->reason);
-	return true;
-}
-
 /**
  * i915_request_wait - wait until execution of request has finished
  * @rq: the request to wait upon
@@ -1120,17 +1108,10 @@ long i915_request_wait(struct i915_request *rq,
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
-	DEFINE_WAIT_FUNC(reset, default_wake_function);
 	DEFINE_WAIT_FUNC(exec, default_wake_function);
 	struct intel_wait wait;
 
 	might_sleep();
-#if IS_ENABLED(CONFIG_LOCKDEP)
-	GEM_BUG_ON(debug_locks &&
-		   !!lockdep_is_held(&rq->i915->drm.struct_mutex) !=
-		   !!(flags & I915_WAIT_LOCKED));
-#endif
 	GEM_BUG_ON(timeout < 0);
 
 	if (i915_request_completed(rq))
@@ -1140,11 +1121,7 @@ long i915_request_wait(struct i915_request *rq,
 		return -ETIME;
 
 	trace_i915_request_wait_begin(rq, flags);
-
 	add_wait_queue(&rq->execute, &exec);
-	if (flags & I915_WAIT_LOCKED)
-		add_wait_queue(errq, &reset);
-
 	intel_wait_init(&wait);
 	if (flags & I915_WAIT_PRIORITY)
 		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
@@ -1155,10 +1132,6 @@ long i915_request_wait(struct i915_request *rq,
 		if (intel_wait_update_request(&wait, rq))
 			break;
 
-		if (flags & I915_WAIT_LOCKED &&
-		    __i915_wait_request_check_and_reset(rq))
-			continue;
-
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
 			goto complete;
@@ -1188,9 +1161,6 @@ long i915_request_wait(struct i915_request *rq,
 		 */
 		goto wakeup;
 
-	if (flags & I915_WAIT_LOCKED)
-		__i915_wait_request_check_and_reset(rq);
-
 	for (;;) {
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
@@ -1214,21 +1184,6 @@ long i915_request_wait(struct i915_request *rq,
 		if (i915_request_completed(rq))
 			break;
 
-		/*
-		 * If the GPU is hung, and we hold the lock, reset the GPU
-		 * and then check for completion. On a full reset, the engine's
-		 * HW seqno will be advanced passed us and we are complete.
-		 * If we do a partial reset, we have to wait for the GPU to
-		 * resume and update the breadcrumb.
-		 *
-		 * If we don't hold the mutex, we can just wait for the worker
-		 * to come along and update the breadcrumb (either directly
-		 * itself, or indirectly by recovering the GPU).
-		 */
-		if (flags & I915_WAIT_LOCKED &&
-		    __i915_wait_request_check_and_reset(rq))
-			continue;
-
 		/* Only spin if we know the GPU is processing this request */
 		if (__i915_spin_request(rq, wait.seqno, state, 2))
 			break;
@@ -1242,8 +1197,6 @@ long i915_request_wait(struct i915_request *rq,
 	intel_engine_remove_wait(rq->engine, &wait);
 complete:
 	__set_current_state(TASK_RUNNING);
-	if (flags & I915_WAIT_LOCKED)
-		remove_wait_queue(errq, &reset);
 	remove_wait_queue(&rq->execute, &exec);
 	trace_i915_request_wait_end(rq);
 
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 2961c21d9420..064fc6da1512 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/sched/mm.h>
+#include <linux/stop_machine.h>
 
 #include "i915_drv.h"
 #include "i915_gpu_error.h"
@@ -17,22 +18,23 @@ static void engine_skip_context(struct i915_request *rq)
 	struct intel_engine_cs *engine = rq->engine;
 	struct i915_gem_context *hung_ctx = rq->gem_context;
 	struct i915_timeline *timeline = rq->timeline;
-	unsigned long flags;
 
+	lockdep_assert_held(&engine->timeline.lock);
 	GEM_BUG_ON(timeline == &engine->timeline);
 
-	spin_lock_irqsave(&engine->timeline.lock, flags);
 	spin_lock(&timeline->lock);
 
-	list_for_each_entry_continue(rq, &engine->timeline.requests, link)
-		if (rq->gem_context == hung_ctx)
-			i915_request_skip(rq, -EIO);
+	if (rq->global_seqno) {
+		list_for_each_entry_continue(rq,
+					     &engine->timeline.requests, link)
+			if (rq->gem_context == hung_ctx)
+				i915_request_skip(rq, -EIO);
+	}
 
 	list_for_each_entry(rq, &timeline->requests, link)
 		i915_request_skip(rq, -EIO);
 
 	spin_unlock(&timeline->lock);
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static void client_mark_guilty(struct drm_i915_file_private *file_priv,
@@ -59,7 +61,7 @@ static void client_mark_guilty(struct drm_i915_file_private *file_priv,
 	}
 }
 
-static void context_mark_guilty(struct i915_gem_context *ctx)
+static bool context_mark_guilty(struct i915_gem_context *ctx)
 {
 	unsigned int score;
 	bool banned, bannable;
@@ -72,7 +74,7 @@ static void context_mark_guilty(struct i915_gem_context *ctx)
 
 	/* Cool contexts don't accumulate client ban score */
 	if (!bannable)
-		return;
+		return false;
 
 	if (banned) {
 		DRM_DEBUG_DRIVER("context %s: guilty %d, score %u, banned\n",
@@ -83,6 +85,8 @@ static void context_mark_guilty(struct i915_gem_context *ctx)
 
 	if (!IS_ERR_OR_NULL(ctx->file_priv))
 		client_mark_guilty(ctx->file_priv, ctx);
+
+	return banned;
 }
 
 static void context_mark_innocent(struct i915_gem_context *ctx)
@@ -90,6 +94,21 @@ static void context_mark_innocent(struct i915_gem_context *ctx)
 	atomic_inc(&ctx->active_count);
 }
 
+void i915_reset_request(struct i915_request *rq, bool guilty)
+{
+	lockdep_assert_held(&rq->engine->timeline.lock);
+	GEM_BUG_ON(i915_request_completed(rq));
+
+	if (guilty) {
+		i915_request_skip(rq, -EIO);
+		if (context_mark_guilty(rq->gem_context))
+			engine_skip_context(rq);
+	} else {
+		dma_fence_set_error(&rq->fence, -EAGAIN);
+		context_mark_innocent(rq->gem_context);
+	}
+}
+
 static void gen3_stop_engine(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *dev_priv = engine->i915;
@@ -533,22 +552,6 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
 	int retry;
 	int ret;
 
-	/*
-	 * We want to perform per-engine reset from atomic context (e.g.
-	 * softirq), which imposes the constraint that we cannot sleep.
-	 * However, experience suggests that spending a bit of time waiting
-	 * for a reset helps in various cases, so for a full-device reset
-	 * we apply the opposite rule and wait if we want to. As we should
-	 * always follow up a failed per-engine reset with a full device reset,
-	 * being a little faster, stricter and more error prone for the
-	 * atomic case seems an acceptable compromise.
-	 *
-	 * Unfortunately this leads to a bimodal routine, when the goal was
-	 * to have a single reset function that worked for resetting any
-	 * number of engines simultaneously.
-	 */
-	might_sleep_if(engine_mask == ALL_ENGINES);
-
 	/*
 	 * If the power well sleeps during the reset, the reset
 	 * request may be dropped and never completes (causing -EIO).
@@ -580,8 +583,6 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
 		}
 		if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
 			break;
-
-		cond_resched();
 	}
 	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
 
@@ -620,11 +621,8 @@ int intel_reset_guc(struct drm_i915_private *i915)
  * Ensure irq handler finishes, and not run again.
  * Also return the active request so that we only search for it once.
  */
-static struct i915_request *
-reset_prepare_engine(struct intel_engine_cs *engine)
+static void reset_prepare_engine(struct intel_engine_cs *engine)
 {
-	struct i915_request *rq;
-
 	/*
 	 * During the reset sequence, we must prevent the engine from
 	 * entering RC6. As the context state is undefined until we restart
@@ -633,162 +631,86 @@ reset_prepare_engine(struct intel_engine_cs *engine)
 	 * GPU state upon resume, i.e. fail to restart after a reset.
 	 */
 	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
-
-	rq = engine->reset.prepare(engine);
-	if (rq && rq->fence.error == -EIO)
-		rq = ERR_PTR(-EIO); /* Previous reset failed! */
-
-	return rq;
+	engine->reset.prepare(engine);
 }
 
-static int reset_prepare(struct drm_i915_private *i915)
+static void reset_prepare(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
-	struct i915_request *rq;
 	enum intel_engine_id id;
-	int err = 0;
 
-	for_each_engine(engine, i915, id) {
-		rq = reset_prepare_engine(engine);
-		if (IS_ERR(rq)) {
-			err = PTR_ERR(rq);
-			continue;
-		}
-
-		engine->hangcheck.active_request = rq;
-	}
+	for_each_engine(engine, i915, id)
+		reset_prepare_engine(engine);
 
-	i915_gem_revoke_fences(i915);
 	intel_uc_sanitize(i915);
-
-	return err;
 }
 
-/* Returns the request if it was guilty of the hang */
-static struct i915_request *
-reset_request(struct intel_engine_cs *engine,
-	      struct i915_request *rq,
-	      bool stalled)
+static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 {
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err;
+
 	/*
-	 * The guilty request will get skipped on a hung engine.
-	 *
-	 * Users of client default contexts do not rely on logical
-	 * state preserved between batches so it is safe to execute
-	 * queued requests following the hang. Non default contexts
-	 * rely on preserved state, so skipping a batch loses the
-	 * evolution of the state and it needs to be considered corrupted.
-	 * Executing more queued batches on top of corrupted state is
-	 * risky. But we take the risk by trying to advance through
-	 * the queued requests in order to make the client behaviour
-	 * more predictable around resets, by not throwing away random
-	 * amount of batches it has prepared for execution. Sophisticated
-	 * clients can use gem_reset_stats_ioctl and dma fence status
-	 * (exported via sync_file info ioctl on explicit fences) to observe
-	 * when it loses the context state and should rebuild accordingly.
-	 *
-	 * The context ban, and ultimately the client ban, mechanism are safety
-	 * valves if client submission ends up resulting in nothing more than
-	 * subsequent hangs.
+	 * Everything depends on having the GTT running, so we need to start
+	 * there.
 	 */
+	err = i915_ggtt_enable_hw(i915);
+	if (err)
+		return err;
 
-	if (i915_request_completed(rq)) {
-		GEM_TRACE("%s pardoned global=%d (fence %llx:%lld), current %d\n",
-			  engine->name, rq->global_seqno,
-			  rq->fence.context, rq->fence.seqno,
-			  intel_engine_get_seqno(engine));
-		stalled = false;
-	}
-
-	if (stalled) {
-		context_mark_guilty(rq->gem_context);
-		i915_request_skip(rq, -EIO);
+	for_each_engine(engine, i915, id)
+		intel_engine_reset(engine, stalled_mask & ENGINE_MASK(id));
 
-		/* If this context is now banned, skip all pending requests. */
-		if (i915_gem_context_is_banned(rq->gem_context))
-			engine_skip_context(rq);
-	} else {
-		/*
-		 * Since this is not the hung engine, it may have advanced
-		 * since the hang declaration. Double check by refinding
-		 * the active request at the time of the reset.
-		 */
-		rq = i915_gem_find_active_request(engine);
-		if (rq) {
-			unsigned long flags;
-
-			context_mark_innocent(rq->gem_context);
-			dma_fence_set_error(&rq->fence, -EAGAIN);
-
-			/* Rewind the engine to replay the incomplete rq */
-			spin_lock_irqsave(&engine->timeline.lock, flags);
-			rq = list_prev_entry(rq, link);
-			if (&rq->link == &engine->timeline.requests)
-				rq = NULL;
-			spin_unlock_irqrestore(&engine->timeline.lock, flags);
-		}
-	}
+	i915_gem_restore_fences(i915);
 
-	return rq;
+	return err;
 }
 
-static void reset_engine(struct intel_engine_cs *engine,
-			 struct i915_request *rq,
-			 bool stalled)
+static void reset_finish_engine(struct intel_engine_cs *engine)
 {
-	if (rq)
-		rq = reset_request(engine, rq, stalled);
-
-	/* Setup the CS to resume from the breadcrumb of the hung request */
-	engine->reset.reset(engine, rq);
+	engine->reset.finish(engine);
+	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
 }
 
-static void gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
+struct i915_gpu_restart {
+	struct work_struct work;
+	struct drm_i915_private *i915;
+};
+
+static void restart_work(struct work_struct *work)
 {
+	struct i915_gpu_restart *arg = container_of(work, typeof(*arg), work);
+	struct drm_i915_private *i915 = arg->i915;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+	mutex_lock(&i915->drm.struct_mutex);
 
-	i915_retire_requests(i915);
+	smp_store_mb(i915->gpu_error.restart, NULL);
 
 	for_each_engine(engine, i915, id) {
-		struct intel_context *ce;
-
-		reset_engine(engine,
-			     engine->hangcheck.active_request,
-			     stalled_mask & ENGINE_MASK(id));
-		ce = fetch_and_zero(&engine->last_retired_context);
-		if (ce)
-			intel_context_unpin(ce);
+		struct i915_request *rq;
 
 		/*
 		 * Ostensibily, we always want a context loaded for powersaving,
 		 * so if the engine is idle after the reset, send a request
 		 * to load our scratch kernel_context.
-		 *
-		 * More mysteriously, if we leave the engine idle after a reset,
-		 * the next userspace batch may hang, with what appears to be
-		 * an incoherent read by the CS (presumably stale TLB). An
-		 * empty request appears sufficient to paper over the glitch.
 		 */
-		if (intel_engine_is_idle(engine)) {
-			struct i915_request *rq;
+		if (!intel_engine_is_idle(engine))
+			continue;
 
-			rq = i915_request_alloc(engine, i915->kernel_context);
-			if (!IS_ERR(rq))
-				i915_request_add(rq);
-		}
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (!IS_ERR(rq))
+			i915_request_add(rq);
 	}
 
-	i915_gem_restore_fences(i915);
-}
-
-static void reset_finish_engine(struct intel_engine_cs *engine)
-{
-	engine->reset.finish(engine);
+	mutex_unlock(&i915->drm.struct_mutex);
+	intel_runtime_pm_put(i915, wakeref);
 
-	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
+	kfree(arg);
 }
 
 static void reset_finish(struct drm_i915_private *i915)
@@ -796,11 +718,30 @@ static void reset_finish(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
-
-	for_each_engine(engine, i915, id) {
-		engine->hangcheck.active_request = NULL;
+	for_each_engine(engine, i915, id)
 		reset_finish_engine(engine);
+}
+
+static void reset_restart(struct drm_i915_private *i915)
+{
+	struct i915_gpu_restart *arg;
+
+	/*
+	 * Following the reset, ensure that we always reload context for
+	 * powersaving, and to correct engine->last_retired_context. Since
+	 * this requires us to submit a request, queue a worker to do that
+	 * task for us to evade any locking here.
+	 */
+	if (READ_ONCE(i915->gpu_error.restart))
+		return;
+
+	arg = kmalloc(sizeof(*arg), GFP_KERNEL);
+	if (arg) {
+		arg->i915 = i915;
+		INIT_WORK(&arg->work, restart_work);
+
+		WRITE_ONCE(i915->gpu_error.restart, arg);
+		queue_work(i915->wq, &arg->work);
 	}
 }
 
@@ -889,8 +830,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	struct i915_timeline *tl;
 	bool ret = false;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
-
 	if (!test_bit(I915_WEDGED, &error->flags))
 		return true;
 
@@ -913,9 +852,9 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 */
 	list_for_each_entry(tl, &i915->gt.timelines, link) {
 		struct i915_request *rq;
+		long timeout;
 
-		rq = i915_gem_active_peek(&tl->last_request,
-					  &i915->drm.struct_mutex);
+		rq = i915_gem_active_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
@@ -930,12 +869,12 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 		 * and when the seqno passes the fence, the signaler
 		 * then signals the fence waking us up).
 		 */
-		if (dma_fence_default_wait(&rq->fence, true,
-					   MAX_SCHEDULE_TIMEOUT) < 0)
+		timeout = dma_fence_default_wait(&rq->fence, true,
+						 MAX_SCHEDULE_TIMEOUT);
+		i915_request_put(rq);
+		if (timeout < 0)
 			goto unlock;
 	}
-	i915_retire_requests(i915);
-	GEM_BUG_ON(i915->gt.active_requests);
 
 	intel_engines_sanitize(i915, false);
 
@@ -949,7 +888,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 * context and do not require stop_machine().
 	 */
 	intel_engines_reset_default_submission(i915);
-	i915_gem_contexts_lost(i915);
 
 	GEM_TRACE("end\n");
 
@@ -962,6 +900,43 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	return ret;
 }
 
+struct __i915_reset {
+	struct drm_i915_private *i915;
+	unsigned int stalled_mask;
+};
+
+static int __i915_reset__BKL(void *data)
+{
+	struct __i915_reset *arg = data;
+	int err;
+
+	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
+	if (err)
+		return err;
+
+	return gt_reset(arg->i915, arg->stalled_mask);
+}
+
+#if 0
+#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
+#else
+#define __do_reset(fn, arg) fn(arg)
+#endif
+
+static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
+{
+	struct __i915_reset arg = { i915, stalled_mask };
+	int err, i;
+
+	err = __do_reset(__i915_reset__BKL, &arg);
+	for (i = 0; err && i < 3; i++) {
+		msleep(100);
+		err = __do_reset(__i915_reset__BKL, &arg);
+	}
+
+	return err;
+}
+
 /**
  * i915_reset - reset chip after a hang
  * @i915: #drm_i915_private to reset
@@ -987,31 +962,22 @@ void i915_reset(struct drm_i915_private *i915,
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	int ret;
-	int i;
 
 	GEM_TRACE("flags=%lx\n", error->flags);
 
 	might_sleep();
-	lockdep_assert_held(&i915->drm.struct_mutex);
 	assert_rpm_wakelock_held(i915);
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
 
-	if (!test_bit(I915_RESET_HANDOFF, &error->flags))
-		return;
-
 	/* Clear any previous failed attempts at recovery. Time to try again. */
 	if (!i915_gem_unset_wedged(i915))
-		goto wakeup;
+		return;
 
 	if (reason)
 		dev_notice(i915->drm.dev, "Resetting chip for %s\n", reason);
 	error->reset_count++;
 
-	ret = reset_prepare(i915);
-	if (ret) {
-		dev_err(i915->drm.dev, "GPU recovery failed\n");
-		goto taint;
-	}
+	reset_prepare(i915);
 
 	if (!intel_has_gpu_reset(i915)) {
 		if (i915_modparams.reset)
@@ -1021,32 +987,11 @@ void i915_reset(struct drm_i915_private *i915,
 		goto error;
 	}
 
-	for (i = 0; i < 3; i++) {
-		ret = intel_gpu_reset(i915, ALL_ENGINES);
-		if (ret == 0)
-			break;
-
-		msleep(100);
-	}
-	if (ret) {
+	if (do_reset(i915, stalled_mask)) {
 		dev_err(i915->drm.dev, "Failed to reset chip\n");
 		goto taint;
 	}
 
-	/* Ok, now get things going again... */
-
-	/*
-	 * Everything depends on having the GTT running, so we need to start
-	 * there.
-	 */
-	ret = i915_ggtt_enable_hw(i915);
-	if (ret) {
-		DRM_ERROR("Failed to re-enable GGTT following reset (%d)\n",
-			  ret);
-		goto error;
-	}
-
-	gt_reset(i915, stalled_mask);
 	intel_overlay_reset(i915);
 
 	/*
@@ -1068,9 +1013,8 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-wakeup:
-	clear_bit(I915_RESET_HANDOFF, &error->flags);
-	wake_up_bit(&error->flags, I915_RESET_HANDOFF);
+	if (!i915_terminally_wedged(error))
+		reset_restart(i915);
 	return;
 
 taint:
@@ -1089,7 +1033,6 @@ void i915_reset(struct drm_i915_private *i915,
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 error:
 	i915_gem_set_wedged(i915);
-	i915_retire_requests(i915);
 	goto finish;
 }
 
@@ -1115,18 +1058,16 @@ static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
 int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 {
 	struct i915_gpu_error *error = &engine->i915->gpu_error;
-	struct i915_request *active_request;
 	int ret;
 
 	GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
 
-	active_request = reset_prepare_engine(engine);
-	if (IS_ERR_OR_NULL(active_request)) {
-		/* Either the previous reset failed, or we pardon the reset. */
-		ret = PTR_ERR(active_request);
-		goto out;
-	}
+	if (i915_seqno_passed(intel_engine_get_seqno(engine),
+			      intel_engine_last_submit(engine)))
+		return 0;
+
+	reset_prepare_engine(engine);
 
 	if (msg)
 		dev_notice(engine->i915->drm.dev,
@@ -1150,7 +1091,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	 * active request and can drop it, adjust head to skip the offending
 	 * request to resume executing remaining requests in the queue.
 	 */
-	reset_engine(engine, active_request, true);
+	intel_engine_reset(engine, true);
 
 	/*
 	 * The engine and its registers (and workarounds in case of render)
@@ -1187,30 +1128,7 @@ static void i915_reset_device(struct drm_i915_private *i915,
 	i915_wedge_on_timeout(&w, i915, 5 * HZ) {
 		intel_prepare_reset(i915);
 
-		error->reason = reason;
-		error->stalled_mask = engine_mask;
-
-		/* Signal that locked waiters should reset the GPU */
-		smp_mb__before_atomic();
-		set_bit(I915_RESET_HANDOFF, &error->flags);
-		wake_up_all(&error->wait_queue);
-
-		/*
-		 * Wait for anyone holding the lock to wakeup, without
-		 * blocking indefinitely on struct_mutex.
-		 */
-		do {
-			if (mutex_trylock(&i915->drm.struct_mutex)) {
-				i915_reset(i915, engine_mask, reason);
-				mutex_unlock(&i915->drm.struct_mutex);
-			}
-		} while (wait_on_bit_timeout(&error->flags,
-					     I915_RESET_HANDOFF,
-					     TASK_UNINTERRUPTIBLE,
-					     1));
-
-		error->stalled_mask = 0;
-		error->reason = NULL;
+		i915_reset(i915, engine_mask, reason);
 
 		intel_finish_reset(i915);
 	}
@@ -1366,6 +1284,25 @@ void i915_handle_error(struct drm_i915_private *i915,
 	intel_runtime_pm_put(i915, wakeref);
 }
 
+bool i915_reset_flush(struct drm_i915_private *i915)
+{
+	int err;
+
+	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
+
+	flush_workqueue(i915->wq);
+	GEM_BUG_ON(READ_ONCE(i915->gpu_error.restart));
+
+	mutex_lock(&i915->drm.struct_mutex);
+	err = i915_gem_wait_for_idle(i915,
+				     I915_WAIT_LOCKED |
+				     I915_WAIT_FOR_IDLE_BOOST,
+				     MAX_SCHEDULE_TIMEOUT);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return !err;
+}
+
 static void i915_wedge_me(struct work_struct *work)
 {
 	struct i915_wedge_me *w = container_of(work, typeof(*w), work.work);
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index b6a519bde67d..f2d347f319df 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -29,6 +29,9 @@ void i915_reset(struct drm_i915_private *i915,
 int i915_reset_engine(struct intel_engine_cs *engine,
 		      const char *reason);
 
+void i915_reset_request(struct i915_request *rq, bool guilty);
+bool i915_reset_flush(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2f3c71f6d313..fc52737751e7 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1071,10 +1071,8 @@ void intel_engines_sanitize(struct drm_i915_private *i915, bool force)
 	if (!reset_engines(i915) && !force)
 		return;
 
-	for_each_engine(engine, i915, id) {
-		if (engine->reset.reset)
-			engine->reset.reset(engine, NULL);
-	}
+	for_each_engine(engine, i915, id)
+		intel_engine_reset(engine, false);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index ab1c49b106f2..7217c7e3ee8d 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -834,8 +834,7 @@ static void guc_submission_tasklet(unsigned long data)
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
-static struct i915_request *
-guc_reset_prepare(struct intel_engine_cs *engine)
+static void guc_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 
@@ -861,8 +860,6 @@ guc_reset_prepare(struct intel_engine_cs *engine)
 	 */
 	if (engine->i915->guc.preempt_wq)
 		flush_workqueue(engine->i915->guc.preempt_wq);
-
-	return i915_gem_find_active_request(engine);
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 741441daae32..5662d6fed523 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -25,6 +25,17 @@
 #include "i915_drv.h"
 #include "i915_reset.h"
 
+struct hangcheck {
+	u64 acthd;
+	u32 seqno;
+	enum intel_engine_hangcheck_action action;
+	unsigned long action_timestamp;
+	int deadlock;
+	struct intel_instdone instdone;
+	bool wedged:1;
+	bool stalled:1;
+};
+
 static bool instdone_unchanged(u32 current_instdone, u32 *old_instdone)
 {
 	u32 tmp = current_instdone | *old_instdone;
@@ -119,25 +130,22 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 }
 
 static void hangcheck_load_sample(struct intel_engine_cs *engine,
-				  struct intel_engine_hangcheck *hc)
+				  struct hangcheck *hc)
 {
 	hc->acthd = intel_engine_get_active_head(engine);
 	hc->seqno = intel_engine_get_seqno(engine);
 }
 
 static void hangcheck_store_sample(struct intel_engine_cs *engine,
-				   const struct intel_engine_hangcheck *hc)
+				   const struct hangcheck *hc)
 {
 	engine->hangcheck.acthd = hc->acthd;
 	engine->hangcheck.seqno = hc->seqno;
-	engine->hangcheck.action = hc->action;
-	engine->hangcheck.stalled = hc->stalled;
-	engine->hangcheck.wedged = hc->wedged;
 }
 
 static enum intel_engine_hangcheck_action
 hangcheck_get_action(struct intel_engine_cs *engine,
-		     const struct intel_engine_hangcheck *hc)
+		     const struct hangcheck *hc)
 {
 	if (engine->hangcheck.seqno != hc->seqno)
 		return ENGINE_ACTIVE_SEQNO;
@@ -149,7 +157,7 @@ hangcheck_get_action(struct intel_engine_cs *engine,
 }
 
 static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
-					struct intel_engine_hangcheck *hc)
+					struct hangcheck *hc)
 {
 	unsigned long timeout = I915_ENGINE_DEAD_TIMEOUT;
 
@@ -265,19 +273,19 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_engine_hangcheck hc;
+		struct hangcheck hc;
 
 		hangcheck_load_sample(engine, &hc);
 		hangcheck_accumulate_sample(engine, &hc);
 		hangcheck_store_sample(engine, &hc);
 
-		if (engine->hangcheck.stalled) {
+		if (hc.stalled) {
 			hung |= intel_engine_flag(engine);
 			if (hc.action != ENGINE_DEAD)
 				stuck |= intel_engine_flag(engine);
 		}
 
-		if (engine->hangcheck.wedged)
+		if (hc.wedged)
 			wedged |= intel_engine_flag(engine);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 28d183439952..c11cbf34258d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -136,6 +136,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_gem_render_state.h"
+#include "i915_reset.h"
 #include "i915_vgpu.h"
 #include "intel_lrc_reg.h"
 #include "intel_mocs.h"
@@ -288,7 +289,8 @@ static void unwind_wa_tail(struct i915_request *rq)
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
 
-static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
+static struct i915_request *
+__unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
@@ -330,6 +332,8 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
 	}
+
+	return active;
 }
 
 void
@@ -1743,11 +1747,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 	return 0;
 }
 
-static struct i915_request *
-execlists_reset_prepare(struct intel_engine_cs *engine)
+static void execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request *request, *active;
 	unsigned long flags;
 
 	GEM_TRACE("%s: depth<-%d\n", engine->name,
@@ -1763,59 +1765,21 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * prevents the race.
 	 */
 	__tasklet_disable_sync_once(&execlists->tasklet);
+	GEM_BUG_ON(!reset_in_progress(execlists));
 
+	/* And flush any current direct submission. */
 	spin_lock_irqsave(&engine->timeline.lock, flags);
-
-	/*
-	 * We want to flush the pending context switches, having disabled
-	 * the tasklet above, we can assume exclusive access to the execlists.
-	 * For this allows us to catch up with an inflight preemption event,
-	 * and avoid blaming an innocent request if the stall was due to the
-	 * preemption itself.
-	 */
-	process_csb(engine);
-
-	/*
-	 * The last active request can then be no later than the last request
-	 * now in ELSP[0]. So search backwards from there, so that if the GPU
-	 * has advanced beyond the last CSB update, it will be pardoned.
-	 */
-	active = NULL;
-	request = port_request(execlists->port);
-	if (request) {
-		/*
-		 * Prevent the breadcrumb from advancing before we decide
-		 * which request is currently active.
-		 */
-		intel_engine_stop_cs(engine);
-
-		list_for_each_entry_from_reverse(request,
-						 &engine->timeline.requests,
-						 link) {
-			if (__i915_request_completed(request,
-						     request->global_seqno))
-				break;
-
-			active = request;
-		}
-	}
-
+	process_csb(engine); /* drain preemption events */
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
-
-	return active;
 }
 
-static void execlists_reset(struct intel_engine_cs *engine,
-			    struct i915_request *request)
+static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_request *rq;
 	unsigned long flags;
 	u32 *regs;
 
-	GEM_TRACE("%s request global=%d, current=%d\n",
-		  engine->name, request ? request->global_seqno : 0,
-		  intel_engine_get_seqno(engine));
-
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	/*
@@ -1830,12 +1794,18 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	execlists_cancel_port_requests(execlists);
 
 	/* Push back any incomplete requests for replay after the reset. */
-	__unwind_incomplete_requests(engine);
+	rq = __unwind_incomplete_requests(engine);
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	reset_csb_pointers(&engine->execlists);
 
-	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
+		  engine->name,
+		  rq ? rq->global_seqno : 0,
+		  intel_engine_get_seqno(engine),
+		  yesno(stalled));
+	if (!rq)
+		goto out_unlock;
 
 	/*
 	 * If the request was innocent, we leave the request in the ELSP
@@ -1848,8 +1818,9 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	 * and have to at least restore the RING register in the context
 	 * image back to the expected values to skip over the guilty request.
 	 */
-	if (!request || request->fence.error != -EIO)
-		return;
+	i915_reset_request(rq, stalled);
+	if (!stalled)
+		goto out_unlock;
 
 	/*
 	 * We want a simple context + ring to execute the breadcrumb update.
@@ -1859,25 +1830,23 @@ static void execlists_reset(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	regs = request->hw_context->lrc_reg_state;
+	regs = rq->hw_context->lrc_reg_state;
 	if (engine->pinned_default_state) {
 		memcpy(regs, /* skip restoring the vanilla PPHWSP */
 		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
 		       engine->context_size - PAGE_SIZE);
 	}
-	execlists_init_reg_state(regs,
-				 request->gem_context, engine, request->ring);
+	execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring);
 
 	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
-	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
-
-	request->ring->head = intel_ring_wrap(request->ring, request->postfix);
-	regs[CTX_RING_HEAD + 1] = request->ring->head;
+	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(rq->ring->vma);
 
-	intel_ring_update_space(request->ring);
+	rq->ring->head = intel_ring_wrap(rq->ring, rq->postfix);
+	regs[CTX_RING_HEAD + 1] = rq->ring->head;
+	intel_ring_update_space(rq->ring);
 
-	/* Reset WaIdleLiteRestore:bdw,skl as well */
-	unwind_wa_tail(request);
+out_unlock:
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static void execlists_reset_finish(struct intel_engine_cs *engine)
@@ -1890,6 +1859,7 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
 	 * to sleep before we restart and reload a context.
 	 *
 	 */
+	GEM_BUG_ON(!reset_in_progress(execlists));
 	if (!RB_EMPTY_ROOT(&execlists->queue.rb_root))
 		execlists->tasklet.func(execlists->tasklet.data);
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index c81db81e4416..f68c7975006c 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -478,8 +478,6 @@ void intel_overlay_reset(struct drm_i915_private *dev_priv)
 	if (!overlay)
 		return;
 
-	intel_overlay_release_old_vid(overlay);
-
 	overlay->old_xscale = 0;
 	overlay->old_yscale = 0;
 	overlay->crtc = NULL;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 26b7274a2d43..662907e1a286 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -33,6 +33,7 @@
 
 #include "i915_drv.h"
 #include "i915_gem_render_state.h"
+#include "i915_reset.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
 #include "intel_workarounds.h"
@@ -707,52 +708,80 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	return ret;
 }
 
-static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
+static void reset_prepare(struct intel_engine_cs *engine)
 {
 	intel_engine_stop_cs(engine);
-	return i915_gem_find_active_request(engine);
 }
 
-static void skip_request(struct i915_request *rq)
+static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 {
-	void *vaddr = rq->ring->vaddr;
+	struct i915_timeline *tl = &engine->timeline;
+	struct i915_request *pos, *rq;
+	unsigned long flags;
 	u32 head;
 
-	head = rq->infix;
-	if (rq->postfix < head) {
-		memset32(vaddr + head, MI_NOOP,
-			 (rq->ring->size - head) / sizeof(u32));
-		head = 0;
+	rq = NULL;
+	spin_lock_irqsave(&tl->lock, flags);
+	list_for_each_entry(pos, &tl->requests, link) {
+		if (!__i915_request_completed(pos, pos->global_seqno)) {
+			rq = pos;
+			break;
+		}
 	}
-	memset32(vaddr + head, MI_NOOP, (rq->postfix - head) / sizeof(u32));
-}
-
-static void reset_ring(struct intel_engine_cs *engine, struct i915_request *rq)
-{
-	GEM_TRACE("%s request global=%d, current=%d\n",
-		  engine->name, rq ? rq->global_seqno : 0,
-		  intel_engine_get_seqno(engine));
 
+	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
+		  engine->name,
+		  rq ? rq->global_seqno : 0,
+		  intel_engine_get_seqno(engine),
+		  yesno(stalled));
 	/*
-	 * Try to restore the logical GPU state to match the continuation
-	 * of the request queue. If we skip the context/PD restore, then
-	 * the next request may try to execute assuming that its context
-	 * is valid and loaded on the GPU and so may try to access invalid
-	 * memory, prompting repeated GPU hangs.
+	 * The guilty request will get skipped on a hung engine.
 	 *
-	 * If the request was guilty, we still restore the logical state
-	 * in case the next request requires it (e.g. the aliasing ppgtt),
-	 * but skip over the hung batch.
+	 * Users of client default contexts do not rely on logical
+	 * state preserved between batches so it is safe to execute
+	 * queued requests following the hang. Non default contexts
+	 * rely on preserved state, so skipping a batch loses the
+	 * evolution of the state and it needs to be considered corrupted.
+	 * Executing more queued batches on top of corrupted state is
+	 * risky. But we take the risk by trying to advance through
+	 * the queued requests in order to make the client behaviour
+	 * more predictable around resets, by not throwing away random
+	 * amount of batches it has prepared for execution. Sophisticated
+	 * clients can use gem_reset_stats_ioctl and dma fence status
+	 * (exported via sync_file info ioctl on explicit fences) to observe
+	 * when it loses the context state and should rebuild accordingly.
 	 *
-	 * If the request was innocent, we try to replay the request with
-	 * the restored context.
+	 * The context ban, and ultimately the client ban, mechanism are safety
+	 * valves if client submission ends up resulting in nothing more than
+	 * subsequent hangs.
 	 */
+
 	if (rq) {
-		/* If the rq hung, jump to its breadcrumb and skip the batch */
-		rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
-		if (rq->fence.error == -EIO)
-			skip_request(rq);
+		/*
+		 * Try to restore the logical GPU state to match the
+		 * continuation of the request queue. If we skip the
+		 * context/PD restore, then the next request may try to execute
+		 * assuming that its context is valid and loaded on the GPU and
+		 * so may try to access invalid memory, prompting repeated GPU
+		 * hangs.
+		 *
+		 * If the request was guilty, we still restore the logical
+		 * state in case the next request requires it (e.g. the
+		 * aliasing ppgtt), but skip over the hung batch.
+		 *
+		 * If the request was innocent, we try to replay the request
+		 * with the restored context.
+		 */
+		i915_reset_request(rq, stalled);
+
+		GEM_BUG_ON(rq->ring != engine->buffer);
+		head = rq->head;
+	} else {
+		head = engine->buffer->tail;
 	}
+	engine->buffer->head = intel_ring_wrap(engine->buffer, head);
+
+	spin_unlock_irqrestore(&tl->lock, flags);
 }
 
 static void reset_finish(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c3ef0f9bf321..32ed44196c1a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -120,13 +120,8 @@ struct intel_instdone {
 struct intel_engine_hangcheck {
 	u64 acthd;
 	u32 seqno;
-	enum intel_engine_hangcheck_action action;
 	unsigned long action_timestamp;
-	int deadlock;
 	struct intel_instdone instdone;
-	struct i915_request *active_request;
-	bool stalled:1;
-	bool wedged:1;
 };
 
 struct intel_ring {
@@ -444,9 +439,8 @@ struct intel_engine_cs {
 	int		(*init_hw)(struct intel_engine_cs *engine);
 
 	struct {
-		struct i915_request *(*prepare)(struct intel_engine_cs *engine);
-		void (*reset)(struct intel_engine_cs *engine,
-			      struct i915_request *rq);
+		void (*prepare)(struct intel_engine_cs *engine);
+		void (*reset)(struct intel_engine_cs *engine, bool stalled);
 		void (*finish)(struct intel_engine_cs *engine);
 	} reset;
 
@@ -1018,6 +1012,13 @@ gen8_emit_ggtt_write(u32 *cs, u32 value, u32 gtt_offset)
 	return cs;
 }
 
+static inline void intel_engine_reset(struct intel_engine_cs *engine,
+				      bool stalled)
+{
+	if (engine->reset.reset)
+		engine->reset.reset(engine, stalled);
+}
+
 void intel_engines_sanitize(struct drm_i915_private *i915, bool force);
 
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 12550b55c42f..67431355cd6e 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -363,9 +363,7 @@ static int igt_global_reset(void *arg)
 	/* Check that we can issue a global GPU reset */
 
 	igt_global_reset_lock(i915);
-	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
 
-	mutex_lock(&i915->drm.struct_mutex);
 	reset_count = i915_reset_count(&i915->gpu_error);
 
 	i915_reset(i915, ALL_ENGINES, NULL);
@@ -374,9 +372,7 @@ static int igt_global_reset(void *arg)
 		pr_err("No GPU reset recorded!\n");
 		err = -EINVAL;
 	}
-	mutex_unlock(&i915->drm.struct_mutex);
 
-	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
 	igt_global_reset_unlock(i915);
 
 	if (i915_terminally_wedged(&i915->gpu_error))
@@ -399,9 +395,7 @@ static int igt_wedged_reset(void *arg)
 	i915_gem_set_wedged(i915);
 	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
 
-	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
 	i915_reset(i915, ALL_ENGINES, NULL);
-	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
 
 	intel_runtime_pm_put(i915, wakeref);
 	mutex_unlock(&i915->drm.struct_mutex);
@@ -511,7 +505,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 				break;
 			}
 
-			if (!wait_for_idle(engine)) {
+			if (!i915_reset_flush(i915)) {
 				struct drm_printer p =
 					drm_info_printer(i915->drm.dev);
 
@@ -903,20 +897,13 @@ static int igt_reset_engines(void *arg)
 	return 0;
 }
 
-static u32 fake_hangcheck(struct i915_request *rq, u32 mask)
+static u32 fake_hangcheck(struct drm_i915_private *i915, u32 mask)
 {
-	struct i915_gpu_error *error = &rq->i915->gpu_error;
-	u32 reset_count = i915_reset_count(error);
-
-	error->stalled_mask = mask;
-
-	/* set_bit() must be after we have setup the backchannel (mask) */
-	smp_mb__before_atomic();
-	set_bit(I915_RESET_HANDOFF, &error->flags);
+	u32 count = i915_reset_count(&i915->gpu_error);
 
-	wake_up_all(&error->wait_queue);
+	i915_reset(i915, mask, NULL);
 
-	return reset_count;
+	return count;
 }
 
 static int igt_reset_wait(void *arg)
@@ -962,7 +949,7 @@ static int igt_reset_wait(void *arg)
 		goto out_rq;
 	}
 
-	reset_count = fake_hangcheck(rq, ALL_ENGINES);
+	reset_count = fake_hangcheck(i915, ALL_ENGINES);
 
 	timeout = i915_request_wait(rq, I915_WAIT_LOCKED, 10);
 	if (timeout < 0) {
@@ -972,7 +959,6 @@ static int igt_reset_wait(void *arg)
 		goto out_rq;
 	}
 
-	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
 	if (i915_reset_count(&i915->gpu_error) == reset_count) {
 		pr_err("No GPU reset recorded!\n");
 		err = -EINVAL;
@@ -1162,7 +1148,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 	}
 
 out_reset:
-	fake_hangcheck(rq, intel_engine_flag(rq->engine));
+	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
 
 	if (tsk) {
 		struct igt_wedge_me w;
@@ -1341,12 +1327,7 @@ static int igt_reset_queue(void *arg)
 				goto fini;
 			}
 
-			reset_count = fake_hangcheck(prev, ENGINE_MASK(id));
-
-			i915_reset(i915, ENGINE_MASK(id), NULL);
-
-			GEM_BUG_ON(test_bit(I915_RESET_HANDOFF,
-					    &i915->gpu_error.flags));
+			reset_count = fake_hangcheck(i915, ENGINE_MASK(id));
 
 			if (prev->fence.error != -EIO) {
 				pr_err("GPU reset not recorded on hanging request [fence.error=%d]!\n",
@@ -1565,6 +1546,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 		pr_err("%s(%s): Failed to start request %llx, at %x\n",
 		       __func__, engine->name,
 		       rq->fence.seqno, hws_seqno(&h, rq));
+		i915_gem_set_wedged(i915);
 		err = -EIO;
 	}
 
@@ -1588,7 +1570,6 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 static void force_reset(struct drm_i915_private *i915)
 {
 	i915_gem_set_wedged(i915);
-	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
 	i915_reset(i915, 0, NULL);
 }
 
@@ -1618,6 +1599,26 @@ static int igt_atomic_reset(void *arg)
 	if (i915_terminally_wedged(&i915->gpu_error))
 		goto unlock;
 
+	if (intel_has_gpu_reset(i915)) {
+		const typeof(*phases) *p;
+
+		for (p = phases; p->name; p++) {
+			GEM_TRACE("intel_gpu_reset under %s\n", p->name);
+
+			p->critical_section_begin();
+			err = intel_gpu_reset(i915, ALL_ENGINES);
+			p->critical_section_end();
+
+			if (err) {
+				pr_err("intel_gpu_reset failed under %s\n",
+				       p->name);
+				goto out;
+			}
+		}
+
+		force_reset(i915);
+	}
+
 	if (intel_has_reset_engine(i915)) {
 		struct intel_engine_cs *engine;
 		enum intel_engine_id id;
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index a8cac56be835..b15c4f26c593 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -214,7 +214,6 @@ static int check_whitelist(struct i915_gem_context *ctx,
 
 static int do_device_reset(struct intel_engine_cs *engine)
 {
-	set_bit(I915_RESET_HANDOFF, &engine->i915->gpu_error.flags);
 	i915_reset(engine->i915, ENGINE_MASK(engine->id), "live_workarounds");
 	return 0;
 }
@@ -394,7 +393,6 @@ static int
 live_gpu_reset_gt_engine_workarounds(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct i915_gpu_error *error = &i915->gpu_error;
 	intel_wakeref_t wakeref;
 	struct wa_lists lists;
 	bool ok;
@@ -413,7 +411,6 @@ live_gpu_reset_gt_engine_workarounds(void *arg)
 	if (!ok)
 		goto out;
 
-	set_bit(I915_RESET_HANDOFF, &error->flags);
 	i915_reset(i915, ALL_ENGINES, "live_workarounds");
 
 	ok = verify_gt_engine_wa(i915, &lists, "after reset");
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 5477ad4a7e7d..8ab5a2688a0c 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -58,8 +58,8 @@ static void mock_device_release(struct drm_device *dev)
 	i915_gem_contexts_lost(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	cancel_delayed_work_sync(&i915->gt.retire_work);
-	cancel_delayed_work_sync(&i915->gt.idle_work);
+	drain_delayed_work(&i915->gt.retire_work);
+	drain_delayed_work(&i915->gt.idle_work);
 	i915_gem_drain_workqueue(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 11/34] drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (9 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-21 22:20 ` [PATCH 12/34] drm/i915: Issue engine resets onto idle engines Chris Wilson
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Trim the struct_mutex hold and exclude the call to i915_gem_set_wedged()
as a reminder that it must be callable without struct_mutex held.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 67431355cd6e..8025c7e0bf6c 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -389,16 +389,16 @@ static int igt_wedged_reset(void *arg)
 	/* Check that we can recover a wedged device with a GPU reset */
 
 	igt_global_reset_lock(i915);
-	mutex_lock(&i915->drm.struct_mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
 	i915_gem_set_wedged(i915);
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
 
+	mutex_lock(&i915->drm.struct_mutex);
+	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
 	i915_reset(i915, ALL_ENGINES, NULL);
+	mutex_unlock(&i915->drm.struct_mutex);
 
 	intel_runtime_pm_put(i915, wakeref);
-	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
 	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
@@ -1675,6 +1675,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 
 	wakeref = intel_runtime_pm_get(i915);
 	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
+	drain_delayed_work(&i915->gpu_error.hangcheck_work); /* flush param */
 
 	err = i915_subtests(tests, i915);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 12/34] drm/i915: Issue engine resets onto idle engines
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (10 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 11/34] drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-23  1:18   ` John Harrison
  2019-01-21 22:20 ` [PATCH 13/34] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
                   ` (28 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Always perform the requested reset, even if we believe the engine is
idle. Presumably there was a reason the caller wanted the reset, and in
the near future we lose the easy tracking for whether the engine is
idle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c             |  4 ----
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 22 +++++--------------
 2 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 064fc6da1512..d44b095e2860 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1063,10 +1063,6 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
 
-	if (i915_seqno_passed(intel_engine_get_seqno(engine),
-			      intel_engine_last_submit(engine)))
-		return 0;
-
 	reset_prepare_engine(engine);
 
 	if (msg)
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 8025c7e0bf6c..2c38ea5892d9 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -449,8 +449,6 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 
 		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
 		do {
-			u32 seqno = intel_engine_get_seqno(engine);
-
 			if (active) {
 				struct i915_request *rq;
 
@@ -479,8 +477,6 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 					break;
 				}
 
-				GEM_BUG_ON(!rq->global_seqno);
-				seqno = rq->global_seqno - 1;
 				i915_request_put(rq);
 			}
 
@@ -496,11 +492,10 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 				break;
 			}
 
-			reset_engine_count += active;
 			if (i915_reset_engine_count(&i915->gpu_error, engine) !=
-			    reset_engine_count) {
-				pr_err("%s engine reset %srecorded!\n",
-				       engine->name, active ? "not " : "");
+			    ++reset_engine_count) {
+				pr_err("%s engine reset not recorded!\n",
+				       engine->name);
 				err = -EINVAL;
 				break;
 			}
@@ -728,7 +723,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 
 		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
 		do {
-			u32 seqno = intel_engine_get_seqno(engine);
 			struct i915_request *rq = NULL;
 
 			if (flags & TEST_ACTIVE) {
@@ -756,9 +750,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 					err = -EIO;
 					break;
 				}
-
-				GEM_BUG_ON(!rq->global_seqno);
-				seqno = rq->global_seqno - 1;
 			}
 
 			err = i915_reset_engine(engine, NULL);
@@ -795,10 +786,9 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 
 		reported = i915_reset_engine_count(&i915->gpu_error, engine);
 		reported -= threads[engine->id].resets;
-		if (reported != (flags & TEST_ACTIVE ? count : 0)) {
-			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu, expected %lu reported\n",
-			       engine->name, test_name, count, reported,
-			       (flags & TEST_ACTIVE ? count : 0));
+		if (reported != count) {
+			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
+			       engine->name, test_name, count, reported);
 			if (!err)
 				err = -EINVAL;
 		}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 13/34] drm/i915: Stop tracking MRU activity on VMA
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (11 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 12/34] drm/i915: Issue engine resets onto idle engines Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-21 22:20 ` [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex Chris Wilson
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Our goal is to remove struct_mutex and replace it with fine grained
locking. One of the thorny issues is our eviction logic for reclaiming
space for an execbuffer (or GTT mmaping, among a few other examples).
While eviction itself is easy to move under a per-VM mutex, performing
the activity tracking is less agreeable. One solution is not to do any
MRU tracking and do a simple coarse evaluation during eviction of
active/inactive, with a loose temporal ordering of last
insertion/evaluation. That keeps all the locking constrained to when we
are manipulating the VM itself, neatly avoiding the tricky handling of
possible recursive locking during execbuf and elsewhere.

Note that discarding the MRU is unlikely to impact upon our efficiency
to reclaim VM space (where we think a LRU model is best) as our
current strategy is to use random idle replacement first before doing
a search, and over time the use of softpinned 48b per-ppGTT is growing
(thereby eliminating any need to perform any eviction searches, in
theory at least).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c               | 10 +--
 drivers/gpu/drm/i915/i915_gem_evict.c         | 71 ++++++++++++-------
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 15 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.h           | 26 +------
 drivers/gpu/drm/i915/i915_gem_shrinker.c      |  8 ++-
 drivers/gpu/drm/i915/i915_gem_stolen.c        |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         | 37 +++++-----
 drivers/gpu/drm/i915/i915_vma.c               |  9 +--
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  2 +-
 10 files changed, 84 insertions(+), 101 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d20b42386c3c..f45186ddb236 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -253,10 +253,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 
 	pinned = ggtt->vm.reserved;
 	mutex_lock(&dev->struct_mutex);
-	list_for_each_entry(vma, &ggtt->vm.active_list, vm_link)
-		if (i915_vma_is_pinned(vma))
-			pinned += vma->node.size;
-	list_for_each_entry(vma, &ggtt->vm.inactive_list, vm_link)
+	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
 		if (i915_vma_is_pinned(vma))
 			pinned += vma->node.size;
 	mutex_unlock(&dev->struct_mutex);
@@ -1539,13 +1536,10 @@ static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
 
 	for_each_ggtt_vma(vma, obj) {
-		if (i915_vma_is_active(vma))
-			continue;
-
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
-		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 	}
 
 	i915 = to_i915(obj->base.dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index f6855401f247..5cfe4b75e7d6 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -126,14 +126,10 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	struct drm_i915_private *dev_priv = vm->i915;
 	struct drm_mm_scan scan;
 	struct list_head eviction_list;
-	struct list_head *phases[] = {
-		&vm->inactive_list,
-		&vm->active_list,
-		NULL,
-	}, **phase;
 	struct i915_vma *vma, *next;
 	struct drm_mm_node *node;
 	enum drm_mm_insert_mode mode;
+	struct i915_vma *active;
 	int ret;
 
 	lockdep_assert_held(&vm->i915->drm.struct_mutex);
@@ -169,17 +165,46 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	 */
 	if (!(flags & PIN_NONBLOCK))
 		i915_retire_requests(dev_priv);
-	else
-		phases[1] = NULL;
 
 search_again:
+	active = NULL;
 	INIT_LIST_HEAD(&eviction_list);
-	phase = phases;
-	do {
-		list_for_each_entry(vma, *phase, vm_link)
-			if (mark_free(&scan, vma, flags, &eviction_list))
-				goto found;
-	} while (*++phase);
+	list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
+		/*
+		 * We keep this list in a rough least-recently scanned order
+		 * of active elements (inactive elements are cheap to reap).
+		 * New entries are added to the end, and we move anything we
+		 * scan to the end. The assumption is that the working set
+		 * of applications is either steady state (and thanks to the
+		 * userspace bo cache it almost always is) or volatile and
+		 * frequently replaced after a frame, which are self-evicting!
+		 * Given that assumption, the MRU order of the scan list is
+		 * fairly static, and keeping it in least-recently scan order
+		 * is suitable.
+		 *
+		 * To notice when we complete one full cycle, we record the
+		 * first active element seen, before moving it to the tail.
+		 */
+		if (i915_vma_is_active(vma)) {
+			if (vma == active) {
+				if (flags & PIN_NONBLOCK)
+					break;
+
+				active = ERR_PTR(-EAGAIN);
+			}
+
+			if (active != ERR_PTR(-EAGAIN)) {
+				if (!active)
+					active = vma;
+
+				list_move_tail(&vma->vm_link, &vm->bound_list);
+				continue;
+			}
+		}
+
+		if (mark_free(&scan, vma, flags, &eviction_list))
+			goto found;
+	}
 
 	/* Nothing found, clean up and bail out! */
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
@@ -388,11 +413,6 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
  */
 int i915_gem_evict_vm(struct i915_address_space *vm)
 {
-	struct list_head *phases[] = {
-		&vm->inactive_list,
-		&vm->active_list,
-		NULL
-	}, **phase;
 	struct list_head eviction_list;
 	struct i915_vma *vma, *next;
 	int ret;
@@ -412,16 +432,13 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	}
 
 	INIT_LIST_HEAD(&eviction_list);
-	phase = phases;
-	do {
-		list_for_each_entry(vma, *phase, vm_link) {
-			if (i915_vma_is_pinned(vma))
-				continue;
+	list_for_each_entry(vma, &vm->bound_list, vm_link) {
+		if (i915_vma_is_pinned(vma))
+			continue;
 
-			__i915_vma_pin(vma);
-			list_add(&vma->evict_link, &eviction_list);
-		}
-	} while (*++phase);
+		__i915_vma_pin(vma);
+		list_add(&vma->evict_link, &eviction_list);
+	}
 
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9081e3bc5a59..2ad9070a54c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -491,9 +491,8 @@ static void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	stash_init(&vm->free_pages);
 
-	INIT_LIST_HEAD(&vm->active_list);
-	INIT_LIST_HEAD(&vm->inactive_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+	INIT_LIST_HEAD(&vm->bound_list);
 }
 
 static void i915_address_space_fini(struct i915_address_space *vm)
@@ -2111,8 +2110,7 @@ void i915_ppgtt_close(struct i915_address_space *vm)
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
 {
 	struct list_head *phases[] = {
-		&vm->active_list,
-		&vm->inactive_list,
+		&vm->bound_list,
 		&vm->unbound_list,
 		NULL,
 	}, **phase;
@@ -2135,8 +2133,7 @@ void i915_ppgtt_release(struct kref *kref)
 
 	ppgtt_destroy_vma(&ppgtt->vm);
 
-	GEM_BUG_ON(!list_empty(&ppgtt->vm.active_list));
-	GEM_BUG_ON(!list_empty(&ppgtt->vm.inactive_list));
+	GEM_BUG_ON(!list_empty(&ppgtt->vm.bound_list));
 	GEM_BUG_ON(!list_empty(&ppgtt->vm.unbound_list));
 
 	ppgtt->vm.cleanup(&ppgtt->vm);
@@ -2801,8 +2798,7 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	i915_gem_fini_aliasing_ppgtt(dev_priv);
 
-	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
-	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link)
+	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link)
 		WARN_ON(i915_vma_unbind(vma));
 
 	if (drm_mm_node_allocated(&ggtt->error_capture))
@@ -3514,8 +3510,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
 
 	/* clflush objects bound into the GGTT and rebind them. */
-	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
-	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link) {
+	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a0039ea97cdc..bd679c8c56dd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -299,32 +299,12 @@ struct i915_address_space {
 	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
 
 	/**
-	 * List of objects currently involved in rendering.
-	 *
-	 * Includes buffers having the contents of their GPU caches
-	 * flushed, not necessarily primitives. last_read_req
-	 * represents when the rendering involved will be completed.
-	 *
-	 * A reference is held on the buffer while on this list.
+	 * List of vma currently bound.
 	 */
-	struct list_head active_list;
+	struct list_head bound_list;
 
 	/**
-	 * LRU list of objects which are not in the ringbuffer and
-	 * are ready to unbind, but are still in the GTT.
-	 *
-	 * last_read_req is NULL while an object is in this list.
-	 *
-	 * A reference is not held on the buffer while on this list,
-	 * as merely being GTT-bound shouldn't prevent its being
-	 * freed, and we'll pull it off the list in the free path.
-	 */
-	struct list_head inactive_list;
-
-	/**
-	 * List of vma that have been unbound.
-	 *
-	 * A reference is not held on the buffer while on this list.
+	 * List of vma that are not unbound.
 	 */
 	struct list_head unbound_list;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 8ceecb026910..a76d6c95c824 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -462,9 +462,13 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 
 	/* We also want to clear any cached iomaps as they wrap vmap */
 	list_for_each_entry_safe(vma, next,
-				 &i915->ggtt.vm.inactive_list, vm_link) {
+				 &i915->ggtt.vm.bound_list, vm_link) {
 		unsigned long count = vma->node.size >> PAGE_SHIFT;
-		if (vma->iomap && i915_vma_unbind(vma) == 0)
+
+		if (!vma->iomap || i915_vma_is_active(vma))
+			continue;
+
+		if (i915_vma_unbind(vma) == 0)
 			freed_pages += count;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 9df615eea2d8..a9e365789686 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -701,7 +701,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 	vma->pages = obj->mm.pages;
 	vma->flags |= I915_VMA_GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
-	list_move_tail(&vma->vm_link, &ggtt->vm.inactive_list);
+
+	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
 
 	spin_lock(&dev_priv->mm.obj_lock);
 	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4eef0462489c..96d1d634a29d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1121,7 +1121,7 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 
 static u32 capture_error_bo(struct drm_i915_error_buffer *err,
 			    int count, struct list_head *head,
-			    bool pinned_only)
+			    bool active_only, bool pinned_only)
 {
 	struct i915_vma *vma;
 	int i = 0;
@@ -1130,6 +1130,9 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
 		if (!vma->obj)
 			continue;
 
+		if (active_only && !i915_vma_is_active(vma))
+			continue;
+
 		if (pinned_only && !i915_vma_is_pinned(vma))
 			continue;
 
@@ -1601,14 +1604,16 @@ static void gem_capture_vm(struct i915_gpu_state *error,
 	int count;
 
 	count = 0;
-	list_for_each_entry(vma, &vm->active_list, vm_link)
-		count++;
+	list_for_each_entry(vma, &vm->bound_list, vm_link)
+		if (i915_vma_is_active(vma))
+			count++;
 
 	active_bo = NULL;
 	if (count)
 		active_bo = kcalloc(count, sizeof(*active_bo), GFP_ATOMIC);
 	if (active_bo)
-		count = capture_error_bo(active_bo, count, &vm->active_list, false);
+		count = capture_error_bo(active_bo, count, &vm->bound_list,
+					 true, false);
 	else
 		count = 0;
 
@@ -1646,28 +1651,20 @@ static void capture_pinned_buffers(struct i915_gpu_state *error)
 	struct i915_address_space *vm = &error->i915->ggtt.vm;
 	struct drm_i915_error_buffer *bo;
 	struct i915_vma *vma;
-	int count_inactive, count_active;
-
-	count_inactive = 0;
-	list_for_each_entry(vma, &vm->inactive_list, vm_link)
-		count_inactive++;
+	int count;
 
-	count_active = 0;
-	list_for_each_entry(vma, &vm->active_list, vm_link)
-		count_active++;
+	count = 0;
+	list_for_each_entry(vma, &vm->bound_list, vm_link)
+		count++;
 
 	bo = NULL;
-	if (count_inactive + count_active)
-		bo = kcalloc(count_inactive + count_active,
-			     sizeof(*bo), GFP_ATOMIC);
+	if (count)
+		bo = kcalloc(count, sizeof(*bo), GFP_ATOMIC);
 	if (!bo)
 		return;
 
-	count_inactive = capture_error_bo(bo, count_inactive,
-					  &vm->active_list, true);
-	count_active = capture_error_bo(bo + count_inactive, count_active,
-					&vm->inactive_list, true);
-	error->pinned_bo_count = count_inactive + count_active;
+	error->pinned_bo_count =
+		capture_error_bo(bo, count, &vm->bound_list, false, true);
 	error->pinned_bo = bo;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5b4d78cdb4ca..7de28baffb8f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -79,9 +79,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 	if (--vma->active_count)
 		return;
 
-	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
-
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
 		return;
@@ -659,7 +656,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -1003,10 +1000,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_active_isset(active) && !vma->active_count++) {
-		list_move_tail(&vma->vm_link, &vma->vm->active_list);
+	if (!i915_gem_active_isset(active) && !vma->active_count++)
 		obj->active_count++;
-	}
 	i915_gem_active_set(active, rq);
 	GEM_BUG_ON(!i915_vma_is_active(vma));
 	GEM_BUG_ON(!obj->active_count);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index d0553bc69705..af9b85cb8639 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -84,7 +84,7 @@ static int populate_ggtt(struct drm_i915_private *i915,
 		return -EINVAL;
 	}
 
-	if (list_empty(&i915->ggtt.vm.inactive_list)) {
+	if (list_empty(&i915->ggtt.vm.bound_list)) {
 		pr_err("No objects on the GGTT inactive list!\n");
 		return -EINVAL;
 	}
@@ -96,7 +96,7 @@ static void unpin_ggtt(struct drm_i915_private *i915)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &i915->ggtt.vm.inactive_list, vm_link)
+	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
 		if (vma->obj->mm.quirked)
 			i915_vma_unpin(vma);
 }
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 06bde4a273cb..8feb4af308ff 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1237,7 +1237,7 @@ static void track_vma_bind(struct i915_vma *vma)
 	__i915_gem_object_pin_pages(obj);
 
 	vma->pages = obj->mm.pages;
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 }
 
 static int exercise_mock(struct drm_i915_private *i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex.
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (12 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 13/34] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-22  9:09   ` Tvrtko Ursulin
  2019-01-21 22:20 ` [PATCH 15/34] drm/i915: Move vma lookup to its own lock Chris Wilson
                   ` (26 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

A starting point to counter the pervasive struct_mutex. For the goal of
avoiding (or at least blocking under them!) global locks during user
request submission, a simple but important step is being able to manage
each clients GTT separately. For which, we want to replace using the
struct_mutex as the guard for all things GTT/VM and switch instead to a
specific mutex inside i915_address_space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c                 | 14 ++++++++------
 drivers/gpu/drm/i915/i915_gem_evict.c           |  2 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c             | 15 +++++++++++++--
 drivers/gpu/drm/i915/i915_gem_shrinker.c        |  4 ++++
 drivers/gpu/drm/i915/i915_gem_stolen.c          |  2 ++
 drivers/gpu/drm/i915/i915_vma.c                 | 11 +++++++++++
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c |  3 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  3 +++
 8 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f45186ddb236..538fa5404603 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -245,18 +245,19 @@ int
 i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
+	struct i915_ggtt *ggtt = &to_i915(dev)->ggtt;
 	struct drm_i915_gem_get_aperture *args = data;
 	struct i915_vma *vma;
 	u64 pinned;
 
+	mutex_lock(&ggtt->vm.mutex);
+
 	pinned = ggtt->vm.reserved;
-	mutex_lock(&dev->struct_mutex);
 	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
 		if (i915_vma_is_pinned(vma))
 			pinned += vma->node.size;
-	mutex_unlock(&dev->struct_mutex);
+
+	mutex_unlock(&ggtt->vm.mutex);
 
 	args->aper_size = ggtt->vm.total;
 	args->aper_available_size = args->aper_size - pinned;
@@ -1529,20 +1530,21 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *i915;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	struct list_head *list;
 	struct i915_vma *vma;
 
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
 
+	mutex_lock(&i915->ggtt.vm.mutex);
 	for_each_ggtt_vma(vma, obj) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
 		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 	}
+	mutex_unlock(&i915->ggtt.vm.mutex);
 
-	i915 = to_i915(obj->base.dev);
 	spin_lock(&i915->mm.obj_lock);
 	list = obj->bind_count ? &i915->mm.bound_list : &i915->mm.unbound_list;
 	list_move_tail(&obj->mm.link, list);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 5cfe4b75e7d6..dc137701acb8 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -432,6 +432,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	}
 
 	INIT_LIST_HEAD(&eviction_list);
+	mutex_lock(&vm->mutex);
 	list_for_each_entry(vma, &vm->bound_list, vm_link) {
 		if (i915_vma_is_pinned(vma))
 			continue;
@@ -439,6 +440,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 		__i915_vma_pin(vma);
 		list_add(&vma->evict_link, &eviction_list);
 	}
+	mutex_unlock(&vm->mutex);
 
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2ad9070a54c1..49b00996a15e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1931,7 +1931,10 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
 
 	INIT_LIST_HEAD(&vma->obj_link);
+
+	mutex_lock(&vma->vm->mutex);
 	list_add(&vma->vm_link, &vma->vm->unbound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	return vma;
 }
@@ -3504,9 +3507,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 
 	i915_check_and_clear_faults(dev_priv);
 
+	mutex_lock(&ggtt->vm.mutex);
+
 	/* First fill our portion of the GTT with scratch pages */
 	ggtt->vm.clear_range(&ggtt->vm, 0, ggtt->vm.total);
-
 	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
 
 	/* clflush objects bound into the GGTT and rebind them. */
@@ -3516,19 +3520,26 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
 			continue;
 
+		mutex_unlock(&ggtt->vm.mutex);
+
 		if (!i915_vma_unbind(vma))
-			continue;
+			goto lock;
 
 		WARN_ON(i915_vma_bind(vma,
 				      obj ? obj->cache_level : 0,
 				      PIN_UPDATE));
 		if (obj)
 			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
+
+lock:
+		mutex_lock(&ggtt->vm.mutex);
 	}
 
 	ggtt->vm.closed = false;
 	i915_ggtt_invalidate(dev_priv);
 
+	mutex_unlock(&ggtt->vm.mutex);
+
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index a76d6c95c824..6da795c7e62e 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -461,6 +461,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 					       I915_SHRINK_VMAPS);
 
 	/* We also want to clear any cached iomaps as they wrap vmap */
+	mutex_lock(&i915->ggtt.vm.mutex);
 	list_for_each_entry_safe(vma, next,
 				 &i915->ggtt.vm.bound_list, vm_link) {
 		unsigned long count = vma->node.size >> PAGE_SHIFT;
@@ -468,9 +469,12 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 		if (!vma->iomap || i915_vma_is_active(vma))
 			continue;
 
+		mutex_unlock(&i915->ggtt.vm.mutex);
 		if (i915_vma_unbind(vma) == 0)
 			freed_pages += count;
+		mutex_lock(&i915->ggtt.vm.mutex);
 	}
+	mutex_unlock(&i915->ggtt.vm.mutex);
 
 out:
 	shrinker_unlock(i915, unlock);
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index a9e365789686..74a9661479ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -702,7 +702,9 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 	vma->flags |= I915_VMA_GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
 
+	mutex_lock(&ggtt->vm.mutex);
 	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
+	mutex_unlock(&ggtt->vm.mutex);
 
 	spin_lock(&dev_priv->mm.obj_lock);
 	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 7de28baffb8f..dcbd0d345c72 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -213,7 +213,10 @@ vma_create(struct drm_i915_gem_object *obj,
 	}
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma_tree);
+
+	mutex_lock(&vm->mutex);
 	list_add(&vma->vm_link, &vm->unbound_list);
+	mutex_unlock(&vm->mutex);
 
 	return vma;
 
@@ -656,7 +659,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
+	mutex_lock(&vma->vm->mutex);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -689,8 +694,10 @@ i915_vma_remove(struct i915_vma *vma)
 
 	vma->ops->clear_pages(vma);
 
+	mutex_lock(&vma->vm->mutex);
 	drm_mm_remove_node(&vma->node);
 	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	/*
 	 * Since the unbound list is global, only move to that list if
@@ -802,7 +809,11 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
 
 	list_del(&vma->obj_link);
+
+	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
+	mutex_unlock(&vma->vm->mutex);
+
 	if (vma->obj)
 		rb_erase(&vma->obj_node, &vma->obj->vma_tree);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index af9b85cb8639..32dce7176f63 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -94,11 +94,14 @@ static int populate_ggtt(struct drm_i915_private *i915,
 
 static void unpin_ggtt(struct drm_i915_private *i915)
 {
+	struct i915_ggtt *ggtt = &i915->ggtt;
 	struct i915_vma *vma;
 
+	mutex_lock(&ggtt->vm.mutex);
 	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
 		if (vma->obj->mm.quirked)
 			i915_vma_unpin(vma);
+	mutex_unlock(&ggtt->vm.mutex);
 }
 
 static void cleanup_objects(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 8feb4af308ff..3850ef4a5ec8 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1237,7 +1237,10 @@ static void track_vma_bind(struct i915_vma *vma)
 	__i915_gem_object_pin_pages(obj);
 
 	vma->pages = obj->mm.pages;
+
+	mutex_lock(&vma->vm->mutex);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	mutex_unlock(&vma->vm->mutex);
 }
 
 static int exercise_mock(struct drm_i915_private *i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 15/34] drm/i915: Move vma lookup to its own lock
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (13 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-21 22:20 ` [PATCH 16/34] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx

Remove the struct_mutex requirement for looking up the vma for an
object.

v2: Highlight how the race for duplicate vma creation is resolved on
reacquiring the lock with a short comment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c       |  6 +--
 drivers/gpu/drm/i915/i915_gem.c           | 33 +++++++-----
 drivers/gpu/drm/i915/i915_gem_object.h    | 45 +++++++++-------
 drivers/gpu/drm/i915/i915_vma.c           | 66 ++++++++++++++++-------
 drivers/gpu/drm/i915/i915_vma.h           |  2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c |  4 +-
 6 files changed, 98 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3ec369980d40..2a6e4044f25b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -159,14 +159,14 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (i915_vma_is_pinned(vma))
 			pin_count++;
 	}
 	seq_printf(m, " (pinned x %d)", pin_count);
 	if (obj->pin_global)
 		seq_printf(m, " (global)");
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -322,7 +322,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 	if (obj->base.name || obj->base.dma_buf)
 		stats->shared += obj->base.size;
 
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 538fa5404603..15acd052da46 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -437,15 +437,19 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret;
 
-	while ((vma = list_first_entry_or_null(&obj->vma_list,
-					       struct i915_vma,
-					       obj_link))) {
+	spin_lock(&obj->vma.lock);
+	while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
+						       struct i915_vma,
+						       obj_link))) {
 		list_move_tail(&vma->obj_link, &still_in_list);
+		spin_unlock(&obj->vma.lock);
+
 		ret = i915_vma_unbind(vma);
-		if (ret)
-			break;
+
+		spin_lock(&obj->vma.lock);
 	}
-	list_splice(&still_in_list, &obj->vma_list);
+	list_splice(&still_in_list, &obj->vma.list);
+	spin_unlock(&obj->vma.lock);
 
 	return ret;
 }
@@ -3489,7 +3493,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * reading an invalid PTE on older architectures.
 	 */
 restart:
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -3567,7 +3571,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			 */
 		}
 
-		list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		list_for_each_entry(vma, &obj->vma.list, obj_link) {
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
@@ -3577,7 +3581,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
+	list_for_each_entry(vma, &obj->vma.list, obj_link)
 		vma->node.color = cache_level;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 	obj->cache_dirty = true; /* Always invalidate stale cachelines */
@@ -4153,7 +4157,9 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 {
 	mutex_init(&obj->mm.lock);
 
-	INIT_LIST_HEAD(&obj->vma_list);
+	spin_lock_init(&obj->vma.lock);
+	INIT_LIST_HEAD(&obj->vma.list);
+
 	INIT_LIST_HEAD(&obj->lut_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
@@ -4319,14 +4325,13 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 		mutex_lock(&i915->drm.struct_mutex);
 
 		GEM_BUG_ON(i915_gem_object_is_active(obj));
-		list_for_each_entry_safe(vma, vn,
-					 &obj->vma_list, obj_link) {
+		list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) {
 			GEM_BUG_ON(i915_vma_is_active(vma));
 			vma->flags &= ~I915_VMA_PIN_MASK;
 			i915_vma_destroy(vma);
 		}
-		GEM_BUG_ON(!list_empty(&obj->vma_list));
-		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma_tree));
+		GEM_BUG_ON(!list_empty(&obj->vma.list));
+		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma.tree));
 
 		/* This serializes freeing with the shrinker. Since the free
 		 * is delayed, first by RCU then by the workqueue, we want the
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
index cb1b0144d274..5a33b6d9f942 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -87,24 +87,33 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
-	/**
-	 * @vma_list: List of VMAs backed by this object
-	 *
-	 * The VMA on this list are ordered by type, all GGTT vma are placed
-	 * at the head and all ppGTT vma are placed at the tail. The different
-	 * types of GGTT vma are unordered between themselves, use the
-	 * @vma_tree (which has a defined order between all VMA) to find an
-	 * exact match.
-	 */
-	struct list_head vma_list;
-	/**
-	 * @vma_tree: Ordered tree of VMAs backed by this object
-	 *
-	 * All VMA created for this object are placed in the @vma_tree for
-	 * fast retrieval via a binary search in i915_vma_instance().
-	 * They are also added to @vma_list for easy iteration.
-	 */
-	struct rb_root vma_tree;
+	struct {
+		/**
+		 * @vma.lock: protect the list/tree of vmas
+		 */
+		struct spinlock lock;
+
+		/**
+		 * @vma.list: List of VMAs backed by this object
+		 *
+		 * The VMA on this list are ordered by type, all GGTT vma are
+		 * placed at the head and all ppGTT vma are placed at the tail.
+		 * The different types of GGTT vma are unordered between
+		 * themselves, use the @vma.tree (which has a defined order
+		 * between all VMA) to quickly find an exact match.
+		 */
+		struct list_head list;
+
+		/**
+		 * @vma.tree: Ordered tree of VMAs backed by this object
+		 *
+		 * All VMA created for this object are placed in the @vma.tree
+		 * for fast retrieval via a binary search in
+		 * i915_vma_instance(). They are also added to @vma.list for
+		 * easy iteration.
+		 */
+		struct rb_root tree;
+	} vma;
 
 	/**
 	 * @lut_list: List of vma lookup entries in use for this object.
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index dcbd0d345c72..d83b8ad5f859 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -187,32 +187,52 @@ vma_create(struct drm_i915_gem_object *obj,
 								i915_gem_object_get_stride(obj));
 		GEM_BUG_ON(!is_power_of_2(vma->fence_alignment));
 
-		/*
-		 * We put the GGTT vma at the start of the vma-list, followed
-		 * by the ppGGTT vma. This allows us to break early when
-		 * iterating over only the GGTT vma for an object, see
-		 * for_each_ggtt_vma()
-		 */
 		vma->flags |= I915_VMA_GGTT;
-		list_add(&vma->obj_link, &obj->vma_list);
-	} else {
-		list_add_tail(&vma->obj_link, &obj->vma_list);
 	}
 
+	spin_lock(&obj->vma.lock);
+
 	rb = NULL;
-	p = &obj->vma_tree.rb_node;
+	p = &obj->vma.tree.rb_node;
 	while (*p) {
 		struct i915_vma *pos;
+		long cmp;
 
 		rb = *p;
 		pos = rb_entry(rb, struct i915_vma, obj_node);
-		if (i915_vma_compare(pos, vm, view) < 0)
+
+		/*
+		 * If the view already exists in the tree, another thread
+		 * already created a matching vma, so return the older instance
+		 * and dispose of ours.
+		 */
+		cmp = i915_vma_compare(pos, vm, view);
+		if (cmp == 0) {
+			spin_unlock(&obj->vma.lock);
+			kmem_cache_free(vm->i915->vmas, vma);
+			return pos;
+		}
+
+		if (cmp < 0)
 			p = &rb->rb_right;
 		else
 			p = &rb->rb_left;
 	}
 	rb_link_node(&vma->obj_node, rb, p);
-	rb_insert_color(&vma->obj_node, &obj->vma_tree);
+	rb_insert_color(&vma->obj_node, &obj->vma.tree);
+
+	if (i915_vma_is_ggtt(vma))
+		/*
+		 * We put the GGTT vma at the start of the vma-list, followed
+		 * by the ppGGTT vma. This allows us to break early when
+		 * iterating over only the GGTT vma for an object, see
+		 * for_each_ggtt_vma()
+		 */
+		list_add(&vma->obj_link, &obj->vma.list);
+	else
+		list_add_tail(&vma->obj_link, &obj->vma.list);
+
+	spin_unlock(&obj->vma.lock);
 
 	mutex_lock(&vm->mutex);
 	list_add(&vma->vm_link, &vm->unbound_list);
@@ -232,7 +252,7 @@ vma_lookup(struct drm_i915_gem_object *obj,
 {
 	struct rb_node *rb;
 
-	rb = obj->vma_tree.rb_node;
+	rb = obj->vma.tree.rb_node;
 	while (rb) {
 		struct i915_vma *vma = rb_entry(rb, struct i915_vma, obj_node);
 		long cmp;
@@ -272,16 +292,18 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 	GEM_BUG_ON(vm->closed);
 
+	spin_lock(&obj->vma.lock);
 	vma = vma_lookup(obj, vm, view);
-	if (!vma)
+	spin_unlock(&obj->vma.lock);
+
+	/* vma_create() will resolve the race if another creates the vma */
+	if (unlikely(!vma))
 		vma = vma_create(obj, vm, view);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
-	GEM_BUG_ON(!IS_ERR(vma) && vma_lookup(obj, vm, view) != vma);
 	return vma;
 }
 
@@ -808,14 +830,18 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 
 	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
 
-	list_del(&vma->obj_link);
-
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
 	mutex_unlock(&vma->vm->mutex);
 
-	if (vma->obj)
-		rb_erase(&vma->obj_node, &vma->obj->vma_tree);
+	if (vma->obj) {
+		struct drm_i915_gem_object *obj = vma->obj;
+
+		spin_lock(&obj->vma.lock);
+		list_del(&vma->obj_link);
+		rb_erase(&vma->obj_node, &vma->obj->vma.tree);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4f7c1c7599f4..7252abc73d3e 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -425,7 +425,7 @@ void i915_vma_parked(struct drm_i915_private *i915);
  * or the list is empty ofc.
  */
 #define for_each_ggtt_vma(V, OBJ) \
-	list_for_each_entry(V, &(OBJ)->vma_list, obj_link)		\
+	list_for_each_entry(V, &(OBJ)->vma.list, obj_link)		\
 		for_each_until(!i915_vma_is_ggtt(V))
 
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index f0a32edfb9b1..cf1de82741fa 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -672,7 +672,7 @@ static int igt_vma_partial(void *arg)
 		}
 
 		count = 0;
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
+		list_for_each_entry(vma, &obj->vma.list, obj_link)
 			count++;
 		if (count != nvma) {
 			pr_err("(%s) All partial vma were not recorded on the obj->vma_list: found %u, expected %u\n",
@@ -701,7 +701,7 @@ static int igt_vma_partial(void *arg)
 		i915_vma_unpin(vma);
 
 		count = 0;
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
+		list_for_each_entry(vma, &obj->vma.list, obj_link)
 			count++;
 		if (count != nvma) {
 			pr_err("(%s) allocated an extra full vma!\n", p->name);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 16/34] drm/i915: Always allocate an object/vma for the HWSP
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (14 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 15/34] drm/i915: Move vma lookup to its own lock Chris Wilson
@ 2019-01-21 22:20 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 17/34] drm/i915: Move list of timelines under its own lock Chris Wilson
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Currently we only allocate an object and vma if we are using a GGTT
virtual HWSP, and a plain struct page for a physical HWSP. For
convenience later on with global timelines, it will be useful to always
have the status page being tracked by a struct i915_vma. Make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c       | 109 ++++++++++---------
 drivers/gpu/drm/i915/intel_guc_submission.c  |   6 +
 drivers/gpu/drm/i915/intel_lrc.c             |  12 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  21 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  23 +---
 drivers/gpu/drm/i915/selftests/mock_engine.c |   2 +-
 6 files changed, 93 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index fc52737751e7..4b4b7358c482 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -506,27 +506,61 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
 {
+	struct i915_vma *vma;
+
 	/* Prevent writes into HWSP after returning the page to the system */
 	intel_engine_set_hwsp_writemask(engine, ~0u);
 
-	if (HWS_NEEDS_PHYSICAL(engine->i915)) {
-		void *addr = fetch_and_zero(&engine->status_page.page_addr);
+	vma = fetch_and_zero(&engine->status_page.vma);
+	if (!vma)
+		return;
 
-		__free_page(virt_to_page(addr));
-	}
+	if (!HWS_NEEDS_PHYSICAL(engine->i915))
+		i915_vma_unpin(vma);
+
+	i915_gem_object_unpin_map(vma->obj);
+	__i915_gem_object_release_unless_active(vma->obj);
+}
+
+static int pin_ggtt_status_page(struct intel_engine_cs *engine,
+				struct i915_vma *vma)
+{
+	unsigned int flags;
+
+	flags = PIN_GLOBAL;
+	if (!HAS_LLC(engine->i915))
+		/*
+		 * On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actually map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	else
+		flags |= PIN_HIGH;
 
-	i915_vma_unpin_and_release(&engine->status_page.vma,
-				   I915_VMA_RELEASE_MAP);
+	return i915_vma_pin(vma, 0, 0, flags);
 }
 
 static int init_status_page(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
-	unsigned int flags;
 	void *vaddr;
 	int ret;
 
+	/*
+	 * Though the HWS register does support 36bit addresses, historically
+	 * we have had hangs and corruption reported due to wild writes if
+	 * the HWS is placed above 4G. We only allow objects to be allocated
+	 * in GFP_DMA32 for i965, and no earlier physical address users had
+	 * access to more than 4G.
+	 */
 	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate status page\n");
@@ -543,61 +577,30 @@ static int init_status_page(struct intel_engine_cs *engine)
 		goto err;
 	}
 
-	flags = PIN_GLOBAL;
-	if (!HAS_LLC(engine->i915))
-		/* On g33, we cannot place HWS above 256MiB, so
-		 * restrict its pinning to the low mappable arena.
-		 * Though this restriction is not documented for
-		 * gen4, gen5, or byt, they also behave similarly
-		 * and hang if the HWS is placed at the top of the
-		 * GTT. To generalise, it appears that all !llc
-		 * platforms have issues with us placing the HWS
-		 * above the mappable region (even though we never
-		 * actually map it).
-		 */
-		flags |= PIN_MAPPABLE;
-	else
-		flags |= PIN_HIGH;
-	ret = i915_vma_pin(vma, 0, 0, flags);
-	if (ret)
-		goto err;
-
 	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
-		goto err_unpin;
+		goto err;
 	}
 
+	engine->status_page.addr = memset(vaddr, 0, PAGE_SIZE);
 	engine->status_page.vma = vma;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
-	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
+
+	if (!HWS_NEEDS_PHYSICAL(engine->i915)) {
+		ret = pin_ggtt_status_page(engine, vma);
+		if (ret)
+			goto err_unpin;
+	}
+
 	return 0;
 
 err_unpin:
-	i915_vma_unpin(vma);
+	i915_gem_object_unpin_map(obj);
 err:
 	i915_gem_object_put(obj);
 	return ret;
 }
 
-static int init_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct page *page;
-
-	/*
-	 * Though the HWS register does support 36bit addresses, historically
-	 * we have had hangs and corruption reported due to wild writes if
-	 * the HWS is placed above 4G.
-	 */
-	page = alloc_page(GFP_KERNEL | __GFP_DMA32 | __GFP_ZERO);
-	if (!page)
-		return -ENOMEM;
-
-	engine->status_page.page_addr = page_address(page);
-
-	return 0;
-}
-
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
@@ -650,10 +653,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		goto err_unpin_preempt;
 
-	if (HWS_NEEDS_PHYSICAL(i915))
-		ret = init_phys_status_page(engine);
-	else
-		ret = init_status_page(engine);
+	ret = init_status_page(engine);
 	if (ret)
 		goto err_breadcrumbs;
 
@@ -1318,7 +1318,8 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
-		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+		const u32 *hws =
+			&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int idx;
 		u8 read, write;
 
@@ -1501,7 +1502,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	spin_unlock_irqrestore(&b->rb_lock, flags);
 
 	drm_printf(m, "HWSP:\n");
-	hexdump(m, engine->status_page.page_addr, PAGE_SIZE);
+	hexdump(m, engine->status_page.addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 7217c7e3ee8d..9a860aa5d276 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -81,6 +81,12 @@
  *
  */
 
+static inline u32 intel_hws_preempt_done_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_PREEMPT_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c11cbf34258d..bc65d8006e16 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -172,6 +172,12 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
+static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_INDEX_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -1710,7 +1716,7 @@ static void enable_execlists(struct intel_engine_cs *engine)
 		   _MASKED_BIT_DISABLE(STOP_RING));
 
 	I915_WRITE(RING_HWS_PGA(engine->mmio_base),
-		   engine->status_page.ggtt_offset);
+		   i915_ggtt_offset(engine->status_page.vma));
 	POSTING_READ(RING_HWS_PGA(engine->mmio_base));
 }
 
@@ -2257,10 +2263,10 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	}
 
 	execlists->csb_status =
-		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+		&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 
 	execlists->csb_write =
-		&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
+		&engine->status_page.addr[intel_hws_csb_write_index(i915)];
 
 	reset_csb_pointers(execlists);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 662907e1a286..66dc8e2fa353 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -43,6 +43,12 @@
  */
 #define LEGACY_REQUEST_SIZE 200
 
+static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_INDEX_ADDR);
+}
+
 static unsigned int __intel_ring_space(unsigned int head,
 				       unsigned int tail,
 				       unsigned int size)
@@ -499,12 +505,17 @@ static void set_hws_pga(struct intel_engine_cs *engine, phys_addr_t phys)
 	I915_WRITE(HWS_PGA, addr);
 }
 
-static void ring_setup_phys_status_page(struct intel_engine_cs *engine)
+static struct page *status_page(struct intel_engine_cs *engine)
 {
-	struct page *page = virt_to_page(engine->status_page.page_addr);
-	phys_addr_t phys = PFN_PHYS(page_to_pfn(page));
+	struct drm_i915_gem_object *obj = engine->status_page.vma->obj;
 
-	set_hws_pga(engine, phys);
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	return sg_page(obj->mm.pages->sgl);
+}
+
+static void ring_setup_phys_status_page(struct intel_engine_cs *engine)
+{
+	set_hws_pga(engine, PFN_PHYS(page_to_pfn(status_page(engine))));
 	set_hwstam(engine, ~0u);
 }
 
@@ -571,7 +582,7 @@ static void flush_cs_tlb(struct intel_engine_cs *engine)
 
 static void ring_setup_status_page(struct intel_engine_cs *engine)
 {
-	set_hwsp(engine, engine->status_page.ggtt_offset);
+	set_hwsp(engine, i915_ggtt_offset(engine->status_page.vma));
 	set_hwstam(engine, ~0u);
 
 	flush_cs_tlb(engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 32ed44196c1a..f43d0bb451a9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -32,8 +32,7 @@ struct i915_sched_attr;
 
 struct intel_hw_status_page {
 	struct i915_vma *vma;
-	u32 *page_addr;
-	u32 ggtt_offset;
+	u32 *addr;
 };
 
 #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
@@ -671,7 +670,7 @@ static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
 	/* Ensure that the compiler doesn't optimize away the load. */
-	return READ_ONCE(engine->status_page.page_addr[reg]);
+	return READ_ONCE(engine->status_page.addr[reg]);
 }
 
 static inline void
@@ -684,12 +683,12 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 	 */
 	if (static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		mb();
-		clflush(&engine->status_page.page_addr[reg]);
-		engine->status_page.page_addr[reg] = value;
-		clflush(&engine->status_page.page_addr[reg]);
+		clflush(&engine->status_page.addr[reg]);
+		engine->status_page.addr[reg] = value;
+		clflush(&engine->status_page.addr[reg]);
 		mb();
 	} else {
-		WRITE_ONCE(engine->status_page.page_addr[reg], value);
+		WRITE_ONCE(engine->status_page.addr[reg], value);
 	}
 }
 
@@ -877,16 +876,6 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
-static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
-{
-	return engine->status_page.ggtt_offset + I915_GEM_HWS_INDEX_ADDR;
-}
-
-static inline u32 intel_hws_preempt_done_address(struct intel_engine_cs *engine)
-{
-	return engine->status_page.ggtt_offset + I915_GEM_HWS_PREEMPT_ADDR;
-}
-
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
 int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 8b8d51af7d6a..968a7e139a67 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -201,7 +201,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.i915 = i915;
 	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
 	engine->base.id = id;
-	engine->base.status_page.page_addr = (void *)(engine + 1);
+	engine->base.status_page.addr = (void *)(engine + 1);
 
 	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 17/34] drm/i915: Move list of timelines under its own lock
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (15 preceding siblings ...)
  2019-01-21 22:20 ` [PATCH 16/34] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance Chris Wilson
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Currently, the list of timelines is serialised by the struct_mutex, but
to alleviate difficulties with using that mutex in future, move the
list management under its own dedicated mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |   5 +-
 drivers/gpu/drm/i915/i915_gem.c               | 103 ++++++++++--------
 drivers/gpu/drm/i915/i915_reset.c             |   8 +-
 drivers/gpu/drm/i915/i915_timeline.c          |  38 ++++++-
 drivers/gpu/drm/i915/i915_timeline.h          |   3 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   7 +-
 .../gpu/drm/i915/selftests/mock_timeline.c    |   3 +-
 7 files changed, 109 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 59a7e90113d7..364067f811f7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1975,7 +1975,10 @@ struct drm_i915_private {
 		void (*resume)(struct drm_i915_private *);
 		void (*cleanup_engine)(struct intel_engine_cs *engine);
 
-		struct list_head timelines;
+		struct i915_gt_timelines {
+			struct mutex mutex; /* protects list, tainted by GPU */
+			struct list_head list;
+		} timelines;
 
 		struct list_head active_rings;
 		struct list_head closed_vma;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 15acd052da46..761714448ff3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3222,33 +3222,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	return ret;
 }
 
-static long wait_for_timeline(struct i915_timeline *tl,
-			      unsigned int flags, long timeout)
-{
-	struct i915_request *rq;
-
-	rq = i915_gem_active_get_unlocked(&tl->last_request);
-	if (!rq)
-		return timeout;
-
-	/*
-	 * "Race-to-idle".
-	 *
-	 * Switching to the kernel context is often used a synchronous
-	 * step prior to idling, e.g. in suspend for flushing all
-	 * current operations to memory before sleeping. These we
-	 * want to complete as quickly as possible to avoid prolonged
-	 * stalls, so allow the gpu to boost to maximum clocks.
-	 */
-	if (flags & I915_WAIT_FOR_IDLE_BOOST)
-		gen6_rps_boost(rq, NULL);
-
-	timeout = i915_request_wait(rq, flags, timeout);
-	i915_request_put(rq);
-
-	return timeout;
-}
-
 static int wait_for_engines(struct drm_i915_private *i915)
 {
 	if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
@@ -3262,6 +3235,52 @@ static int wait_for_engines(struct drm_i915_private *i915)
 	return 0;
 }
 
+static long
+wait_for_timelines(struct drm_i915_private *i915,
+		   unsigned int flags, long timeout)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline *tl;
+
+	if (!READ_ONCE(i915->gt.active_requests))
+		return timeout;
+
+	mutex_lock(&gt->mutex);
+	list_for_each_entry(tl, &gt->list, link) {
+		struct i915_request *rq;
+
+		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		if (!rq)
+			continue;
+
+		mutex_unlock(&gt->mutex);
+
+		/*
+		 * "Race-to-idle".
+		 *
+		 * Switching to the kernel context is often used a synchronous
+		 * step prior to idling, e.g. in suspend for flushing all
+		 * current operations to memory before sleeping. These we
+		 * want to complete as quickly as possible to avoid prolonged
+		 * stalls, so allow the gpu to boost to maximum clocks.
+		 */
+		if (flags & I915_WAIT_FOR_IDLE_BOOST)
+			gen6_rps_boost(rq, NULL);
+
+		timeout = i915_request_wait(rq, flags, timeout);
+		i915_request_put(rq);
+		if (timeout < 0)
+			return timeout;
+
+		/* restart after reacquiring the lock */
+		mutex_lock(&gt->mutex);
+		tl = list_entry(&gt->list, typeof(*tl), link);
+	}
+	mutex_unlock(&gt->mutex);
+
+	return timeout;
+}
+
 int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 			   unsigned int flags, long timeout)
 {
@@ -3273,17 +3292,15 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 	if (!READ_ONCE(i915->gt.awake))
 		return 0;
 
+	timeout = wait_for_timelines(i915, flags, timeout);
+	if (timeout < 0)
+		return timeout;
+
 	if (flags & I915_WAIT_LOCKED) {
-		struct i915_timeline *tl;
 		int err;
 
 		lockdep_assert_held(&i915->drm.struct_mutex);
 
-		list_for_each_entry(tl, &i915->gt.timelines, link) {
-			timeout = wait_for_timeline(tl, flags, timeout);
-			if (timeout < 0)
-				return timeout;
-		}
 		if (GEM_SHOW_DEBUG() && !timeout) {
 			/* Presume that timeout was non-zero to begin with! */
 			dev_warn(&i915->drm.pdev->dev,
@@ -3297,17 +3314,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 
 		i915_retire_requests(i915);
 		GEM_BUG_ON(i915->gt.active_requests);
-	} else {
-		struct intel_engine_cs *engine;
-		enum intel_engine_id id;
-
-		for_each_engine(engine, i915, id) {
-			struct i915_timeline *tl = &engine->timeline;
-
-			timeout = wait_for_timeline(tl, flags, timeout);
-			if (timeout < 0)
-				return timeout;
-		}
 	}
 
 	return 0;
@@ -5008,6 +5014,8 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		dev_priv->gt.cleanup_engine = intel_engine_cleanup;
 	}
 
+	i915_timelines_init(dev_priv);
+
 	ret = i915_gem_init_userptr(dev_priv);
 	if (ret)
 		return ret;
@@ -5130,8 +5138,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_uc_misc:
 	intel_uc_fini_misc(dev_priv);
 
-	if (ret != -EIO)
+	if (ret != -EIO) {
 		i915_gem_cleanup_userptr(dev_priv);
+		i915_timelines_fini(dev_priv);
+	}
 
 	if (ret == -EIO) {
 		mutex_lock(&dev_priv->drm.struct_mutex);
@@ -5182,6 +5192,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv)
 
 	intel_uc_fini_misc(dev_priv);
 	i915_gem_cleanup_userptr(dev_priv);
+	i915_timelines_fini(dev_priv);
 
 	i915_gem_drain_freed_objects(dev_priv);
 
@@ -5284,7 +5295,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->priorities)
 		goto err_dependencies;
 
-	INIT_LIST_HEAD(&dev_priv->gt.timelines);
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
@@ -5328,7 +5338,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	WARN_ON(dev_priv->mm.object_count);
-	WARN_ON(!list_empty(&dev_priv->gt.timelines));
 
 	kmem_cache_destroy(dev_priv->priorities);
 	kmem_cache_destroy(dev_priv->dependencies);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index d44b095e2860..12e5a2bc825c 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -850,7 +850,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 *
 	 * No more can be submitted until we reset the wedged bit.
 	 */
-	list_for_each_entry(tl, &i915->gt.timelines, link) {
+	mutex_lock(&i915->gt.timelines.mutex);
+	list_for_each_entry(tl, &i915->gt.timelines.list, link) {
 		struct i915_request *rq;
 		long timeout;
 
@@ -872,9 +873,12 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 		timeout = dma_fence_default_wait(&rq->fence, true,
 						 MAX_SCHEDULE_TIMEOUT);
 		i915_request_put(rq);
-		if (timeout < 0)
+		if (timeout < 0) {
+			mutex_unlock(&i915->gt.timelines.mutex);
 			goto unlock;
+		}
 	}
+	mutex_unlock(&i915->gt.timelines.mutex);
 
 	intel_engines_sanitize(i915, false);
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 4667cc08c416..84550f17d3df 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -13,7 +13,7 @@ void i915_timeline_init(struct drm_i915_private *i915,
 			struct i915_timeline *timeline,
 			const char *name)
 {
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
@@ -23,9 +23,12 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	 */
 	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
 
+	timeline->i915 = i915;
 	timeline->name = name;
 
-	list_add(&timeline->link, &i915->gt.timelines);
+	mutex_lock(&gt->mutex);
+	list_add(&timeline->link, &gt->list);
+	mutex_unlock(&gt->mutex);
 
 	/* Called during early_init before we know how many engines there are */
 
@@ -39,6 +42,17 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	i915_syncmap_init(&timeline->sync);
 }
 
+void i915_timelines_init(struct drm_i915_private *i915)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+
+	mutex_init(&gt->mutex);
+	INIT_LIST_HEAD(&gt->list);
+
+	/* via i915_gem_wait_for_idle() */
+	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
+}
+
 /**
  * i915_timelines_park - called when the driver idles
  * @i915: the drm_i915_private device
@@ -51,11 +65,11 @@ void i915_timeline_init(struct drm_i915_private *i915,
  */
 void i915_timelines_park(struct drm_i915_private *i915)
 {
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	struct i915_timeline *timeline;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
-
-	list_for_each_entry(timeline, &i915->gt.timelines, link) {
+	mutex_lock(&gt->mutex);
+	list_for_each_entry(timeline, &gt->list, link) {
 		/*
 		 * All known fences are completed so we can scrap
 		 * the current sync point tracking and start afresh,
@@ -64,15 +78,20 @@ void i915_timelines_park(struct drm_i915_private *i915)
 		 */
 		i915_syncmap_free(&timeline->sync);
 	}
+	mutex_unlock(&gt->mutex);
 }
 
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
+	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
+
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
 	i915_syncmap_free(&timeline->sync);
 
+	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
+	mutex_unlock(&gt->mutex);
 }
 
 struct i915_timeline *
@@ -99,6 +118,15 @@ void __i915_timeline_free(struct kref *kref)
 	kfree(timeline);
 }
 
+void i915_timelines_fini(struct drm_i915_private *i915)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+
+	GEM_BUG_ON(!list_empty(&gt->list));
+
+	mutex_destroy(&gt->mutex);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_timeline.c"
 #include "selftests/i915_timeline.c"
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 38c1e15e927a..87ad2dd31c20 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -66,6 +66,7 @@ struct i915_timeline {
 
 	struct list_head link;
 	const char *name;
+	struct drm_i915_private *i915;
 
 	struct kref kref;
 };
@@ -134,6 +135,8 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 	return __i915_timeline_sync_is_later(tl, fence->context, fence->seqno);
 }
 
+void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
+void i915_timelines_fini(struct drm_i915_private *i915);
 
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 8ab5a2688a0c..14ae46fda49f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -68,13 +68,14 @@ static void mock_device_release(struct drm_device *dev)
 	i915_gem_contexts_fini(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
+	i915_timelines_fini(i915);
+
 	drain_workqueue(i915->wq);
 	i915_gem_drain_freed_objects(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	mock_fini_ggtt(&i915->ggtt);
 	mutex_unlock(&i915->drm.struct_mutex);
-	WARN_ON(!list_empty(&i915->gt.timelines));
 
 	destroy_workqueue(i915->wq);
 
@@ -226,7 +227,8 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->priorities)
 		goto err_dependencies;
 
-	INIT_LIST_HEAD(&i915->gt.timelines);
+	i915_timelines_init(i915);
+
 	INIT_LIST_HEAD(&i915->gt.active_rings);
 	INIT_LIST_HEAD(&i915->gt.closed_vma);
 
@@ -253,6 +255,7 @@ struct drm_i915_private *mock_gem_device(void)
 	i915_gem_contexts_fini(i915);
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
+	i915_timelines_fini(i915);
 	kmem_cache_destroy(i915->priorities);
 err_dependencies:
 	kmem_cache_destroy(i915->dependencies);
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index dcf3b16f5a07..cf39ccd9fc05 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -10,6 +10,7 @@
 
 void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 {
+	timeline->i915 = NULL;
 	timeline->fence_context = context;
 
 	spin_lock_init(&timeline->lock);
@@ -24,5 +25,5 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 void mock_timeline_fini(struct i915_timeline *timeline)
 {
-	i915_timeline_fini(timeline);
+	i915_syncmap_free(&timeline->sync);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (16 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 17/34] drm/i915: Move list of timelines under its own lock Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22  9:33   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags Chris Wilson
                   ` (22 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Replace the open-coding of advance with a call instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/mock_engine.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 968a7e139a67..386dfa7e2d5c 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -67,11 +67,10 @@ static struct mock_request *first_request(struct mock_engine *engine)
 					link);
 }
 
-static void advance(struct mock_engine *engine,
-		    struct mock_request *request)
+static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
-	mock_seqno_advance(&engine->base, request->base.global_seqno);
+	mock_seqno_advance(request->base.engine, request->base.global_seqno);
 }
 
 static void hw_delay_complete(struct timer_list *t)
@@ -84,7 +83,7 @@ static void hw_delay_complete(struct timer_list *t)
 	/* Timer fired, first request is complete */
 	request = first_request(engine);
 	if (request)
-		advance(engine, request);
+		advance(request);
 
 	/*
 	 * Also immediately signal any subsequent 0-delay requests, but
@@ -96,7 +95,7 @@ static void hw_delay_complete(struct timer_list *t)
 			break;
 		}
 
-		advance(engine, request);
+		advance(request);
 	}
 
 	spin_unlock(&engine->hw_lock);
@@ -180,7 +179,7 @@ static void mock_submit_request(struct i915_request *request)
 		if (mock->delay)
 			mod_timer(&engine->hw_delay, jiffies + mock->delay);
 		else
-			advance(engine, mock);
+			advance(mock);
 	}
 	spin_unlock_irq(&engine->hw_lock);
 }
@@ -240,10 +239,8 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 	del_timer_sync(&mock->hw_delay);
 
 	spin_lock_irq(&mock->hw_lock);
-	list_for_each_entry_safe(request, rn, &mock->hw_queue, link) {
-		list_del_init(&request->link);
-		mock_seqno_advance(&mock->base, request->base.global_seqno);
-	}
+	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
+		advance(request);
 	spin_unlock_irq(&mock->hw_lock);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (17 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22  9:35   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
                   ` (21 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

A repeated pattern is to test the signaled bit of our
request->fence.flags. Make this an inline to shorten a few lines and
remove unnecessary line continuations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c          | 3 +--
 drivers/gpu/drm/i915/i915_request.c      | 2 +-
 drivers/gpu/drm/i915/i915_request.h      | 5 +++++
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +--
 drivers/gpu/drm/i915/intel_lrc.c         | 2 +-
 drivers/gpu/drm/i915/intel_pm.c          | 2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c  | 3 +--
 7 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 1abfc3fa76ad..5fd5080c4ccb 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1182,8 +1182,7 @@ static void notify_ring(struct intel_engine_cs *engine)
 			struct i915_request *waiter = wait->request;
 
 			if (waiter &&
-			    !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-				      &waiter->fence.flags) &&
+			    !i915_request_signaled(waiter) &&
 			    intel_wait_check_request(wait, waiter))
 				rq = i915_request_get(waiter);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 80232de8e2be..2721a356368f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -198,7 +198,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	spin_unlock(&engine->timeline.lock);
 
 	spin_lock(&rq->lock);
-	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+	if (!i915_request_signaled(rq))
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
 		intel_engine_cancel_signaling(rq);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index d014b0605445..c0f084ca4f29 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -280,6 +280,11 @@ long i915_request_wait(struct i915_request *rq,
 #define I915_WAIT_ALL		BIT(3) /* used by i915_gem_object_wait() */
 #define I915_WAIT_FOR_IDLE_BOOST BIT(4)
 
+static inline bool i915_request_signaled(const struct i915_request *rq)
+{
+	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
+}
+
 static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 					    u32 seqno);
 static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 4fad93fe3678..b58915b8708b 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -631,8 +631,7 @@ static int intel_breadcrumbs_signaler(void *arg)
 				rq->signaling.wait.seqno = 0;
 				__list_del_entry(&rq->signaling.link);
 
-				if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-					      &rq->fence.flags)) {
+				if (!i915_request_signaled(rq)) {
 					list_add_tail(&rq->signaling.link,
 						      &list);
 					i915_request_get(rq);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bc65d8006e16..464dd309fa99 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -855,7 +855,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(rq, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!rq->global_seqno);
 
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+		if (i915_request_signaled(rq))
 			continue;
 
 		dma_fence_set_error(&rq->fence, -EIO);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 8b63afa3a221..fdc28a3d2936 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6662,7 +6662,7 @@ void gen6_rps_boost(struct i915_request *rq,
 	if (!rps->enabled)
 		return;
 
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+	if (i915_request_signaled(rq))
 		return;
 
 	/* Serializes with i915_request_retire() */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 66dc8e2fa353..bc620ae297b4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -876,8 +876,7 @@ static void cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(request, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!request->global_seqno);
 
-		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			     &request->fence.flags))
+		if (i915_request_signaled(request))
 			continue;
 
 		dma_fence_set_error(&request->fence, -EIO);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (18 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-23  1:35   ` John Harrison
  2019-01-21 22:21 ` [PATCH 21/34] drm/i915: Enlarge vma->pin_count Chris Wilson
                   ` (20 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Supplement the per-engine HWSP with a per-timeline HWSP. That is a
per-request pointer through which we can check a local seqno,
abstracting away the presumption of a global seqno. In this first step,
we point each request back into the engine's HWSP so everything
continues to work with the global timeline.

v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current
HW value and that we are accessing it via i915_request merely as a
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 16 ++++++----
 drivers/gpu/drm/i915/i915_request.h | 45 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c    |  9 ++++--
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2721a356368f..d61e86c6a1d1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -182,10 +182,11 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_request(struct intel_engine_cs *engine,
 				    struct i915_request *rq)
 {
-	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d:%d\n",
 		  __func__, engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
+		  hwsp_seqno(rq),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!i915_request_completed(rq));
@@ -244,10 +245,11 @@ static void i915_request_retire(struct i915_request *request)
 {
 	struct i915_gem_active *active, *next;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  request->engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(request->engine));
 
 	lockdep_assert_held(&request->i915->drm.struct_mutex);
@@ -307,10 +309,11 @@ void i915_request_retire_upto(struct i915_request *rq)
 	struct intel_ring *ring = rq->ring;
 	struct i915_request *tmp;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  rq->engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
+		  hwsp_seqno(rq),
 		  intel_engine_get_seqno(rq->engine));
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
@@ -355,10 +358,11 @@ void __i915_request_submit(struct i915_request *request)
 	struct intel_engine_cs *engine = request->engine;
 	u32 seqno;
 
-	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d:%d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  engine->timeline.seqno + 1,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!irqs_disabled());
@@ -405,10 +409,11 @@ void __i915_request_unsubmit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
-	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d:%d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!irqs_disabled());
@@ -616,6 +621,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->ring = ce->ring;
 	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
+	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
 
 	spin_lock_init(&rq->lock);
 	dma_fence_init(&rq->fence,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index c0f084ca4f29..ade010fe6e26 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -130,6 +130,13 @@ struct i915_request {
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
 
+	/*
+	 * A convenience pointer to the current breadcrumb value stored in
+	 * the HW status page (or our timeline's local equivalent). The full
+	 * path would be rq->hw_context->ring->timeline->hwsp_seqno.
+	 */
+	const u32 *hwsp_seqno;
+
 	/**
 	 * GEM sequence number associated with this request on the
 	 * global execution timeline. It is zero when the request is not
@@ -285,11 +292,6 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
-static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
-					    u32 seqno);
-static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
-					      u32 seqno);
-
 /**
  * Returns true if seq1 is later than seq2.
  */
@@ -298,6 +300,35 @@ static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
 	return (s32)(seq1 - seq2) >= 0;
 }
 
+static inline u32 __hwsp_seqno(const struct i915_request *rq)
+{
+	return READ_ONCE(*rq->hwsp_seqno);
+}
+
+/**
+ * hwsp_seqno - the current breadcrumb value in the HW status page
+ * @rq: the request, to chase the relevant HW status page
+ *
+ * The emphasis in naming here is that hwsp_seqno() is not a property of the
+ * request, but an indication of the current HW state (associated with this
+ * request). Its value will change as the GPU executes more requests.
+ *
+ * Returns the current breadcrumb value in the associated HW status page (or
+ * the local timeline's equivalent) for this request. The request itself
+ * has the associated breadcrumb value of rq->fence.seqno, when the HW
+ * status page has that breadcrumb or later, this request is complete.
+ */
+static inline u32 hwsp_seqno(const struct i915_request *rq)
+{
+	u32 seqno;
+
+	rcu_read_lock(); /* the HWSP may be freed at runtime */
+	seqno = __hwsp_seqno(rq);
+	rcu_read_unlock();
+
+	return seqno;
+}
+
 /**
  * i915_request_started - check if the request has begun being executed
  * @rq: the request
@@ -315,14 +346,14 @@ static inline bool i915_request_started(const struct i915_request *rq)
 	if (!seqno) /* not yet submitted to HW */
 		return false;
 
-	return intel_engine_has_started(rq->engine, seqno);
+	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
 }
 
 static inline bool
 __i915_request_completed(const struct i915_request *rq, u32 seqno)
 {
 	GEM_BUG_ON(!seqno);
-	return intel_engine_has_completed(rq->engine, seqno) &&
+	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
 		seqno == i915_request_global_seqno(rq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 464dd309fa99..e45c4e29c435 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -470,11 +470,12 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
 
-			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
 				  engine->name, n,
 				  port[n].context_id, count,
 				  rq->global_seqno,
 				  rq->fence.context, rq->fence.seqno,
+				  hwsp_seqno(rq),
 				  intel_engine_get_seqno(engine),
 				  rq_prio(rq));
 		} else {
@@ -767,11 +768,12 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
 
-		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
+		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d:%d)\n",
 			  rq->engine->name,
 			  (unsigned int)(port - execlists->port),
 			  rq->global_seqno,
 			  rq->fence.context, rq->fence.seqno,
+			  hwsp_seqno(rq),
 			  intel_engine_get_seqno(rq->engine));
 
 		GEM_BUG_ON(!execlists->active);
@@ -997,12 +999,13 @@ static void process_csb(struct intel_engine_cs *engine)
 						EXECLISTS_ACTIVE_USER));
 
 		rq = port_unpack(port, &count);
-		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
 			  engine->name,
 			  port->context_id, count,
 			  rq ? rq->global_seqno : 0,
 			  rq ? rq->fence.context : 0,
 			  rq ? rq->fence.seqno : 0,
+			  rq ? hwsp_seqno(rq) : 0,
 			  intel_engine_get_seqno(engine),
 			  rq ? rq_prio(rq) : 0);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 21/34] drm/i915: Enlarge vma->pin_count
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (19 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 22/34] drm/i915: Allocate a status page for each timeline Chris Wilson
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Previously we only accommodated having a vma pinned by a small number of
users, with the maximum being pinned for use by the display engine. As
such, we used a small bitfield only large enough to allow the vma to
be pinned twice (for back/front buffers) in each scanout plane. Keeping
the maximum permissible pin_count small allows us to quickly catch a
potential leak. However, as we want to split a 4096B page into 64
different cachelines and pin each cacheline for use by a different
timeline, we will exceed the current maximum permissible vma->pin_count
and so time has come to enlarge it.

Whilst we are here, try to pull together the similar bits:

Address/layout specification:
 - bias, mappable, zone_4g: address limit specifiers
 - fixed: address override, limits still apply though
 - high: not strictly an address limit, but an address direction to search

Search controls:
 - nonblock, nonfault, noevict

v2: Rewrite the guideline comment on bit consumption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.C.Harrison@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 26 ++++++++---------
 drivers/gpu/drm/i915/i915_vma.h     | 45 +++++++++++++++++++----------
 2 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bd679c8c56dd..03ade71b8d9a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 /* Flags used by pin/bind&friends. */
 #define PIN_NONBLOCK		BIT_ULL(0)
-#define PIN_MAPPABLE		BIT_ULL(1)
-#define PIN_ZONE_4G		BIT_ULL(2)
-#define PIN_NONFAULT		BIT_ULL(3)
-#define PIN_NOEVICT		BIT_ULL(4)
-
-#define PIN_MBZ			BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */
-#define PIN_GLOBAL		BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */
-#define PIN_USER		BIT_ULL(7) /* I915_VMA_LOCAL_BIND */
-#define PIN_UPDATE		BIT_ULL(8)
-
-#define PIN_HIGH		BIT_ULL(9)
-#define PIN_OFFSET_BIAS		BIT_ULL(10)
-#define PIN_OFFSET_FIXED	BIT_ULL(11)
+#define PIN_NONFAULT		BIT_ULL(1)
+#define PIN_NOEVICT		BIT_ULL(2)
+#define PIN_MAPPABLE		BIT_ULL(3)
+#define PIN_ZONE_4G		BIT_ULL(4)
+#define PIN_HIGH		BIT_ULL(5)
+#define PIN_OFFSET_BIAS		BIT_ULL(6)
+#define PIN_OFFSET_FIXED	BIT_ULL(7)
+
+#define PIN_MBZ			BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */
+#define PIN_GLOBAL		BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */
+#define PIN_USER		BIT_ULL(10) /* I915_VMA_LOCAL_BIND */
+#define PIN_UPDATE		BIT_ULL(11)
+
 #define PIN_OFFSET_MASK		(-I915_GTT_PAGE_SIZE)
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 7252abc73d3e..5793abe509a2 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -71,29 +71,42 @@ struct i915_vma {
 	unsigned int open_count;
 	unsigned long flags;
 	/**
-	 * How many users have pinned this object in GTT space. The following
-	 * users can each hold at most one reference: pwrite/pread, execbuffer
-	 * (objects are not allowed multiple times for the same batchbuffer),
-	 * and the framebuffer code. When switching/pageflipping, the
-	 * framebuffer code has at most two buffers pinned per crtc.
+	 * How many users have pinned this object in GTT space.
 	 *
-	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
-	 * bits with absolutely no headroom. So use 4 bits.
+	 * This is a tightly bound, fairly small number of users, so we
+	 * stuff inside the flags field so that we can both check for overflow
+	 * and detect a no-op i915_vma_pin() in a single check, while also
+	 * pinning the vma.
+	 *
+	 * The worst case display setup would have the same vma pinned for
+	 * use on each plane on each crtc, while also building the next atomic
+	 * state and holding a pin for the length of the cleanup queue. In the
+	 * future, the flip queue may be increased from 1.
+	 * Estimated worst case: 3 [qlen] * 4 [max crtcs] * 7 [max planes] = 84
+	 *
+	 * For GEM, the number of concurrent users for pwrite/pread is
+	 * unbounded. For execbuffer, it is currently one but will in future
+	 * be extended to allow multiple clients to pin vma concurrently.
+	 *
+	 * We also use suballocated pages, with each suballocation claiming
+	 * its own pin on the shared vma. At present, this is limited to
+	 * exclusive cachelines of a single page, so a maximum of 64 possible
+	 * users.
 	 */
-#define I915_VMA_PIN_MASK 0xf
-#define I915_VMA_PIN_OVERFLOW	BIT(5)
+#define I915_VMA_PIN_MASK 0xff
+#define I915_VMA_PIN_OVERFLOW	BIT(8)
 
 	/** Flags and address space this VMA is bound to */
-#define I915_VMA_GLOBAL_BIND	BIT(6)
-#define I915_VMA_LOCAL_BIND	BIT(7)
+#define I915_VMA_GLOBAL_BIND	BIT(9)
+#define I915_VMA_LOCAL_BIND	BIT(10)
 #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW)
 
-#define I915_VMA_GGTT		BIT(8)
-#define I915_VMA_CAN_FENCE	BIT(9)
-#define I915_VMA_CLOSED		BIT(10)
-#define I915_VMA_USERFAULT_BIT	11
+#define I915_VMA_GGTT		BIT(11)
+#define I915_VMA_CAN_FENCE	BIT(12)
+#define I915_VMA_CLOSED		BIT(13)
+#define I915_VMA_USERFAULT_BIT	14
 #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
-#define I915_VMA_GGTT_WRITE	BIT(12)
+#define I915_VMA_GGTT_WRITE	BIT(15)
 
 	unsigned int active_count;
 	struct rb_root active;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 22/34] drm/i915: Allocate a status page for each timeline
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (20 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 21/34] drm/i915: Enlarge vma->pin_count Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Allocate a page for use as a status page by a group of timelines, as we
only need a dword of storage for each (rounded up to the cacheline for
safety) we can pack multiple timelines into the same page. Each timeline
will then be able to track its own HW seqno.

v2: Reuse the common per-engine HWSP for the solitary ringbuffer
timeline, so that we do not have to emit (using per-gen specialised
vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
sleight-of-hand for the global/per-context seqno switchover, we will
store both temporarily (and so use a custom offset for the shared timeline
HWSP until the switch over).

v3: Keep things simple and allocate a page for each timeline, page
sharing comes next.

v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
again in selftests.

v5: And caught red handed copying create timeline + check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_timeline.c          | 121 ++++++-
 drivers/gpu/drm/i915/i915_timeline.h          |  21 +-
 drivers/gpu/drm/i915/intel_engine_cs.c        |  64 ++--
 drivers/gpu/drm/i915/intel_lrc.c              |  22 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  10 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |   6 +-
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../drm/i915/selftests/i915_mock_selftests.h  |   2 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 326 +++++++++++++++++-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  14 +-
 10 files changed, 535 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 84550f17d3df..8d5792311a8f 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -9,28 +9,78 @@
 #include "i915_timeline.h"
 #include "i915_syncmap.h"
 
-void i915_timeline_init(struct drm_i915_private *i915,
-			struct i915_timeline *timeline,
-			const char *name)
+static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+
+	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+
+	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+	if (IS_ERR(vma))
+		i915_gem_object_put(obj);
+
+	return vma;
+}
+
+static int hwsp_alloc(struct i915_timeline *timeline)
+{
+	struct i915_vma *vma;
+
+	vma = __hwsp_alloc(timeline->i915);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	timeline->hwsp_ggtt = vma;
+	timeline->hwsp_offset = 0;
+
+	return 0;
+}
+
+int i915_timeline_init(struct drm_i915_private *i915,
+		       struct i915_timeline *timeline,
+		       const char *name,
+		       struct i915_vma *global_hwsp)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	void *vaddr;
+	int err;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
 	 * to mostly be tracking synchronisation between engines. It is not
 	 * a huge issue if this is not the case, but we may want to mitigate
 	 * any page crossing penalties if they become an issue.
+	 *
+	 * Called during early_init before we know how many engines there are.
 	 */
 	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
 
 	timeline->i915 = i915;
 	timeline->name = name;
+	timeline->pin_count = 0;
+
+	if (global_hwsp) {
+		timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+	} else {
+		err = hwsp_alloc(timeline);
+		if (err)
+			return err;
+	}
 
-	mutex_lock(&gt->mutex);
-	list_add(&timeline->link, &gt->list);
-	mutex_unlock(&gt->mutex);
+	vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		i915_vma_put(timeline->hwsp_ggtt);
+		return PTR_ERR(vaddr);
+	}
 
-	/* Called during early_init before we know how many engines there are */
+	timeline->hwsp_seqno =
+		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
 	timeline->fence_context = dma_fence_context_alloc(1);
 
@@ -40,6 +90,12 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
+
+	mutex_lock(&gt->mutex);
+	list_add(&timeline->link, &gt->list);
+	mutex_unlock(&gt->mutex);
+
+	return 0;
 }
 
 void i915_timelines_init(struct drm_i915_private *i915)
@@ -85,6 +141,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
 
+	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
 	i915_syncmap_free(&timeline->sync);
@@ -92,23 +149,69 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
 	mutex_unlock(&gt->mutex);
+
+	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	i915_vma_put(timeline->hwsp_ggtt);
 }
 
 struct i915_timeline *
-i915_timeline_create(struct drm_i915_private *i915, const char *name)
+i915_timeline_create(struct drm_i915_private *i915,
+		     const char *name,
+		     struct i915_vma *global_hwsp)
 {
 	struct i915_timeline *timeline;
+	int err;
 
 	timeline = kzalloc(sizeof(*timeline), GFP_KERNEL);
 	if (!timeline)
 		return ERR_PTR(-ENOMEM);
 
-	i915_timeline_init(i915, timeline, name);
+	err = i915_timeline_init(i915, timeline, name, global_hwsp);
+	if (err) {
+		kfree(timeline);
+		return ERR_PTR(err);
+	}
+
 	kref_init(&timeline->kref);
 
 	return timeline;
 }
 
+int i915_timeline_pin(struct i915_timeline *tl)
+{
+	int err;
+
+	if (tl->pin_count++)
+		return 0;
+	GEM_BUG_ON(!tl->pin_count);
+
+	err = i915_vma_pin(tl->hwsp_ggtt, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err)
+		goto unpin;
+
+	return 0;
+
+unpin:
+	tl->pin_count = 0;
+	return err;
+}
+
+void i915_timeline_unpin(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	if (--tl->pin_count)
+		return;
+
+	/*
+	 * Since this timeline is idle, all bariers upon which we were waiting
+	 * must also be complete and so we can discard the last used barriers
+	 * without loss of information.
+	 */
+	i915_syncmap_free(&tl->sync);
+
+	__i915_vma_unpin(tl->hwsp_ggtt);
+}
+
 void __i915_timeline_free(struct kref *kref)
 {
 	struct i915_timeline *timeline =
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 87ad2dd31c20..0c3739d53d79 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -32,6 +32,8 @@
 #include "i915_syncmap.h"
 #include "i915_utils.h"
 
+struct i915_vma;
+
 struct i915_timeline {
 	u64 fence_context;
 	u32 seqno;
@@ -40,6 +42,11 @@ struct i915_timeline {
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
 
+	unsigned int pin_count;
+	const u32 *hwsp_seqno;
+	struct i915_vma *hwsp_ggtt;
+	u32 hwsp_offset;
+
 	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
@@ -71,9 +78,10 @@ struct i915_timeline {
 	struct kref kref;
 };
 
-void i915_timeline_init(struct drm_i915_private *i915,
-			struct i915_timeline *tl,
-			const char *name);
+int i915_timeline_init(struct drm_i915_private *i915,
+		       struct i915_timeline *tl,
+		       const char *name,
+		       struct i915_vma *hwsp);
 void i915_timeline_fini(struct i915_timeline *tl);
 
 static inline void
@@ -96,7 +104,9 @@ i915_timeline_set_subclass(struct i915_timeline *timeline,
 }
 
 struct i915_timeline *
-i915_timeline_create(struct drm_i915_private *i915, const char *name);
+i915_timeline_create(struct drm_i915_private *i915,
+		     const char *name,
+		     struct i915_vma *global_hwsp);
 
 static inline struct i915_timeline *
 i915_timeline_get(struct i915_timeline *timeline)
@@ -135,6 +145,9 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 	return __i915_timeline_sync_is_later(tl, fence->context, fence->seqno);
 }
 
+int i915_timeline_pin(struct i915_timeline *tl);
+void i915_timeline_unpin(struct i915_timeline *tl);
+
 void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 4b4b7358c482..c850d131d8c3 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -484,26 +484,6 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 	execlists->queue = RB_ROOT_CACHED;
 }
 
-/**
- * intel_engines_setup_common - setup engine state not requiring hw access
- * @engine: Engine to setup.
- *
- * Initializes @engine@ structure members shared between legacy and execlists
- * submission modes which do not require hardware access.
- *
- * Typically done early in the submission mode specific engine setup stage.
- */
-void intel_engine_setup_common(struct intel_engine_cs *engine)
-{
-	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
-	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
-
-	intel_engine_init_execlist(engine);
-	intel_engine_init_hangcheck(engine);
-	intel_engine_init_batch_pool(engine);
-	intel_engine_init_cmd_parser(engine);
-}
-
 static void cleanup_status_page(struct intel_engine_cs *engine)
 {
 	struct i915_vma *vma;
@@ -601,6 +581,44 @@ static int init_status_page(struct intel_engine_cs *engine)
 	return ret;
 }
 
+/**
+ * intel_engines_setup_common - setup engine state not requiring hw access
+ * @engine: Engine to setup.
+ *
+ * Initializes @engine@ structure members shared between legacy and execlists
+ * submission modes which do not require hardware access.
+ *
+ * Typically done early in the submission mode specific engine setup stage.
+ */
+int intel_engine_setup_common(struct intel_engine_cs *engine)
+{
+	int err;
+
+	err = init_status_page(engine);
+	if (err)
+		return err;
+
+	err = i915_timeline_init(engine->i915,
+				 &engine->timeline,
+				 engine->name,
+				 engine->status_page.vma);
+	if (err)
+		goto err_hwsp;
+
+	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
+
+	intel_engine_init_execlist(engine);
+	intel_engine_init_hangcheck(engine);
+	intel_engine_init_batch_pool(engine);
+	intel_engine_init_cmd_parser(engine);
+
+	return 0;
+
+err_hwsp:
+	cleanup_status_page(engine);
+	return err;
+}
+
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
@@ -653,14 +671,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		goto err_unpin_preempt;
 
-	ret = init_status_page(engine);
-	if (ret)
-		goto err_breadcrumbs;
-
 	return 0;
 
-err_breadcrumbs:
-	intel_engine_fini_breadcrumbs(engine);
 err_unpin_preempt:
 	if (i915->preempt_context)
 		__intel_context_unpin(i915->preempt_context, engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e45c4e29c435..5c830a1ca332 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2219,10 +2219,14 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
-static void
+static int
 logical_ring_setup(struct intel_engine_cs *engine)
 {
-	intel_engine_setup_common(engine);
+	int err;
+
+	err = intel_engine_setup_common(engine);
+	if (err)
+		return err;
 
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
@@ -2232,6 +2236,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
+
+	return 0;
 }
 
 static int logical_ring_init(struct intel_engine_cs *engine)
@@ -2280,7 +2286,9 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 {
 	int ret;
 
-	logical_ring_setup(engine);
+	ret = logical_ring_setup(engine);
+	if (ret)
+		return ret;
 
 	/* Override some for render ring. */
 	engine->init_context = gen8_init_rcs_context;
@@ -2310,7 +2318,11 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 
 int logical_xcs_ring_init(struct intel_engine_cs *engine)
 {
-	logical_ring_setup(engine);
+	int err;
+
+	err = logical_ring_setup(engine);
+	if (err)
+		return err;
 
 	return logical_ring_init(engine);
 }
@@ -2644,7 +2656,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	timeline = i915_timeline_create(ctx->i915, ctx->name);
+	timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
 		goto error_deref_obj;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bc620ae297b4..cad25f7b8c2e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1539,9 +1539,13 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	struct intel_ring *ring;
 	int err;
 
-	intel_engine_setup_common(engine);
+	err = intel_engine_setup_common(engine);
+	if (err)
+		return err;
 
-	timeline = i915_timeline_create(engine->i915, engine->name);
+	timeline = i915_timeline_create(engine->i915,
+					engine->name,
+					engine->status_page.vma);
 	if (IS_ERR(timeline)) {
 		err = PTR_ERR(timeline);
 		goto err;
@@ -1565,6 +1569,8 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	if (err)
 		goto err_unpin;
 
+	GEM_BUG_ON(ring->timeline->hwsp_ggtt != engine->status_page.vma);
+
 	return 0;
 
 err_unpin:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f43d0bb451a9..a792bacf2930 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -712,7 +712,9 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_INDEX_ADDR (I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 #define I915_GEM_HWS_PREEMPT_INDEX	0x32
 #define I915_GEM_HWS_PREEMPT_ADDR (I915_GEM_HWS_PREEMPT_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
-#define I915_GEM_HWS_SCRATCH_INDEX	0x40
+#define I915_GEM_HWS_SEQNO		0x40
+#define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO << MI_STORE_DWORD_INDEX_SHIFT)
+#define I915_GEM_HWS_SCRATCH_INDEX	0x80
 #define I915_GEM_HWS_SCRATCH_ADDR (I915_GEM_HWS_SCRATCH_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 
 #define I915_HWS_CSB_BUF0_INDEX		0x10
@@ -818,7 +820,7 @@ intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
 
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 
-void intel_engine_setup_common(struct intel_engine_cs *engine);
+int intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index a15713cae3b3..76b4f87fc853 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -13,6 +13,7 @@ selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
 selftest(uncore, intel_uncore_live_selftests)
 selftest(workarounds, intel_workarounds_live_selftests)
 selftest(requests, i915_request_live_selftests)
+selftest(timelines, i915_timeline_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(dmabuf, i915_gem_dmabuf_live_selftests)
 selftest(coherency, i915_gem_coherency_live_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 1b70208eeea7..4a83a1c6c406 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -16,7 +16,7 @@ selftest(syncmap, i915_syncmap_mock_selftests)
 selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
 selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
-selftest(timelines, i915_gem_timeline_mock_selftests)
+selftest(timelines, i915_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
 selftest(dmabuf, i915_gem_dmabuf_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 19f1c6a5c8fb..1585b614510d 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -7,6 +7,7 @@
 #include "../i915_selftest.h"
 #include "i915_random.h"
 
+#include "igt_flush_test.h"
 #include "mock_gem_device.h"
 #include "mock_timeline.h"
 
@@ -256,7 +257,7 @@ static int bench_sync(void *arg)
 	return 0;
 }
 
-int i915_gem_timeline_mock_selftests(void)
+int i915_timeline_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_sync),
@@ -265,3 +266,326 @@ int i915_gem_timeline_mock_selftests(void)
 
 	return i915_subtests(tests, NULL);
 }
+
+static int emit_ggtt_store_dw(struct i915_request *rq, u32 addr, u32 value)
+{
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	if (INTEL_GEN(rq->i915) >= 8) {
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = addr;
+		*cs++ = 0;
+		*cs++ = value;
+	} else if (INTEL_GEN(rq->i915) >= 4) {
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = 0;
+		*cs++ = addr;
+		*cs++ = value;
+	} else {
+		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
+		*cs++ = addr;
+		*cs++ = value;
+		*cs++ = MI_NOOP;
+	}
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static u32 hwsp_address(const struct i915_timeline *tl)
+{
+	return i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
+}
+
+static struct i915_request *
+tl_write(struct i915_timeline *tl, struct intel_engine_cs *engine, u32 value)
+{
+	struct i915_request *rq;
+	int err;
+
+	lockdep_assert_held(&tl->i915->drm.struct_mutex); /* lazy rq refs */
+
+	err = i915_timeline_pin(tl);
+	if (err) {
+		rq = ERR_PTR(err);
+		goto out;
+	}
+
+	rq = i915_request_alloc(engine, engine->i915->kernel_context);
+	if (IS_ERR(rq))
+		goto out_unpin;
+
+	err = emit_ggtt_store_dw(rq, hwsp_address(tl), value);
+	i915_request_add(rq);
+	if (err)
+		rq = ERR_PTR(err);
+
+out_unpin:
+	i915_timeline_unpin(tl);
+out:
+	if (IS_ERR(rq))
+		pr_err("Failed to write to timeline!\n");
+	return rq;
+}
+
+static struct i915_timeline *
+checked_i915_timeline_create(struct drm_i915_private *i915)
+{
+	struct i915_timeline *tl;
+
+	tl = i915_timeline_create(i915, "live", NULL);
+	if (IS_ERR(tl))
+		return tl;
+
+	if (*tl->hwsp_seqno != tl->seqno) {
+		pr_err("Timeline created with incorrect breadcrumb, found %x, expected %x\n",
+		       *tl->hwsp_seqno, tl->seqno);
+		i915_timeline_put(tl);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return tl;
+}
+
+static int live_hwsp_engine(void *arg)
+{
+#define NUM_TIMELINES 4096
+	struct drm_i915_private *i915 = arg;
+	struct i915_timeline **timelines;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count, n;
+	int err = 0;
+
+	/*
+	 * Create a bunch of timelines and check we can write
+	 * independently to each of their breadcrumb slots.
+	 */
+
+	timelines = kvmalloc_array(NUM_TIMELINES * I915_NUM_ENGINES,
+				   sizeof(*timelines),
+				   GFP_KERNEL);
+	if (!timelines)
+		return -ENOMEM;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for_each_engine(engine, i915, id) {
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		for (n = 0; n < NUM_TIMELINES; n++) {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			timelines[count++] = tl;
+		}
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (n = 0; n < count; n++) {
+		struct i915_timeline *tl = timelines[n];
+
+		if (!err && *tl->hwsp_seqno != n) {
+			pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+			       n, *tl->hwsp_seqno);
+			err = -EINVAL;
+		}
+		i915_timeline_put(tl);
+	}
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	kvfree(timelines);
+
+	return err;
+#undef NUM_TIMELINES
+}
+
+static int live_hwsp_alternate(void *arg)
+{
+#define NUM_TIMELINES 4096
+	struct drm_i915_private *i915 = arg;
+	struct i915_timeline **timelines;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count, n;
+	int err = 0;
+
+	/*
+	 * Create a bunch of timelines and check we can write
+	 * independently to each of their breadcrumb slots with adjacent
+	 * engines.
+	 */
+
+	timelines = kvmalloc_array(NUM_TIMELINES * I915_NUM_ENGINES,
+				   sizeof(*timelines),
+				   GFP_KERNEL);
+	if (!timelines)
+		return -ENOMEM;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for (n = 0; n < NUM_TIMELINES; n++) {
+		for_each_engine(engine, i915, id) {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			if (!intel_engine_can_store_dword(engine))
+				continue;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			timelines[count++] = tl;
+		}
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (n = 0; n < count; n++) {
+		struct i915_timeline *tl = timelines[n];
+
+		if (!err && *tl->hwsp_seqno != n) {
+			pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+			       n, *tl->hwsp_seqno);
+			err = -EINVAL;
+		}
+		i915_timeline_put(tl);
+	}
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	kvfree(timelines);
+
+	return err;
+#undef NUM_TIMELINES
+}
+
+static int live_hwsp_recycle(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count;
+	int err = 0;
+
+	/*
+	 * Check seqno writes into one timeline at a time. We expect to
+	 * recycle the breadcrumb slot between iterations and neither
+	 * want to confuse ourselves or the GPU.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for_each_engine(engine, i915, id) {
+		IGT_TIMEOUT(end_time);
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		do {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			if (i915_request_wait(rq,
+					      I915_WAIT_LOCKED,
+					      HZ / 5) < 0) {
+				pr_err("Wait for timeline writes timed out!\n");
+				i915_timeline_put(tl);
+				err = -EIO;
+				goto out;
+			}
+
+			if (*tl->hwsp_seqno != count) {
+				pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+				       count, *tl->hwsp_seqno);
+				err = -EINVAL;
+			}
+
+			i915_timeline_put(tl);
+			count++;
+
+			if (err)
+				goto out;
+
+			i915_timelines_park(i915); /* Encourage recycling! */
+		} while (!__igt_timeout(end_time, NULL));
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
+int i915_timeline_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_hwsp_recycle),
+		SUBTEST(live_hwsp_engine),
+		SUBTEST(live_hwsp_alternate),
+	};
+
+	return i915_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 386dfa7e2d5c..ca95ab278da3 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -39,7 +39,12 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 	if (!ring)
 		return NULL;
 
-	i915_timeline_init(engine->i915, &ring->timeline, engine->name);
+	if (i915_timeline_init(engine->i915,
+			       &ring->timeline, engine->name,
+			       NULL)) {
+		kfree(ring);
+		return NULL;
+	}
 
 	ring->base.size = sz;
 	ring->base.effective_size = sz;
@@ -208,7 +213,11 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
 
-	i915_timeline_init(i915, &engine->base.timeline, engine->base.name);
+	if (i915_timeline_init(i915,
+			       &engine->base.timeline,
+			       engine->base.name,
+			       NULL))
+		goto err_free;
 	i915_timeline_set_subclass(&engine->base.timeline, TIMELINE_ENGINE);
 
 	intel_engine_init_breadcrumbs(&engine->base);
@@ -226,6 +235,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(&engine->base);
 	i915_timeline_fini(&engine->base.timeline);
+err_free:
 	kfree(engine);
 	return NULL;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (21 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 22/34] drm/i915: Allocate a status page for each timeline Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22 10:47   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
                   ` (17 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.

John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.

v2: Keep all partially allocated HWSP on a freelist

This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!

v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |   4 +
 drivers/gpu/drm/i915/i915_timeline.c          | 117 ++++++++++++---
 drivers/gpu/drm/i915/i915_timeline.h          |   1 +
 drivers/gpu/drm/i915/i915_vma.h               |  12 ++
 drivers/gpu/drm/i915/selftests/i915_random.c  |  33 ++++-
 drivers/gpu/drm/i915/selftests/i915_random.h  |   3 +
 .../gpu/drm/i915/selftests/i915_timeline.c    | 140 ++++++++++++++++++
 7 files changed, 282 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 364067f811f7..c00eaf2889fb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1978,6 +1978,10 @@ struct drm_i915_private {
 		struct i915_gt_timelines {
 			struct mutex mutex; /* protects list, tainted by GPU */
 			struct list_head list;
+
+			/* Pack multiple timelines' seqnos into the same page */
+			spinlock_t hwsp_lock;
+			struct list_head hwsp_free_list;
 		} timelines;
 
 		struct list_head active_rings;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 8d5792311a8f..69ee33dfa340 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -9,6 +9,12 @@
 #include "i915_timeline.h"
 #include "i915_syncmap.h"
 
+struct i915_timeline_hwsp {
+	struct i915_vma *vma;
+	struct list_head free_link;
+	u64 free_bitmap;
+};
+
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -27,28 +33,92 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 	return vma;
 }
 
-static int hwsp_alloc(struct i915_timeline *timeline)
+static struct i915_vma *
+hwsp_alloc(struct i915_timeline *timeline, int *offset)
 {
-	struct i915_vma *vma;
+	struct drm_i915_private *i915 = timeline->i915;
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline_hwsp *hwsp;
+	int cacheline;
 
-	vma = __hwsp_alloc(timeline->i915);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
 
-	timeline->hwsp_ggtt = vma;
-	timeline->hwsp_offset = 0;
+	spin_lock(&gt->hwsp_lock);
 
-	return 0;
+	/* hwsp_free_list only contains HWSP that have available cachelines */
+	hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
+					typeof(*hwsp), free_link);
+	if (!hwsp) {
+		struct i915_vma *vma;
+
+		spin_unlock(&gt->hwsp_lock);
+
+		hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
+		if (!hwsp)
+			return ERR_PTR(-ENOMEM);
+
+		vma = __hwsp_alloc(i915);
+		if (IS_ERR(vma)) {
+			kfree(hwsp);
+			return vma;
+		}
+
+		vma->private = hwsp;
+		hwsp->vma = vma;
+		hwsp->free_bitmap = ~0ull;
+
+		spin_lock(&gt->hwsp_lock);
+		list_add(&hwsp->free_link, &gt->hwsp_free_list);
+	}
+
+	GEM_BUG_ON(!hwsp->free_bitmap);
+	cacheline = __ffs64(hwsp->free_bitmap);
+	hwsp->free_bitmap &= ~BIT_ULL(cacheline);
+	if (!hwsp->free_bitmap)
+		list_del(&hwsp->free_link);
+
+	spin_unlock(&gt->hwsp_lock);
+
+	GEM_BUG_ON(hwsp->vma->private != hwsp);
+
+	*offset = cacheline * CACHELINE_BYTES;
+	return hwsp->vma;
+}
+
+static void hwsp_free(struct i915_timeline *timeline)
+{
+	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
+	struct i915_timeline_hwsp *hwsp;
+
+	hwsp = i915_timeline_hwsp(timeline);
+	if (!hwsp) /* leave global HWSP alone! */
+		return;
+
+	spin_lock(&gt->hwsp_lock);
+
+	/* As a cacheline becomes available, publish the HWSP on the freelist */
+	if (!hwsp->free_bitmap)
+		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
+
+	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+
+	/* And if no one is left using it, give the page back to the system */
+	if (hwsp->free_bitmap == ~0ull) {
+		i915_vma_put(hwsp->vma);
+		list_del(&hwsp->free_link);
+		kfree(hwsp);
+	}
+
+	spin_unlock(&gt->hwsp_lock);
 }
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
-		       struct i915_vma *global_hwsp)
+		       struct i915_vma *hwsp)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	void *vaddr;
-	int err;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
@@ -64,18 +134,18 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 
-	if (global_hwsp) {
-		timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
-		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
-	} else {
-		err = hwsp_alloc(timeline);
-		if (err)
-			return err;
+	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+	if (!hwsp) {
+		hwsp = hwsp_alloc(timeline, &timeline->hwsp_offset);
+		if (IS_ERR(hwsp))
+			return PTR_ERR(hwsp);
 	}
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
+	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
-		i915_vma_put(timeline->hwsp_ggtt);
+		hwsp_free(timeline);
+		i915_vma_put(hwsp);
 		return PTR_ERR(vaddr);
 	}
 
@@ -105,6 +175,9 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	mutex_init(&gt->mutex);
 	INIT_LIST_HEAD(&gt->list);
 
+	spin_lock_init(&gt->hwsp_lock);
+	INIT_LIST_HEAD(&gt->hwsp_free_list);
+
 	/* via i915_gem_wait_for_idle() */
 	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
 }
@@ -144,12 +217,13 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
-	i915_syncmap_free(&timeline->sync);
-
 	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
 	mutex_unlock(&gt->mutex);
 
+	i915_syncmap_free(&timeline->sync);
+	hwsp_free(timeline);
+
 	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
 	i915_vma_put(timeline->hwsp_ggtt);
 }
@@ -226,6 +300,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	GEM_BUG_ON(!list_empty(&gt->list));
+	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
 
 	mutex_destroy(&gt->mutex);
 }
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 0c3739d53d79..ab736e2e5707 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -33,6 +33,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
+struct i915_timeline_hwsp;
 
 struct i915_timeline {
 	u64 fence_context;
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5793abe509a2..46eb818ed309 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -226,6 +226,18 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 	return i915_vm_to_ggtt(vma->vm)->pin_bias;
 }
 
+/* XXX inline spaghetti */
+static inline struct i915_timeline_hwsp *
+i915_timeline_hwsp(const struct i915_timeline *tl)
+{
+	return tl->hwsp_ggtt->private;
+}
+
+static inline bool i915_timeline_is_global(const struct i915_timeline *tl)
+{
+	return !i915_timeline_hwsp(tl);
+}
+
 static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
 {
 	i915_gem_object_get(vma->obj);
diff --git a/drivers/gpu/drm/i915/selftests/i915_random.c b/drivers/gpu/drm/i915/selftests/i915_random.c
index 1f415ce47018..716a3f19f030 100644
--- a/drivers/gpu/drm/i915/selftests/i915_random.c
+++ b/drivers/gpu/drm/i915/selftests/i915_random.c
@@ -41,18 +41,37 @@ u64 i915_prandom_u64_state(struct rnd_state *rnd)
 	return x;
 }
 
-void i915_random_reorder(unsigned int *order, unsigned int count,
-			 struct rnd_state *state)
+void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
+			  struct rnd_state *state)
 {
-	unsigned int i, j;
+	char stack[128];
+
+	if (WARN_ON(elsz > sizeof(stack) || count > U32_MAX))
+		return;
+
+	if (!elsz || !count)
+		return;
+
+	/* Fisher-Yates shuffle courtesy of Knuth */
+	while (--count) {
+		size_t swp;
+
+		swp = i915_prandom_u32_max_state(count + 1, state);
+		if (swp == count)
+			continue;
 
-	for (i = 0; i < count; i++) {
-		BUILD_BUG_ON(sizeof(unsigned int) > sizeof(u32));
-		j = i915_prandom_u32_max_state(count, state);
-		swap(order[i], order[j]);
+		memcpy(stack, arr + count * elsz, elsz);
+		memcpy(arr + count * elsz, arr + swp * elsz, elsz);
+		memcpy(arr + swp * elsz, stack, elsz);
 	}
 }
 
+void i915_random_reorder(unsigned int *order, unsigned int count,
+			 struct rnd_state *state)
+{
+	i915_prandom_shuffle(order, sizeof(*order), count, state);
+}
+
 unsigned int *i915_random_order(unsigned int count, struct rnd_state *state)
 {
 	unsigned int *order, i;
diff --git a/drivers/gpu/drm/i915/selftests/i915_random.h b/drivers/gpu/drm/i915/selftests/i915_random.h
index 7dffedc501ca..8e1ff9c105b6 100644
--- a/drivers/gpu/drm/i915/selftests/i915_random.h
+++ b/drivers/gpu/drm/i915/selftests/i915_random.h
@@ -54,4 +54,7 @@ void i915_random_reorder(unsigned int *order,
 			 unsigned int count,
 			 struct rnd_state *state);
 
+void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
+			  struct rnd_state *state);
+
 #endif /* !__I915_SELFTESTS_RANDOM_H__ */
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 1585b614510d..1cecc71fba74 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -4,6 +4,8 @@
  * Copyright © 2017-2018 Intel Corporation
  */
 
+#include <linux/prime_numbers.h>
+
 #include "../i915_selftest.h"
 #include "i915_random.h"
 
@@ -11,6 +13,143 @@
 #include "mock_gem_device.h"
 #include "mock_timeline.h"
 
+static struct page *hwsp_page(struct i915_timeline *tl)
+{
+	struct drm_i915_gem_object *obj = tl->hwsp_ggtt->obj;
+
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	return sg_page(obj->mm.pages->sgl);
+}
+
+static unsigned long hwsp_cacheline(struct i915_timeline *tl)
+{
+	unsigned long address = (unsigned long)page_address(hwsp_page(tl));
+
+	return (address + tl->hwsp_offset) / CACHELINE_BYTES;
+}
+
+#define CACHELINES_PER_PAGE (PAGE_SIZE / CACHELINE_BYTES)
+
+struct mock_hwsp_freelist {
+	struct drm_i915_private *i915;
+	struct radix_tree_root cachelines;
+	struct i915_timeline **history;
+	unsigned long count, max;
+	struct rnd_state prng;
+};
+
+enum {
+	SHUFFLE = BIT(0),
+};
+
+static void __mock_hwsp_record(struct mock_hwsp_freelist *state,
+			       unsigned int idx,
+			       struct i915_timeline *tl)
+{
+	tl = xchg(&state->history[idx], tl);
+	if (tl) {
+		radix_tree_delete(&state->cachelines, hwsp_cacheline(tl));
+		i915_timeline_put(tl);
+	}
+}
+
+static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
+				unsigned int count,
+				unsigned int flags)
+{
+	struct i915_timeline *tl;
+	unsigned int idx;
+
+	while (count--) {
+		unsigned long cacheline;
+		int err;
+
+		tl = i915_timeline_create(state->i915, "mock", NULL);
+		if (IS_ERR(tl))
+			return PTR_ERR(tl);
+
+		cacheline = hwsp_cacheline(tl);
+		err = radix_tree_insert(&state->cachelines, cacheline, tl);
+		if (err) {
+			if (err == -EEXIST) {
+				pr_err("HWSP cacheline %lu already used; duplicate allocation!\n",
+				       cacheline);
+			}
+			i915_timeline_put(tl);
+			return err;
+		}
+
+		idx = state->count++ % state->max;
+		__mock_hwsp_record(state, idx, tl);
+	}
+
+	if (flags & SHUFFLE)
+		i915_prandom_shuffle(state->history,
+				     sizeof(*state->history),
+				     min(state->count, state->max),
+				     &state->prng);
+
+	count = i915_prandom_u32_max_state(min(state->count, state->max),
+					   &state->prng);
+	while (count--) {
+		idx = --state->count % state->max;
+		__mock_hwsp_record(state, idx, NULL);
+	}
+
+	return 0;
+}
+
+static int mock_hwsp_freelist(void *arg)
+{
+	struct mock_hwsp_freelist state;
+	const struct {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "linear", 0 },
+		{ "shuffled", SHUFFLE },
+		{ },
+	}, *p;
+	unsigned int na;
+	int err = 0;
+
+	INIT_RADIX_TREE(&state.cachelines, GFP_KERNEL);
+	state.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed);
+
+	state.i915 = mock_gem_device();
+	if (!state.i915)
+		return -ENOMEM;
+
+	/*
+	 * Create a bunch of timelines and check that their HWSP do not overlap.
+	 * Free some, and try again.
+	 */
+
+	state.max = PAGE_SIZE / sizeof(*state.history);
+	state.count = 0;
+	state.history = kcalloc(state.max, sizeof(*state.history), GFP_KERNEL);
+	if (!state.history) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	for (p = phases; p->name; p++) {
+		pr_debug("%s(%s)\n", __func__, p->name);
+		for_each_prime_number_from(na, 1, 2 * CACHELINES_PER_PAGE) {
+			err = __mock_hwsp_timeline(&state, na, p->flags);
+			if (err)
+				goto out;
+		}
+	}
+
+out:
+	for (na = 0; na < state.max; na++)
+		__mock_hwsp_record(&state, na, NULL);
+	kfree(state.history);
+	drm_dev_put(&state.i915->drm);
+	return err;
+}
+
 struct __igt_sync {
 	const char *name;
 	u32 seqno;
@@ -260,6 +399,7 @@ static int bench_sync(void *arg)
 int i915_timeline_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(mock_hwsp_freelist),
 		SUBTEST(igt_sync),
 		SUBTEST(bench_sync),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (22 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22 12:24   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 25/34] drm/i915: Track active timelines Chris Wilson
                   ` (16 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c              |  2 +-
 drivers/gpu/drm/i915/i915_request.c          |  3 +-
 drivers/gpu/drm/i915/i915_request.h          | 30 +++----
 drivers/gpu/drm/i915/i915_reset.c            |  1 +
 drivers/gpu/drm/i915/i915_vma.h              |  6 ++
 drivers/gpu/drm/i915/intel_engine_cs.c       |  7 +-
 drivers/gpu/drm/i915/intel_lrc.c             | 35 +++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 88 +++++++++++++++-----
 drivers/gpu/drm/i915/selftests/mock_engine.c | 20 ++++-
 9 files changed, 132 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 761714448ff3..4e0de22f0166 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2890,7 +2890,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	 */
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 	list_for_each_entry(request, &engine->timeline.requests, link) {
-		if (__i915_request_completed(request, request->global_seqno))
+		if (i915_request_completed(request))
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d61e86c6a1d1..bb2885f1dc1e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -199,6 +199,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	spin_unlock(&engine->timeline.lock);
 
 	spin_lock(&rq->lock);
+	i915_request_mark_complete(rq);
 	if (!i915_request_signaled(rq))
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
@@ -621,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->ring = ce->ring;
 	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
+	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
 
 	spin_lock_init(&rq->lock);
 	dma_fence_init(&rq->fence,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index ade010fe6e26..96c586d6ff4d 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -289,6 +289,7 @@ long i915_request_wait(struct i915_request *rq,
 
 static inline bool i915_request_signaled(const struct i915_request *rq)
 {
+	/* The request may live longer than its HWSP, so check flags first! */
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
@@ -340,32 +341,23 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
  */
 static inline bool i915_request_started(const struct i915_request *rq)
 {
-	u32 seqno;
-
-	seqno = i915_request_global_seqno(rq);
-	if (!seqno) /* not yet submitted to HW */
-		return false;
+	if (i915_request_signaled(rq))
+		return true;
 
-	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
-}
-
-static inline bool
-__i915_request_completed(const struct i915_request *rq, u32 seqno)
-{
-	GEM_BUG_ON(!seqno);
-	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
-		seqno == i915_request_global_seqno(rq);
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
 }
 
 static inline bool i915_request_completed(const struct i915_request *rq)
 {
-	u32 seqno;
+	if (i915_request_signaled(rq))
+		return true;
 
-	seqno = i915_request_global_seqno(rq);
-	if (!seqno)
-		return false;
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno);
+}
 
-	return __i915_request_completed(rq, seqno);
+static inline void i915_request_mark_complete(struct i915_request *rq)
+{
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno; /* decouple from HWSP */
 }
 
 void i915_retire_requests(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 12e5a2bc825c..09edf488f711 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -756,6 +756,7 @@ static void nop_submit_request(struct i915_request *request)
 
 	spin_lock_irqsave(&request->engine->timeline.lock, flags);
 	__i915_request_submit(request);
+	i915_request_mark_complete(request);
 	intel_engine_write_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
 }
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 46eb818ed309..b0f6b1d904a5 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -227,6 +227,12 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
 }
 
 /* XXX inline spaghetti */
+static inline u32 i915_timeline_seqno_address(const struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	return i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
+}
+
 static inline struct i915_timeline_hwsp *
 i915_timeline_hwsp(const struct i915_timeline *tl)
 {
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index c850d131d8c3..e532b4b27239 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1374,9 +1374,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 				char hdr[80];
 
 				snprintf(hdr, sizeof(hdr),
-					 "\t\tELSP[%d] count=%d, ring->start=%08x, rq: ",
+					 "\t\tELSP[%d] count=%d, ring: {start:%08x, hwsp:%08x}, rq: ",
 					 idx, count,
-					 i915_ggtt_offset(rq->ring->vma));
+					 i915_ggtt_offset(rq->ring->vma),
+					 i915_timeline_seqno_address(rq->timeline));
 				print_request(m, rq, hdr);
 			} else {
 				drm_printf(m, "\t\tELSP[%d] idle\n", idx);
@@ -1486,6 +1487,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 			   rq->ring->emit);
 		drm_printf(m, "\t\tring->space:  0x%08x\n",
 			   rq->ring->space);
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   i915_timeline_seqno_address(rq->timeline));
 
 		print_request_ring(m, rq);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5c830a1ca332..1bf178ca3e00 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -857,10 +857,10 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(rq, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!rq->global_seqno);
 
-		if (i915_request_signaled(rq))
-			continue;
+		if (!i915_request_signaled(rq))
+			dma_fence_set_error(&rq->fence, -EIO);
 
-		dma_fence_set_error(&rq->fence, -EIO);
+		i915_request_mark_complete(rq);
 	}
 
 	/* Flush the queued requests to the timeline list (for retiring). */
@@ -870,9 +870,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
 			list_del_init(&rq->sched.link);
-
-			dma_fence_set_error(&rq->fence, -EIO);
 			__i915_request_submit(rq);
+			dma_fence_set_error(&rq->fence, -EIO);
+			i915_request_mark_complete(rq);
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
@@ -2054,31 +2054,40 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
 
-	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
+	cs = gen8_emit_ggtt_write(cs,
+				  request->fence.seqno,
+				  i915_timeline_seqno_address(request->timeline));
+
+	cs = gen8_emit_ggtt_write(cs,
+				  request->global_seqno,
 				  intel_hws_seqno_address(request->engine));
+
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_breadcrumb_sz = 6 + WA_TAIL_DWORDS;
+static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
 
 static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 {
-	/* We're using qword write, seqno should be aligned to 8 bytes. */
-	BUILD_BUG_ON(I915_GEM_HWS_INDEX & 1);
-
 	cs = gen8_emit_ggtt_write_rcs(cs,
-				      request->global_seqno,
-				      intel_hws_seqno_address(request->engine),
+				      request->fence.seqno,
+				      i915_timeline_seqno_address(request->timeline),
 				      PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
 				      PIPE_CONTROL_DEPTH_CACHE_FLUSH |
 				      PIPE_CONTROL_DC_FLUSH_ENABLE |
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_CS_STALL);
 
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      request->global_seqno,
+				      intel_hws_seqno_address(request->engine),
+				      PIPE_CONTROL_CS_STALL);
+
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
@@ -2087,7 +2096,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
+static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
 
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cad25f7b8c2e..751bd4e7da42 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -326,6 +326,12 @@ static void gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 		 PIPE_CONTROL_DC_FLUSH_ENABLE |
 		 PIPE_CONTROL_QW_WRITE |
 		 PIPE_CONTROL_CS_STALL);
+	*cs++ = i915_timeline_seqno_address(rq->timeline) |
+		PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
 	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
 	*cs++ = rq->global_seqno;
 
@@ -335,7 +341,7 @@ static void gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen6_rcs_emit_breadcrumb_sz = 14;
+static const int gen6_rcs_emit_breadcrumb_sz = 18;
 
 static int
 gen7_render_ring_cs_stall_wa(struct i915_request *rq)
@@ -426,6 +432,13 @@ static void gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 		 PIPE_CONTROL_QW_WRITE |
 		 PIPE_CONTROL_GLOBAL_GTT_IVB |
 		 PIPE_CONTROL_CS_STALL);
+	*cs++ = i915_timeline_seqno_address(rq->timeline);
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = (PIPE_CONTROL_QW_WRITE |
+		 PIPE_CONTROL_GLOBAL_GTT_IVB |
+		 PIPE_CONTROL_CS_STALL);
 	*cs++ = intel_hws_seqno_address(rq->engine);
 	*cs++ = rq->global_seqno;
 
@@ -435,27 +448,37 @@ static void gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen7_rcs_emit_breadcrumb_sz = 6;
+static const int gen7_rcs_emit_breadcrumb_sz = 10;
 
 static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
-	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
+
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen6_xcs_emit_breadcrumb_sz = 4;
+static const int gen6_xcs_emit_breadcrumb_sz = 8;
 
 #define GEN7_XCS_WA 32
 static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
 	int i;
 
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
-	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
 
 	for (i = 0; i < GEN7_XCS_WA; i++) {
@@ -469,12 +492,11 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = 0;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3;
+static const int gen7_xcs_emit_breadcrumb_sz = 10 + GEN7_XCS_WA * 3;
 #undef GEN7_XCS_WA
 
 static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
@@ -734,7 +756,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 	rq = NULL;
 	spin_lock_irqsave(&tl->lock, flags);
 	list_for_each_entry(pos, &tl->requests, link) {
-		if (!__i915_request_completed(pos, pos->global_seqno)) {
+		if (!i915_request_completed(pos)) {
 			rq = pos;
 			break;
 		}
@@ -876,10 +898,10 @@ static void cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(request, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!request->global_seqno);
 
-		if (i915_request_signaled(request))
-			continue;
+		if (!i915_request_signaled(request))
+			dma_fence_set_error(&request->fence, -EIO);
 
-		dma_fence_set_error(&request->fence, -EIO);
+		i915_request_mark_complete(request);
 	}
 
 	intel_write_status_page(engine,
@@ -903,27 +925,38 @@ static void i9xx_submit_request(struct i915_request *request)
 
 static void i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+
 	*cs++ = MI_FLUSH;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+	*cs++ = rq->fence.seqno;
+
 	*cs++ = MI_STORE_DWORD_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int i9xx_emit_breadcrumb_sz = 6;
+static const int i9xx_emit_breadcrumb_sz = 8;
 
 #define GEN5_WA_STORES 8 /* must be at least 1! */
 static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
 	int i;
 
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+
 	*cs++ = MI_FLUSH;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+	*cs++ = rq->fence.seqno;
+
 	BUILD_BUG_ON(GEN5_WA_STORES < 1);
 	for (i = 0; i < GEN5_WA_STORES; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
@@ -932,11 +965,12 @@ static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	}
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 2;
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 6;
 #undef GEN5_WA_STORES
 
 static void
@@ -1163,6 +1197,10 @@ int intel_ring_pin(struct intel_ring *ring)
 
 	GEM_BUG_ON(ring->vaddr);
 
+	ret = i915_timeline_pin(ring->timeline);
+	if (ret)
+		return ret;
+
 	flags = PIN_GLOBAL;
 
 	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
@@ -1179,28 +1217,32 @@ int intel_ring_pin(struct intel_ring *ring)
 		else
 			ret = i915_gem_object_set_to_cpu_domain(vma->obj, true);
 		if (unlikely(ret))
-			return ret;
+			goto unpin_timeline;
 	}
 
 	ret = i915_vma_pin(vma, 0, 0, flags);
 	if (unlikely(ret))
-		return ret;
+		goto unpin_timeline;
 
 	if (i915_vma_is_map_and_fenceable(vma))
 		addr = (void __force *)i915_vma_pin_iomap(vma);
 	else
 		addr = i915_gem_object_pin_map(vma->obj, map);
-	if (IS_ERR(addr))
-		goto err;
+	if (IS_ERR(addr)) {
+		ret = PTR_ERR(addr);
+		goto unpin_ring;
+	}
 
 	vma->obj->pin_global++;
 
 	ring->vaddr = addr;
 	return 0;
 
-err:
+unpin_ring:
 	i915_vma_unpin(vma);
-	return PTR_ERR(addr);
+unpin_timeline:
+	i915_timeline_unpin(ring->timeline);
+	return ret;
 }
 
 void intel_ring_reset(struct intel_ring *ring, u32 tail)
@@ -1229,6 +1271,8 @@ void intel_ring_unpin(struct intel_ring *ring)
 
 	ring->vma->obj->pin_global--;
 	i915_vma_unpin(ring->vma);
+
+	i915_timeline_unpin(ring->timeline);
 }
 
 static struct i915_vma *
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index ca95ab278da3..c0a408828415 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -30,6 +30,17 @@ struct mock_ring {
 	struct i915_timeline timeline;
 };
 
+static void mock_timeline_pin(struct i915_timeline *tl)
+{
+	tl->pin_count++;
+}
+
+static void mock_timeline_unpin(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	tl->pin_count--;
+}
+
 static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 {
 	const unsigned long sz = PAGE_SIZE / 2;
@@ -76,6 +87,8 @@ static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
 	mock_seqno_advance(request->base.engine, request->base.global_seqno);
+	i915_request_mark_complete(&request->base);
+	GEM_BUG_ON(!i915_request_completed(&request->base));
 }
 
 static void hw_delay_complete(struct timer_list *t)
@@ -108,6 +121,7 @@ static void hw_delay_complete(struct timer_list *t)
 
 static void mock_context_unpin(struct intel_context *ce)
 {
+	mock_timeline_unpin(ce->ring->timeline);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -129,6 +143,7 @@ mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
+	int err = -ENOMEM;
 
 	if (ce->pin_count++)
 		return ce;
@@ -139,13 +154,15 @@ mock_context_pin(struct intel_engine_cs *engine,
 			goto err;
 	}
 
+	mock_timeline_pin(ce->ring->timeline);
+
 	ce->ops = &mock_context_ops;
 	i915_gem_context_get(ctx);
 	return ce;
 
 err:
 	ce->pin_count = 0;
-	return ERR_PTR(-ENOMEM);
+	return ERR_PTR(err);
 }
 
 static int mock_request_alloc(struct i915_request *request)
@@ -256,7 +273,6 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 
 void mock_engine_reset(struct intel_engine_cs *engine)
 {
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, 0);
 }
 
 void mock_engine_free(struct intel_engine_cs *engine)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 25/34] drm/i915: Track active timelines
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (23 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22 14:56   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 26/34] drm/i915: Identify active requests Chris Wilson
                   ` (15 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Now that we pin timelines around use, we have a clearly defined lifetime
and convenient points at which we can track only the active timelines.
This allows us to reduce the list iteration to only consider those
active timelines and not all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |  2 +-
 drivers/gpu/drm/i915/i915_gem.c      |  4 +--
 drivers/gpu/drm/i915/i915_reset.c    |  2 +-
 drivers/gpu/drm/i915/i915_timeline.c | 39 ++++++++++++++++++----------
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c00eaf2889fb..5577e0e1034f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1977,7 +1977,7 @@ struct drm_i915_private {
 
 		struct i915_gt_timelines {
 			struct mutex mutex; /* protects list, tainted by GPU */
-			struct list_head list;
+			struct list_head active_list;
 
 			/* Pack multiple timelines' seqnos into the same page */
 			spinlock_t hwsp_lock;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4e0de22f0166..9c499edb4c13 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3246,7 +3246,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 		return timeout;
 
 	mutex_lock(&gt->mutex);
-	list_for_each_entry(tl, &gt->list, link) {
+	list_for_each_entry(tl, &gt->active_list, link) {
 		struct i915_request *rq;
 
 		rq = i915_gem_active_get_unlocked(&tl->last_request);
@@ -3274,7 +3274,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 
 		/* restart after reacquiring the lock */
 		mutex_lock(&gt->mutex);
-		tl = list_entry(&gt->list, typeof(*tl), link);
+		tl = list_entry(&gt->active_list, typeof(*tl), link);
 	}
 	mutex_unlock(&gt->mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 09edf488f711..9b9169508139 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -852,7 +852,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 * No more can be submitted until we reset the wedged bit.
 	 */
 	mutex_lock(&i915->gt.timelines.mutex);
-	list_for_each_entry(tl, &i915->gt.timelines.list, link) {
+	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
 		struct i915_request *rq;
 		long timeout;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 69ee33dfa340..007348b1b469 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -117,7 +117,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 		       const char *name,
 		       struct i915_vma *hwsp)
 {
-	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	void *vaddr;
 
 	/*
@@ -161,10 +160,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	i915_syncmap_init(&timeline->sync);
 
-	mutex_lock(&gt->mutex);
-	list_add(&timeline->link, &gt->list);
-	mutex_unlock(&gt->mutex);
-
 	return 0;
 }
 
@@ -173,7 +168,7 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	mutex_init(&gt->mutex);
-	INIT_LIST_HEAD(&gt->list);
+	INIT_LIST_HEAD(&gt->active_list);
 
 	spin_lock_init(&gt->hwsp_lock);
 	INIT_LIST_HEAD(&gt->hwsp_free_list);
@@ -182,6 +177,24 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
 }
 
+static void timeline_active(struct i915_timeline *tl)
+{
+	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
+
+	mutex_lock(&gt->mutex);
+	list_add(&tl->link, &gt->active_list);
+	mutex_unlock(&gt->mutex);
+}
+
+static void timeline_inactive(struct i915_timeline *tl)
+{
+	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
+
+	mutex_lock(&gt->mutex);
+	list_del(&tl->link);
+	mutex_unlock(&gt->mutex);
+}
+
 /**
  * i915_timelines_park - called when the driver idles
  * @i915: the drm_i915_private device
@@ -198,7 +211,7 @@ void i915_timelines_park(struct drm_i915_private *i915)
 	struct i915_timeline *timeline;
 
 	mutex_lock(&gt->mutex);
-	list_for_each_entry(timeline, &gt->list, link) {
+	list_for_each_entry(timeline, &gt->active_list, link) {
 		/*
 		 * All known fences are completed so we can scrap
 		 * the current sync point tracking and start afresh,
@@ -212,15 +225,9 @@ void i915_timelines_park(struct drm_i915_private *i915)
 
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
-	mutex_lock(&gt->mutex);
-	list_del(&timeline->link);
-	mutex_unlock(&gt->mutex);
-
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
 
@@ -263,6 +270,8 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	if (err)
 		goto unpin;
 
+	timeline_active(tl);
+
 	return 0;
 
 unpin:
@@ -276,6 +285,8 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 	if (--tl->pin_count)
 		return;
 
+	timeline_inactive(tl);
+
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
 	 * must also be complete and so we can discard the last used barriers
@@ -299,7 +310,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
-	GEM_BUG_ON(!list_empty(&gt->list));
+	GEM_BUG_ON(!list_empty(&gt->active_list));
 	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
 
 	mutex_destroy(&gt->mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 26/34] drm/i915: Identify active requests
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (24 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 25/34] drm/i915: Track active timelines Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22 15:34   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
                   ` (14 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  6 +++
 drivers/gpu/drm/i915/i915_request.c          |  9 ++--
 drivers/gpu/drm/i915/i915_request.h          |  1 +
 drivers/gpu/drm/i915/i915_timeline.c         |  1 +
 drivers/gpu/drm/i915/i915_timeline.h         |  2 +
 drivers/gpu/drm/i915/intel_engine_cs.c       |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c             | 47 ++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 43 ++++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  6 ++-
 drivers/gpu/drm/i915/selftests/mock_engine.c |  2 +-
 10 files changed, 86 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f250109e1f66..defe7d60bb88 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1976,6 +1976,12 @@ static int eb_submit(struct i915_execbuffer *eb)
 			return err;
 	}
 
+	if (eb->engine->emit_init_breadcrumb) {
+		err = eb->engine->emit_init_breadcrumb(eb->request);
+		if (err)
+			return err;
+	}
+
 	err = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
 					eb->batch_start_offset,
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index bb2885f1dc1e..0a8a2a1bf55d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -333,6 +333,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 static u32 timeline_get_seqno(struct i915_timeline *tl)
 {
+	tl->seqno += tl->has_initial_breadcrumb;
 	return ++tl->seqno;
 }
 
@@ -382,8 +383,8 @@ void __i915_request_submit(struct i915_request *request)
 		intel_engine_enable_signaling(request, false);
 	spin_unlock(&request->lock);
 
-	engine->emit_breadcrumb(request,
-				request->ring->vaddr + request->postfix);
+	engine->emit_fini_breadcrumb(request,
+				     request->ring->vaddr + request->postfix);
 
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, &engine->timeline);
@@ -657,7 +658,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * around inside i915_request_add() there is sufficient space at
 	 * the beginning of the ring as well.
 	 */
-	rq->reserved_space = 2 * engine->emit_breadcrumb_sz * sizeof(u32);
+	rq->reserved_space = 2 * engine->emit_fini_breadcrumb_sz * sizeof(u32);
 
 	/*
 	 * Record the position of the start of the request so that
@@ -908,7 +909,7 @@ void i915_request_add(struct i915_request *request)
 	 * GPU processing the request, we never over-estimate the
 	 * position of the ring's HEAD.
 	 */
-	cs = intel_ring_begin(request, engine->emit_breadcrumb_sz);
+	cs = intel_ring_begin(request, engine->emit_fini_breadcrumb_sz);
 	GEM_BUG_ON(IS_ERR(cs));
 	request->postfix = intel_ring_offset(request, cs);
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 96c586d6ff4d..340d6216791c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -344,6 +344,7 @@ static inline bool i915_request_started(const struct i915_request *rq)
 	if (i915_request_signaled(rq))
 		return true;
 
+	/* Remember: started but may have since been preempted! */
 	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 007348b1b469..7bc9164733bc 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -132,6 +132,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->i915 = i915;
 	timeline->name = name;
 	timeline->pin_count = 0;
+	timeline->has_initial_breadcrumb = !hwsp;
 
 	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index ab736e2e5707..8caeb66d1cd5 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -48,6 +48,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	bool has_initial_breadcrumb;
+
 	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e532b4b27239..2a4c547240a1 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1239,7 +1239,9 @@ static void print_request(struct drm_printer *m,
 	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
 		   prefix,
 		   rq->global_seqno,
-		   i915_request_completed(rq) ? "!" : "",
+		   i915_request_completed(rq) ? "!" :
+		   i915_request_started(rq) ? "*" :
+		   "",
 		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1bf178ca3e00..0a2d53f19625 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -649,7 +649,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 * WaIdleLiteRestore:bdw,skl
 		 * Apply the wa NOOPs to prevent
 		 * ring:HEAD == rq:TAIL as we resubmit the
-		 * request. See gen8_emit_breadcrumb() for
+		 * request. See gen8_emit_fini_breadcrumb() for
 		 * where we prepare the padding after the
 		 * end of the request.
 		 */
@@ -1294,6 +1294,34 @@ execlists_context_pin(struct intel_engine_cs *engine,
 	return __execlists_context_pin(engine, ctx, ce);
 }
 
+static int gen8_emit_init_breadcrumb(struct i915_request *rq)
+{
+	u32 *cs;
+
+	GEM_BUG_ON(!rq->timeline->has_initial_breadcrumb);
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Check if we have been preempted before we even get started.
+	 *
+	 * After this point i915_request_started() reports true, even if
+	 * we get preempted and so are no longer running.
+	 */
+	*cs++ = MI_ARB_CHECK;
+	*cs++ = MI_NOOP;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = i915_timeline_seqno_address(rq->timeline);
+	*cs++ = 0;
+	*cs++ = rq->fence.seqno - 1;
+
+	intel_ring_advance(rq, cs);
+	return 0;
+}
+
 static int emit_pdps(struct i915_request *rq)
 {
 	const struct intel_engine_cs * const engine = rq->engine;
@@ -2049,7 +2077,7 @@ static void gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
 	request->wa_tail = intel_ring_offset(request, cs);
 }
 
-static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
+static void gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 {
 	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
@@ -2070,9 +2098,9 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
+static const int gen8_emit_fini_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
 
-static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
+static void gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 {
 	cs = gen8_emit_ggtt_write_rcs(cs,
 				      request->fence.seqno,
@@ -2096,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
 
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
@@ -2188,8 +2216,9 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->request_alloc = execlists_request_alloc;
 
 	engine->emit_flush = gen8_emit_flush;
-	engine->emit_breadcrumb = gen8_emit_breadcrumb;
-	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
+	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
+	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb;
+	engine->emit_fini_breadcrumb_sz = gen8_emit_fini_breadcrumb_sz;
 
 	engine->set_default_submission = intel_execlists_set_default_submission;
 
@@ -2302,8 +2331,8 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	/* Override some for render ring. */
 	engine->init_context = gen8_init_rcs_context;
 	engine->emit_flush = gen8_emit_flush_render;
-	engine->emit_breadcrumb = gen8_emit_breadcrumb_rcs;
-	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_rcs_sz;
+	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
+	engine->emit_fini_breadcrumb_sz = gen8_emit_fini_breadcrumb_rcs_sz;
 
 	ret = logical_ring_init(engine);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 751bd4e7da42..f6b30eb46263 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1594,6 +1594,7 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 		err = PTR_ERR(timeline);
 		goto err;
 	}
+	GEM_BUG_ON(timeline->has_initial_breadcrumb);
 
 	ring = intel_engine_create_ring(engine, timeline, 32 * PAGE_SIZE);
 	i915_timeline_put(timeline);
@@ -1947,6 +1948,7 @@ static int ring_request_alloc(struct i915_request *request)
 	int ret;
 
 	GEM_BUG_ON(!request->hw_context->pin_count);
+	GEM_BUG_ON(request->timeline->has_initial_breadcrumb);
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
@@ -2283,11 +2285,16 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->context_pin = intel_ring_context_pin;
 	engine->request_alloc = ring_request_alloc;
 
-	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
-	engine->emit_breadcrumb_sz = i9xx_emit_breadcrumb_sz;
+	/*
+	 * Using a global execution timeline; the previous final breadcrumb is
+	 * equivalent to our next initial bread so we can elide
+	 * engine->emit_init_breadcrumb().
+	 */
+	engine->emit_fini_breadcrumb = i9xx_emit_breadcrumb;
+	engine->emit_fini_breadcrumb_sz = i9xx_emit_breadcrumb_sz;
 	if (IS_GEN(dev_priv, 5)) {
-		engine->emit_breadcrumb = gen5_emit_breadcrumb;
-		engine->emit_breadcrumb_sz = gen5_emit_breadcrumb_sz;
+		engine->emit_fini_breadcrumb = gen5_emit_breadcrumb;
+		engine->emit_fini_breadcrumb_sz = gen5_emit_breadcrumb_sz;
 	}
 
 	engine->set_default_submission = i9xx_set_default_submission;
@@ -2317,13 +2324,13 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 	if (INTEL_GEN(dev_priv) >= 7) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_flush = gen7_render_ring_flush;
-		engine->emit_breadcrumb = gen7_rcs_emit_breadcrumb;
-		engine->emit_breadcrumb_sz = gen7_rcs_emit_breadcrumb_sz;
+		engine->emit_fini_breadcrumb = gen7_rcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb_sz = gen7_rcs_emit_breadcrumb_sz;
 	} else if (IS_GEN(dev_priv, 6)) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_flush = gen6_render_ring_flush;
-		engine->emit_breadcrumb = gen6_rcs_emit_breadcrumb;
-		engine->emit_breadcrumb_sz = gen6_rcs_emit_breadcrumb_sz;
+		engine->emit_fini_breadcrumb = gen6_rcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb_sz = gen6_rcs_emit_breadcrumb_sz;
 	} else if (IS_GEN(dev_priv, 5)) {
 		engine->emit_flush = gen4_render_ring_flush;
 	} else {
@@ -2360,11 +2367,11 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine)
 		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 
 		if (IS_GEN(dev_priv, 6)) {
-			engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
-			engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
+			engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
 		} else {
-			engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
-			engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
+			engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
 		}
 	} else {
 		engine->emit_flush = bsd_ring_flush;
@@ -2389,11 +2396,11 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine)
 	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 
 	if (IS_GEN(dev_priv, 6)) {
-		engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
-		engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
+		engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
 	} else {
-		engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
-		engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
+		engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
 	}
 
 	return intel_init_ring_buffer(engine);
@@ -2412,8 +2419,8 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
 	engine->irq_enable = hsw_vebox_irq_enable;
 	engine->irq_disable = hsw_vebox_irq_disable;
 
-	engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
-	engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
+	engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
+	engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
 
 	return intel_init_ring_buffer(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a792bacf2930..d3d4f3667afb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -463,8 +463,10 @@ struct intel_engine_cs {
 					 unsigned int dispatch_flags);
 #define I915_DISPATCH_SECURE BIT(0)
 #define I915_DISPATCH_PINNED BIT(1)
-	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
-	int		emit_breadcrumb_sz;
+	int		(*emit_init_breadcrumb)(struct i915_request *rq);
+	void		(*emit_fini_breadcrumb)(struct i915_request *rq,
+						u32 *cs);
+	unsigned int	emit_fini_breadcrumb_sz;
 
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index c0a408828415..2515cffb4490 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -227,7 +227,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
 	engine->base.emit_flush = mock_emit_flush;
-	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
+	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
 
 	if (i915_timeline_init(i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (25 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 26/34] drm/i915: Identify active requests Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22 15:50   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
                   ` (13 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

The global seqno is defunct and so we have no meaningful indicator of
forward progress for an engine. You need to listen to the request
signaling tracepoints instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
 2 files changed, 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 5fd5080c4ccb..71d11dc2c235 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1209,8 +1209,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 		wake_up_process(tsk);
 
 	rcu_read_unlock();
-
-	trace_intel_engine_notify(engine, wait);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 33d90eca9cdd..cb5bc65d575d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -750,31 +750,6 @@ trace_i915_request_out(struct i915_request *rq)
 #endif
 #endif
 
-TRACE_EVENT(intel_engine_notify,
-	    TP_PROTO(struct intel_engine_cs *engine, bool waiters),
-	    TP_ARGS(engine, waiters),
-
-	    TP_STRUCT__entry(
-			     __field(u32, dev)
-			     __field(u16, class)
-			     __field(u16, instance)
-			     __field(u32, seqno)
-			     __field(bool, waiters)
-			     ),
-
-	    TP_fast_assign(
-			   __entry->dev = engine->i915->drm.primary->index;
-			   __entry->class = engine->uabi_class;
-			   __entry->instance = engine->instance;
-			   __entry->seqno = intel_engine_get_seqno(engine);
-			   __entry->waiters = waiters;
-			   ),
-
-	    TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
-		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->seqno, __entry->waiters)
-);
-
 DEFINE_EVENT(i915_request, i915_request_retire,
 	    TP_PROTO(struct i915_request *rq),
 	    TP_ARGS(rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (26 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-23  9:21   ` Tvrtko Ursulin
  2019-01-23 11:41   ` [PATCH] " Chris Wilson
  2019-01-21 22:21 ` [PATCH 29/34] drm/i915: Drop fake breadcrumb irq Chris Wilson
                   ` (12 subsequent siblings)
  40 siblings, 2 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c725816, the solution was simple. Every client waking
on a request would be woken on every interrupt and each would do a
heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  28 +-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |  73 --
 drivers/gpu/drm/i915/i915_gpu_error.h         |   8 -
 drivers/gpu/drm/i915/i915_irq.c               |  87 +-
 drivers/gpu/drm/i915/i915_request.c           | 128 +--
 drivers/gpu/drm/i915/i915_request.h           |  22 +-
 drivers/gpu/drm/i915/i915_reset.c             |  13 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c      | 797 +++++-------------
 drivers/gpu/drm/i915/intel_engine_cs.c        |  34 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  95 +--
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 drivers/gpu/drm/i915/selftests/i915_request.c | 398 +++++++++
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   5 -
 .../drm/i915/selftests/intel_breadcrumbs.c    | 470 -----------
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  54 ++
 drivers/gpu/drm/i915/selftests/lib_sw_fence.h |   3 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  16 +-
 drivers/gpu/drm/i915/selftests/mock_engine.h  |   6 -
 21 files changed, 774 insertions(+), 1477 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2a6e4044f25b..d7764e62e9b4 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1315,29 +1315,16 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_breadcrumbs *b = &engine->breadcrumbs;
-		struct rb_node *rb;
-
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
 			   engine->hangcheck.seqno, seqno[id],
 			   intel_engine_last_submit(engine),
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
-		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
-			   yesno(intel_engine_has_waiter(engine)),
+		seq_printf(m, "\tfake irq active? %s\n",
 			   yesno(test_bit(engine->id,
 					  &dev_priv->gpu_error.missed_irq_rings)));
 
-		spin_lock_irq(&b->rb_lock);
-		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-			struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-			seq_printf(m, "\t%s [%d] waiting for %x\n",
-				   w->tsk->comm, w->tsk->pid, w->seqno);
-		}
-		spin_unlock_irq(&b->rb_lock);
-
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
@@ -2021,18 +2008,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static int count_irq_waiters(struct drm_i915_private *i915)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int count = 0;
-
-	for_each_engine(engine, i915, id)
-		count += intel_engine_has_waiter(engine);
-
-	return count;
-}
-
 static const char *rps_power_to_str(unsigned int power)
 {
 	static const char * const strings[] = {
@@ -2072,7 +2047,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
 	seq_printf(m, "GPU busy? %s [%d requests]\n",
 		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
-	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
 	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive));
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 47d82ce7ba6a..9e11b31acd01 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -164,6 +164,8 @@ struct i915_gem_context {
 	struct intel_context {
 		struct i915_gem_context *gem_context;
 		struct intel_engine_cs *active;
+		struct list_head signal_link;
+		struct list_head signals;
 		struct i915_vma *state;
 		struct intel_ring *ring;
 		u32 *lrc_reg_state;
@@ -370,6 +372,9 @@ intel_context_init(struct intel_context *ce,
 		   struct intel_engine_cs *engine)
 {
 	ce->gem_context = ctx;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
 }
 
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 96d1d634a29d..825572127029 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -530,7 +530,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	}
 	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
 	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
-	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
 	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
 	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
 	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
@@ -804,21 +803,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 						    error->epoch);
 		}
 
-		if (IS_ERR(ee->waiters)) {
-			err_printf(m, "%s --- ? waiters [unable to acquire spinlock]\n",
-				   m->i915->engine[i]->name);
-		} else if (ee->num_waiters) {
-			err_printf(m, "%s --- %d waiters\n",
-				   m->i915->engine[i]->name,
-				   ee->num_waiters);
-			for (j = 0; j < ee->num_waiters; j++) {
-				err_printf(m, " seqno 0x%08x for %s [%d]\n",
-					   ee->waiters[j].seqno,
-					   ee->waiters[j].comm,
-					   ee->waiters[j].pid);
-			}
-		}
-
 		print_error_obj(m, m->i915->engine[i],
 				"ringbuffer", ee->ringbuffer);
 
@@ -1000,8 +984,6 @@ void __i915_gpu_state_free(struct kref *error_ref)
 		i915_error_object_free(ee->wa_ctx);
 
 		kfree(ee->requests);
-		if (!IS_ERR_OR_NULL(ee->waiters))
-			kfree(ee->waiters);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
@@ -1203,59 +1185,6 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
 			I915_READ(RING_SYNC_2(engine->mmio_base));
 }
 
-static void error_record_engine_waiters(struct intel_engine_cs *engine,
-					struct drm_i915_error_engine *ee)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct drm_i915_error_waiter *waiter;
-	struct rb_node *rb;
-	int count;
-
-	ee->num_waiters = 0;
-	ee->waiters = NULL;
-
-	if (RB_EMPTY_ROOT(&b->waiters))
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	count = 0;
-	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
-		count++;
-	spin_unlock_irq(&b->rb_lock);
-
-	waiter = NULL;
-	if (count)
-		waiter = kmalloc_array(count,
-				       sizeof(struct drm_i915_error_waiter),
-				       GFP_ATOMIC);
-	if (!waiter)
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		kfree(waiter);
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	ee->waiters = waiter;
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		strcpy(waiter->comm, w->tsk->comm);
-		waiter->pid = w->tsk->pid;
-		waiter->seqno = w->seqno;
-		waiter++;
-
-		if (++ee->num_waiters == count)
-			break;
-	}
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static void error_record_engine_registers(struct i915_gpu_state *error,
 					  struct intel_engine_cs *engine,
 					  struct drm_i915_error_engine *ee)
@@ -1291,7 +1220,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 
 	intel_engine_get_instdone(engine, &ee->instdone);
 
-	ee->waiting = intel_engine_has_waiter(engine);
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
@@ -1540,7 +1468,6 @@ static void gem_record_rings(struct i915_gpu_state *error)
 		ee->engine_id = i;
 
 		error_record_engine_registers(error, engine, ee);
-		error_record_engine_waiters(engine, ee);
 		error_record_engine_execlists(engine, ee);
 
 		request = i915_gem_find_active_request(engine);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 231173786eae..0e184712cbcc 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -82,8 +82,6 @@ struct i915_gpu_state {
 		int engine_id;
 		/* Software tracked state */
 		bool idle;
-		bool waiting;
-		int num_waiters;
 		unsigned long hangcheck_timestamp;
 		struct i915_address_space *vm;
 		int num_requests;
@@ -159,12 +157,6 @@ struct i915_gpu_state {
 		} *requests, execlist[EXECLIST_MAX_PORTS];
 		unsigned int num_ports;
 
-		struct drm_i915_error_waiter {
-			char comm[TASK_COMM_LEN];
-			pid_t pid;
-			u32 seqno;
-		} *waiters;
-
 		struct {
 			u32 gfx_mode;
 			union {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 71d11dc2c235..7669b1caeef0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -28,9 +28,10 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include <linux/sysrq.h>
-#include <linux/slab.h>
 #include <linux/circ_buf.h>
+#include <linux/slab.h>
+#include <linux/sysrq.h>
+
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
@@ -1151,66 +1152,6 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 	return;
 }
 
-static void notify_ring(struct intel_engine_cs *engine)
-{
-	const u32 seqno = intel_engine_get_seqno(engine);
-	struct i915_request *rq = NULL;
-	struct task_struct *tsk = NULL;
-	struct intel_wait *wait;
-
-	if (unlikely(!engine->breadcrumbs.irq_armed))
-		return;
-
-	rcu_read_lock();
-
-	spin_lock(&engine->breadcrumbs.irq_lock);
-	wait = engine->breadcrumbs.irq_wait;
-	if (wait) {
-		/*
-		 * We use a callback from the dma-fence to submit
-		 * requests after waiting on our own requests. To
-		 * ensure minimum delay in queuing the next request to
-		 * hardware, signal the fence now rather than wait for
-		 * the signaler to be woken up. We still wake up the
-		 * waiter in order to handle the irq-seqno coherency
-		 * issues (we may receive the interrupt before the
-		 * seqno is written, see __i915_request_irq_complete())
-		 * and to handle coalescing of multiple seqno updates
-		 * and many waiters.
-		 */
-		if (i915_seqno_passed(seqno, wait->seqno)) {
-			struct i915_request *waiter = wait->request;
-
-			if (waiter &&
-			    !i915_request_signaled(waiter) &&
-			    intel_wait_check_request(wait, waiter))
-				rq = i915_request_get(waiter);
-
-			tsk = wait->tsk;
-		}
-
-		engine->breadcrumbs.irq_count++;
-	} else {
-		if (engine->breadcrumbs.irq_armed)
-			__intel_engine_disarm_breadcrumbs(engine);
-	}
-	spin_unlock(&engine->breadcrumbs.irq_lock);
-
-	if (rq) {
-		spin_lock(&rq->lock);
-		dma_fence_signal_locked(&rq->fence);
-		GEM_BUG_ON(!i915_request_completed(rq));
-		spin_unlock(&rq->lock);
-
-		i915_request_put(rq);
-	}
-
-	if (tsk && tsk->state & TASK_NORMAL)
-		wake_up_process(tsk);
-
-	rcu_read_unlock();
-}
-
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
 			struct intel_rps_ei *ei)
 {
@@ -1455,20 +1396,20 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 }
 
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[BCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[BCS]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -1488,7 +1429,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
-		notify_ring(engine);
+		intel_engine_breadcrumbs_irq(engine);
 		tasklet |= USES_GUC_SUBMISSION(engine->i915);
 	}
 
@@ -1834,7 +1775,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 
 	if (HAS_VEBOX(dev_priv)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VECS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VECS]);
 
 		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
@@ -4257,7 +4198,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		I915_WRITE16(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4365,7 +4306,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4510,10 +4451,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_BSD_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0a8a2a1bf55d..cca437ac8a7e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence)
 
 static bool i915_fence_enable_signaling(struct dma_fence *fence)
 {
-	return intel_engine_enable_signaling(to_request(fence), true);
+	return intel_engine_enable_signaling(to_request(fence));
 }
 
 static signed long i915_fence_wait(struct dma_fence *fence,
@@ -378,9 +378,11 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	request->global_seqno = seqno;
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
-		intel_engine_enable_signaling(request, false);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
+	    !intel_engine_enable_signaling(request))
+		intel_engine_queue_breadcrumbs(engine);
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -390,8 +392,6 @@ void __i915_request_submit(struct i915_request *request)
 	move_to_timeline(request, &engine->timeline);
 
 	trace_i915_request_execute(request);
-
-	wake_up_all(&request->execute);
 }
 
 void i915_request_submit(struct i915_request *request)
@@ -435,6 +435,7 @@ void __i915_request_unsubmit(struct i915_request *request)
 	request->global_seqno = 0;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
 		intel_engine_cancel_signaling(request);
+	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	spin_unlock(&request->lock);
 
 	/* Transfer back from the global per-engine timeline to per-context */
@@ -634,13 +635,11 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
-	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
 
 	/* No zalloc, must clear what we need by hand */
 	rq->global_seqno = 0;
-	rq->signaling.wait.seqno = 0;
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
@@ -1031,13 +1030,10 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(const struct i915_request *rq,
-				u32 seqno, int state, unsigned long timeout_us)
+static bool __i915_spin_request(const struct i915_request * const rq,
+				int state, unsigned long timeout_us)
 {
-	struct intel_engine_cs *engine = rq->engine;
-	unsigned int irq, cpu;
-
-	GEM_BUG_ON(!seqno);
+	unsigned int cpu;
 
 	/*
 	 * Only wait for the request if we know it is likely to complete.
@@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * it is a fair assumption that it will not complete within our
 	 * relatively short timeout.
 	 */
-	if (!intel_engine_has_started(engine, seqno))
+	if (!i915_request_started(rq))
 		return false;
 
 	/*
@@ -1064,20 +1060,10 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	irq = READ_ONCE(engine->breadcrumbs.irq_count);
 	timeout_us += local_clock_us(&cpu);
 	do {
-		if (intel_engine_has_completed(engine, seqno))
-			return seqno == i915_request_global_seqno(rq);
-
-		/*
-		 * Seqno are meant to be ordered *before* the interrupt. If
-		 * we see an interrupt without a corresponding seqno advance,
-		 * assume we won't see one in the near future but require
-		 * the engine->seqno_barrier() to fixup coherency.
-		 */
-		if (READ_ONCE(engine->breadcrumbs.irq_count) != irq)
-			break;
+		if (i915_request_completed(rq))
+			return true;
 
 		if (signal_pending_state(state, current))
 			break;
@@ -1091,6 +1077,18 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	return false;
 }
 
+struct request_wait {
+	struct dma_fence_cb cb;
+	struct task_struct *tsk;
+};
+
+static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	struct request_wait *wait = container_of(cb, typeof(*wait), cb);
+
+	wake_up_process(wait->tsk);
+}
+
 /**
  * i915_request_wait - wait until execution of request has finished
  * @rq: the request to wait upon
@@ -1116,8 +1114,7 @@ long i915_request_wait(struct i915_request *rq,
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT_FUNC(exec, default_wake_function);
-	struct intel_wait wait;
+	struct request_wait wait;
 
 	might_sleep();
 	GEM_BUG_ON(timeout < 0);
@@ -1129,47 +1126,24 @@ long i915_request_wait(struct i915_request *rq,
 		return -ETIME;
 
 	trace_i915_request_wait_begin(rq, flags);
-	add_wait_queue(&rq->execute, &exec);
-	intel_wait_init(&wait);
-	if (flags & I915_WAIT_PRIORITY)
-		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
-
-restart:
-	do {
-		set_current_state(state);
-		if (intel_wait_update_request(&wait, rq))
-			break;
-
-		if (signal_pending_state(state, current)) {
-			timeout = -ERESTARTSYS;
-			goto complete;
-		}
 
-		if (!timeout) {
-			timeout = -ETIME;
-			goto complete;
-		}
+	/* Optimistic short spin before touching IRQs */
+	if (__i915_spin_request(rq, state, 5))
+		goto out;
 
-		timeout = io_schedule_timeout(timeout);
-	} while (1);
+	if (flags & I915_WAIT_PRIORITY)
+		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
 
-	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
-	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
+	wait.tsk = current;
+	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
+		goto out;
 
-	/* Optimistic short spin before touching IRQs */
-	if (__i915_spin_request(rq, wait.seqno, state, 5))
-		goto complete;
+	for (;;) {
+		set_current_state(state);
 
-	set_current_state(state);
-	if (intel_engine_add_wait(rq->engine, &wait))
-		/*
-		 * In order to check that we haven't missed the interrupt
-		 * as we enabled it, we need to kick ourselves to do a
-		 * coherent check on the seqno before we sleep.
-		 */
-		goto wakeup;
+		if (i915_request_completed(rq))
+			break;
 
-	for (;;) {
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
 			break;
@@ -1181,33 +1155,13 @@ long i915_request_wait(struct i915_request *rq,
 		}
 
 		timeout = io_schedule_timeout(timeout);
-
-		if (intel_wait_complete(&wait) &&
-		    intel_wait_check_request(&wait, rq))
-			break;
-
-		set_current_state(state);
-
-wakeup:
-		if (i915_request_completed(rq))
-			break;
-
-		/* Only spin if we know the GPU is processing this request */
-		if (__i915_spin_request(rq, wait.seqno, state, 2))
-			break;
-
-		if (!intel_wait_check_request(&wait, rq)) {
-			intel_engine_remove_wait(rq->engine, &wait);
-			goto restart;
-		}
 	}
-
-	intel_engine_remove_wait(rq->engine, &wait);
-complete:
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&rq->execute, &exec);
-	trace_i915_request_wait_end(rq);
 
+	dma_fence_remove_callback(&rq->fence, &wait.cb);
+
+out:
+	trace_i915_request_wait_end(rq);
 	return timeout;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 340d6216791c..8f78ac97b8d6 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,23 +38,16 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
-struct intel_wait {
-	struct rb_node node;
-	struct task_struct *tsk;
-	struct i915_request *request;
-	u32 seqno;
-};
-
-struct intel_signal_node {
-	struct intel_wait wait;
-	struct list_head link;
-};
-
 struct i915_capture_list {
 	struct i915_capture_list *next;
 	struct i915_vma *vma;
 };
 
+enum {
+	I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
+	I915_FENCE_FLAG_SIGNAL,
+};
+
 /**
  * Request queue structure.
  *
@@ -97,7 +90,7 @@ struct i915_request {
 	struct intel_context *hw_context;
 	struct intel_ring *ring;
 	struct i915_timeline *timeline;
-	struct intel_signal_node signaling;
+	struct list_head signal_link;
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
@@ -116,7 +109,6 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
-	wait_queue_head_t execute;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
@@ -255,7 +247,7 @@ i915_request_put(struct i915_request *rq)
  * that it has passed the global seqno and the global seqno is unchanged
  * after the read, it is indeed complete).
  */
-static u32
+static inline u32
 i915_request_global_seqno(const struct i915_request *request)
 {
 	return READ_ONCE(request->global_seqno);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 9b9169508139..d7d2840fcaa5 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -747,18 +747,19 @@ static void reset_restart(struct drm_i915_private *i915)
 
 static void nop_submit_request(struct i915_request *request)
 {
+	struct intel_engine_cs *engine = request->engine;
 	unsigned long flags;
 
 	GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
-		  request->engine->name,
-		  request->fence.context, request->fence.seqno);
+		  engine->name, request->fence.context, request->fence.seqno);
 	dma_fence_set_error(&request->fence, -EIO);
 
-	spin_lock_irqsave(&request->engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 	__i915_request_submit(request);
 	i915_request_mark_complete(request);
-	intel_engine_write_global_seqno(request->engine, request->global_seqno);
-	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	intel_engine_queue_breadcrumbs(engine);
 }
 
 void i915_gem_set_wedged(struct drm_i915_private *i915)
@@ -813,7 +814,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 
 	for_each_engine(engine, i915, id) {
 		reset_finish_engine(engine);
-		intel_engine_wakeup(engine);
+		intel_engine_signal_breadcrumbs(engine);
 	}
 
 	smp_mb__before_atomic();
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index b58915b8708b..faeb0083b561 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -29,48 +29,132 @@
 
 #define task_asleep(tsk) ((tsk)->state & TASK_NORMAL && !(tsk)->on_rq)
 
-static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b)
+static void irq_enable(struct intel_engine_cs *engine)
 {
-	struct intel_wait *wait;
-	unsigned int result = 0;
+	if (!engine->irq_enable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
 
+static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+{
 	lockdep_assert_held(&b->irq_lock);
 
-	wait = b->irq_wait;
-	if (wait) {
-		/*
-		 * N.B. Since task_asleep() and ttwu are not atomic, the
-		 * waiter may actually go to sleep after the check, causing
-		 * us to suppress a valid wakeup. We prefer to reduce the
-		 * number of false positive missed_breadcrumb() warnings
-		 * at the expense of a few false negatives, as it it easy
-		 * to trigger a false positive under heavy load. Enough
-		 * signal should remain from genuine missed_breadcrumb()
-		 * for us to detect in CI.
-		 */
-		bool was_asleep = task_asleep(wait->tsk);
-
-		result = ENGINE_WAKEUP_WAITER;
-		if (wake_up_process(wait->tsk) && was_asleep)
-			result |= ENGINE_WAKEUP_ASLEEP;
-	}
+	GEM_BUG_ON(!b->irq_enabled);
+	if (!--b->irq_enabled)
+		irq_disable(container_of(b,
+					 struct intel_engine_cs,
+					 breadcrumbs));
 
-	return result;
+	b->irq_armed = false;
 }
 
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine)
+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
-	unsigned int result;
 
-	spin_lock_irqsave(&b->irq_lock, flags);
-	result = __intel_breadcrumbs_wakeup(b);
-	spin_unlock_irqrestore(&b->irq_lock, flags);
+	if (!b->irq_armed)
+		return;
+
+	spin_lock_irq(&b->irq_lock);
+	if (b->irq_armed)
+		__intel_breadcrumbs_disarm_irq(b);
+	spin_unlock_irq(&b->irq_lock);
+}
+
+static inline bool __request_completed(const struct i915_request *rq)
+{
+	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
+}
+
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_context *ce, *cn;
+	struct i915_request *rq, *rn;
+	LIST_HEAD(signal);
+
+	spin_lock(&b->irq_lock);
+
+	b->irq_fired = true;
+	if (b->irq_armed && list_empty(&b->signalers))
+		__intel_breadcrumbs_disarm_irq(b);
+
+	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
+		GEM_BUG_ON(list_empty(&ce->signals));
+
+		list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
+			if (!__request_completed(rq))
+				break;
+
+			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
+					     &rq->fence.flags));
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &rq->fence.flags))
+				continue;
+
+			/*
+			 * Queue for execution after dropping the signaling
+			 * spinlock as the callback chain may end adding
+			 * more signalers to the same context or engine.
+			 */
+			i915_request_get(rq);
+			list_add_tail(&rq->signal_link, &signal);
+		}
+
+		if (!list_is_first(&rq->signal_link, &ce->signals)) {
+			__list_del_many(&ce->signals, &rq->signal_link);
+			if (&ce->signals == &rq->signal_link)
+				list_del_init(&ce->signal_link);
+		}
+	}
+
+	spin_unlock(&b->irq_lock);
+
+	list_for_each_entry_safe(rq, rn, &signal, signal_link) {
+		dma_fence_signal(&rq->fence);
+		i915_request_put(rq);
+	}
+
+	return !list_empty(&signal);
+}
+
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
+{
+	bool result;
+
+	local_irq_disable();
+	result = intel_engine_breadcrumbs_irq(engine);
+	local_irq_enable();
 
 	return result;
 }
 
+static void signal_irq_work(struct irq_work *work)
+{
+	struct intel_engine_cs *engine =
+		container_of(work, typeof(*engine), breadcrumbs.irq_work);
+
+	intel_engine_breadcrumbs_irq(engine);
+}
+
 static unsigned long wait_timeout(void)
 {
 	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
@@ -94,19 +178,15 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	struct intel_engine_cs *engine =
 		from_timer(engine, t, breadcrumbs.hangcheck);
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned int irq_count;
 
 	if (!b->irq_armed)
 		return;
 
-	irq_count = READ_ONCE(b->irq_count);
-	if (b->hangcheck_interrupts != irq_count) {
-		b->hangcheck_interrupts = irq_count;
-		mod_timer(&b->hangcheck, wait_timeout());
-		return;
-	}
+	if (b->irq_fired)
+		goto rearm;
 
-	/* We keep the hangcheck timer alive until we disarm the irq, even
+	/*
+	 * We keep the hangcheck timer alive until we disarm the irq, even
 	 * if there are no waiters at present.
 	 *
 	 * If the waiter was currently running, assume it hasn't had a chance
@@ -118,10 +198,13 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	 * but we still have a waiter. Assuming all batches complete within
 	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
 	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
+	synchronize_hardirq(engine->i915->drm.irq);
+	if (intel_engine_signal_breadcrumbs(engine)) {
 		missed_breadcrumb(engine);
 		mod_timer(&b->fake_irq, jiffies + 1);
 	} else {
+rearm:
+		b->irq_fired = false;
 		mod_timer(&b->hangcheck, wait_timeout());
 	}
 }
@@ -140,11 +223,7 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	 * oldest waiter to do the coherent seqno check.
 	 */
 
-	spin_lock_irq(&b->irq_lock);
-	if (b->irq_armed && !__intel_breadcrumbs_wakeup(b))
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock_irq(&b->irq_lock);
-	if (!b->irq_armed)
+	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
 		return;
 
 	/* If the user has disabled the fake-irq, restore the hangchecking */
@@ -156,43 +235,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	mod_timer(&b->fake_irq, jiffies + 1);
 }
 
-static void irq_enable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_enable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-static void irq_disable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->irq_lock);
-	GEM_BUG_ON(b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-
-	GEM_BUG_ON(!b->irq_enabled);
-	if (!--b->irq_enabled)
-		irq_disable(engine);
-
-	b->irq_armed = false;
-}
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -215,40 +257,6 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
 	spin_unlock_irq(&b->irq_lock);
 }
 
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait, *n;
-
-	if (!b->irq_armed)
-		return;
-
-	/*
-	 * We only disarm the irq when we are idle (all requests completed),
-	 * so if the bottom-half remains asleep, it missed the request
-	 * completion.
-	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
-		missed_breadcrumb(engine);
-
-	spin_lock_irq(&b->rb_lock);
-
-	spin_lock(&b->irq_lock);
-	b->irq_wait = NULL;
-	if (b->irq_armed)
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock(&b->irq_lock);
-
-	rbtree_postorder_for_each_entry_safe(wait, n, &b->waiters, node) {
-		GEM_BUG_ON(!intel_engine_signaled(engine, wait->seqno));
-		RB_CLEAR_NODE(&wait->node);
-		wake_up_process(wait->tsk);
-	}
-	b->waiters = RB_ROOT;
-
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static bool use_fake_irq(const struct intel_breadcrumbs *b)
 {
 	const struct intel_engine_cs *engine =
@@ -264,7 +272,7 @@ static bool use_fake_irq(const struct intel_breadcrumbs *b)
 	 * engine->seqno_barrier(), a timing error that should be transient
 	 * and unlikely to reoccur.
 	 */
-	return READ_ONCE(b->irq_count) == b->hangcheck_interrupts;
+	return !b->irq_fired;
 }
 
 static void enable_fake_irq(struct intel_breadcrumbs *b)
@@ -276,7 +284,7 @@ static void enable_fake_irq(struct intel_breadcrumbs *b)
 		mod_timer(&b->hangcheck, wait_timeout());
 }
 
-static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
@@ -315,536 +323,135 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	return enabled;
 }
 
-static inline struct intel_wait *to_wait(struct rb_node *node)
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 {
-	return rb_entry(node, struct intel_wait, node);
-}
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
-					      struct intel_wait *wait)
-{
-	lockdep_assert_held(&b->rb_lock);
-	GEM_BUG_ON(b->irq_wait == wait);
+	spin_lock_init(&b->irq_lock);
+	INIT_LIST_HEAD(&b->signalers);
 
-	/*
-	 * This request is completed, so remove it from the tree, mark it as
-	 * complete, and *then* wake up the associated task. N.B. when the
-	 * task wakes up, it will find the empty rb_node, discern that it
-	 * has already been removed from the tree and skip the serialisation
-	 * of the b->rb_lock and b->irq_lock. This means that the destruction
-	 * of the intel_wait is not serialised with the interrupt handler
-	 * by the waiter - it must instead be serialised by the caller.
-	 */
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
+	init_irq_work(&b->irq_work, signal_irq_work);
 
-	if (wait->tsk->state != TASK_RUNNING)
-		wake_up_process(wait->tsk); /* implicit smp_wmb() */
+	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
+	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
 }
 
-static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
-					    struct rb_node *next)
+static void cancel_fake_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-	spin_lock(&b->irq_lock);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(!b->irq_wait);
-	b->irq_wait = to_wait(next);
-	spin_unlock(&b->irq_lock);
-
-	/* We always wake up the next waiter that takes over as the bottom-half
-	 * as we may delegate not only the irq-seqno barrier to the next waiter
-	 * but also the task of waking up concurrent waiters.
-	 */
-	if (next)
-		wake_up_process(to_wait(next)->tsk);
+	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
+	del_timer_sync(&b->hangcheck);
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
-static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
-				    struct intel_wait *wait)
+void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node **p, *parent, *completed;
-	bool first, armed;
-	u32 seqno;
+	unsigned long flags;
 
-	GEM_BUG_ON(!wait->seqno);
+	spin_lock_irqsave(&b->irq_lock, flags);
 
-	/* Insert the request into the retirement ordered list
-	 * of waiters by walking the rbtree. If we are the oldest
-	 * seqno in the tree (the first to be retired), then
-	 * set ourselves as the bottom-half.
-	 *
-	 * As we descend the tree, prune completed branches since we hold the
-	 * spinlock we know that the first_waiter must be delayed and can
-	 * reduce some of the sequential wake up latency if we take action
-	 * ourselves and wake up the completed tasks in parallel. Also, by
-	 * removing stale elements in the tree, we may be able to reduce the
-	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	/*
+	 * Leave the fake_irq timer enabled (if it is running), but clear the
+	 * bit so that it turns itself off on its next wake up and goes back
+	 * to the long hangcheck interval if still required.
 	 */
-	armed = false;
-	first = true;
-	parent = NULL;
-	completed = NULL;
-	seqno = intel_engine_get_seqno(engine);
-
-	 /* If the request completed before we managed to grab the spinlock,
-	  * return now before adding ourselves to the rbtree. We let the
-	  * current bottom-half handle any pending wakeups and instead
-	  * try and get out of the way quickly.
-	  */
-	if (i915_seqno_passed(seqno, wait->seqno)) {
-		RB_CLEAR_NODE(&wait->node);
-		return first;
-	}
-
-	p = &b->waiters.rb_node;
-	while (*p) {
-		parent = *p;
-		if (wait->seqno == to_wait(parent)->seqno) {
-			/* We have multiple waiters on the same seqno, select
-			 * the highest priority task (that with the smallest
-			 * task->prio) to serve as the bottom-half for this
-			 * group.
-			 */
-			if (wait->tsk->prio > to_wait(parent)->tsk->prio) {
-				p = &parent->rb_right;
-				first = false;
-			} else {
-				p = &parent->rb_left;
-			}
-		} else if (i915_seqno_passed(wait->seqno,
-					     to_wait(parent)->seqno)) {
-			p = &parent->rb_right;
-			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
-				completed = parent;
-			else
-				first = false;
-		} else {
-			p = &parent->rb_left;
-		}
-	}
-	rb_link_node(&wait->node, parent, p);
-	rb_insert_color(&wait->node, &b->waiters);
-
-	if (first) {
-		spin_lock(&b->irq_lock);
-		b->irq_wait = wait;
-		/* After assigning ourselves as the new bottom-half, we must
-		 * perform a cursory check to prevent a missed interrupt.
-		 * Either we miss the interrupt whilst programming the hardware,
-		 * or if there was a previous waiter (for a later seqno) they
-		 * may be woken instead of us (due to the inherent race
-		 * in the unlocked read of b->irq_seqno_bh in the irq handler)
-		 * and so we miss the wake up.
-		 */
-		armed = __intel_breadcrumbs_enable_irq(b);
-		spin_unlock(&b->irq_lock);
-	}
-
-	if (completed) {
-		/* Advance the bottom-half (b->irq_wait) before we wake up
-		 * the waiters who may scribble over their intel_wait
-		 * just as the interrupt handler is dereferencing it via
-		 * b->irq_wait.
-		 */
-		if (!first) {
-			struct rb_node *next = rb_next(completed);
-			GEM_BUG_ON(next == &wait->node);
-			__intel_breadcrumbs_next(engine, next);
-		}
-
-		do {
-			struct intel_wait *crumb = to_wait(completed);
-			completed = rb_prev(completed);
-			__intel_breadcrumbs_finish(b, crumb);
-		} while (completed);
-	}
-
-	GEM_BUG_ON(!b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(rb_first(&b->waiters) != &b->irq_wait->node);
-
-	return armed;
-}
-
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	bool armed;
-
-	spin_lock_irq(&b->rb_lock);
-	armed = __intel_engine_add_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
-	if (armed)
-		return armed;
-
-	/* Make the caller recheck if its request has already started. */
-	return intel_engine_has_started(engine, wait->seqno);
-}
-
-static inline bool chain_wakeup(struct rb_node *rb, int priority)
-{
-	return rb && to_wait(rb)->tsk->prio <= priority;
-}
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 
-static inline int wakeup_priority(struct intel_breadcrumbs *b,
-				  struct task_struct *tsk)
-{
-	if (tsk == b->signaler)
-		return INT_MIN;
+	if (b->irq_enabled)
+		irq_enable(engine);
 	else
-		return tsk->prio;
-}
-
-static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
-				       struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	if (RB_EMPTY_NODE(&wait->node))
-		goto out;
-
-	if (b->irq_wait == wait) {
-		const int priority = wakeup_priority(b, wait->tsk);
-		struct rb_node *next;
-
-		/* We are the current bottom-half. Find the next candidate,
-		 * the first waiter in the queue on the remaining oldest
-		 * request. As multiple seqnos may complete in the time it
-		 * takes us to wake up and find the next waiter, we have to
-		 * wake up that waiter for it to perform its own coherent
-		 * completion check.
-		 */
-		next = rb_next(&wait->node);
-		if (chain_wakeup(next, priority)) {
-			/* If the next waiter is already complete,
-			 * wake it up and continue onto the next waiter. So
-			 * if have a small herd, they will wake up in parallel
-			 * rather than sequentially, which should reduce
-			 * the overall latency in waking all the completed
-			 * clients.
-			 *
-			 * However, waking up a chain adds extra latency to
-			 * the first_waiter. This is undesirable if that
-			 * waiter is a high priority task.
-			 */
-			u32 seqno = intel_engine_get_seqno(engine);
-
-			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
-				struct rb_node *n = rb_next(next);
-
-				__intel_breadcrumbs_finish(b, to_wait(next));
-				next = n;
-				if (!chain_wakeup(next, priority))
-					break;
-			}
-		}
-
-		__intel_breadcrumbs_next(engine, next);
-	} else {
-		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
-	}
-
-	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
-
-out:
-	GEM_BUG_ON(b->irq_wait == wait);
-	GEM_BUG_ON(rb_first(&b->waiters) !=
-		   (b->irq_wait ? &b->irq_wait->node : NULL));
-}
-
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	/* Quick check to see if this waiter was already decoupled from
-	 * the tree by the bottom-half to avoid contention on the spinlock
-	 * by the herd.
-	 */
-	if (RB_EMPTY_NODE(&wait->node)) {
-		GEM_BUG_ON(READ_ONCE(b->irq_wait) == wait);
-		return;
-	}
+		irq_disable(engine);
 
-	spin_lock_irq(&b->rb_lock);
-	__intel_engine_remove_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
+	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
-static void signaler_set_rtpriority(void)
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
-	 struct sched_param param = { .sched_priority = 1 };
-
-	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
+	cancel_fake_irq(engine);
 }
 
-static int intel_breadcrumbs_signaler(void *arg)
+bool intel_engine_enable_signaling(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = arg;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct i915_request *rq, *n;
-
-	/* Install ourselves with high priority to reduce signalling latency */
-	signaler_set_rtpriority();
-
-	do {
-		bool do_schedule = true;
-		LIST_HEAD(list);
-		u32 seqno;
-
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (list_empty(&b->signals))
-			goto sleep;
-
-		/*
-		 * We are either woken up by the interrupt bottom-half,
-		 * or by a client adding a new signaller. In both cases,
-		 * the GPU seqno may have advanced beyond our oldest signal.
-		 * If it has, propagate the signal, remove the waiter and
-		 * check again with the next oldest signal. Otherwise we
-		 * need to wait for a new interrupt from the GPU or for
-		 * a new client.
-		 */
-		seqno = intel_engine_get_seqno(engine);
-
-		spin_lock_irq(&b->rb_lock);
-		list_for_each_entry_safe(rq, n, &b->signals, signaling.link) {
-			u32 this = rq->signaling.wait.seqno;
-
-			GEM_BUG_ON(!rq->signaling.wait.seqno);
-
-			if (!i915_seqno_passed(seqno, this))
-				break;
-
-			if (likely(this == i915_request_global_seqno(rq))) {
-				__intel_engine_remove_wait(engine,
-							   &rq->signaling.wait);
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-				rq->signaling.wait.seqno = 0;
-				__list_del_entry(&rq->signaling.link);
+	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
 
-				if (!i915_request_signaled(rq)) {
-					list_add_tail(&rq->signaling.link,
-						      &list);
-					i915_request_get(rq);
-				}
-			}
-		}
-		spin_unlock_irq(&b->rb_lock);
+	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
+		return true;
 
-		if (!list_empty(&list)) {
-			local_bh_disable();
-			list_for_each_entry_safe(rq, n, &list, signaling.link) {
-				dma_fence_signal(&rq->fence);
-				GEM_BUG_ON(!i915_request_completed(rq));
-				i915_request_put(rq);
-			}
-			local_bh_enable(); /* kick start the tasklets */
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
+	    !__request_completed(rq)) {
+		struct intel_context *ce = rq->hw_context;
+		struct list_head *pos;
 
-			/*
-			 * If the engine is saturated we may be continually
-			 * processing completed requests. This angers the
-			 * NMI watchdog if we never let anything else
-			 * have access to the CPU. Let's pretend to be nice
-			 * and relinquish the CPU if we burn through the
-			 * entire RT timeslice!
-			 */
-			do_schedule = need_resched();
-		}
+		__intel_breadcrumbs_arm_irq(b);
 
-		if (unlikely(do_schedule)) {
-sleep:
-			if (kthread_should_park())
-				kthread_parkme();
+		list_for_each_prev(pos, &ce->signals) {
+			struct i915_request *it =
+				list_entry(pos, typeof(*it), signal_link);
 
-			if (unlikely(kthread_should_stop()))
+			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
 				break;
-
-			schedule();
 		}
-	} while (1);
-	__set_current_state(TASK_RUNNING);
-
-	return 0;
-}
-
-static void insert_signal(struct intel_breadcrumbs *b,
-			  struct i915_request *request,
-			  const u32 seqno)
-{
-	struct i915_request *iter;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	/*
-	 * A reasonable assumption is that we are called to add signals
-	 * in sequence, as the requests are submitted for execution and
-	 * assigned a global_seqno. This will be the case for the majority
-	 * of internally generated signals (inter-engine signaling).
-	 *
-	 * Out of order waiters triggering random signaling enabling will
-	 * be more problematic, but hopefully rare enough and the list
-	 * small enough that the O(N) insertion sort is not an issue.
-	 */
-
-	list_for_each_entry_reverse(iter, &b->signals, signaling.link)
-		if (i915_seqno_passed(seqno, iter->signaling.wait.seqno))
-			break;
-
-	list_add(&request->signaling.link, &iter->signaling.link);
-}
+		list_add(&rq->signal_link, pos);
+		if (pos == &ce->signals)
+			list_move_tail(&ce->signal_link, &b->signalers);
 
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup)
-{
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait = &request->signaling.wait;
-	u32 seqno;
-
-	/*
-	 * Note that we may be called from an interrupt handler on another
-	 * device (e.g. nouveau signaling a fence completion causing us
-	 * to submit a request, and so enable signaling). As such,
-	 * we need to make sure that all other users of b->rb_lock protect
-	 * against interrupts, i.e. use spin_lock_irqsave.
-	 */
-
-	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
-
-	seqno = i915_request_global_seqno(request);
-	if (!seqno) /* will be enabled later upon execution */
-		return true;
-
-	GEM_BUG_ON(wait->seqno);
-	wait->tsk = b->signaler;
-	wait->request = request;
-	wait->seqno = seqno;
-
-	/*
-	 * Add ourselves into the list of waiters, but registering our
-	 * bottom-half as the signaller thread. As per usual, only the oldest
-	 * waiter (not just signaller) is tasked as the bottom-half waking
-	 * up all completed waiters after the user interrupt.
-	 *
-	 * If we are the oldest waiter, enable the irq (after which we
-	 * must double check that the seqno did not complete).
-	 */
-	spin_lock(&b->rb_lock);
-	insert_signal(b, request, seqno);
-	wakeup &= __intel_engine_add_wait(engine, wait);
-	spin_unlock(&b->rb_lock);
-
-	if (wakeup) {
-		wake_up_process(b->signaler);
-		return !intel_wait_complete(wait);
+		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 	}
+	spin_unlock(&b->irq_lock);
 
-	return true;
+	return !__request_completed(rq);
 }
 
-void intel_engine_cancel_signaling(struct i915_request *request)
+void intel_engine_cancel_signaling(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	if (!READ_ONCE(request->signaling.wait.seqno))
+	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
-	spin_lock(&b->rb_lock);
-	__intel_engine_remove_wait(engine, &request->signaling.wait);
-	if (fetch_and_zero(&request->signaling.wait.seqno))
-		__list_del_entry(&request->signaling.link);
-	spin_unlock(&b->rb_lock);
-}
-
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct task_struct *tsk;
-
-	spin_lock_init(&b->rb_lock);
-	spin_lock_init(&b->irq_lock);
-
-	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
-	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
-
-	INIT_LIST_HEAD(&b->signals);
-
-	/* Spawn a thread to provide a common bottom-half for all signals.
-	 * As this is an asynchronous interface we cannot steal the current
-	 * task for handling the bottom-half to the user interrupt, therefore
-	 * we create a thread to do the coherent seqno dance after the
-	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
-	 */
-	tsk = kthread_run(intel_breadcrumbs_signaler, engine,
-			  "i915/signal:%d", engine->id);
-	if (IS_ERR(tsk))
-		return PTR_ERR(tsk);
-
-	b->signaler = tsk;
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
+		struct intel_context *ce = rq->hw_context;
 
-	return 0;
-}
+		list_del(&rq->signal_link);
+		if (list_empty(&ce->signals))
+			list_del_init(&ce->signal_link);
 
-static void cancel_fake_irq(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
-	del_timer_sync(&b->hangcheck);
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+	}
+	spin_unlock(&b->irq_lock);
 }
 
-void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
-
-	spin_lock_irqsave(&b->irq_lock, flags);
-
-	/*
-	 * Leave the fake_irq timer enabled (if it is running), but clear the
-	 * bit so that it turns itself off on its next wake up and goes back
-	 * to the long hangcheck interval if still required.
-	 */
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+	struct intel_context *ce;
+	struct i915_request *rq;
 
-	if (b->irq_enabled)
-		irq_enable(engine);
-	else
-		irq_disable(engine);
-
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-}
-
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	if (list_empty(&b->signalers))
+		return;
 
-	/* The engines should be idle and all requests accounted for! */
-	WARN_ON(READ_ONCE(b->irq_wait));
-	WARN_ON(!RB_EMPTY_ROOT(&b->waiters));
-	WARN_ON(!list_empty(&b->signals));
+	drm_printf(p, "Signals:\n");
 
-	if (!IS_ERR_OR_NULL(b->signaler))
-		kthread_stop(b->signaler);
+	spin_lock_irq(&b->irq_lock);
+	list_for_each_entry(ce, &b->signalers, signal_link) {
+		list_for_each_entry(rq, &ce->signals, signal_link) {
+			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
+				   rq->fence.context, rq->fence.seqno,
+				   i915_request_completed(rq) ? "!" :
+				   i915_request_started(rq) ? "*" :
+				   "",
+				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
+		}
+	}
+	spin_unlock_irq(&b->irq_lock);
 
-	cancel_fake_irq(engine);
+	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
+		drm_printf(p, "Fake irq active\n");
 }
-
-#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-#include "selftests/intel_breadcrumbs.c"
-#endif
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2a4c547240a1..1d9157bf96ae 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -458,12 +458,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
 {
 	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-
-	/* After manually advancing the seqno, fake the interrupt in case
-	 * there are any waiters for that seqno.
-	 */
-	intel_engine_wakeup(engine);
-
 	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
 }
 
@@ -667,16 +661,10 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 		}
 	}
 
-	ret = intel_engine_init_breadcrumbs(engine);
-	if (ret)
-		goto err_unpin_preempt;
+	intel_engine_init_breadcrumbs(engine);
 
 	return 0;
 
-err_unpin_preempt:
-	if (i915->preempt_context)
-		__intel_context_unpin(i915->preempt_context, engine);
-
 err_unpin_kernel:
 	__intel_context_unpin(i915->kernel_context, engine);
 	return ret;
@@ -1236,12 +1224,14 @@ static void print_request(struct drm_printer *m,
 
 	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
 
-	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
+	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
 		   prefix,
 		   rq->global_seqno,
 		   i915_request_completed(rq) ? "!" :
 		   i915_request_started(rq) ? "*" :
 		   "",
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &rq->fence.flags) ?  "+" : "",
 		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
@@ -1434,12 +1424,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
 {
-	struct intel_breadcrumbs * const b = &engine->breadcrumbs;
 	struct i915_gpu_error * const error = &engine->i915->gpu_error;
 	struct i915_request *rq;
 	intel_wakeref_t wakeref;
-	unsigned long flags;
-	struct rb_node *rb;
 
 	if (header) {
 		va_list ap;
@@ -1507,21 +1494,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
 	intel_execlists_show_requests(engine, m, print_request, 8);
 
-	spin_lock_irqsave(&b->rb_lock, flags);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		drm_printf(m, "\t%s [%d:%c] waiting for %x\n",
-			   w->tsk->comm, w->tsk->pid,
-			   task_state_to_char(w->tsk),
-			   w->seqno);
-	}
-	spin_unlock_irqrestore(&b->rb_lock, flags);
-
 	drm_printf(m, "HWSP:\n");
 	hexdump(m, engine->status_page.addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	intel_engine_print_breadcrumbs(engine, m);
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f6b30eb46263..bd44ea41d7ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -483,8 +483,8 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 
 	for (i = 0; i < GEN7_XCS_WA; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_INDEX_ADDR;
-		*cs++ = rq->global_seqno;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
 	}
 
 	*cs++ = MI_FLUSH_DW;
@@ -734,7 +734,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	}
 
 	/* Papering over lost _interrupts_ immediately following the restart */
-	intel_engine_wakeup(engine);
+	intel_engine_queue_breadcrumbs(engine);
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d3d4f3667afb..b78cb9bd4bc2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,7 @@
 #include <drm/drm_util.h>
 
 #include <linux/hashtable.h>
+#include <linux/irq_work.h>
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
@@ -376,22 +377,19 @@ struct intel_engine_cs {
 	 * the overhead of waking that client is much preferred.
 	 */
 	struct intel_breadcrumbs {
-		spinlock_t irq_lock; /* protects irq_*; irqsafe */
-		struct intel_wait *irq_wait; /* oldest waiter by retirement */
+		spinlock_t irq_lock;
+		struct list_head signalers;
 
-		spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
-		struct rb_root waiters; /* sorted by retirement, priority */
-		struct list_head signals; /* sorted by retirement */
-		struct task_struct *signaler; /* used for fence signalling */
+		struct irq_work irq_work;
 
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		struct timer_list hangcheck; /* detect missed interrupts */
 
 		unsigned int hangcheck_interrupts;
 		unsigned int irq_enabled;
-		unsigned int irq_count;
 
-		bool irq_armed : 1;
+		bool irq_armed;
+		bool irq_fired;
 	} breadcrumbs;
 
 	struct {
@@ -880,83 +878,32 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
-/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
-
-static inline void intel_wait_init(struct intel_wait *wait)
-{
-	wait->tsk = current;
-	wait->request = NULL;
-}
-
-static inline void intel_wait_init_for_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->tsk = current;
-	wait->seqno = seqno;
-}
-
-static inline bool intel_wait_has_seqno(const struct intel_wait *wait)
-{
-	return wait->seqno;
-}
-
-static inline bool
-intel_wait_update_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->seqno = seqno;
-	return intel_wait_has_seqno(wait);
-}
-
-static inline bool
-intel_wait_update_request(struct intel_wait *wait,
-			  const struct i915_request *rq)
-{
-	return intel_wait_update_seqno(wait, i915_request_global_seqno(rq));
-}
-
-static inline bool
-intel_wait_check_seqno(const struct intel_wait *wait, u32 seqno)
-{
-	return wait->seqno == seqno;
-}
-
-static inline bool
-intel_wait_check_request(const struct intel_wait *wait,
-			 const struct i915_request *rq)
-{
-	return intel_wait_check_seqno(wait, i915_request_global_seqno(rq));
-}
-
-static inline bool intel_wait_complete(const struct intel_wait *wait)
-{
-	return RB_EMPTY_NODE(&wait->node);
-}
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait);
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait);
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup);
+bool intel_engine_enable_signaling(struct i915_request *request);
 void intel_engine_cancel_signaling(struct i915_request *request);
 
-static inline bool intel_engine_has_waiter(const struct intel_engine_cs *engine)
-{
-	return READ_ONCE(engine->breadcrumbs.irq_wait);
-}
-
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine);
-#define ENGINE_WAKEUP_WAITER BIT(0)
-#define ENGINE_WAKEUP_ASLEEP BIT(1)
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
 void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
 
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
 
+static inline void
+intel_engine_queue_breadcrumbs(struct intel_engine_cs *engine)
+{
+	irq_work_queue(&engine->breadcrumbs.irq_work);
+}
+
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine);
+
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p);
+
 static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
 {
 	memset(batch, 0, 6 * sizeof(u32));
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 4a83a1c6c406..88e5ab586337 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -15,7 +15,6 @@ selftest(scatterlist, scatterlist_mock_selftests)
 selftest(syncmap, i915_syncmap_mock_selftests)
 selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
-selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
 selftest(timelines, i915_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 4d4b86b5fa11..a5359a03bfec 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -25,9 +25,12 @@
 #include <linux/prime_numbers.h>
 
 #include "../i915_selftest.h"
+#include "i915_random.h"
 #include "igt_live_test.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
+#include "mock_drm.h"
 #include "mock_gem_device.h"
 
 static int igt_add_request(void *arg)
@@ -247,6 +250,239 @@ static int igt_request_rewind(void *arg)
 	return err;
 }
 
+struct smoketest {
+	struct intel_engine_cs *engine;
+	struct i915_gem_context **contexts;
+	unsigned int ncontexts, max_batch;
+	atomic_long_t num_waits, num_fences;
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,
+					      struct intel_engine_cs *);
+
+};
+
+static struct i915_request *
+__mock_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return mock_request(engine, ctx, 0);
+}
+
+static struct i915_request *
+__live_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return i915_request_alloc(engine, ctx);
+}
+
+static int __igt_breadcrumbs_smoketest(void *arg)
+{
+	struct smoketest *t = arg;
+	struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
+	struct i915_request **requests;
+	I915_RND_STATE(prng);
+	const unsigned int total = 4 * t->ncontexts + 1;
+	const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
+	unsigned int num_waits = 0, num_fences = 0;
+	unsigned int *order;
+	int err = 0;
+
+	requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
+	if (!requests)
+		return -ENOMEM;
+
+	order = i915_random_order(total, &prng);
+	if (!order) {
+		err = -ENOMEM;
+		goto out_requests;
+	}
+
+	while (!kthread_should_stop()) {
+		struct i915_sw_fence *submit, *wait;
+		unsigned int n, count;
+
+		submit = heap_fence_create(GFP_KERNEL);
+		if (!submit) {
+			err = -ENOMEM;
+			break;
+		}
+
+		wait = heap_fence_create(GFP_KERNEL);
+		if (!wait) {
+			i915_sw_fence_commit(submit);
+			heap_fence_put(submit);
+			err = ENOMEM;
+			break;
+		}
+
+		i915_random_reorder(order, total, &prng);
+		count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
+
+		for (n = 0; n < count; n++) {
+			struct i915_gem_context *ctx =
+				t->contexts[order[n] % t->ncontexts];
+			struct i915_request *rq;
+
+			mutex_lock(BKL);
+
+			rq = t->request_alloc(ctx, t->engine);
+			if (IS_ERR(rq)) {
+				mutex_unlock(BKL);
+				err = PTR_ERR(rq);
+				count = n;
+				break;
+			}
+
+			err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							       submit,
+							       GFP_KERNEL);
+
+			requests[n] = i915_request_get(rq);
+			i915_request_add(rq);
+
+			mutex_unlock(BKL);
+
+			if (err >= 0)
+				err = i915_sw_fence_await_dma_fence(wait,
+								    &rq->fence,
+								    0,
+								    GFP_KERNEL);
+			if (err < 0) {
+				i915_request_put(rq);
+				count = n;
+				break;
+			}
+		}
+
+		i915_sw_fence_commit(submit);
+		i915_sw_fence_commit(wait);
+
+		if (!wait_event_timeout(wait->wait,
+					i915_sw_fence_done(wait),
+					HZ / 2)) {
+			struct i915_request *rq = requests[count - 1];
+
+			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
+			       count,
+			       rq->fence.context, rq->fence.seqno,
+			       t->engine->name);
+			i915_gem_set_wedged(t->engine->i915);
+			GEM_BUG_ON(!i915_request_completed(rq));
+			i915_sw_fence_wait(wait);
+			err = -EIO;
+		}
+
+		for (n = 0; n < count; n++) {
+			struct i915_request *rq = requests[n];
+
+			if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				      &rq->fence.flags)) {
+				pr_err("%llu:%llu was not signaled!\n",
+				       rq->fence.context, rq->fence.seqno);
+				err = -EINVAL;
+			}
+
+			i915_request_put(rq);
+		}
+
+		heap_fence_put(wait);
+		heap_fence_put(submit);
+
+		if (err < 0)
+			break;
+
+		num_fences += count;
+		num_waits++;
+
+		cond_resched();
+	}
+
+	atomic_long_add(num_fences, &t->num_fences);
+	atomic_long_add(num_waits, &t->num_waits);
+
+	kfree(order);
+out_requests:
+	kfree(requests);
+	return err;
+}
+
+static int mock_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t = {
+		.engine = i915->engine[RCS],
+		.ncontexts = 1024,
+		.max_batch = 1024,
+		.request_alloc = __mock_request_alloc
+	};
+	unsigned int ncpus = num_online_cpus();
+	struct task_struct **threads;
+	unsigned int n;
+	int ret = 0;
+
+	threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
+	if (!threads)
+		return -ENOMEM;
+
+	t.contexts =
+		kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
+	if (!t.contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+	for (n = 0; n < t.ncontexts; n++) {
+		t.contexts[n] = mock_context(t.engine->i915, "mock");
+		if (!t.contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+
+	for (n = 0; n < ncpus; n++) {
+		threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
+					 &t, "igt/%d", n);
+		if (IS_ERR(threads[n])) {
+			ret = PTR_ERR(threads[n]);
+			ncpus = n;
+			break;
+		}
+
+		get_task_struct(threads[n]);
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+	for (n = 0; n < ncpus; n++) {
+		int err;
+
+		err = kthread_stop(threads[n]);
+		if (err < 0 && !ret)
+			ret = err;
+
+		put_task_struct(threads[n]);
+	}
+	pr_info("Completed %lu waits for %lu fence across %d cpus\n",
+		atomic_long_read(&t.num_waits),
+		atomic_long_read(&t.num_fences),
+		ncpus);
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+out_contexts:
+	for (n = 0; n < t.ncontexts; n++) {
+		if (!t.contexts[n])
+			break;
+		mock_context_close(t.contexts[n]);
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+	kfree(t.contexts);
+out_threads:
+	kfree(threads);
+
+	return ret;
+}
+
 int i915_request_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
@@ -254,6 +490,7 @@ int i915_request_mock_selftests(void)
 		SUBTEST(igt_wait_request),
 		SUBTEST(igt_fence_wait),
 		SUBTEST(igt_request_rewind),
+		SUBTEST(mock_breadcrumbs_smoketest),
 	};
 	struct drm_i915_private *i915;
 	intel_wakeref_t wakeref;
@@ -812,6 +1049,166 @@ static int live_sequential_engines(void *arg)
 	return err;
 }
 
+static int
+max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+	int ret;
+
+	/*
+	 * Before execlists, all contexts share the same ringbuffer. With
+	 * execlists, each context/engine has a separate ringbuffer and
+	 * for the purposes of this test, inexhaustible.
+	 *
+	 * For the global ringbuffer though, we have to be very careful
+	 * that we do not wrap while preventing the execution of requests
+	 * with a unsignaled fence.
+	 */
+	if (HAS_EXECLISTS(ctx->i915))
+		return INT_MAX;
+
+	rq = i915_request_alloc(engine, ctx);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+	} else {
+		int sz;
+
+		ret = rq->ring->size - rq->reserved_space;
+		i915_request_add(rq);
+
+		sz = rq->ring->emit - rq->head;
+		if (sz < 0)
+			sz += rq->ring->size;
+		ret /= sz;
+		ret /= 2; /* leave half spare, in case of emergency! */
+
+		/* One ring interleaved between requests from all cpus */
+		ret /= num_online_cpus() + 1;
+	}
+
+	return ret;
+}
+
+static int live_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t[I915_NUM_ENGINES];
+	unsigned int ncpus = num_online_cpus();
+	unsigned long num_waits, num_fences;
+	struct intel_engine_cs *engine;
+	struct task_struct **threads;
+	struct igt_live_test live;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	struct drm_file *file;
+	unsigned int n;
+	int ret = 0;
+
+	wakeref = intel_runtime_pm_get(i915);
+
+	file = mock_file(i915);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out_rpm;
+	}
+
+	threads = kcalloc(ncpus * I915_NUM_ENGINES,
+			  sizeof(*threads),
+			  GFP_KERNEL);
+	if (!threads)
+		return -ENOMEM;
+
+	memset(&t[0], 0, sizeof(t[0]));
+	t[0].request_alloc = __live_request_alloc;
+	t[0].ncontexts = 64;
+	t[0].contexts = kmalloc_array(t[0].ncontexts,
+				      sizeof(*t[0].contexts),
+				      GFP_KERNEL);
+	if (!t[0].contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&i915->drm.struct_mutex);
+	for (n = 0; n < t[0].ncontexts; n++) {
+		t[0].contexts[n] = live_context(i915, file);
+		if (!t[0].contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+
+	ret = igt_live_test_begin(&live, i915, __func__, "");
+	if (ret)
+		goto out_contexts;
+
+	for_each_engine(engine, i915, id) {
+		t[id] = t[0];
+		t[id].engine = engine;
+		t[id].max_batch = max_batches(t[0].contexts[0], engine);
+		if (t[id].max_batch < 0) {
+			ret = t[id].max_batch;
+			goto out_flush;
+		}
+		pr_debug("Limiting batches to %d requests on %s\n",
+			 t[id].max_batch, engine->name);
+
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk;
+
+			tsk = kthread_run(__igt_breadcrumbs_smoketest,
+					  &t[id], "igt/%d.%d", id, n);
+			if (IS_ERR(tsk)) {
+				ret = PTR_ERR(tsk);
+				goto out_flush;
+			}
+
+			get_task_struct(tsk);
+			threads[id * ncpus + n] = tsk;
+		}
+	}
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+out_flush:
+	num_waits = 0;
+	num_fences = 0;
+	for_each_engine(engine, i915, id) {
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk = threads[id * ncpus + n];
+			int err;
+
+			if (!tsk)
+				continue;
+
+			err = kthread_stop(tsk);
+			if (err < 0 && !ret)
+				ret = err;
+
+			put_task_struct(tsk);
+		}
+
+		num_waits += atomic_long_read(&t[id].num_waits);
+		num_fences += atomic_long_read(&t[id].num_fences);
+	}
+	pr_info("Completed %lu waits for %lu fence across %d engines and %d cpus\n",
+		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
+
+	mutex_lock(&i915->drm.struct_mutex);
+	ret = igt_live_test_end(&live) ?: ret;
+out_contexts:
+	mutex_unlock(&i915->drm.struct_mutex);
+	kfree(t[0].contexts);
+out_threads:
+	kfree(threads);
+	mock_file_free(i915, file);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+
+	return ret;
+}
+
 int i915_request_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -819,6 +1216,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_all_engines),
 		SUBTEST(live_sequential_engines),
 		SUBTEST(live_empty_request),
+		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
 	if (i915_terminally_wedged(&i915->gpu_error))
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0e70df0230b8..9ebd9225684e 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -185,11 +185,6 @@ void igt_spinner_fini(struct igt_spinner *spin)
 
 bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
 {
-	if (!wait_event_timeout(rq->execute,
-				READ_ONCE(rq->global_seqno),
-				msecs_to_jiffies(10)))
-		return false;
-
 	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
 					       rq->fence.seqno),
 			     10) &&
diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
deleted file mode 100644
index f03b407fdbe2..000000000000
--- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
+++ /dev/null
@@ -1,470 +0,0 @@
-/*
- * Copyright © 2016 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- *
- */
-
-#include "../i915_selftest.h"
-#include "i915_random.h"
-
-#include "mock_gem_device.h"
-#include "mock_engine.h"
-
-static int check_rbtree(struct intel_engine_cs *engine,
-			const unsigned long *bitmap,
-			const struct intel_wait *waiters,
-			const int count)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node *rb;
-	int n;
-
-	if (&b->irq_wait->node != rb_first(&b->waiters)) {
-		pr_err("First waiter does not match first element of wait-tree\n");
-		return -EINVAL;
-	}
-
-	n = find_first_bit(bitmap, count);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = container_of(rb, typeof(*w), node);
-		int idx = w - waiters;
-
-		if (!test_bit(idx, bitmap)) {
-			pr_err("waiter[%d, seqno=%d] removed but still in wait-tree\n",
-			       idx, w->seqno);
-			return -EINVAL;
-		}
-
-		if (n != idx) {
-			pr_err("waiter[%d, seqno=%d] does not match expected next element in tree [%d]\n",
-			       idx, w->seqno, n);
-			return -EINVAL;
-		}
-
-		n = find_next_bit(bitmap, count, n + 1);
-	}
-
-	return 0;
-}
-
-static int check_completion(struct intel_engine_cs *engine,
-			    const unsigned long *bitmap,
-			    const struct intel_wait *waiters,
-			    const int count)
-{
-	int n;
-
-	for (n = 0; n < count; n++) {
-		if (intel_wait_complete(&waiters[n]) != !!test_bit(n, bitmap))
-			continue;
-
-		pr_err("waiter[%d, seqno=%d] is %s, but expected %s\n",
-		       n, waiters[n].seqno,
-		       intel_wait_complete(&waiters[n]) ? "complete" : "active",
-		       test_bit(n, bitmap) ? "active" : "complete");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int check_rbtree_empty(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	if (b->irq_wait) {
-		pr_err("Empty breadcrumbs still has a waiter\n");
-		return -EINVAL;
-	}
-
-	if (!RB_EMPTY_ROOT(&b->waiters)) {
-		pr_err("Empty breadcrumbs, but wait-tree not empty\n");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int igt_random_insert_remove(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned int *order;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	order = i915_random_order(count, &prng);
-	if (!order)
-		goto out_bitmap;
-
-	for (n = 0; n < count; n++)
-		intel_wait_init_for_seqno(&waiters[n], seqno_bias + n);
-
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_order;
-
-	/* Add and remove waiters into the rbtree in random order. At each
-	 * step, we verify that the rbtree is correctly ordered.
-	 */
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_add_wait(engine, &waiters[i]);
-		__set_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	i915_random_reorder(order, count, &prng);
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_remove_wait(engine, &waiters[i]);
-		__clear_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	err = check_rbtree_empty(engine);
-out_order:
-	kfree(order);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-static int igt_insert_complete(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n, m;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	for (n = 0; n < count; n++) {
-		intel_wait_init_for_seqno(&waiters[n], n + seqno_bias);
-		intel_engine_add_wait(engine, &waiters[n]);
-		__set_bit(n, bitmap);
-	}
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_bitmap;
-
-	/* On each step, we advance the seqno so that several waiters are then
-	 * complete (we increase the seqno by increasingly larger values to
-	 * retire more and more waiters at once). All retired waiters should
-	 * be woken and removed from the rbtree, and so that we check.
-	 */
-	for (n = 0; n < count; n = m) {
-		int seqno = 2 * n;
-
-		GEM_BUG_ON(find_first_bit(bitmap, count) != n);
-
-		if (intel_wait_complete(&waiters[n])) {
-			pr_err("waiter[%d, seqno=%d] completed too early\n",
-			       n, waiters[n].seqno);
-			err = -EINVAL;
-			goto out_bitmap;
-		}
-
-		/* complete the following waiters */
-		mock_seqno_advance(engine, seqno + seqno_bias);
-		for (m = n; m <= seqno; m++) {
-			if (m == count)
-				break;
-
-			GEM_BUG_ON(!test_bit(m, bitmap));
-			__clear_bit(m, bitmap);
-		}
-
-		intel_engine_remove_wait(engine, &waiters[n]);
-		RB_CLEAR_NODE(&waiters[n].node);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("rbtree corrupt after seqno advance to %d\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-
-		err = check_completion(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("completions after seqno advance to %d failed\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-	}
-
-	err = check_rbtree_empty(engine);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-struct igt_wakeup {
-	struct task_struct *tsk;
-	atomic_t *ready, *set, *done;
-	struct intel_engine_cs *engine;
-	unsigned long flags;
-#define STOP 0
-#define IDLE 1
-	wait_queue_head_t *wq;
-	u32 seqno;
-};
-
-static bool wait_for_ready(struct igt_wakeup *w)
-{
-	DEFINE_WAIT(ready);
-
-	set_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->done))
-		wake_up_var(w->done);
-
-	if (test_bit(STOP, &w->flags))
-		goto out;
-
-	for (;;) {
-		prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
-		if (atomic_read(w->ready) == 0)
-			break;
-
-		schedule();
-	}
-	finish_wait(w->wq, &ready);
-
-out:
-	clear_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->set))
-		wake_up_var(w->set);
-
-	return !test_bit(STOP, &w->flags);
-}
-
-static int igt_wakeup_thread(void *arg)
-{
-	struct igt_wakeup *w = arg;
-	struct intel_wait wait;
-
-	while (wait_for_ready(w)) {
-		GEM_BUG_ON(kthread_should_stop());
-
-		intel_wait_init_for_seqno(&wait, w->seqno);
-		intel_engine_add_wait(w->engine, &wait);
-		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
-					      w->seqno))
-				break;
-
-			if (test_bit(STOP, &w->flags)) /* emergency escape */
-				break;
-
-			schedule();
-		}
-		intel_engine_remove_wait(w->engine, &wait);
-		__set_current_state(TASK_RUNNING);
-	}
-
-	return 0;
-}
-
-static void igt_wake_all_sync(atomic_t *ready,
-			      atomic_t *set,
-			      atomic_t *done,
-			      wait_queue_head_t *wq,
-			      int count)
-{
-	atomic_set(set, count);
-	atomic_set(ready, 0);
-	wake_up_all(wq);
-
-	wait_var_event(set, !atomic_read(set));
-	atomic_set(ready, count);
-	atomic_set(done, count);
-}
-
-static int igt_wakeup(void *arg)
-{
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct igt_wakeup *waiters;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-	const int count = 4096;
-	const u32 max_seqno = count / 4;
-	atomic_t ready, set, done;
-	int err = -ENOMEM;
-	int n, step;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	/* Create a large number of threads, each waiting on a random seqno.
-	 * Multiple waiters will be waiting for the same seqno.
-	 */
-	atomic_set(&ready, count);
-	for (n = 0; n < count; n++) {
-		waiters[n].wq = &wq;
-		waiters[n].ready = &ready;
-		waiters[n].set = &set;
-		waiters[n].done = &done;
-		waiters[n].engine = engine;
-		waiters[n].flags = BIT(IDLE);
-
-		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
-					     "i915/igt:%d", n);
-		if (IS_ERR(waiters[n].tsk))
-			goto out_waiters;
-
-		get_task_struct(waiters[n].tsk);
-	}
-
-	for (step = 1; step <= max_seqno; step <<= 1) {
-		u32 seqno;
-
-		/* The waiter threads start paused as we assign them a random
-		 * seqno and reset the engine. Once the engine is reset,
-		 * we signal that the threads may begin their wait upon their
-		 * seqno.
-		 */
-		for (n = 0; n < count; n++) {
-			GEM_BUG_ON(!test_bit(IDLE, &waiters[n].flags));
-			waiters[n].seqno =
-				1 + prandom_u32_state(&prng) % max_seqno;
-		}
-		mock_seqno_advance(engine, 0);
-		igt_wake_all_sync(&ready, &set, &done, &wq, count);
-
-		/* Simulate the GPU doing chunks of work, with one or more
-		 * seqno appearing to finish at the same time. A random number
-		 * of threads will be waiting upon the update and hopefully be
-		 * woken.
-		 */
-		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
-			usleep_range(50, 500);
-			mock_seqno_advance(engine, seqno);
-		}
-		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
-
-		/* With the seqno now beyond any of the waiting threads, they
-		 * should all be woken, see that they are complete and signal
-		 * that they are ready for the next test. We wait until all
-		 * threads are complete and waiting for us (i.e. not a seqno).
-		 */
-		if (!wait_var_event_timeout(&done,
-					    !atomic_read(&done), 10 * HZ)) {
-			pr_err("Timed out waiting for %d remaining waiters\n",
-			       atomic_read(&done));
-			err = -ETIMEDOUT;
-			break;
-		}
-
-		err = check_rbtree_empty(engine);
-		if (err)
-			break;
-	}
-
-out_waiters:
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		set_bit(STOP, &waiters[n].flags);
-	}
-	mock_seqno_advance(engine, INT_MAX); /* wakeup any broken waiters */
-	igt_wake_all_sync(&ready, &set, &done, &wq, n);
-
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		kthread_stop(waiters[n].tsk);
-		put_task_struct(waiters[n].tsk);
-	}
-
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-int intel_breadcrumbs_mock_selftests(void)
-{
-	static const struct i915_subtest tests[] = {
-		SUBTEST(igt_random_insert_remove),
-		SUBTEST(igt_insert_complete),
-		SUBTEST(igt_wakeup),
-	};
-	struct drm_i915_private *i915;
-	int err;
-
-	i915 = mock_gem_device();
-	if (!i915)
-		return -ENOMEM;
-
-	err = i915_subtests(tests, i915->engine[RCS]);
-	drm_dev_put(&i915->drm);
-
-	return err;
-}
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 2c38ea5892d9..7b6f3bea9ef8 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1127,7 +1127,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 	wait_for_completion(&arg.completion);
 
-	if (wait_for(waitqueue_active(&rq->execute), 10)) {
+	if (wait_for(!list_empty(&rq->fence.cb_list), 10)) {
 		struct drm_printer p = drm_info_printer(i915->drm.dev);
 
 		pr_err("igt/evict_vma kthread did not wait\n");
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
index b26f07b55d86..2bfa72c1654b 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
@@ -76,3 +76,57 @@ void timed_fence_fini(struct timed_fence *tf)
 	destroy_timer_on_stack(&tf->timer);
 	i915_sw_fence_fini(&tf->fence);
 }
+
+struct heap_fence {
+	struct i915_sw_fence fence;
+	union {
+		struct kref ref;
+		struct rcu_head rcu;
+	};
+};
+
+static int __i915_sw_fence_call
+heap_fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		break;
+
+	case FENCE_FREE:
+		heap_fence_put(&h->fence);
+	}
+
+	return NOTIFY_DONE;
+}
+
+struct i915_sw_fence *heap_fence_create(gfp_t gfp)
+{
+	struct heap_fence *h;
+
+	h = kmalloc(sizeof(*h), gfp);
+	if (!h)
+		return NULL;
+
+	i915_sw_fence_init(&h->fence, heap_fence_notify);
+	refcount_set(&h->ref.refcount, 2);
+
+	return &h->fence;
+}
+
+static void heap_fence_release(struct kref *ref)
+{
+	struct heap_fence *h = container_of(ref, typeof(*h), ref);
+
+	i915_sw_fence_fini(&h->fence);
+
+	kfree_rcu(h, rcu);
+}
+
+void heap_fence_put(struct i915_sw_fence *fence)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	kref_put(&h->ref, heap_fence_release);
+}
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
index 474aafb92ae1..1f9927e10f3a 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
@@ -39,4 +39,7 @@ struct timed_fence {
 void timed_fence_init(struct timed_fence *tf, unsigned long expires);
 void timed_fence_fini(struct timed_fence *tf);
 
+struct i915_sw_fence *heap_fence_create(gfp_t gfp);
+void heap_fence_put(struct i915_sw_fence *fence);
+
 #endif /* _LIB_SW_FENCE_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 2515cffb4490..e70b4a6cfc67 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -86,17 +86,21 @@ static struct mock_request *first_request(struct mock_engine *engine)
 static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
-	mock_seqno_advance(request->base.engine, request->base.global_seqno);
+	intel_engine_write_global_seqno(request->base.engine,
+					request->base.global_seqno);
 	i915_request_mark_complete(&request->base);
 	GEM_BUG_ON(!i915_request_completed(&request->base));
+
+	intel_engine_queue_breadcrumbs(request->base.engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
 	struct mock_request *request;
+	unsigned long flags;
 
-	spin_lock(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 
 	/* Timer fired, first request is complete */
 	request = first_request(engine);
@@ -116,7 +120,7 @@ static void hw_delay_complete(struct timer_list *t)
 		advance(request);
 	}
 
-	spin_unlock(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 static void mock_context_unpin(struct intel_context *ce)
@@ -191,11 +195,12 @@ static void mock_submit_request(struct i915_request *request)
 	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
+	unsigned long flags;
 
 	i915_request_submit(request);
 	GEM_BUG_ON(!request->global_seqno);
 
-	spin_lock_irq(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 	list_add_tail(&mock->link, &engine->hw_queue);
 	if (mock->link.prev == &engine->hw_queue) {
 		if (mock->delay)
@@ -203,7 +208,7 @@ static void mock_submit_request(struct i915_request *request)
 		else
 			advance(mock);
 	}
-	spin_unlock_irq(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
@@ -273,6 +278,7 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 
 void mock_engine_reset(struct intel_engine_cs *engine)
 {
+	intel_engine_write_global_seqno(engine, 0);
 }
 
 void mock_engine_free(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.h b/drivers/gpu/drm/i915/selftests/mock_engine.h
index 133d0c21790d..b9cc3a245f16 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.h
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.h
@@ -46,10 +46,4 @@ void mock_engine_flush(struct intel_engine_cs *engine);
 void mock_engine_reset(struct intel_engine_cs *engine);
 void mock_engine_free(struct intel_engine_cs *engine);
 
-static inline void mock_seqno_advance(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-	intel_engine_wakeup(engine);
-}
-
 #endif /* !__MOCK_ENGINE_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 29/34] drm/i915: Drop fake breadcrumb irq
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (27 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-24 17:55   ` Tvrtko Ursulin
  2019-01-21 22:21 ` [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
                   ` (11 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Missed breadcrumb detection is defunct due to the tight coupling with
dma_fence signaling and the myriad ways we may signal fences from
everywhere but from an interrupt, i.e. we frequently signal a fence
before we even see its interrupt. This means that even if we miss an
interrupt for a fence, it still is signaled before our breadcrumb
hangcheck fires, so simplify the breadcrumb hangchecking by moving it
into the GPU hangcheck and forgo fake interrupts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  93 -----------
 drivers/gpu/drm/i915/i915_gpu_error.c         |   2 -
 drivers/gpu/drm/i915/i915_gpu_error.h         |   5 -
 drivers/gpu/drm/i915/intel_breadcrumbs.c      | 147 +-----------------
 drivers/gpu/drm/i915/intel_hangcheck.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h       |   5 -
 .../gpu/drm/i915/selftests/igt_live_test.c    |   7 -
 7 files changed, 5 insertions(+), 256 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d7764e62e9b4..c2aaf010c3d1 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1321,9 +1321,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   intel_engine_last_submit(engine),
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
-		seq_printf(m, "\tfake irq active? %s\n",
-			   yesno(test_bit(engine->id,
-					  &dev_priv->gpu_error.missed_irq_rings)));
 
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
@@ -3874,94 +3871,6 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
 			i915_wedged_get, i915_wedged_set,
 			"%llu\n");
 
-static int
-fault_irq_set(struct drm_i915_private *i915,
-	      unsigned long *irq,
-	      unsigned long val)
-{
-	int err;
-
-	err = mutex_lock_interruptible(&i915->drm.struct_mutex);
-	if (err)
-		return err;
-
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED |
-				     I915_WAIT_INTERRUPTIBLE,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err)
-		goto err_unlock;
-
-	*irq = val;
-	mutex_unlock(&i915->drm.struct_mutex);
-
-	/* Flush idle worker to disarm irq */
-	drain_delayed_work(&i915->gt.idle_work);
-
-	return 0;
-
-err_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
-	return err;
-}
-
-static int
-i915_ring_missed_irq_get(void *data, u64 *val)
-{
-	struct drm_i915_private *dev_priv = data;
-
-	*val = dev_priv->gpu_error.missed_irq_rings;
-	return 0;
-}
-
-static int
-i915_ring_missed_irq_set(void *data, u64 val)
-{
-	struct drm_i915_private *i915 = data;
-
-	return fault_irq_set(i915, &i915->gpu_error.missed_irq_rings, val);
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(i915_ring_missed_irq_fops,
-			i915_ring_missed_irq_get, i915_ring_missed_irq_set,
-			"0x%08llx\n");
-
-static int
-i915_ring_test_irq_get(void *data, u64 *val)
-{
-	struct drm_i915_private *dev_priv = data;
-
-	*val = dev_priv->gpu_error.test_irq_rings;
-
-	return 0;
-}
-
-static int
-i915_ring_test_irq_set(void *data, u64 val)
-{
-	struct drm_i915_private *i915 = data;
-
-	/* GuC keeps the user interrupt permanently enabled for submission */
-	if (USES_GUC_SUBMISSION(i915))
-		return -ENODEV;
-
-	/*
-	 * From icl, we can no longer individually mask interrupt generation
-	 * from each engine.
-	 */
-	if (INTEL_GEN(i915) >= 11)
-		return -ENODEV;
-
-	val &= INTEL_INFO(i915)->ring_mask;
-	DRM_DEBUG_DRIVER("Masking interrupts on rings 0x%08llx\n", val);
-
-	return fault_irq_set(i915, &i915->gpu_error.test_irq_rings, val);
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(i915_ring_test_irq_fops,
-			i915_ring_test_irq_get, i915_ring_test_irq_set,
-			"0x%08llx\n");
-
 #define DROP_UNBOUND	BIT(0)
 #define DROP_BOUND	BIT(1)
 #define DROP_RETIRE	BIT(2)
@@ -4724,8 +4633,6 @@ static const struct i915_debugfs_files {
 } i915_debugfs_files[] = {
 	{"i915_wedged", &i915_wedged_fops},
 	{"i915_cache_sharing", &i915_cache_sharing_fops},
-	{"i915_ring_missed_irq", &i915_ring_missed_irq_fops},
-	{"i915_ring_test_irq", &i915_ring_test_irq_fops},
 	{"i915_gem_drop_caches", &i915_drop_caches_fops},
 #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
 	{"i915_error_state", &i915_error_state_fops},
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 825572127029..0584c8dfa6ae 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -718,8 +718,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 	err_printf(m, "FORCEWAKE: 0x%08x\n", error->forcewake);
 	err_printf(m, "DERRMR: 0x%08x\n", error->derrmr);
 	err_printf(m, "CCID: 0x%08x\n", error->ccid);
-	err_printf(m, "Missed interrupts: 0x%08lx\n",
-		   m->i915->gpu_error.missed_irq_rings);
 
 	for (i = 0; i < error->nfence; i++)
 		err_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 0e184712cbcc..99a53c0cd6da 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -203,8 +203,6 @@ struct i915_gpu_error {
 
 	atomic_t pending_fb_pin;
 
-	unsigned long missed_irq_rings;
-
 	/**
 	 * State variable controlling the reset flow and count
 	 *
@@ -273,9 +271,6 @@ struct i915_gpu_error {
 	 */
 	wait_queue_head_t reset_queue;
 
-	/* For missed irq/seqno simulation. */
-	unsigned long test_irq_rings;
-
 	struct i915_gpu_restart *restart;
 };
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index faeb0083b561..3bdfa63ea4a1 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -91,7 +91,6 @@ bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 
 	spin_lock(&b->irq_lock);
 
-	b->irq_fired = true;
 	if (b->irq_armed && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
@@ -155,86 +154,6 @@ static void signal_irq_work(struct irq_work *work)
 	intel_engine_breadcrumbs_irq(engine);
 }
 
-static unsigned long wait_timeout(void)
-{
-	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
-}
-
-static noinline void missed_breadcrumb(struct intel_engine_cs *engine)
-{
-	if (GEM_SHOW_DEBUG()) {
-		struct drm_printer p = drm_debug_printer(__func__);
-
-		intel_engine_dump(engine, &p,
-				  "%s missed breadcrumb at %pS\n",
-				  engine->name, __builtin_return_address(0));
-	}
-
-	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-}
-
-static void intel_breadcrumbs_hangcheck(struct timer_list *t)
-{
-	struct intel_engine_cs *engine =
-		from_timer(engine, t, breadcrumbs.hangcheck);
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	if (!b->irq_armed)
-		return;
-
-	if (b->irq_fired)
-		goto rearm;
-
-	/*
-	 * We keep the hangcheck timer alive until we disarm the irq, even
-	 * if there are no waiters at present.
-	 *
-	 * If the waiter was currently running, assume it hasn't had a chance
-	 * to process the pending interrupt (e.g, low priority task on a loaded
-	 * system) and wait until it sleeps before declaring a missed interrupt.
-	 *
-	 * If the waiter was asleep (and not even pending a wakeup), then we
-	 * must have missed an interrupt as the GPU has stopped advancing
-	 * but we still have a waiter. Assuming all batches complete within
-	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
-	 */
-	synchronize_hardirq(engine->i915->drm.irq);
-	if (intel_engine_signal_breadcrumbs(engine)) {
-		missed_breadcrumb(engine);
-		mod_timer(&b->fake_irq, jiffies + 1);
-	} else {
-rearm:
-		b->irq_fired = false;
-		mod_timer(&b->hangcheck, wait_timeout());
-	}
-}
-
-static void intel_breadcrumbs_fake_irq(struct timer_list *t)
-{
-	struct intel_engine_cs *engine =
-		from_timer(engine, t, breadcrumbs.fake_irq);
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	/*
-	 * The timer persists in case we cannot enable interrupts,
-	 * or if we have previously seen seqno/interrupt incoherency
-	 * ("missed interrupt" syndrome, better known as a "missed breadcrumb").
-	 * Here the worker will wake up every jiffie in order to kick the
-	 * oldest waiter to do the coherent seqno check.
-	 */
-
-	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
-		return;
-
-	/* If the user has disabled the fake-irq, restore the hangchecking */
-	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings)) {
-		mod_timer(&b->hangcheck, wait_timeout());
-		return;
-	}
-
-	mod_timer(&b->fake_irq, jiffies + 1);
-}
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -257,43 +176,14 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
 	spin_unlock_irq(&b->irq_lock);
 }
 
-static bool use_fake_irq(const struct intel_breadcrumbs *b)
-{
-	const struct intel_engine_cs *engine =
-		container_of(b, struct intel_engine_cs, breadcrumbs);
-
-	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
-		return false;
-
-	/*
-	 * Only start with the heavy weight fake irq timer if we have not
-	 * seen any interrupts since enabling it the first time. If the
-	 * interrupts are still arriving, it means we made a mistake in our
-	 * engine->seqno_barrier(), a timing error that should be transient
-	 * and unlikely to reoccur.
-	 */
-	return !b->irq_fired;
-}
-
-static void enable_fake_irq(struct intel_breadcrumbs *b)
-{
-	/* Ensure we never sleep indefinitely */
-	if (!b->irq_enabled || use_fake_irq(b))
-		mod_timer(&b->fake_irq, jiffies + 1);
-	else
-		mod_timer(&b->hangcheck, wait_timeout());
-}
-
-static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
+static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
-	struct drm_i915_private *i915 = engine->i915;
-	bool enabled;
 
 	lockdep_assert_held(&b->irq_lock);
 	if (b->irq_armed)
-		return false;
+		return;
 
 	/*
 	 * The breadcrumb irq will be disarmed on the interrupt after the
@@ -311,16 +201,8 @@ static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	 * the driver is idle) we disarm the breadcrumbs.
 	 */
 
-	/* No interrupts? Kick the waiter every jiffie! */
-	enabled = false;
-	if (!b->irq_enabled++ &&
-	    !test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
+	if (!b->irq_enabled++)
 		irq_enable(engine);
-		enabled = true;
-	}
-
-	enable_fake_irq(b);
-	return enabled;
 }
 
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
@@ -331,18 +213,6 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&b->signalers);
 
 	init_irq_work(&b->irq_work, signal_irq_work);
-
-	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
-	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
-}
-
-static void cancel_fake_irq(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
-	del_timer_sync(&b->hangcheck);
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
@@ -352,13 +222,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 
 	spin_lock_irqsave(&b->irq_lock, flags);
 
-	/*
-	 * Leave the fake_irq timer enabled (if it is running), but clear the
-	 * bit so that it turns itself off on its next wake up and goes back
-	 * to the long hangcheck interval if still required.
-	 */
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-
 	if (b->irq_enabled)
 		irq_enable(engine);
 	else
@@ -369,7 +232,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
-	cancel_fake_irq(engine);
 }
 
 bool intel_engine_enable_signaling(struct i915_request *rq)
@@ -451,7 +313,4 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 		}
 	}
 	spin_unlock_irq(&b->irq_lock);
-
-	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
-		drm_printf(p, "Fake irq active\n");
 }
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 5662d6fed523..a219c796e56d 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -275,6 +275,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	for_each_engine(engine, dev_priv, id) {
 		struct hangcheck hc;
 
+		intel_engine_signal_breadcrumbs(engine);
+
 		hangcheck_load_sample(engine, &hc);
 		hangcheck_accumulate_sample(engine, &hc);
 		hangcheck_store_sample(engine, &hc);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b78cb9bd4bc2..7eec96cf2a0b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -382,14 +382,9 @@ struct intel_engine_cs {
 
 		struct irq_work irq_work;
 
-		struct timer_list fake_irq; /* used after a missed interrupt */
-		struct timer_list hangcheck; /* detect missed interrupts */
-
-		unsigned int hangcheck_interrupts;
 		unsigned int irq_enabled;
 
 		bool irq_armed;
-		bool irq_fired;
 	} breadcrumbs;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index 5deb485fb942..3e902761cd16 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -35,7 +35,6 @@ int igt_live_test_begin(struct igt_live_test *t,
 		return err;
 	}
 
-	i915->gpu_error.missed_irq_rings = 0;
 	t->reset_global = i915_reset_count(&i915->gpu_error);
 
 	for_each_engine(engine, i915, id)
@@ -75,11 +74,5 @@ int igt_live_test_end(struct igt_live_test *t)
 		return -EIO;
 	}
 
-	if (i915->gpu_error.missed_irq_rings) {
-		pr_err("%s(%s): Missed interrupts on engines %lx\n",
-		       t->func, t->name, i915->gpu_error.missed_irq_rings);
-		return -EIO;
-	}
-
 	return 0;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (28 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 29/34] drm/i915: Drop fake breadcrumb irq Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:37   ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 31/34] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
                   ` (10 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until the entire system is idle, as any other
timeline active on the GPU may still refer back to the already retired
timeline. We both have to delay recycling available cachelines and
unpinning old HWSP until the next idle point (i.e. on parking).

That we have to keep the HWSP alive for external references on HW raises
an interesting conundrum. On a busy system, we may never see a global
idle point, essentially meaning the resource will be leaking until we
are forced to sleep. What we need is a set of RCU primitives for the GPU!
This should also help mitigate the resource starvation issues
promulgating from keeping all logical state pinned until idle (instead
of as currently handled until the next context switch).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |   2 +
 drivers/gpu/drm/i915/i915_request.c  |  34 ++++---
 drivers/gpu/drm/i915/i915_timeline.c | 127 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_timeline.h |   1 +
 4 files changed, 133 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5577e0e1034f..7ca701cf9086 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1981,7 +1981,9 @@ struct drm_i915_private {
 
 			/* Pack multiple timelines' seqnos into the same page */
 			spinlock_t hwsp_lock;
+			struct list_head hwsp_pin_list;
 			struct list_head hwsp_free_list;
+			struct list_head hwsp_dead_list;
 		} timelines;
 
 		struct list_head active_rings;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index cca437ac8a7e..099c6f994b99 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -331,12 +331,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	tl->seqno += tl->has_initial_breadcrumb;
-	return ++tl->seqno;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -538,8 +532,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -614,7 +610,15 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
+	tl = ce->ring->timeline;
+	GEM_BUG_ON(tl == &engine->timeline);
+	ret = i915_timeline_get_seqno(tl, &seqno);
+	if (ret)
+		goto err_free;
+
+	spin_lock_init(&rq->lock);
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	INIT_LIST_HEAD(&rq->active_list);
 	rq->i915 = i915;
@@ -622,16 +626,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
-	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
-
-	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	rq->timeline = tl;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->rcustate = get_state_synchronize_rcu();
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -688,6 +685,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(i915->requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 7bc9164733bc..a0bbc993048b 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -11,8 +11,11 @@
 
 struct i915_timeline_hwsp {
 	struct i915_vma *vma;
+	struct list_head pin_link;
 	struct list_head free_link;
+	struct list_head dead_link;
 	u64 free_bitmap;
+	u64 dead_bitmap;
 };
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -33,8 +36,7 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 	return vma;
 }
 
-static struct i915_vma *
-hwsp_alloc(struct i915_timeline *timeline, int *offset)
+static struct i915_vma *hwsp_alloc(struct i915_timeline *timeline, int *offset)
 {
 	struct drm_i915_private *i915 = timeline->i915;
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
@@ -66,6 +68,7 @@ hwsp_alloc(struct i915_timeline *timeline, int *offset)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->dead_bitmap = 0;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -96,18 +99,11 @@ static void hwsp_free(struct i915_timeline *timeline)
 
 	spin_lock(&gt->hwsp_lock);
 
-	/* As a cacheline becomes available, publish the HWSP on the freelist */
-	if (!hwsp->free_bitmap)
-		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
-
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	/* Defer recycling the HWSP cacheline until after the GPU is idle. */
+	if (!hwsp->dead_bitmap)
+		list_add_tail(&hwsp->dead_link, &gt->hwsp_dead_list);
 
-	/* And if no one is left using it, give the page back to the system */
-	if (hwsp->free_bitmap == ~0ull) {
-		i915_vma_put(hwsp->vma);
-		list_del(&hwsp->free_link);
-		kfree(hwsp);
-	}
+	hwsp->dead_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
 
 	spin_unlock(&gt->hwsp_lock);
 }
@@ -172,7 +168,9 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	INIT_LIST_HEAD(&gt->active_list);
 
 	spin_lock_init(&gt->hwsp_lock);
+	INIT_LIST_HEAD(&gt->hwsp_pin_list);
 	INIT_LIST_HEAD(&gt->hwsp_free_list);
+	INIT_LIST_HEAD(&gt->hwsp_dead_list);
 
 	/* via i915_gem_wait_for_idle() */
 	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
@@ -209,6 +207,7 @@ static void timeline_inactive(struct i915_timeline *tl)
 void i915_timelines_park(struct drm_i915_private *i915)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline_hwsp *hwsp, *hn;
 	struct i915_timeline *timeline;
 
 	mutex_lock(&gt->mutex);
@@ -222,6 +221,38 @@ void i915_timelines_park(struct drm_i915_private *i915)
 		i915_syncmap_free(&timeline->sync);
 	}
 	mutex_unlock(&gt->mutex);
+
+	/*
+	 * Now the system is idle, we can be sure that there are no more
+	 * references to our old HWSP pages remaining on the HW, so we
+	 * can return the pages back to the system.
+	 */
+	spin_lock(&gt->hwsp_lock);
+
+	list_for_each_entry_safe(hwsp, hn, &gt->hwsp_pin_list, pin_link) {
+		INIT_LIST_HEAD(&hwsp->pin_link);
+		i915_vma_unpin(hwsp->vma);
+	}
+	INIT_LIST_HEAD(&gt->hwsp_pin_list);
+
+	list_for_each_entry_safe(hwsp, hn, &gt->hwsp_dead_list, dead_link) {
+		GEM_BUG_ON(!hwsp->dead_bitmap);
+
+		if (!hwsp->free_bitmap)
+			list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
+
+		hwsp->free_bitmap |= hwsp->dead_bitmap;
+		hwsp->dead_bitmap = 0;
+
+		if (hwsp->free_bitmap == ~0ull) {
+			list_del(&hwsp->free_link);
+			i915_vma_put(hwsp->vma);
+			kfree(hwsp);
+		}
+	}
+	INIT_LIST_HEAD(&gt->hwsp_dead_list);
+
+	spin_unlock(&gt->hwsp_lock);
 }
 
 void i915_timeline_fini(struct i915_timeline *timeline)
@@ -259,6 +290,24 @@ i915_timeline_create(struct drm_i915_private *i915,
 	return timeline;
 }
 
+static void
+__i915_timeline_pin_hwsp(struct i915_timeline *tl,
+			 struct i915_timeline_hwsp *hwsp)
+{
+	GEM_BUG_ON(!tl->pin_count);
+
+	if (hwsp && list_empty(&hwsp->pin_link)) {
+		struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
+
+		spin_lock(&gt->hwsp_lock);
+		if (list_empty(&hwsp->pin_link)) {
+			list_add(&hwsp->pin_link, &gt->hwsp_pin_list);
+			__i915_vma_pin(hwsp->vma);
+		}
+		spin_unlock(&gt->hwsp_lock);
+	}
+}
+
 int i915_timeline_pin(struct i915_timeline *tl)
 {
 	int err;
@@ -271,6 +320,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	if (err)
 		goto unpin;
 
+	__i915_timeline_pin_hwsp(tl, tl->hwsp_ggtt->private);
 	timeline_active(tl);
 
 	return 0;
@@ -280,6 +330,53 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	tl->seqno += tl->has_initial_breadcrumb;
+	return ++tl->seqno;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno--;
+	tl->seqno -= tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl, u32 *seqno)
+{
+	struct i915_vma *vma;
+	int offset;
+
+	vma = hwsp_alloc(tl, &offset);
+	if (IS_ERR(vma)) {
+		timeline_rollback(tl);
+		return PTR_ERR(vma);
+	}
+	hwsp_free(tl);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+	tl->hwsp_offset = offset;
+	__i915_timeline_pin_hwsp(tl, vma->private);
+
+	*seqno = timeline_advance(tl);
+	return 0;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl, u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && !i915_timeline_is_global(tl)))
+		return __i915_timeline_get_seqno(tl, seqno);
+
+	return 0;
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -311,8 +408,12 @@ void i915_timelines_fini(struct drm_i915_private *i915)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
+	i915_timelines_park(i915);
+
 	GEM_BUG_ON(!list_empty(&gt->active_list));
+	GEM_BUG_ON(!list_empty(&gt->hwsp_pin_list));
 	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
+	GEM_BUG_ON(!list_empty(&gt->hwsp_dead_list));
 
 	mutex_destroy(&gt->mutex);
 }
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 8caeb66d1cd5..c01b81a85a15 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -149,6 +149,7 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl, u32 *seqno);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
 void i915_timelines_init(struct drm_i915_private *i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 31/34] drm/i915/execlists: Refactor out can_merge_rq()
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (29 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 32/34] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0a2d53f19625..3d8fffa1b6dc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -511,6 +511,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
 	return true;
 }
 
+static bool can_merge_rq(const struct i915_request *prev,
+			 const struct i915_request *next)
+{
+	GEM_BUG_ON(need_preempt(prev->engine, prev, rq_prio(next)));
+
+	if (!can_merge_ctx(prev->hw_context, next->hw_context))
+		return false;
+
+	return true;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -662,9 +673,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
-			GEM_BUG_ON(last &&
-				   need_preempt(engine, last, rq_prio(rq)));
-
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
@@ -676,8 +684,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last &&
-			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
+			if (last && !can_merge_rq(last, rq)) {
+				if (last->hw_context == rq->hw_context)
+					goto done;
+
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -697,7 +707,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				    ctx_single_port_submission(rq->hw_context))
 					goto done;
 
-				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
 				if (submit)
 					port_assign(port, last);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 32/34] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (30 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 31/34] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we know have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

But what AB-BA deadlocks? If you remove B, there can be no deadlock...
The issue is that with a deep ELSP queue, we can queue up a pair of
AB-BA on different engines, thus forming a classic mutual exclusion
deadlock. We side-step that issue by restricting the queue depth to
avoid having multiple semaphores in flight and so we only ever take one
set of locks at a time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c       | 139 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_scheduler.c     |   1 +
 drivers/gpu/drm/i915/i915_scheduler.h     |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
 drivers/gpu/drm/i915/intel_lrc.c          |  13 +-
 8 files changed, 163 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 099c6f994b99..b7554a399c39 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -331,6 +332,66 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kfree(cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+	unsigned long flags;
+
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &signal->fence.flags))
+		return 0;
+
+	cb = kmalloc(sizeof(*cb), gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irqsave(&signal->lock, flags);
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &signal->fence.flags)) {
+		i915_sw_fence_complete(cb->fence);
+		kfree(cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irqrestore(&signal->lock, flags);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -377,6 +438,7 @@ void __i915_request_submit(struct i915_request *request)
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !intel_engine_enable_signaling(request))
 		intel_engine_queue_breadcrumbs(engine);
+	__notify_execute_cb(request);
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -621,6 +683,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		       tl->fence_context, seqno);
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
@@ -693,6 +756,77 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 *cs;
+
+	GEM_BUG_ON(i915_timeline_is_global(from->timeline));
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+
+	/*
+	 * If we know our signaling request has started, we know that it
+	 * must, at least, have passed its initial breadcrumb and that its
+	 * seqno can only increase, therefore any change in its breadcrumb
+	 * must indicate completion. By using a "not equal to start" compare
+	 * we avoid the murky issue of how to handle seqno wraparound in an
+	 * async environment (short answer, we must stop the world whenever
+	 * any context wraps!) as the likelihood of missing one request then
+	 * seeing the same start value for a new request is 1 in 2^31, and
+	 * even then we know that the new request has started and is in
+	 * progress, so we are sure it will complete soon enough (not to
+	 * worry about).
+	 */
+	if (i915_request_started(from)) {
+		cs = intel_ring_begin(to, 4);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		*cs++ = MI_SEMAPHORE_WAIT |
+			MI_SEMAPHORE_GLOBAL_GTT |
+			MI_SEMAPHORE_POLL |
+			MI_SEMAPHORE_SAD_NEQ_SDD;
+		*cs++ = from->fence.seqno - 1;
+		*cs++ = i915_timeline_seqno_address(from->timeline);
+		*cs++ = 0;
+
+		intel_ring_advance(to, cs);
+	} else {
+		int err;
+
+		err = i915_request_await_execution(to, from, gfp);
+		if (err)
+			return err;
+
+		cs = intel_ring_begin(to, 4);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		/*
+		 * Using greater-than-or-equal here means we have to worry
+		 * about seqno wraparound. To side step that issue, we swap
+		 * the timeline HWSP upon wrapping, so that everyone listening
+		 * for the old (pre-wrap) values do not see the much smaller
+		 * (post-wrap) values than they were expecting (and so wait
+		 * forever).
+		 */
+		*cs++ = MI_SEMAPHORE_WAIT |
+			MI_SEMAPHORE_GLOBAL_GTT |
+			MI_SEMAPHORE_POLL |
+			MI_SEMAPHORE_SAD_GTE_SDD;
+		*cs++ = from->fence.seqno;
+		*cs++ = i915_timeline_seqno_address(from->timeline);
+		*cs++ = 0;
+
+		intel_ring_advance(to, cs);
+	}
+
+	to->sched.semaphore = true;
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -716,6 +850,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (HAS_EXECLISTS(to->i915) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 8f78ac97b8d6..d76be15ba0dd 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -109,6 +109,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index fb5d953430e5..c41a36e48d12 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -29,6 +29,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->semaphore = false;
 }
 
 static struct i915_dependency *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..d764cf10536f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -72,6 +72,7 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	bool semaphore;
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index 105e2a9e874a..bafad94f751b 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -106,7 +106,12 @@
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
 #define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3d8fffa1b6dc..b59cfec1d5d4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -332,7 +332,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
+	if (!(prio & I915_PRIORITY_NEWCLIENT) && i915_request_started(active)) {
 		prio |= I915_PRIORITY_NEWCLIENT;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
@@ -516,6 +516,17 @@ static bool can_merge_rq(const struct i915_request *prev,
 {
 	GEM_BUG_ON(need_preempt(prev->engine, prev, rq_prio(next)));
 
+	/*
+	 * To avoid AB-BA deadlocks, we simply restrict ourselves to only
+	 * submitting one semaphore (think HW spinlock) to HW at a time. This
+	 * prevents the execution callback on a later sempahore from being
+	 * queued on another engine, so no cycle can be formed. Preemption
+	 * rules should mean that if this semaphore is preempted, its
+	 * dependency chain is preserved and suitably promoted via PI.
+	 */
+	if (prev->sched.semaphore && !i915_request_started(prev))
+		return false;
+
 	if (!can_merge_ctx(prev->hw_context, next->hw_context))
 		return false;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (31 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 32/34] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-23  0:33   ` Chris Wilson
  2019-01-21 22:21 ` [PATCH 34/34] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
                   ` (7 subsequent siblings)
  40 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 3 +++
 drivers/gpu/drm/i915/i915_scheduler.h | 7 ++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b7554a399c39..815386581f1a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1096,6 +1096,9 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		if (!request->sched.semaphore)
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index d764cf10536f..7f194a8db785 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 struct i915_sched_attr {
 	/**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 34/34] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (32 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-01-21 22:21 ` Chris Wilson
  2019-01-22  0:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption Patchwork
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:21 UTC (permalink / raw)
  To: intel-gfx

To determine whether an engine has 'struck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 ++++---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  5 +++--
 drivers/gpu/drm/i915/intel_hangcheck.c  |  6 +++---
 drivers/gpu/drm/i915/intel_lrc.c        | 19 +++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 28 +++++++++++++++++++------
 drivers/gpu/drm/i915/intel_ringbuffer.h | 19 ++++++++++++++++-
 6 files changed, 67 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c2aaf010c3d1..16a9384de478 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1297,7 +1297,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	with_intel_runtime_pm(dev_priv, wakeref) {
 		for_each_engine(engine, dev_priv, id) {
 			acthd[id] = intel_engine_get_active_head(engine);
-			seqno[id] = intel_engine_get_seqno(engine);
+			seqno[id] = intel_engine_get_hangcheck_seqno(engine);
 		}
 
 		intel_engine_get_instdone(dev_priv->engine[RCS], &instdone);
@@ -1317,8 +1317,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	for_each_engine(engine, dev_priv, id) {
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
-			   engine->hangcheck.seqno, seqno[id],
-			   intel_engine_last_submit(engine),
+			   engine->hangcheck.last_seqno,
+			   seqno[id],
+			   engine->hangcheck.next_seqno,
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1d9157bf96ae..f631ad23a702 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1439,10 +1439,11 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x/%x [%d ms]\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
-		   engine->hangcheck.seqno,
+		   engine->hangcheck.last_seqno,
+		   engine->hangcheck.next_seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..e04b2560369e 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -133,21 +133,21 @@ static void hangcheck_load_sample(struct intel_engine_cs *engine,
 				  struct hangcheck *hc)
 {
 	hc->acthd = intel_engine_get_active_head(engine);
-	hc->seqno = intel_engine_get_seqno(engine);
+	hc->seqno = intel_engine_get_hangcheck_seqno(engine);
 }
 
 static void hangcheck_store_sample(struct intel_engine_cs *engine,
 				   const struct hangcheck *hc)
 {
 	engine->hangcheck.acthd = hc->acthd;
-	engine->hangcheck.seqno = hc->seqno;
+	engine->hangcheck.last_seqno = hc->seqno;
 }
 
 static enum intel_engine_hangcheck_action
 hangcheck_get_action(struct intel_engine_cs *engine,
 		     const struct hangcheck *hc)
 {
-	if (engine->hangcheck.seqno != hc->seqno)
+	if (engine->hangcheck.last_seqno != hc->seqno)
 		return ENGINE_ACTIVE_SEQNO;
 
 	if (intel_engine_is_idle(engine))
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b59cfec1d5d4..2864a9f542aa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -178,6 +178,12 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 		I915_GEM_HWS_INDEX_ADDR);
 }
 
+static inline u32 intel_hws_hangcheck_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_HANGCHECK_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -2106,6 +2112,10 @@ static void gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 				  request->fence.seqno,
 				  i915_timeline_seqno_address(request->timeline));
 
+	cs = gen8_emit_ggtt_write(cs,
+				  intel_engine_next_hangcheck_seqno(request->engine),
+				  intel_hws_hangcheck_address(request->engine));
+
 	cs = gen8_emit_ggtt_write(cs,
 				  request->global_seqno,
 				  intel_hws_seqno_address(request->engine));
@@ -2118,7 +2128,7 @@ static void gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_fini_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
+static const int gen8_emit_fini_breadcrumb_sz = 14 + WA_TAIL_DWORDS;
 
 static void gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 {
@@ -2131,6 +2141,11 @@ static void gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_CS_STALL);
 
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      intel_engine_next_hangcheck_seqno(request->engine),
+				      intel_hws_hangcheck_address(request->engine),
+				      PIPE_CONTROL_CS_STALL);
+
 	cs = gen8_emit_ggtt_write_rcs(cs,
 				      request->global_seqno,
 				      intel_hws_seqno_address(request->engine),
@@ -2144,7 +2159,7 @@ static void gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-static const int gen8_emit_fini_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 20 + WA_TAIL_DWORDS;
 
 static int gen8_init_rcs_context(struct i915_request *rq)
 {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index bd44ea41d7ca..13f2836bb7b4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -456,17 +456,20 @@ static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen6_xcs_emit_breadcrumb_sz = 8;
+static const int gen6_xcs_emit_breadcrumb_sz = 10;
 
 #define GEN7_XCS_WA 32
 static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
@@ -477,6 +480,10 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
@@ -492,11 +499,12 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = 0;
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen7_xcs_emit_breadcrumb_sz = 10 + GEN7_XCS_WA * 3;
+static const int gen7_xcs_emit_breadcrumb_sz = 14 + GEN7_XCS_WA * 3;
 #undef GEN7_XCS_WA
 
 static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
@@ -933,16 +941,21 @@ static void i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_STORE_DWORD_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int i9xx_emit_breadcrumb_sz = 8;
+static const int i9xx_emit_breadcrumb_sz = 12;
 
 #define GEN5_WA_STORES 8 /* must be at least 1! */
 static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
@@ -957,6 +970,10 @@ static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	BUILD_BUG_ON(GEN5_WA_STORES < 1);
 	for (i = 0; i < GEN5_WA_STORES; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
@@ -965,12 +982,11 @@ static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	}
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
 }
-static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 6;
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 8;
 #undef GEN5_WA_STORES
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7eec96cf2a0b..e3ac22181295 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -6,6 +6,7 @@
 
 #include <linux/hashtable.h>
 #include <linux/irq_work.h>
+#include <linux/random.h>
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
@@ -119,7 +120,8 @@ struct intel_instdone {
 
 struct intel_engine_hangcheck {
 	u64 acthd;
-	u32 seqno;
+	u32 last_seqno;
+	u32 next_seqno;
 	unsigned long action_timestamp;
 	struct intel_instdone instdone;
 };
@@ -707,6 +709,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_INDEX_ADDR (I915_GEM_HWS_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
 #define I915_GEM_HWS_PREEMPT_INDEX	0x32
 #define I915_GEM_HWS_PREEMPT_ADDR (I915_GEM_HWS_PREEMPT_INDEX << MI_STORE_DWORD_INDEX_SHIFT)
+#define I915_GEM_HWS_HANGCHECK		0x34
+#define I915_GEM_HWS_HANGCHECK_ADDR (I915_GEM_HWS_HANGCHECK << MI_STORE_DWORD_INDEX_SHIFT)
 #define I915_GEM_HWS_SEQNO		0x40
 #define I915_GEM_HWS_SEQNO_ADDR (I915_GEM_HWS_SEQNO << MI_STORE_DWORD_INDEX_SHIFT)
 #define I915_GEM_HWS_SCRATCH_INDEX	0x80
@@ -1058,4 +1062,17 @@ static inline bool inject_preempt_hang(struct intel_engine_execlists *execlists)
 
 #endif
 
+static inline u32 intel_engine_next_hangcheck_seqno(struct intel_engine_cs *engine)
+{
+	engine->hangcheck.next_seqno =
+		next_pseudo_random32(engine->hangcheck.next_seqno);
+
+	return engine->hangcheck.next_seqno;
+}
+
+static inline u32 intel_engine_get_hangcheck_seqno(struct intel_engine_cs *engine)
+{
+	return intel_read_status_page(engine, I915_GEM_HWS_HANGCHECK);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle
  2019-01-21 22:21 ` [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
@ 2019-01-21 22:37   ` Chris Wilson
  2019-01-21 22:48     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:37 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-01-21 22:21:13)
> In preparation for enabling HW semaphores, we need to keep in flight
> timeline HWSP alive until the entire system is idle, as any other
> timeline active on the GPU may still refer back to the already retired
> timeline. We both have to delay recycling available cachelines and
> unpinning old HWSP until the next idle point (i.e. on parking).
> 
> That we have to keep the HWSP alive for external references on HW raises
> an interesting conundrum. On a busy system, we may never see a global
> idle point, essentially meaning the resource will be leaking until we
> are forced to sleep. What we need is a set of RCU primitives for the GPU!
> This should also help mitigate the resource starvation issues
> promulgating from keeping all logical state pinned until idle (instead
> of as currently handled until the next context switch).

I was resisting adding all the i915_vma_move_to_active() thinking that
it was overkill, but perhaps that is exactly what I mean by
rcu_read_lock(). Hmm. More so that I was trying to avoid having to keep
moving the HWSP from one request to the next (for the write lock), but
that should be for the normal case covered by the context pinning
itself, and for the realloc we can add a write lock to the next rq.

How does that help? Good question.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle
  2019-01-21 22:37   ` Chris Wilson
@ 2019-01-21 22:48     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-21 22:48 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-01-21 22:37:13)
> Quoting Chris Wilson (2019-01-21 22:21:13)
> > In preparation for enabling HW semaphores, we need to keep in flight
> > timeline HWSP alive until the entire system is idle, as any other
> > timeline active on the GPU may still refer back to the already retired
> > timeline. We both have to delay recycling available cachelines and
> > unpinning old HWSP until the next idle point (i.e. on parking).
> > 
> > That we have to keep the HWSP alive for external references on HW raises
> > an interesting conundrum. On a busy system, we may never see a global
> > idle point, essentially meaning the resource will be leaking until we
> > are forced to sleep. What we need is a set of RCU primitives for the GPU!
> > This should also help mitigate the resource starvation issues
> > promulgating from keeping all logical state pinned until idle (instead
> > of as currently handled until the next context switch).
> 
> I was resisting adding all the i915_vma_move_to_active() thinking that
> it was overkill, but perhaps that is exactly what I mean by
> rcu_read_lock(). Hmm. More so that I was trying to avoid having to keep
> moving the HWSP from one request to the next (for the write lock), but
> that should be for the normal case covered by the context pinning
> itself, and for the realloc we can add a write lock to the next rq.

Also because that mechanism is guarded by the struct_mutex and I have an
aversion to struct_mutex...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (33 preceding siblings ...)
  2019-01-21 22:21 ` [PATCH 34/34] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
@ 2019-01-22  0:09 ` Patchwork
  2019-01-22  0:22 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-22  0:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
URL   : https://patchwork.freedesktop.org/series/55528/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
096d4f120d3e drm/i915/execlists: Mark up priority boost on preemption
85c9694f46a4 drm/i915/execlists: Suppress preempting self
-:18: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#18: 
References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")

-:18: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")'
#18: 
References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")

total: 1 errors, 1 warnings, 0 checks, 92 lines checked
58648274db42 drm/i915: Show all active engines on hangcheck
fd236ee5650b drm/i915/selftests: Refactor common live_test framework
-:435: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#435: 
new file mode 100644

-:440: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#440: FILE: drivers/gpu/drm/i915/selftests/igt_live_test.c:1:
+/*

-:531: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#531: FILE: drivers/gpu/drm/i915/selftests/igt_live_test.h:1:
+/*

total: 0 errors, 3 warnings, 0 checks, 496 lines checked
424a90dd6d68 drm/i915/selftests: Track evict objects explicitly
-:12: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#12: 
References: 71fc448c1aaf ("drm/i915/selftests: Make evict tolerant of foreign objects")

-:12: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 71fc448c1aaf ("drm/i915/selftests: Make evict tolerant of foreign objects")'
#12: 
References: 71fc448c1aaf ("drm/i915/selftests: Make evict tolerant of foreign objects")

total: 1 errors, 1 warnings, 0 checks, 256 lines checked
525322709fa3 drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting
-:393: WARNING:LONG_LINE: line over 100 characters
#393: FILE: drivers/gpu/drm/i915/selftests/i915_vma.c:265:
+		VALID(0, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (ggtt->mappable_end - 4096)),

-:416: WARNING:LONG_LINE: line over 100 characters
#416: FILE: drivers/gpu/drm/i915/selftests/i915_vma.c:280:
+		INVALID(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_FIXED | (ggtt->mappable_end - 4096)),

-:435: WARNING:LONG_LINE: line over 100 characters
#435: FILE: drivers/gpu/drm/i915/selftests/i915_vma.c:294:
+		NOSPACE(8192, PIN_GLOBAL | PIN_MAPPABLE | PIN_OFFSET_BIAS | (ggtt->mappable_end - 4096)),

total: 0 errors, 3 warnings, 0 checks, 520 lines checked
82cdc06b4a39 drm/i915: Refactor out intel_context_init()
912b47059d6e drm/i915: Make all GPU resets atomic
-:24: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#24: FILE: drivers/gpu/drm/i915/i915_reset.c:147:
+	udelay(50);

-:30: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#30: FILE: drivers/gpu/drm/i915/i915_reset.c:152:
+	udelay(50);

total: 0 errors, 0 warnings, 2 checks, 111 lines checked
00695a093e1f drm/i915/guc: Disable global reset
f6dd5893f7b8 drm/i915: Remove GPU reset dependence on struct_mutex
-:878: WARNING:MEMORY_BARRIER: memory barrier without comment
#878: FILE: drivers/gpu/drm/i915/i915_reset.c:692:
+	smp_store_mb(i915->gpu_error.restart, NULL);

-:1031: WARNING:IF_0: Consider removing the code enclosed by this #if 0 and its #endif
#1031: FILE: drivers/gpu/drm/i915/i915_reset.c:920:
+#if 0

-:1302: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#1302: FILE: drivers/gpu/drm/i915/intel_hangcheck.c:35:
+	bool wedged:1;

-:1303: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#1303: FILE: drivers/gpu/drm/i915/intel_hangcheck.c:36:
+	bool stalled:1;

total: 0 errors, 4 warnings, 0 checks, 1729 lines checked
faf113b5e213 drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
789901576adf drm/i915: Issue engine resets onto idle engines
4f90b7cb5b32 drm/i915: Stop tracking MRU activity on VMA
3e5e7f676feb drm/i915: Pull VM lists under the VM mutex.
ef26f0a3b3db drm/i915: Move vma lookup to its own lock
-:161: WARNING:USE_SPINLOCK_T: struct spinlock should be spinlock_t
#161: FILE: drivers/gpu/drm/i915/i915_gem_object.h:94:
+		struct spinlock lock;

total: 0 errors, 1 warnings, 0 checks, 290 lines checked
f2c29c8ca52c drm/i915: Always allocate an object/vma for the HWSP
25604144be52 drm/i915: Move list of timelines under its own lock
b048cc2c6473 drm/i915/selftests: Use common mock_engine::advance
744111bb6c5e drm/i915: Tidy common test_bit probing of i915_request->fence.flags
e817b79ab8b6 drm/i915: Introduce concept of per-timeline (context) HWSP
d2f9c373370e drm/i915: Enlarge vma->pin_count
74aa9e1fc5b3 drm/i915: Allocate a status page for each timeline
5f0f0c9295ff drm/i915: Share per-timeline HWSP using a slab suballocator
-:79: CHECK:SPACING: No space is necessary after a cast
#79: FILE: drivers/gpu/drm/i915/i915_timeline.c:44:
+	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);

total: 0 errors, 0 warnings, 1 checks, 416 lines checked
daf8d943452d drm/i915: Track the context's seqno in its own timeline HWSP
-:224: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#224: FILE: drivers/gpu/drm/i915/intel_lrc.c:2073:
 }
+static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;

-:255: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#255: FILE: drivers/gpu/drm/i915/intel_lrc.c:2099:
 }
+static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;

-:281: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#281: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:344:
 }
+static const int gen6_rcs_emit_breadcrumb_sz = 18;

-:304: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#304: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:451:
 }
+static const int gen7_rcs_emit_breadcrumb_sz = 10;

-:325: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#325: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:469:
 }
+static const int gen6_xcs_emit_breadcrumb_sz = 8;

-:353: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#353: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:499:
 }
+static const int gen7_xcs_emit_breadcrumb_sz = 10 + GEN7_XCS_WA * 3;

-:403: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#403: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:945:
 }
+static const int i9xx_emit_breadcrumb_sz = 8;

-:431: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#431: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:973:
 }
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 6;

total: 0 errors, 0 warnings, 8 checks, 471 lines checked
4daf132957fc drm/i915: Track active timelines
6671fcc878bd drm/i915: Identify active requests
-:195: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#195: FILE: drivers/gpu/drm/i915/intel_lrc.c:2101:
 }
+static const int gen8_emit_fini_breadcrumb_sz = 10 + WA_TAIL_DWORDS;

-:207: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#207: FILE: drivers/gpu/drm/i915/intel_lrc.c:2127:
 }
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;

total: 0 errors, 0 warnings, 2 checks, 278 lines checked
063e9cd8eb0a drm/i915: Remove the intel_engine_notify tracepoint
9534b50448fe drm/i915: Replace global breadcrumbs with per-context interrupt tracking
-:18: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#18: 
Before commit 688e6c725816, the solution was simple. Every client waking

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#21: 
688e6c725816 introduced an rbtree so that only the earliest waiter on

-:49: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#49: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:49: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#49: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:1997: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct i915_gem_context *' should also have an identifier name
#1997: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:1997: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct intel_engine_cs *' should also have an identifier name
#1997: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:2021: WARNING:LINE_SPACING: Missing a blank line after declarations
#2021: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:282:
+	struct i915_request **requests;
+	I915_RND_STATE(prng);

-:2428: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#2428: 
deleted file mode 100644

total: 3 errors, 5 warnings, 0 checks, 2387 lines checked
09c15d6b8399 drm/i915: Drop fake breadcrumb irq
6a7f5b9d1dfc drm/i915: Keep timeline HWSP allocated until the system is idle
4c26f95d5c56 drm/i915/execlists: Refactor out can_merge_rq()
f557fb2f2149 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:296: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#296: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:298: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#298: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:299: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#299: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:300: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#300: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:301: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#301: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 259 lines checked
2befa074162f drm/i915: Prioritise non-busywait semaphore workloads
6cedf9409751 drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
-:122: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#122: FILE: drivers/gpu/drm/i915/intel_lrc.c:2131:
 }
+static const int gen8_emit_fini_breadcrumb_sz = 14 + WA_TAIL_DWORDS;

-:143: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#143: FILE: drivers/gpu/drm/i915/intel_lrc.c:2162:
 }
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 20 + WA_TAIL_DWORDS;

-:170: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#170: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:472:
 }
+static const int gen6_xcs_emit_breadcrumb_sz = 10;

-:195: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#195: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:507:
 }
+static const int gen7_xcs_emit_breadcrumb_sz = 14 + GEN7_XCS_WA * 3;

-:218: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#218: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:958:
 }
+static const int i9xx_emit_breadcrumb_sz = 12;

-:243: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#243: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:989:
 }
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 8;

total: 0 errors, 0 warnings, 6 checks, 236 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (34 preceding siblings ...)
  2019-01-22  0:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption Patchwork
@ 2019-01-22  0:22 ` Patchwork
  2019-01-22  0:30 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-22  0:22 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
URL   : https://patchwork.freedesktop.org/series/55528/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Mark up priority boost on preemption
+drivers/gpu/drm/i915/intel_ringbuffer.h:602:23: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Suppress preempting self
Okay!

Commit: drm/i915: Show all active engines on hangcheck
Okay!

Commit: drm/i915/selftests: Refactor common live_test framework
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915/selftests: Track evict objects explicitly
Okay!

Commit: drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting
Okay!

Commit: drm/i915: Refactor out intel_context_init()
Okay!

Commit: drm/i915: Make all GPU resets atomic
Okay!

Commit: drm/i915/guc: Disable global reset
Okay!

Commit: drm/i915: Remove GPU reset dependence on struct_mutex
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3546:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3541:16: warning: expression using sizeof(void)

Commit: drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
Okay!

Commit: drm/i915: Issue engine resets onto idle engines
Okay!

Commit: drm/i915: Stop tracking MRU activity on VMA
Okay!

Commit: drm/i915: Pull VM lists under the VM mutex.
Okay!

Commit: drm/i915: Move vma lookup to its own lock
Okay!

Commit: drm/i915: Always allocate an object/vma for the HWSP
Okay!

Commit: drm/i915: Move list of timelines under its own lock
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3541:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)

Commit: drm/i915/selftests: Use common mock_engine::advance
Okay!

Commit: drm/i915: Tidy common test_bit probing of i915_request->fence.flags
Okay!

Commit: drm/i915: Introduce concept of per-timeline (context) HWSP
Okay!

Commit: drm/i915: Enlarge vma->pin_count
Okay!

Commit: drm/i915: Allocate a status page for each timeline
+./include/linux/mm.h:619:13: error: not a function <noident>
+./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/mm.h:619:13: warning: call with no type!

Commit: drm/i915: Share per-timeline HWSP using a slab suballocator
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+./include/linux/slab.h:664:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/slab.h:664:13: warning: call with no type!

Commit: drm/i915: Track the context's seqno in its own timeline HWSP
Okay!

Commit: drm/i915: Track active timelines
Okay!

Commit: drm/i915: Identify active requests
Okay!

Commit: drm/i915: Remove the intel_engine_notify tracepoint
Okay!

Commit: drm/i915: Replace global breadcrumbs with per-context interrupt tracking
+drivers/gpu/drm/i915/selftests/i915_request.c:284:40: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_request.c:284:40: warning: expression using sizeof(void)
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
-./include/linux/mm.h:619:13: warning: call with no type!
+./include/linux/slab.h:664:13: error: not a function <noident>
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Drop fake breadcrumb irq
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until the system is idle
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3550:16: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
Okay!

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (35 preceding siblings ...)
  2019-01-22  0:22 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-01-22  0:30 ` Patchwork
  2019-01-22  1:35 ` ✗ Fi.CI.IGT: failure " Patchwork
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-22  0:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
URL   : https://patchwork.freedesktop.org/series/55528/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5459 -> Patchwork_12001
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/55528/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12001:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@runner@aborted}:
    - fi-bxt-j4205:       NOTRUN -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12001 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927]

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  
#### Possible fixes ####

  * igt@i915_selftest@live_hangcheck:
    - fi-bwr-2160:        DMESG-FAIL [fdo#108735] -> PASS

  * igt@kms_frontbuffer_tracking@basic:
    - {fi-icl-u2}:        FAIL [fdo#103167] -> PASS

  * igt@pm_rpm@module-reload:
    - {fi-icl-u2}:        DMESG-WARN [fdo#108654] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108622]: https://bugs.freedesktop.org/show_bug.cgi?id=108622
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108735]: https://bugs.freedesktop.org/show_bug.cgi?id=108735
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315


Participating hosts (47 -> 42)
------------------------------

  Additional (1): fi-glk-j4005 
  Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-peppy fi-byt-squawks fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5459 -> Patchwork_12001

  CI_DRM_5459: 0f693a275dd91391b476ada7481cf08f4fe610aa @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4780: 1c1612bdc36b44a704095e7b0ba5542818ce793f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12001: 6cedf9409751062b0d231b0f22dad0cf0d9d3f1d @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

6cedf9409751 drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
2befa074162f drm/i915: Prioritise non-busywait semaphore workloads
f557fb2f2149 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
4c26f95d5c56 drm/i915/execlists: Refactor out can_merge_rq()
6a7f5b9d1dfc drm/i915: Keep timeline HWSP allocated until the system is idle
09c15d6b8399 drm/i915: Drop fake breadcrumb irq
9534b50448fe drm/i915: Replace global breadcrumbs with per-context interrupt tracking
063e9cd8eb0a drm/i915: Remove the intel_engine_notify tracepoint
6671fcc878bd drm/i915: Identify active requests
4daf132957fc drm/i915: Track active timelines
daf8d943452d drm/i915: Track the context's seqno in its own timeline HWSP
5f0f0c9295ff drm/i915: Share per-timeline HWSP using a slab suballocator
74aa9e1fc5b3 drm/i915: Allocate a status page for each timeline
d2f9c373370e drm/i915: Enlarge vma->pin_count
e817b79ab8b6 drm/i915: Introduce concept of per-timeline (context) HWSP
744111bb6c5e drm/i915: Tidy common test_bit probing of i915_request->fence.flags
b048cc2c6473 drm/i915/selftests: Use common mock_engine::advance
25604144be52 drm/i915: Move list of timelines under its own lock
f2c29c8ca52c drm/i915: Always allocate an object/vma for the HWSP
ef26f0a3b3db drm/i915: Move vma lookup to its own lock
3e5e7f676feb drm/i915: Pull VM lists under the VM mutex.
4f90b7cb5b32 drm/i915: Stop tracking MRU activity on VMA
789901576adf drm/i915: Issue engine resets onto idle engines
faf113b5e213 drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
f6dd5893f7b8 drm/i915: Remove GPU reset dependence on struct_mutex
00695a093e1f drm/i915/guc: Disable global reset
912b47059d6e drm/i915: Make all GPU resets atomic
82cdc06b4a39 drm/i915: Refactor out intel_context_init()
525322709fa3 drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting
424a90dd6d68 drm/i915/selftests: Track evict objects explicitly
fd236ee5650b drm/i915/selftests: Refactor common live_test framework
58648274db42 drm/i915: Show all active engines on hangcheck
85c9694f46a4 drm/i915/execlists: Suppress preempting self
096d4f120d3e drm/i915/execlists: Mark up priority boost on preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12001/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (36 preceding siblings ...)
  2019-01-22  0:30 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-01-22  1:35 ` Patchwork
  2019-01-23 12:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2) Patchwork
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-22  1:35 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption
URL   : https://patchwork.freedesktop.org/series/55528/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5459_full -> Patchwork_12001_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12001_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12001_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12001_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_switch@basic-all-heavy:
    - shard-snb:          PASS -> DMESG-FAIL

  * igt@gem_exec_await@wide-all:
    - shard-hsw:          PASS -> FAIL +3
    - shard-snb:          PASS -> FAIL +2

  * igt@gem_mmap_gtt@hang:
    - shard-kbl:          PASS -> FAIL
    - shard-apl:          PASS -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12001_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_eio@reset-stress:
    - shard-snb:          PASS -> INCOMPLETE [fdo#105411]

  * igt@gem_exec_schedule@pi-ringfull-blt:
    - shard-apl:          NOTRUN -> FAIL [fdo#103158]

  * igt@kms_content_protection@legacy:
    - shard-apl:          NOTRUN -> FAIL [fdo#108597]

  * igt@kms_cursor_crc@cursor-128x42-sliding:
    - shard-glk:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-atomic:
    - shard-glk:          PASS -> FAIL [fdo#104873]

  * igt@kms_cursor_legacy@pipe-a-single-move:
    - shard-glk:          PASS -> INCOMPLETE [fdo#103359] / [k.org#198133]

  * igt@kms_flip@dpms-vs-vblank-race:
    - shard-glk:          PASS -> FAIL [fdo#103060]

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-glk:          PASS -> FAIL [fdo#105363]

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
    - shard-glk:          PASS -> FAIL [fdo#103167] +1

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-none:
    - shard-glk:          PASS -> FAIL [fdo#103166] +2

  
#### Possible fixes ####

  * igt@kms_busy@extended-modeset-hang-newfb-render-a:
    - shard-kbl:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-render-b:
    - shard-hsw:          DMESG-WARN [fdo#107956] -> PASS +5

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-snb:          DMESG-WARN [fdo#107956] -> PASS +3

  * igt@kms_busy@extended-pageflip-hang-oldfb-render-b:
    - shard-snb:          {SKIP} [fdo#109271] / [fdo#109278] -> PASS

  * igt@kms_cursor_crc@cursor-256x85-sliding:
    - shard-glk:          FAIL [fdo#103232] -> PASS +1

  * igt@kms_cursor_crc@cursor-64x64-random:
    - shard-apl:          FAIL [fdo#103232] -> PASS +1

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-untiled:
    - shard-snb:          {SKIP} [fdo#109271] -> PASS +4

  * igt@kms_plane@pixel-format-pipe-c-planes:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-glk:          FAIL [fdo#108145] -> PASS +1

  * igt@kms_rotation_crc@multiplane-rotation:
    - shard-kbl:          FAIL -> PASS

  
#### Warnings ####

  * igt@kms_setmode@basic:
    - shard-apl:          INCOMPLETE [fdo#103927] -> FAIL [fdo#99912]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103060]: https://bugs.freedesktop.org/show_bug.cgi?id=103060
  [fdo#103158]: https://bugs.freedesktop.org/show_bug.cgi?id=103158
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104873]: https://bugs.freedesktop.org/show_bug.cgi?id=104873
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108597]: https://bugs.freedesktop.org/show_bug.cgi?id=108597
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (7 -> 5)
------------------------------

  Missing    (2): shard-skl shard-iclb 


Build changes
-------------

    * Linux: CI_DRM_5459 -> Patchwork_12001

  CI_DRM_5459: 0f693a275dd91391b476ada7481cf08f4fe610aa @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4780: 1c1612bdc36b44a704095e7b0ba5542818ce793f @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12001: 6cedf9409751062b0d231b0f22dad0cf0d9d3f1d @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12001/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex.
  2019-01-21 22:20 ` [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex Chris Wilson
@ 2019-01-22  9:09   ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22  9:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:20, Chris Wilson wrote:
> A starting point to counter the pervasive struct_mutex. For the goal of
> avoiding (or at least blocking under them!) global locks during user
> request submission, a simple but important step is being able to manage
> each clients GTT separately. For which, we want to replace using the
> struct_mutex as the guard for all things GTT/VM and switch instead to a
> specific mutex inside i915_address_space.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_gem.c                 | 14 ++++++++------
>   drivers/gpu/drm/i915/i915_gem_evict.c           |  2 ++
>   drivers/gpu/drm/i915/i915_gem_gtt.c             | 15 +++++++++++++--
>   drivers/gpu/drm/i915/i915_gem_shrinker.c        |  4 ++++
>   drivers/gpu/drm/i915/i915_gem_stolen.c          |  2 ++
>   drivers/gpu/drm/i915/i915_vma.c                 | 11 +++++++++++
>   drivers/gpu/drm/i915/selftests/i915_gem_evict.c |  3 +++
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  3 +++
>   8 files changed, 46 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index f45186ddb236..538fa5404603 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -245,18 +245,19 @@ int
>   i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>   			    struct drm_file *file)
>   {
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct i915_ggtt *ggtt = &dev_priv->ggtt;
> +	struct i915_ggtt *ggtt = &to_i915(dev)->ggtt;
>   	struct drm_i915_gem_get_aperture *args = data;
>   	struct i915_vma *vma;
>   	u64 pinned;
>   
> +	mutex_lock(&ggtt->vm.mutex);
> +
>   	pinned = ggtt->vm.reserved;
> -	mutex_lock(&dev->struct_mutex);
>   	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
>   		if (i915_vma_is_pinned(vma))
>   			pinned += vma->node.size;
> -	mutex_unlock(&dev->struct_mutex);
> +
> +	mutex_unlock(&ggtt->vm.mutex);
>   
>   	args->aper_size = ggtt->vm.total;
>   	args->aper_available_size = args->aper_size - pinned;
> @@ -1529,20 +1530,21 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   
>   static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
>   {
> -	struct drm_i915_private *i915;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
>   	struct list_head *list;
>   	struct i915_vma *vma;
>   
>   	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
>   
> +	mutex_lock(&i915->ggtt.vm.mutex);
>   	for_each_ggtt_vma(vma, obj) {
>   		if (!drm_mm_node_allocated(&vma->node))
>   			continue;
>   
>   		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
>   	}
> +	mutex_unlock(&i915->ggtt.vm.mutex);
>   
> -	i915 = to_i915(obj->base.dev);
>   	spin_lock(&i915->mm.obj_lock);
>   	list = obj->bind_count ? &i915->mm.bound_list : &i915->mm.unbound_list;
>   	list_move_tail(&obj->mm.link, list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 5cfe4b75e7d6..dc137701acb8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -432,6 +432,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
>   	}
>   
>   	INIT_LIST_HEAD(&eviction_list);
> +	mutex_lock(&vm->mutex);
>   	list_for_each_entry(vma, &vm->bound_list, vm_link) {
>   		if (i915_vma_is_pinned(vma))
>   			continue;
> @@ -439,6 +440,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
>   		__i915_vma_pin(vma);
>   		list_add(&vma->evict_link, &eviction_list);
>   	}
> +	mutex_unlock(&vm->mutex);
>   
>   	ret = 0;
>   	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 2ad9070a54c1..49b00996a15e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1931,7 +1931,10 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
>   	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
>   
>   	INIT_LIST_HEAD(&vma->obj_link);
> +
> +	mutex_lock(&vma->vm->mutex);
>   	list_add(&vma->vm_link, &vma->vm->unbound_list);
> +	mutex_unlock(&vma->vm->mutex);
>   
>   	return vma;
>   }
> @@ -3504,9 +3507,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
>   
>   	i915_check_and_clear_faults(dev_priv);
>   
> +	mutex_lock(&ggtt->vm.mutex);
> +
>   	/* First fill our portion of the GTT with scratch pages */
>   	ggtt->vm.clear_range(&ggtt->vm, 0, ggtt->vm.total);
> -
>   	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
>   
>   	/* clflush objects bound into the GGTT and rebind them. */
> @@ -3516,19 +3520,26 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
>   		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
>   			continue;
>   
> +		mutex_unlock(&ggtt->vm.mutex);
> +
>   		if (!i915_vma_unbind(vma))
> -			continue;
> +			goto lock;
>   
>   		WARN_ON(i915_vma_bind(vma,
>   				      obj ? obj->cache_level : 0,
>   				      PIN_UPDATE));
>   		if (obj)
>   			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
> +
> +lock:
> +		mutex_lock(&ggtt->vm.mutex);
>   	}
>   
>   	ggtt->vm.closed = false;
>   	i915_ggtt_invalidate(dev_priv);
>   
> +	mutex_unlock(&ggtt->vm.mutex);
> +
>   	if (INTEL_GEN(dev_priv) >= 8) {
>   		struct intel_ppat *ppat = &dev_priv->ppat;
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index a76d6c95c824..6da795c7e62e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -461,6 +461,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
>   					       I915_SHRINK_VMAPS);
>   
>   	/* We also want to clear any cached iomaps as they wrap vmap */
> +	mutex_lock(&i915->ggtt.vm.mutex);
>   	list_for_each_entry_safe(vma, next,
>   				 &i915->ggtt.vm.bound_list, vm_link) {
>   		unsigned long count = vma->node.size >> PAGE_SHIFT;
> @@ -468,9 +469,12 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
>   		if (!vma->iomap || i915_vma_is_active(vma))
>   			continue;
>   
> +		mutex_unlock(&i915->ggtt.vm.mutex);
>   		if (i915_vma_unbind(vma) == 0)
>   			freed_pages += count;
> +		mutex_lock(&i915->ggtt.vm.mutex);
>   	}
> +	mutex_unlock(&i915->ggtt.vm.mutex);
>   
>   out:
>   	shrinker_unlock(i915, unlock);
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index a9e365789686..74a9661479ca 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -702,7 +702,9 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
>   	vma->flags |= I915_VMA_GLOBAL_BIND;
>   	__i915_vma_set_map_and_fenceable(vma);
>   
> +	mutex_lock(&ggtt->vm.mutex);
>   	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
> +	mutex_unlock(&ggtt->vm.mutex);
>   
>   	spin_lock(&dev_priv->mm.obj_lock);
>   	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 7de28baffb8f..dcbd0d345c72 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -213,7 +213,10 @@ vma_create(struct drm_i915_gem_object *obj,
>   	}
>   	rb_link_node(&vma->obj_node, rb, p);
>   	rb_insert_color(&vma->obj_node, &obj->vma_tree);
> +
> +	mutex_lock(&vm->mutex);
>   	list_add(&vma->vm_link, &vm->unbound_list);
> +	mutex_unlock(&vm->mutex);
>   
>   	return vma;
>   
> @@ -656,7 +659,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>   	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>   	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
>   
> +	mutex_lock(&vma->vm->mutex);
>   	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
> +	mutex_unlock(&vma->vm->mutex);
>   
>   	if (vma->obj) {
>   		struct drm_i915_gem_object *obj = vma->obj;
> @@ -689,8 +694,10 @@ i915_vma_remove(struct i915_vma *vma)
>   
>   	vma->ops->clear_pages(vma);
>   
> +	mutex_lock(&vma->vm->mutex);
>   	drm_mm_remove_node(&vma->node);
>   	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
> +	mutex_unlock(&vma->vm->mutex);
>   
>   	/*
>   	 * Since the unbound list is global, only move to that list if
> @@ -802,7 +809,11 @@ static void __i915_vma_destroy(struct i915_vma *vma)
>   	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
>   
>   	list_del(&vma->obj_link);
> +
> +	mutex_lock(&vma->vm->mutex);
>   	list_del(&vma->vm_link);
> +	mutex_unlock(&vma->vm->mutex);
> +
>   	if (vma->obj)
>   		rb_erase(&vma->obj_node, &vma->obj->vma_tree);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index af9b85cb8639..32dce7176f63 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -94,11 +94,14 @@ static int populate_ggtt(struct drm_i915_private *i915,
>   
>   static void unpin_ggtt(struct drm_i915_private *i915)
>   {
> +	struct i915_ggtt *ggtt = &i915->ggtt;
>   	struct i915_vma *vma;
>   
> +	mutex_lock(&ggtt->vm.mutex);
>   	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
>   		if (vma->obj->mm.quirked)
>   			i915_vma_unpin(vma);
> +	mutex_unlock(&ggtt->vm.mutex);
>   }
>   
>   static void cleanup_objects(struct drm_i915_private *i915,
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 8feb4af308ff..3850ef4a5ec8 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1237,7 +1237,10 @@ static void track_vma_bind(struct i915_vma *vma)
>   	__i915_gem_object_pin_pages(obj);
>   
>   	vma->pages = obj->mm.pages;
> +
> +	mutex_lock(&vma->vm->mutex);
>   	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
> +	mutex_unlock(&vma->vm->mutex);
>   }
>   
>   static int exercise_mock(struct drm_i915_private *i915,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance
  2019-01-21 22:21 ` [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance Chris Wilson
@ 2019-01-22  9:33   ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22  9:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> Replace the open-coding of advance with a call instead.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/selftests/mock_engine.c | 17 +++++++----------
>   1 file changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 968a7e139a67..386dfa7e2d5c 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -67,11 +67,10 @@ static struct mock_request *first_request(struct mock_engine *engine)
>   					link);
>   }
>   
> -static void advance(struct mock_engine *engine,
> -		    struct mock_request *request)
> +static void advance(struct mock_request *request)
>   {
>   	list_del_init(&request->link);
> -	mock_seqno_advance(&engine->base, request->base.global_seqno);
> +	mock_seqno_advance(request->base.engine, request->base.global_seqno);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
> @@ -84,7 +83,7 @@ static void hw_delay_complete(struct timer_list *t)
>   	/* Timer fired, first request is complete */
>   	request = first_request(engine);
>   	if (request)
> -		advance(engine, request);
> +		advance(request);
>   
>   	/*
>   	 * Also immediately signal any subsequent 0-delay requests, but
> @@ -96,7 +95,7 @@ static void hw_delay_complete(struct timer_list *t)
>   			break;
>   		}
>   
> -		advance(engine, request);
> +		advance(request);
>   	}
>   
>   	spin_unlock(&engine->hw_lock);
> @@ -180,7 +179,7 @@ static void mock_submit_request(struct i915_request *request)
>   		if (mock->delay)
>   			mod_timer(&engine->hw_delay, jiffies + mock->delay);
>   		else
> -			advance(engine, mock);
> +			advance(mock);
>   	}
>   	spin_unlock_irq(&engine->hw_lock);
>   }
> @@ -240,10 +239,8 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   	del_timer_sync(&mock->hw_delay);
>   
>   	spin_lock_irq(&mock->hw_lock);
> -	list_for_each_entry_safe(request, rn, &mock->hw_queue, link) {
> -		list_del_init(&request->link);
> -		mock_seqno_advance(&mock->base, request->base.global_seqno);
> -	}
> +	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
> +		advance(request);
>   	spin_unlock_irq(&mock->hw_lock);
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags
  2019-01-21 22:21 ` [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags Chris Wilson
@ 2019-01-22  9:35   ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22  9:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> A repeated pattern is to test the signaled bit of our
> request->fence.flags. Make this an inline to shorten a few lines and
> remove unnecessary line continuations.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c          | 3 +--
>   drivers/gpu/drm/i915/i915_request.c      | 2 +-
>   drivers/gpu/drm/i915/i915_request.h      | 5 +++++
>   drivers/gpu/drm/i915/intel_breadcrumbs.c | 3 +--
>   drivers/gpu/drm/i915/intel_lrc.c         | 2 +-
>   drivers/gpu/drm/i915/intel_pm.c          | 2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c  | 3 +--
>   7 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 1abfc3fa76ad..5fd5080c4ccb 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1182,8 +1182,7 @@ static void notify_ring(struct intel_engine_cs *engine)
>   			struct i915_request *waiter = wait->request;
>   
>   			if (waiter &&
> -			    !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -				      &waiter->fence.flags) &&
> +			    !i915_request_signaled(waiter) &&
>   			    intel_wait_check_request(wait, waiter))
>   				rq = i915_request_get(waiter);
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 80232de8e2be..2721a356368f 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -198,7 +198,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
>   	spin_unlock(&engine->timeline.lock);
>   
>   	spin_lock(&rq->lock);
> -	if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
> +	if (!i915_request_signaled(rq))
>   		dma_fence_signal_locked(&rq->fence);
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
>   		intel_engine_cancel_signaling(rq);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index d014b0605445..c0f084ca4f29 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -280,6 +280,11 @@ long i915_request_wait(struct i915_request *rq,
>   #define I915_WAIT_ALL		BIT(3) /* used by i915_gem_object_wait() */
>   #define I915_WAIT_FOR_IDLE_BOOST BIT(4)
>   
> +static inline bool i915_request_signaled(const struct i915_request *rq)
> +{
> +	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
> +}
> +
>   static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
>   					    u32 seqno);
>   static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index 4fad93fe3678..b58915b8708b 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -631,8 +631,7 @@ static int intel_breadcrumbs_signaler(void *arg)
>   				rq->signaling.wait.seqno = 0;
>   				__list_del_entry(&rq->signaling.link);
>   
> -				if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -					      &rq->fence.flags)) {
> +				if (!i915_request_signaled(rq)) {
>   					list_add_tail(&rq->signaling.link,
>   						      &list);
>   					i915_request_get(rq);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index bc65d8006e16..464dd309fa99 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -855,7 +855,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry(rq, &engine->timeline.requests, link) {
>   		GEM_BUG_ON(!rq->global_seqno);
>   
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
> +		if (i915_request_signaled(rq))
>   			continue;
>   
>   		dma_fence_set_error(&rq->fence, -EIO);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 8b63afa3a221..fdc28a3d2936 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6662,7 +6662,7 @@ void gen6_rps_boost(struct i915_request *rq,
>   	if (!rps->enabled)
>   		return;
>   
> -	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
> +	if (i915_request_signaled(rq))
>   		return;
>   
>   	/* Serializes with i915_request_retire() */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 66dc8e2fa353..bc620ae297b4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -876,8 +876,7 @@ static void cancel_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry(request, &engine->timeline.requests, link) {
>   		GEM_BUG_ON(!request->global_seqno);
>   
> -		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -			     &request->fence.flags))
> +		if (i915_request_signaled(request))
>   			continue;
>   
>   		dma_fence_set_error(&request->fence, -EIO);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator
  2019-01-21 22:21 ` [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
@ 2019-01-22 10:47   ` Tvrtko Ursulin
  2019-01-22 11:12     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 10:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> If we restrict ourselves to only using a cacheline for each timeline's
> HWSP (we could go smaller, but want to avoid needless polluting
> cachelines on different engines between different contexts), then we can
> suballocate a single 4k page into 64 different timeline HWSP. By
> treating each fresh allocation as a slab of 64 entries, we can keep it
> around for the next 64 allocation attempts until we need to refresh the
> slab cache.
> 
> John Harrison noted the issue of fragmentation leading to the same worst
> case performance of one page per timeline as before, which can be
> mitigated by adopting a freelist.
> 
> v2: Keep all partially allocated HWSP on a freelist
> 
> This is still without migration, so it is possible for the system to end
> up with each timeline in its own page, but we ensure that no new
> allocation would needless allocate a fresh page!
> 
> v3: Throw a selftest at the allocator to try and catch invalid cacheline
> reuse.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h               |   4 +
>   drivers/gpu/drm/i915/i915_timeline.c          | 117 ++++++++++++---
>   drivers/gpu/drm/i915/i915_timeline.h          |   1 +
>   drivers/gpu/drm/i915/i915_vma.h               |  12 ++
>   drivers/gpu/drm/i915/selftests/i915_random.c  |  33 ++++-
>   drivers/gpu/drm/i915/selftests/i915_random.h  |   3 +
>   .../gpu/drm/i915/selftests/i915_timeline.c    | 140 ++++++++++++++++++
>   7 files changed, 282 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 364067f811f7..c00eaf2889fb 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1978,6 +1978,10 @@ struct drm_i915_private {
>   		struct i915_gt_timelines {
>   			struct mutex mutex; /* protects list, tainted by GPU */
>   			struct list_head list;
> +
> +			/* Pack multiple timelines' seqnos into the same page */
> +			spinlock_t hwsp_lock;
> +			struct list_head hwsp_free_list;
>   		} timelines;
>   
>   		struct list_head active_rings;
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 8d5792311a8f..69ee33dfa340 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -9,6 +9,12 @@
>   #include "i915_timeline.h"
>   #include "i915_syncmap.h"
>   
> +struct i915_timeline_hwsp {
> +	struct i915_vma *vma;
> +	struct list_head free_link;
> +	u64 free_bitmap;
> +};
> +
>   static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
>   {
>   	struct drm_i915_gem_object *obj;
> @@ -27,28 +33,92 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
>   	return vma;
>   }
>   
> -static int hwsp_alloc(struct i915_timeline *timeline)
> +static struct i915_vma *
> +hwsp_alloc(struct i915_timeline *timeline, int *offset)
>   {
> -	struct i915_vma *vma;
> +	struct drm_i915_private *i915 = timeline->i915;
> +	struct i915_gt_timelines *gt = &i915->gt.timelines;
> +	struct i915_timeline_hwsp *hwsp;
> +	int cacheline;
>   
> -	vma = __hwsp_alloc(timeline->i915);
> -	if (IS_ERR(vma))
> -		return PTR_ERR(vma);
> +	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
>   
> -	timeline->hwsp_ggtt = vma;
> -	timeline->hwsp_offset = 0;
> +	spin_lock(&gt->hwsp_lock);
>   
> -	return 0;
> +	/* hwsp_free_list only contains HWSP that have available cachelines */
> +	hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
> +					typeof(*hwsp), free_link);
> +	if (!hwsp) {
> +		struct i915_vma *vma;
> +
> +		spin_unlock(&gt->hwsp_lock);
> +
> +		hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
> +		if (!hwsp)
> +			return ERR_PTR(-ENOMEM);
> +
> +		vma = __hwsp_alloc(i915);
> +		if (IS_ERR(vma)) {
> +			kfree(hwsp);
> +			return vma;
> +		}
> +
> +		vma->private = hwsp;
> +		hwsp->vma = vma;
> +		hwsp->free_bitmap = ~0ull;
> +
> +		spin_lock(&gt->hwsp_lock);
> +		list_add(&hwsp->free_link, &gt->hwsp_free_list);
> +	}
> +
> +	GEM_BUG_ON(!hwsp->free_bitmap);
> +	cacheline = __ffs64(hwsp->free_bitmap);
> +	hwsp->free_bitmap &= ~BIT_ULL(cacheline);
> +	if (!hwsp->free_bitmap)
> +		list_del(&hwsp->free_link);
> +
> +	spin_unlock(&gt->hwsp_lock);
> +
> +	GEM_BUG_ON(hwsp->vma->private != hwsp);
> +
> +	*offset = cacheline * CACHELINE_BYTES;
> +	return hwsp->vma;
> +}
> +
> +static void hwsp_free(struct i915_timeline *timeline)
> +{
> +	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> +	struct i915_timeline_hwsp *hwsp;
> +
> +	hwsp = i915_timeline_hwsp(timeline);
> +	if (!hwsp) /* leave global HWSP alone! */

Later you add i915_timeline_is_global so could use it here.

> +		return;
> +
> +	spin_lock(&gt->hwsp_lock);
> +
> +	/* As a cacheline becomes available, publish the HWSP on the freelist */

Thank you! :)

> +	if (!hwsp->free_bitmap)
> +		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
> +
> +	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
> +
> +	/* And if no one is left using it, give the page back to the system */
> +	if (hwsp->free_bitmap == ~0ull) {
> +		i915_vma_put(hwsp->vma);
> +		list_del(&hwsp->free_link);
> +		kfree(hwsp);
> +	}
> +
> +	spin_unlock(&gt->hwsp_lock);
>   }
>   
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *timeline,
>   		       const char *name,
> -		       struct i915_vma *global_hwsp)
> +		       struct i915_vma *hwsp)
>   {
>   	struct i915_gt_timelines *gt = &i915->gt.timelines;
>   	void *vaddr;
> -	int err;
>   
>   	/*
>   	 * Ideally we want a set of engines on a single leaf as we expect
> @@ -64,18 +134,18 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->name = name;
>   	timeline->pin_count = 0;
>   
> -	if (global_hwsp) {
> -		timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
> -		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> -	} else {
> -		err = hwsp_alloc(timeline);
> -		if (err)
> -			return err;
> +	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;

Could be clearer to put this on the else branch.

> +	if (!hwsp) {
> +		hwsp = hwsp_alloc(timeline, &timeline->hwsp_offset);
> +		if (IS_ERR(hwsp))
> +			return PTR_ERR(hwsp);
>   	}
> +	timeline->hwsp_ggtt = i915_vma_get(hwsp);
>   
> -	vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
> +	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
>   	if (IS_ERR(vaddr)) {
> -		i915_vma_put(timeline->hwsp_ggtt);
> +		hwsp_free(timeline);
> +		i915_vma_put(hwsp);
>   		return PTR_ERR(vaddr);
>   	}
>   
> @@ -105,6 +175,9 @@ void i915_timelines_init(struct drm_i915_private *i915)
>   	mutex_init(&gt->mutex);
>   	INIT_LIST_HEAD(&gt->list);
>   
> +	spin_lock_init(&gt->hwsp_lock);
> +	INIT_LIST_HEAD(&gt->hwsp_free_list);
> +
>   	/* via i915_gem_wait_for_idle() */
>   	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
>   }
> @@ -144,12 +217,13 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   	GEM_BUG_ON(timeline->pin_count);
>   	GEM_BUG_ON(!list_empty(&timeline->requests));
>   
> -	i915_syncmap_free(&timeline->sync);
> -
>   	mutex_lock(&gt->mutex);
>   	list_del(&timeline->link);
>   	mutex_unlock(&gt->mutex);
>   
> +	i915_syncmap_free(&timeline->sync);
> +	hwsp_free(timeline);
> +
>   	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
>   	i915_vma_put(timeline->hwsp_ggtt);
>   }
> @@ -226,6 +300,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
>   	struct i915_gt_timelines *gt = &i915->gt.timelines;
>   
>   	GEM_BUG_ON(!list_empty(&gt->list));
> +	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
>   
>   	mutex_destroy(&gt->mutex);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 0c3739d53d79..ab736e2e5707 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -33,6 +33,7 @@
>   #include "i915_utils.h"
>   
>   struct i915_vma;
> +struct i915_timeline_hwsp;
>   
>   struct i915_timeline {
>   	u64 fence_context;
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 5793abe509a2..46eb818ed309 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -226,6 +226,18 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>   	return i915_vm_to_ggtt(vma->vm)->pin_bias;
>   }
>   
> +/* XXX inline spaghetti */
> +static inline struct i915_timeline_hwsp *
> +i915_timeline_hwsp(const struct i915_timeline *tl)
> +{
> +	return tl->hwsp_ggtt->private;
> +}
> +
> +static inline bool i915_timeline_is_global(const struct i915_timeline *tl)
> +{
> +	return !i915_timeline_hwsp(tl);
> +}
> +
>   static inline struct i915_vma *i915_vma_get(struct i915_vma *vma)
>   {
>   	i915_gem_object_get(vma->obj);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_random.c b/drivers/gpu/drm/i915/selftests/i915_random.c
> index 1f415ce47018..716a3f19f030 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_random.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_random.c
> @@ -41,18 +41,37 @@ u64 i915_prandom_u64_state(struct rnd_state *rnd)
>   	return x;
>   }
>   
> -void i915_random_reorder(unsigned int *order, unsigned int count,
> -			 struct rnd_state *state)
> +void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
> +			  struct rnd_state *state)
>   {
> -	unsigned int i, j;
> +	char stack[128];
> +
> +	if (WARN_ON(elsz > sizeof(stack) || count > U32_MAX))

I wonder if the elsz > sizeof(stack) would work as a BUILD_BUG_ON if 
elsz was marked as const. Seems to be const at both call sites.

Step further sizing the stack by it, but.. that's some GCC extension we 
should not use right?

> +		return;
> +
> +	if (!elsz || !count)
> +		return;
> +
> +	/* Fisher-Yates shuffle courtesy of Knuth */
> +	while (--count) {
> +		size_t swp;
> +
> +		swp = i915_prandom_u32_max_state(count + 1, state);
> +		if (swp == count)
> +			continue;
>   
> -	for (i = 0; i < count; i++) {
> -		BUILD_BUG_ON(sizeof(unsigned int) > sizeof(u32));
> -		j = i915_prandom_u32_max_state(count, state);
> -		swap(order[i], order[j]);
> +		memcpy(stack, arr + count * elsz, elsz);
> +		memcpy(arr + count * elsz, arr + swp * elsz, elsz);
> +		memcpy(arr + swp * elsz, stack, elsz);
>   	}
>   }
>   
> +void i915_random_reorder(unsigned int *order, unsigned int count,
> +			 struct rnd_state *state)
> +{
> +	i915_prandom_shuffle(order, sizeof(*order), count, state);
> +}
> +
>   unsigned int *i915_random_order(unsigned int count, struct rnd_state *state)
>   {
>   	unsigned int *order, i;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_random.h b/drivers/gpu/drm/i915/selftests/i915_random.h
> index 7dffedc501ca..8e1ff9c105b6 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_random.h
> +++ b/drivers/gpu/drm/i915/selftests/i915_random.h
> @@ -54,4 +54,7 @@ void i915_random_reorder(unsigned int *order,
>   			 unsigned int count,
>   			 struct rnd_state *state);
>   
> +void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
> +			  struct rnd_state *state);
> +
>   #endif /* !__I915_SELFTESTS_RANDOM_H__ */
> diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> index 1585b614510d..1cecc71fba74 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> @@ -4,6 +4,8 @@
>    * Copyright © 2017-2018 Intel Corporation
>    */
>   
> +#include <linux/prime_numbers.h>
> +
>   #include "../i915_selftest.h"
>   #include "i915_random.h"
>   
> @@ -11,6 +13,143 @@
>   #include "mock_gem_device.h"
>   #include "mock_timeline.h"
>   
> +static struct page *hwsp_page(struct i915_timeline *tl)
> +{
> +	struct drm_i915_gem_object *obj = tl->hwsp_ggtt->obj;
> +
> +	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
> +	return sg_page(obj->mm.pages->sgl);
> +}
> +
> +static unsigned long hwsp_cacheline(struct i915_timeline *tl)
> +{
> +	unsigned long address = (unsigned long)page_address(hwsp_page(tl));
> +
> +	return (address + tl->hwsp_offset) / CACHELINE_BYTES;
> +}
> +
> +#define CACHELINES_PER_PAGE (PAGE_SIZE / CACHELINE_BYTES)
> +
> +struct mock_hwsp_freelist {
> +	struct drm_i915_private *i915;
> +	struct radix_tree_root cachelines;
> +	struct i915_timeline **history;
> +	unsigned long count, max;
> +	struct rnd_state prng;
> +};
> +
> +enum {
> +	SHUFFLE = BIT(0),
> +};
> +
> +static void __mock_hwsp_record(struct mock_hwsp_freelist *state,
> +			       unsigned int idx,
> +			       struct i915_timeline *tl)
> +{
> +	tl = xchg(&state->history[idx], tl);
> +	if (tl) {
> +		radix_tree_delete(&state->cachelines, hwsp_cacheline(tl));
> +		i915_timeline_put(tl);
> +	}
> +}
> +
> +static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
> +				unsigned int count,
> +				unsigned int flags)
> +{
> +	struct i915_timeline *tl;
> +	unsigned int idx;
> +
> +	while (count--) {
> +		unsigned long cacheline;
> +		int err;
> +
> +		tl = i915_timeline_create(state->i915, "mock", NULL);
> +		if (IS_ERR(tl))
> +			return PTR_ERR(tl);
> +
> +		cacheline = hwsp_cacheline(tl);
> +		err = radix_tree_insert(&state->cachelines, cacheline, tl);
> +		if (err) {
> +			if (err == -EEXIST) {
> +				pr_err("HWSP cacheline %lu already used; duplicate allocation!\n",
> +				       cacheline);
> +			}

Radix tree is just to lookup potential offset duplicates? I mean, to 
avoid doing a linear search on the history array? If so is it worth it 
since there aren't that many timelines used in test? Could be since 
below I figured the maximum number of timelines test will use...

> +			i915_timeline_put(tl);
> +			return err;
> +		}
> +
> +		idx = state->count++ % state->max;
> +		__mock_hwsp_record(state, idx, tl);

This doesn't hit "recycling" of slots right? Max count is 2 * 
CACHELINES_PER_PAGE = 128 while state->max is 512.

> +	}
> +
> +	if (flags & SHUFFLE)
> +		i915_prandom_shuffle(state->history,
> +				     sizeof(*state->history),
> +				     min(state->count, state->max),
> +				     &state->prng);
> +
> +	count = i915_prandom_u32_max_state(min(state->count, state->max),
> +					   &state->prng);
> +	while (count--) {
> +		idx = --state->count % state->max;
> +		__mock_hwsp_record(state, idx, NULL);
> +	}

There is no allocations after shuffling, so the code path of allocating 
from hwsp free list will not be exercised I think.

> +
> +	return 0;
> +}
> +
> +static int mock_hwsp_freelist(void *arg)
> +{
> +	struct mock_hwsp_freelist state;
> +	const struct {
> +		const char *name;
> +		unsigned int flags;
> +	} phases[] = {
> +		{ "linear", 0 },
> +		{ "shuffled", SHUFFLE },
> +		{ },
> +	}, *p;
> +	unsigned int na;
> +	int err = 0;
> +
> +	INIT_RADIX_TREE(&state.cachelines, GFP_KERNEL);
> +	state.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed);
> +
> +	state.i915 = mock_gem_device();
> +	if (!state.i915)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Create a bunch of timelines and check that their HWSP do not overlap.
> +	 * Free some, and try again.
> +	 */
> +
> +	state.max = PAGE_SIZE / sizeof(*state.history);

So maximum live number of timelines is 512 on 64-bit which should 
translate to 8 max pages of hwsp backing store.

> +	state.count = 0;
> +	state.history = kcalloc(state.max, sizeof(*state.history), GFP_KERNEL);
> +	if (!state.history) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (p = phases; p->name; p++) {
> +		pr_debug("%s(%s)\n", __func__, p->name);
> +		for_each_prime_number_from(na, 1, 2 * CACHELINES_PER_PAGE) {
> +			err = __mock_hwsp_timeline(&state, na, p->flags);
> +			if (err)
> +				goto out;
> +		}
> +	}
> +
> +out:
> +	for (na = 0; na < state.max; na++)
> +		__mock_hwsp_record(&state, na, NULL);
> +	kfree(state.history);
> +	drm_dev_put(&state.i915->drm);
> +	return err;
> +}
> +
>   struct __igt_sync {
>   	const char *name;
>   	u32 seqno;
> @@ -260,6 +399,7 @@ static int bench_sync(void *arg)
>   int i915_timeline_mock_selftests(void)
>   {
>   	static const struct i915_subtest tests[] = {
> +		SUBTEST(mock_hwsp_freelist),
>   		SUBTEST(igt_sync),
>   		SUBTEST(bench_sync),
>   	};
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator
  2019-01-22 10:47   ` Tvrtko Ursulin
@ 2019-01-22 11:12     ` Chris Wilson
  2019-01-22 11:33       ` Tvrtko Ursulin
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 11:12 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-22 10:47:11)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > If we restrict ourselves to only using a cacheline for each timeline's
> > HWSP (we could go smaller, but want to avoid needless polluting
> > cachelines on different engines between different contexts), then we can
> > suballocate a single 4k page into 64 different timeline HWSP. By
> > treating each fresh allocation as a slab of 64 entries, we can keep it
> > around for the next 64 allocation attempts until we need to refresh the
> > slab cache.
> > 
> > John Harrison noted the issue of fragmentation leading to the same worst
> > case performance of one page per timeline as before, which can be
> > mitigated by adopting a freelist.
> > 
> > v2: Keep all partially allocated HWSP on a freelist
> > 
> > This is still without migration, so it is possible for the system to end
> > up with each timeline in its own page, but we ensure that no new
> > allocation would needless allocate a fresh page!
> > 
> > v3: Throw a selftest at the allocator to try and catch invalid cacheline
> > reuse.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h               |   4 +
> >   drivers/gpu/drm/i915/i915_timeline.c          | 117 ++++++++++++---
> >   drivers/gpu/drm/i915/i915_timeline.h          |   1 +
> >   drivers/gpu/drm/i915/i915_vma.h               |  12 ++
> >   drivers/gpu/drm/i915/selftests/i915_random.c  |  33 ++++-
> >   drivers/gpu/drm/i915/selftests/i915_random.h  |   3 +
> >   .../gpu/drm/i915/selftests/i915_timeline.c    | 140 ++++++++++++++++++
> >   7 files changed, 282 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 364067f811f7..c00eaf2889fb 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1978,6 +1978,10 @@ struct drm_i915_private {
> >               struct i915_gt_timelines {
> >                       struct mutex mutex; /* protects list, tainted by GPU */
> >                       struct list_head list;
> > +
> > +                     /* Pack multiple timelines' seqnos into the same page */
> > +                     spinlock_t hwsp_lock;
> > +                     struct list_head hwsp_free_list;
> >               } timelines;
> >   
> >               struct list_head active_rings;
> > diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> > index 8d5792311a8f..69ee33dfa340 100644
> > --- a/drivers/gpu/drm/i915/i915_timeline.c
> > +++ b/drivers/gpu/drm/i915/i915_timeline.c
> > @@ -9,6 +9,12 @@
> >   #include "i915_timeline.h"
> >   #include "i915_syncmap.h"
> >   
> > +struct i915_timeline_hwsp {
> > +     struct i915_vma *vma;
> > +     struct list_head free_link;
> > +     u64 free_bitmap;
> > +};
> > +
> >   static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
> >   {
> >       struct drm_i915_gem_object *obj;
> > @@ -27,28 +33,92 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
> >       return vma;
> >   }
> >   
> > -static int hwsp_alloc(struct i915_timeline *timeline)
> > +static struct i915_vma *
> > +hwsp_alloc(struct i915_timeline *timeline, int *offset)
> >   {
> > -     struct i915_vma *vma;
> > +     struct drm_i915_private *i915 = timeline->i915;
> > +     struct i915_gt_timelines *gt = &i915->gt.timelines;
> > +     struct i915_timeline_hwsp *hwsp;
> > +     int cacheline;
> >   
> > -     vma = __hwsp_alloc(timeline->i915);
> > -     if (IS_ERR(vma))
> > -             return PTR_ERR(vma);
> > +     BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
> >   
> > -     timeline->hwsp_ggtt = vma;
> > -     timeline->hwsp_offset = 0;
> > +     spin_lock(&gt->hwsp_lock);
> >   
> > -     return 0;
> > +     /* hwsp_free_list only contains HWSP that have available cachelines */
> > +     hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
> > +                                     typeof(*hwsp), free_link);
> > +     if (!hwsp) {
> > +             struct i915_vma *vma;
> > +
> > +             spin_unlock(&gt->hwsp_lock);
> > +
> > +             hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
> > +             if (!hwsp)
> > +                     return ERR_PTR(-ENOMEM);
> > +
> > +             vma = __hwsp_alloc(i915);
> > +             if (IS_ERR(vma)) {
> > +                     kfree(hwsp);
> > +                     return vma;
> > +             }
> > +
> > +             vma->private = hwsp;
> > +             hwsp->vma = vma;
> > +             hwsp->free_bitmap = ~0ull;
> > +
> > +             spin_lock(&gt->hwsp_lock);
> > +             list_add(&hwsp->free_link, &gt->hwsp_free_list);
> > +     }
> > +
> > +     GEM_BUG_ON(!hwsp->free_bitmap);
> > +     cacheline = __ffs64(hwsp->free_bitmap);
> > +     hwsp->free_bitmap &= ~BIT_ULL(cacheline);
> > +     if (!hwsp->free_bitmap)
> > +             list_del(&hwsp->free_link);
> > +
> > +     spin_unlock(&gt->hwsp_lock);
> > +
> > +     GEM_BUG_ON(hwsp->vma->private != hwsp);
> > +
> > +     *offset = cacheline * CACHELINE_BYTES;
> > +     return hwsp->vma;
> > +}
> > +
> > +static void hwsp_free(struct i915_timeline *timeline)
> > +{
> > +     struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> > +     struct i915_timeline_hwsp *hwsp;
> > +
> > +     hwsp = i915_timeline_hwsp(timeline);
> > +     if (!hwsp) /* leave global HWSP alone! */
> 
> Later you add i915_timeline_is_global so could use it here.

Difference is that we want the hwsp for use in this function?

if (i915_timeline_is_global(timeline))
	return;

hwsp = i915_timeline_hwsp(timeline);

I suppose has the feeling of being more descriptive and the compiler
should be smart enough to dtrt.
> 
> > +             return;
> > +
> > +     spin_lock(&gt->hwsp_lock);
> > +
> > +     /* As a cacheline becomes available, publish the HWSP on the freelist */
> 
> Thank you! :)
> 
> > +     if (!hwsp->free_bitmap)
> > +             list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
> > +
> > +     hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
> > +
> > +     /* And if no one is left using it, give the page back to the system */
> > +     if (hwsp->free_bitmap == ~0ull) {
> > +             i915_vma_put(hwsp->vma);
> > +             list_del(&hwsp->free_link);
> > +             kfree(hwsp);
> > +     }
> > +
> > +     spin_unlock(&gt->hwsp_lock);
> >   }
> >   
> >   int i915_timeline_init(struct drm_i915_private *i915,
> >                      struct i915_timeline *timeline,
> >                      const char *name,
> > -                    struct i915_vma *global_hwsp)
> > +                    struct i915_vma *hwsp)
> >   {
> >       struct i915_gt_timelines *gt = &i915->gt.timelines;
> >       void *vaddr;
> > -     int err;
> >   
> >       /*
> >        * Ideally we want a set of engines on a single leaf as we expect
> > @@ -64,18 +134,18 @@ int i915_timeline_init(struct drm_i915_private *i915,
> >       timeline->name = name;
> >       timeline->pin_count = 0;
> >   
> > -     if (global_hwsp) {
> > -             timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
> > -             timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> > -     } else {
> > -             err = hwsp_alloc(timeline);
> > -             if (err)
> > -                     return err;
> > +     timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> 
> Could be clearer to put this on the else branch.

Hey! I rewrote this because I thought it was tidier without the else and
repeated i915_vma_get() :)

> > -void i915_random_reorder(unsigned int *order, unsigned int count,
> > -                      struct rnd_state *state)
> > +void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
> > +                       struct rnd_state *state)
> >   {
> > -     unsigned int i, j;
> > +     char stack[128];
> > +
> > +     if (WARN_ON(elsz > sizeof(stack) || count > U32_MAX))
> 
> I wonder if the elsz > sizeof(stack) would work as a BUILD_BUG_ON if 
> elsz was marked as const. Seems to be const at both call sites.

BUILD_BUG_ON is cpp-ish, so it needs to provably constant, so in the same
translation unit.

> Step further sizing the stack by it, but.. that's some GCC extension we 
> should not use right?

Right. No char stack[elsz] for us. Alternatively is to make this into a
macro-builder and have a custom shuffle for every type. For selftesting,
just not worth it. (Until we have an example were we are severely
limited in our testing by how long it takes to shuffle numbers... But we
can probably just change any test to shuffle less.)

> > +static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
> > +                             unsigned int count,
> > +                             unsigned int flags)
> > +{
> > +     struct i915_timeline *tl;
> > +     unsigned int idx;
> > +
> > +     while (count--) {
> > +             unsigned long cacheline;
> > +             int err;
> > +
> > +             tl = i915_timeline_create(state->i915, "mock", NULL);
> > +             if (IS_ERR(tl))
> > +                     return PTR_ERR(tl);
> > +
> > +             cacheline = hwsp_cacheline(tl);
> > +             err = radix_tree_insert(&state->cachelines, cacheline, tl);
> > +             if (err) {
> > +                     if (err == -EEXIST) {
> > +                             pr_err("HWSP cacheline %lu already used; duplicate allocation!\n",
> > +                                    cacheline);
> > +                     }
> 
> Radix tree is just to lookup potential offset duplicates? I mean, to 
> avoid doing a linear search on the history array? If so is it worth it 
> since there aren't that many timelines used in test? Could be since 
> below I figured the maximum number of timelines test will use...

Radix tree because that's where I started with the test: I want to
detect duplicate cachelines.

> > +                     i915_timeline_put(tl);
> > +                     return err;
> > +             }
> > +
> > +             idx = state->count++ % state->max;
> > +             __mock_hwsp_record(state, idx, tl);
> 
> This doesn't hit "recycling" of slots right? Max count is 2 * 
> CACHELINES_PER_PAGE = 128 while state->max is 512.

The history is kept over the different phases. We add a block, remove a
smaller block; repeat. My thinking was that would create a more
interesting freelist pattern.

> > +     }
> > +
> > +     if (flags & SHUFFLE)
> > +             i915_prandom_shuffle(state->history,
> > +                                  sizeof(*state->history),
> > +                                  min(state->count, state->max),
> > +                                  &state->prng);
> > +
> > +     count = i915_prandom_u32_max_state(min(state->count, state->max),
> > +                                        &state->prng);
> > +     while (count--) {
> > +             idx = --state->count % state->max;
> > +             __mock_hwsp_record(state, idx, NULL);
> > +     }
> 
> There is no allocations after shuffling, so the code path of allocating 
> from hwsp free list will not be exercised I think.

We keep history from one iteration to the next. I think if we added a loop
here (____mock_hwsp_timeline!) you would be happier overall.

> > +     return 0;
> > +}
> > +
> > +static int mock_hwsp_freelist(void *arg)
> > +{
> > +     struct mock_hwsp_freelist state;
> > +     const struct {
> > +             const char *name;
> > +             unsigned int flags;
> > +     } phases[] = {
> > +             { "linear", 0 },
> > +             { "shuffled", SHUFFLE },
> > +             { },
> > +     }, *p;
> > +     unsigned int na;
> > +     int err = 0;
> > +
> > +     INIT_RADIX_TREE(&state.cachelines, GFP_KERNEL);
> > +     state.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed);
> > +
> > +     state.i915 = mock_gem_device();
> > +     if (!state.i915)
> > +             return -ENOMEM;
> > +
> > +     /*
> > +      * Create a bunch of timelines and check that their HWSP do not overlap.
> > +      * Free some, and try again.
> > +      */
> > +
> > +     state.max = PAGE_SIZE / sizeof(*state.history);
> 
> So maximum live number of timelines is 512 on 64-bit which should 
> translate to 8 max pages of hwsp backing store.

Do you feel like we need more or less? I think that's a reasonable
number, it means we should have some pages eventually on the freelist.
And not so many that reuse is sporadic (although the allocator is geared
towards keeping the same page for reuse until full).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator
  2019-01-22 11:12     ` Chris Wilson
@ 2019-01-22 11:33       ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 11:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 22/01/2019 11:12, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-22 10:47:11)
>>
>> On 21/01/2019 22:21, Chris Wilson wrote:
>>> If we restrict ourselves to only using a cacheline for each timeline's
>>> HWSP (we could go smaller, but want to avoid needless polluting
>>> cachelines on different engines between different contexts), then we can
>>> suballocate a single 4k page into 64 different timeline HWSP. By
>>> treating each fresh allocation as a slab of 64 entries, we can keep it
>>> around for the next 64 allocation attempts until we need to refresh the
>>> slab cache.
>>>
>>> John Harrison noted the issue of fragmentation leading to the same worst
>>> case performance of one page per timeline as before, which can be
>>> mitigated by adopting a freelist.
>>>
>>> v2: Keep all partially allocated HWSP on a freelist
>>>
>>> This is still without migration, so it is possible for the system to end
>>> up with each timeline in its own page, but we ensure that no new
>>> allocation would needless allocate a fresh page!
>>>
>>> v3: Throw a selftest at the allocator to try and catch invalid cacheline
>>> reuse.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_drv.h               |   4 +
>>>    drivers/gpu/drm/i915/i915_timeline.c          | 117 ++++++++++++---
>>>    drivers/gpu/drm/i915/i915_timeline.h          |   1 +
>>>    drivers/gpu/drm/i915/i915_vma.h               |  12 ++
>>>    drivers/gpu/drm/i915/selftests/i915_random.c  |  33 ++++-
>>>    drivers/gpu/drm/i915/selftests/i915_random.h  |   3 +
>>>    .../gpu/drm/i915/selftests/i915_timeline.c    | 140 ++++++++++++++++++
>>>    7 files changed, 282 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 364067f811f7..c00eaf2889fb 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1978,6 +1978,10 @@ struct drm_i915_private {
>>>                struct i915_gt_timelines {
>>>                        struct mutex mutex; /* protects list, tainted by GPU */
>>>                        struct list_head list;
>>> +
>>> +                     /* Pack multiple timelines' seqnos into the same page */
>>> +                     spinlock_t hwsp_lock;
>>> +                     struct list_head hwsp_free_list;
>>>                } timelines;
>>>    
>>>                struct list_head active_rings;
>>> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
>>> index 8d5792311a8f..69ee33dfa340 100644
>>> --- a/drivers/gpu/drm/i915/i915_timeline.c
>>> +++ b/drivers/gpu/drm/i915/i915_timeline.c
>>> @@ -9,6 +9,12 @@
>>>    #include "i915_timeline.h"
>>>    #include "i915_syncmap.h"
>>>    
>>> +struct i915_timeline_hwsp {
>>> +     struct i915_vma *vma;
>>> +     struct list_head free_link;
>>> +     u64 free_bitmap;
>>> +};
>>> +
>>>    static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
>>>    {
>>>        struct drm_i915_gem_object *obj;
>>> @@ -27,28 +33,92 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
>>>        return vma;
>>>    }
>>>    
>>> -static int hwsp_alloc(struct i915_timeline *timeline)
>>> +static struct i915_vma *
>>> +hwsp_alloc(struct i915_timeline *timeline, int *offset)
>>>    {
>>> -     struct i915_vma *vma;
>>> +     struct drm_i915_private *i915 = timeline->i915;
>>> +     struct i915_gt_timelines *gt = &i915->gt.timelines;
>>> +     struct i915_timeline_hwsp *hwsp;
>>> +     int cacheline;
>>>    
>>> -     vma = __hwsp_alloc(timeline->i915);
>>> -     if (IS_ERR(vma))
>>> -             return PTR_ERR(vma);
>>> +     BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
>>>    
>>> -     timeline->hwsp_ggtt = vma;
>>> -     timeline->hwsp_offset = 0;
>>> +     spin_lock(&gt->hwsp_lock);
>>>    
>>> -     return 0;
>>> +     /* hwsp_free_list only contains HWSP that have available cachelines */
>>> +     hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
>>> +                                     typeof(*hwsp), free_link);
>>> +     if (!hwsp) {
>>> +             struct i915_vma *vma;
>>> +
>>> +             spin_unlock(&gt->hwsp_lock);
>>> +
>>> +             hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
>>> +             if (!hwsp)
>>> +                     return ERR_PTR(-ENOMEM);
>>> +
>>> +             vma = __hwsp_alloc(i915);
>>> +             if (IS_ERR(vma)) {
>>> +                     kfree(hwsp);
>>> +                     return vma;
>>> +             }
>>> +
>>> +             vma->private = hwsp;
>>> +             hwsp->vma = vma;
>>> +             hwsp->free_bitmap = ~0ull;
>>> +
>>> +             spin_lock(&gt->hwsp_lock);
>>> +             list_add(&hwsp->free_link, &gt->hwsp_free_list);
>>> +     }
>>> +
>>> +     GEM_BUG_ON(!hwsp->free_bitmap);
>>> +     cacheline = __ffs64(hwsp->free_bitmap);
>>> +     hwsp->free_bitmap &= ~BIT_ULL(cacheline);
>>> +     if (!hwsp->free_bitmap)
>>> +             list_del(&hwsp->free_link);
>>> +
>>> +     spin_unlock(&gt->hwsp_lock);
>>> +
>>> +     GEM_BUG_ON(hwsp->vma->private != hwsp);
>>> +
>>> +     *offset = cacheline * CACHELINE_BYTES;
>>> +     return hwsp->vma;
>>> +}
>>> +
>>> +static void hwsp_free(struct i915_timeline *timeline)
>>> +{
>>> +     struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
>>> +     struct i915_timeline_hwsp *hwsp;
>>> +
>>> +     hwsp = i915_timeline_hwsp(timeline);
>>> +     if (!hwsp) /* leave global HWSP alone! */
>>
>> Later you add i915_timeline_is_global so could use it here.
> 
> Difference is that we want the hwsp for use in this function?
> 
> if (i915_timeline_is_global(timeline))
> 	return;
> 
> hwsp = i915_timeline_hwsp(timeline);
> 
> I suppose has the feeling of being more descriptive and the compiler
> should be smart enough to dtrt.

I missed that so I think it is best as it is. Existing comment is enough 
to make it clear.

>>
>>> +             return;
>>> +
>>> +     spin_lock(&gt->hwsp_lock);
>>> +
>>> +     /* As a cacheline becomes available, publish the HWSP on the freelist */
>>
>> Thank you! :)
>>
>>> +     if (!hwsp->free_bitmap)
>>> +             list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
>>> +
>>> +     hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
>>> +
>>> +     /* And if no one is left using it, give the page back to the system */
>>> +     if (hwsp->free_bitmap == ~0ull) {
>>> +             i915_vma_put(hwsp->vma);
>>> +             list_del(&hwsp->free_link);
>>> +             kfree(hwsp);
>>> +     }
>>> +
>>> +     spin_unlock(&gt->hwsp_lock);
>>>    }
>>>    
>>>    int i915_timeline_init(struct drm_i915_private *i915,
>>>                       struct i915_timeline *timeline,
>>>                       const char *name,
>>> -                    struct i915_vma *global_hwsp)
>>> +                    struct i915_vma *hwsp)
>>>    {
>>>        struct i915_gt_timelines *gt = &i915->gt.timelines;
>>>        void *vaddr;
>>> -     int err;
>>>    
>>>        /*
>>>         * Ideally we want a set of engines on a single leaf as we expect
>>> @@ -64,18 +134,18 @@ int i915_timeline_init(struct drm_i915_private *i915,
>>>        timeline->name = name;
>>>        timeline->pin_count = 0;
>>>    
>>> -     if (global_hwsp) {
>>> -             timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
>>> -             timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>>> -     } else {
>>> -             err = hwsp_alloc(timeline);
>>> -             if (err)
>>> -                     return err;
>>> +     timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>>
>> Could be clearer to put this on the else branch.
> 
> Hey! I rewrote this because I thought it was tidier without the else and
> repeated i915_vma_get() :)
> 
>>> -void i915_random_reorder(unsigned int *order, unsigned int count,
>>> -                      struct rnd_state *state)
>>> +void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
>>> +                       struct rnd_state *state)
>>>    {
>>> -     unsigned int i, j;
>>> +     char stack[128];
>>> +
>>> +     if (WARN_ON(elsz > sizeof(stack) || count > U32_MAX))
>>
>> I wonder if the elsz > sizeof(stack) would work as a BUILD_BUG_ON if
>> elsz was marked as const. Seems to be const at both call sites.
> 
> BUILD_BUG_ON is cpp-ish, so it needs to provably constant, so in the same
> translation unit.
> 
>> Step further sizing the stack by it, but.. that's some GCC extension we
>> should not use right?
> 
> Right. No char stack[elsz] for us. Alternatively is to make this into a
> macro-builder and have a custom shuffle for every type. For selftesting,
> just not worth it. (Until we have an example were we are severely
> limited in our testing by how long it takes to shuffle numbers... But we
> can probably just change any test to shuffle less.)
> 
>>> +static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
>>> +                             unsigned int count,
>>> +                             unsigned int flags)
>>> +{
>>> +     struct i915_timeline *tl;
>>> +     unsigned int idx;
>>> +
>>> +     while (count--) {
>>> +             unsigned long cacheline;
>>> +             int err;
>>> +
>>> +             tl = i915_timeline_create(state->i915, "mock", NULL);
>>> +             if (IS_ERR(tl))
>>> +                     return PTR_ERR(tl);
>>> +
>>> +             cacheline = hwsp_cacheline(tl);
>>> +             err = radix_tree_insert(&state->cachelines, cacheline, tl);
>>> +             if (err) {
>>> +                     if (err == -EEXIST) {
>>> +                             pr_err("HWSP cacheline %lu already used; duplicate allocation!\n",
>>> +                                    cacheline);
>>> +                     }
>>
>> Radix tree is just to lookup potential offset duplicates? I mean, to
>> avoid doing a linear search on the history array? If so is it worth it
>> since there aren't that many timelines used in test? Could be since
>> below I figured the maximum number of timelines test will use...
> 
> Radix tree because that's where I started with the test: I want to
> detect duplicate cachelines.
> 
>>> +                     i915_timeline_put(tl);
>>> +                     return err;
>>> +             }
>>> +
>>> +             idx = state->count++ % state->max;
>>> +             __mock_hwsp_record(state, idx, tl);
>>
>> This doesn't hit "recycling" of slots right? Max count is 2 *
>> CACHELINES_PER_PAGE = 128 while state->max is 512.
> 
> The history is kept over the different phases. We add a block, remove a
> smaller block; repeat. My thinking was that would create a more
> interesting freelist pattern.

Yeah, I missed the state is kept. I think is is okay then.

>>> +     }
>>> +
>>> +     if (flags & SHUFFLE)
>>> +             i915_prandom_shuffle(state->history,
>>> +                                  sizeof(*state->history),
>>> +                                  min(state->count, state->max),
>>> +                                  &state->prng);
>>> +
>>> +     count = i915_prandom_u32_max_state(min(state->count, state->max),
>>> +                                        &state->prng);
>>> +     while (count--) {
>>> +             idx = --state->count % state->max;
>>> +             __mock_hwsp_record(state, idx, NULL);
>>> +     }
>>
>> There is no allocations after shuffling, so the code path of allocating
>> from hwsp free list will not be exercised I think.
> 
> We keep history from one iteration to the next. I think if we added a loop
> here (____mock_hwsp_timeline!) you would be happier overall.

Loop to allocate a random number of entries? Yeah that would make a pass 
more self contained. But optional..

> 
>>> +     return 0;
>>> +}
>>> +
>>> +static int mock_hwsp_freelist(void *arg)
>>> +{
>>> +     struct mock_hwsp_freelist state;
>>> +     const struct {
>>> +             const char *name;
>>> +             unsigned int flags;
>>> +     } phases[] = {
>>> +             { "linear", 0 },
>>> +             { "shuffled", SHUFFLE },
>>> +             { },
>>> +     }, *p;
>>> +     unsigned int na;
>>> +     int err = 0;
>>> +
>>> +     INIT_RADIX_TREE(&state.cachelines, GFP_KERNEL);
>>> +     state.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed);
>>> +
>>> +     state.i915 = mock_gem_device();
>>> +     if (!state.i915)
>>> +             return -ENOMEM;
>>> +
>>> +     /*
>>> +      * Create a bunch of timelines and check that their HWSP do not overlap.
>>> +      * Free some, and try again.
>>> +      */
>>> +
>>> +     state.max = PAGE_SIZE / sizeof(*state.history);
>>
>> So maximum live number of timelines is 512 on 64-bit which should
>> translate to 8 max pages of hwsp backing store.
> 
> Do you feel like we need more or less? I think that's a reasonable
> number, it means we should have some pages eventually on the freelist.
> And not so many that reuse is sporadic (although the allocator is geared
> towards keeping the same page for reuse until full).

I think it is fine, was just talking out loud as I was figuring out what 
the test is doing.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly
  2019-01-21 22:20 ` [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly Chris Wilson
@ 2019-01-22 11:53   ` Matthew Auld
  0 siblings, 0 replies; 89+ messages in thread
From: Matthew Auld @ 2019-01-22 11:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Mon, 21 Jan 2019 at 22:22, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> During review of commit 71fc448c1aaf ("drm/i915/selftests: Make evict
> tolerant of foreign objects"), Matthew mentioned it would be better if
> we explicitly tracked the objects we created. We have an obj->st_link
> hook for this purpose, so add the corresponding list of objects and
> reduce our loops to only consider our own list.
>
> References: 71fc448c1aaf ("drm/i915/selftests: Make evict tolerant of foreign objects")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting
  2019-01-21 22:20 ` [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting Chris Wilson
@ 2019-01-22 12:07   ` Matthew Auld
  0 siblings, 0 replies; 89+ messages in thread
From: Matthew Auld @ 2019-01-22 12:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Mon, 21 Jan 2019 at 23:41, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Some tests (e.g. igt_vma_pin1) presume that we have a completely clean
> GGTT so that it can probe boundaries without fear that something is
> already allocated there. However, the mock device is starting to get
> complicated and following similar rules to the live device, i.e. we
> can't guarantee that i915->ggtt remains clean, so create a temporary
> address_space equivalent to the mock ggtt for the purpose.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP
  2019-01-21 22:21 ` [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
@ 2019-01-22 12:24   ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 12:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> Now that we have allocated ourselves a cacheline to store a breadcrumb,
> we can emit a write from the GPU into the timeline's HWSP of the
> per-context seqno as we complete each request. This drops the mirroring
> of the per-engine HWSP and allows each context to operate independently.
> We do not need to unwind the per-context timeline, and so requests are
> always consistent with the timeline breadcrumb, greatly simplifying the
> completion checks as we no longer need to be concerned about the
> global_seqno changing mid check.
> 
> One complication though is that we have to be wary that the request may
> outlive the HWSP and so avoid touching the potentially danging pointer
> after we have retired the fence. We also have to guard our access of the
> HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.
> 
> At this point, we are emitting both per-context and global seqno and
> still using the single per-engine execution timeline for resolving
> interrupts.
> 
> v2: s/fake_complete/mark_complete/
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c              |  2 +-
>   drivers/gpu/drm/i915/i915_request.c          |  3 +-
>   drivers/gpu/drm/i915/i915_request.h          | 30 +++----
>   drivers/gpu/drm/i915/i915_reset.c            |  1 +
>   drivers/gpu/drm/i915/i915_vma.h              |  6 ++
>   drivers/gpu/drm/i915/intel_engine_cs.c       |  7 +-
>   drivers/gpu/drm/i915/intel_lrc.c             | 35 +++++---
>   drivers/gpu/drm/i915/intel_ringbuffer.c      | 88 +++++++++++++++-----
>   drivers/gpu/drm/i915/selftests/mock_engine.c | 20 ++++-
>   9 files changed, 132 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 761714448ff3..4e0de22f0166 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2890,7 +2890,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
>   	 */
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
>   	list_for_each_entry(request, &engine->timeline.requests, link) {
> -		if (__i915_request_completed(request, request->global_seqno))
> +		if (i915_request_completed(request))
>   			continue;
>   
>   		active = request;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d61e86c6a1d1..bb2885f1dc1e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -199,6 +199,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
>   	spin_unlock(&engine->timeline.lock);
>   
>   	spin_lock(&rq->lock);
> +	i915_request_mark_complete(rq);
>   	if (!i915_request_signaled(rq))
>   		dma_fence_signal_locked(&rq->fence);
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
> @@ -621,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq->ring = ce->ring;
>   	rq->timeline = ce->ring->timeline;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
> -	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
> +	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
>   
>   	spin_lock_init(&rq->lock);
>   	dma_fence_init(&rq->fence,
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index ade010fe6e26..96c586d6ff4d 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -289,6 +289,7 @@ long i915_request_wait(struct i915_request *rq,
>   
>   static inline bool i915_request_signaled(const struct i915_request *rq)
>   {
> +	/* The request may live longer than its HWSP, so check flags first! */
>   	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
>   }
>   
> @@ -340,32 +341,23 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
>    */
>   static inline bool i915_request_started(const struct i915_request *rq)
>   {
> -	u32 seqno;
> -
> -	seqno = i915_request_global_seqno(rq);
> -	if (!seqno) /* not yet submitted to HW */
> -		return false;
> +	if (i915_request_signaled(rq))
> +		return true;
>   
> -	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
> -}
> -
> -static inline bool
> -__i915_request_completed(const struct i915_request *rq, u32 seqno)
> -{
> -	GEM_BUG_ON(!seqno);
> -	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
> -		seqno == i915_request_global_seqno(rq);
> +	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
>   }
>   
>   static inline bool i915_request_completed(const struct i915_request *rq)
>   {
> -	u32 seqno;
> +	if (i915_request_signaled(rq))
> +		return true;
>   
> -	seqno = i915_request_global_seqno(rq);
> -	if (!seqno)
> -		return false;
> +	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno);
> +}
>   
> -	return __i915_request_completed(rq, seqno);
> +static inline void i915_request_mark_complete(struct i915_request *rq)
> +{
> +	rq->hwsp_seqno = (u32 *)&rq->fence.seqno; /* decouple from HWSP */
>   }
>   
>   void i915_retire_requests(struct drm_i915_private *i915);
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 12e5a2bc825c..09edf488f711 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -756,6 +756,7 @@ static void nop_submit_request(struct i915_request *request)
>   
>   	spin_lock_irqsave(&request->engine->timeline.lock, flags);
>   	__i915_request_submit(request);
> +	i915_request_mark_complete(request);
>   	intel_engine_write_global_seqno(request->engine, request->global_seqno);
>   	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 46eb818ed309..b0f6b1d904a5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -227,6 +227,12 @@ static inline u32 i915_ggtt_pin_bias(struct i915_vma *vma)
>   }
>   
>   /* XXX inline spaghetti */
> +static inline u32 i915_timeline_seqno_address(const struct i915_timeline *tl)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	return i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
> +}
> +
>   static inline struct i915_timeline_hwsp *
>   i915_timeline_hwsp(const struct i915_timeline *tl)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index c850d131d8c3..e532b4b27239 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1374,9 +1374,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
>   				char hdr[80];
>   
>   				snprintf(hdr, sizeof(hdr),
> -					 "\t\tELSP[%d] count=%d, ring->start=%08x, rq: ",
> +					 "\t\tELSP[%d] count=%d, ring: {start:%08x, hwsp:%08x}, rq: ",
>   					 idx, count,
> -					 i915_ggtt_offset(rq->ring->vma));
> +					 i915_ggtt_offset(rq->ring->vma),
> +					 i915_timeline_seqno_address(rq->timeline));
>   				print_request(m, rq, hdr);
>   			} else {
>   				drm_printf(m, "\t\tELSP[%d] idle\n", idx);
> @@ -1486,6 +1487,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   			   rq->ring->emit);
>   		drm_printf(m, "\t\tring->space:  0x%08x\n",
>   			   rq->ring->space);
> +		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
> +			   i915_timeline_seqno_address(rq->timeline));
>   
>   		print_request_ring(m, rq);
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5c830a1ca332..1bf178ca3e00 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -857,10 +857,10 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry(rq, &engine->timeline.requests, link) {
>   		GEM_BUG_ON(!rq->global_seqno);
>   
> -		if (i915_request_signaled(rq))
> -			continue;
> +		if (!i915_request_signaled(rq))
> +			dma_fence_set_error(&rq->fence, -EIO);
>   
> -		dma_fence_set_error(&rq->fence, -EIO);
> +		i915_request_mark_complete(rq);
>   	}
>   
>   	/* Flush the queued requests to the timeline list (for retiring). */
> @@ -870,9 +870,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   
>   		priolist_for_each_request_consume(rq, rn, p, i) {
>   			list_del_init(&rq->sched.link);
> -
> -			dma_fence_set_error(&rq->fence, -EIO);
>   			__i915_request_submit(rq);
> +			dma_fence_set_error(&rq->fence, -EIO);
> +			i915_request_mark_complete(rq);
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> @@ -2054,31 +2054,40 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
>   	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
>   
> -	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
> +	cs = gen8_emit_ggtt_write(cs,
> +				  request->fence.seqno,
> +				  i915_timeline_seqno_address(request->timeline));
> +
> +	cs = gen8_emit_ggtt_write(cs,
> +				  request->global_seqno,
>   				  intel_hws_seqno_address(request->engine));
> +
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> +
>   	request->tail = intel_ring_offset(request, cs);
>   	assert_ring_tail_valid(request->ring, request->tail);
>   
>   	gen8_emit_wa_tail(request, cs);
>   }
> -static const int gen8_emit_breadcrumb_sz = 6 + WA_TAIL_DWORDS;
> +static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
>   
>   static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   {
> -	/* We're using qword write, seqno should be aligned to 8 bytes. */
> -	BUILD_BUG_ON(I915_GEM_HWS_INDEX & 1);
> -
>   	cs = gen8_emit_ggtt_write_rcs(cs,
> -				      request->global_seqno,
> -				      intel_hws_seqno_address(request->engine),
> +				      request->fence.seqno,
> +				      i915_timeline_seqno_address(request->timeline),
>   				      PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
>   				      PIPE_CONTROL_DEPTH_CACHE_FLUSH |
>   				      PIPE_CONTROL_DC_FLUSH_ENABLE |
>   				      PIPE_CONTROL_FLUSH_ENABLE |
>   				      PIPE_CONTROL_CS_STALL);
>   
> +	cs = gen8_emit_ggtt_write_rcs(cs,
> +				      request->global_seqno,
> +				      intel_hws_seqno_address(request->engine),
> +				      PIPE_CONTROL_CS_STALL);
> +
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   
> @@ -2087,7 +2096,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   
>   	gen8_emit_wa_tail(request, cs);
>   }
> -static const int gen8_emit_breadcrumb_rcs_sz = 8 + WA_TAIL_DWORDS;
> +static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
>   
>   static int gen8_init_rcs_context(struct i915_request *rq)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index cad25f7b8c2e..751bd4e7da42 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -326,6 +326,12 @@ static void gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   		 PIPE_CONTROL_DC_FLUSH_ENABLE |
>   		 PIPE_CONTROL_QW_WRITE |
>   		 PIPE_CONTROL_CS_STALL);
> +	*cs++ = i915_timeline_seqno_address(rq->timeline) |
> +		PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
>   	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
>   	*cs++ = rq->global_seqno;
>   
> @@ -335,7 +341,7 @@ static void gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int gen6_rcs_emit_breadcrumb_sz = 14;
> +static const int gen6_rcs_emit_breadcrumb_sz = 18;
>   
>   static int
>   gen7_render_ring_cs_stall_wa(struct i915_request *rq)
> @@ -426,6 +432,13 @@ static void gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   		 PIPE_CONTROL_QW_WRITE |
>   		 PIPE_CONTROL_GLOBAL_GTT_IVB |
>   		 PIPE_CONTROL_CS_STALL);
> +	*cs++ = i915_timeline_seqno_address(rq->timeline);
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = (PIPE_CONTROL_QW_WRITE |
> +		 PIPE_CONTROL_GLOBAL_GTT_IVB |
> +		 PIPE_CONTROL_CS_STALL);
>   	*cs++ = intel_hws_seqno_address(rq->engine);
>   	*cs++ = rq->global_seqno;
>   
> @@ -435,27 +448,37 @@ static void gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int gen7_rcs_emit_breadcrumb_sz = 6;
> +static const int gen7_rcs_emit_breadcrumb_sz = 10;
>   
>   static void gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   {
> -	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
> -	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->global_seqno;
> +
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int gen6_xcs_emit_breadcrumb_sz = 4;
> +static const int gen6_xcs_emit_breadcrumb_sz = 8;
>   
>   #define GEN7_XCS_WA 32
>   static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   {
>   	int i;
>   
> -	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
> -	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = rq->fence.seqno;
> +
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->global_seqno;
>   
>   	for (i = 0; i < GEN7_XCS_WA; i++) {
> @@ -469,12 +492,11 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = 0;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int gen7_xcs_emit_breadcrumb_sz = 8 + GEN7_XCS_WA * 3;
> +static const int gen7_xcs_emit_breadcrumb_sz = 10 + GEN7_XCS_WA * 3;
>   #undef GEN7_XCS_WA
>   
>   static void set_hwstam(struct intel_engine_cs *engine, u32 mask)
> @@ -734,7 +756,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
>   	rq = NULL;
>   	spin_lock_irqsave(&tl->lock, flags);
>   	list_for_each_entry(pos, &tl->requests, link) {
> -		if (!__i915_request_completed(pos, pos->global_seqno)) {
> +		if (!i915_request_completed(pos)) {
>   			rq = pos;
>   			break;
>   		}
> @@ -876,10 +898,10 @@ static void cancel_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry(request, &engine->timeline.requests, link) {
>   		GEM_BUG_ON(!request->global_seqno);
>   
> -		if (i915_request_signaled(request))
> -			continue;
> +		if (!i915_request_signaled(request))
> +			dma_fence_set_error(&request->fence, -EIO);
>   
> -		dma_fence_set_error(&request->fence, -EIO);
> +		i915_request_mark_complete(request);
>   	}
>   
>   	intel_write_status_page(engine,
> @@ -903,27 +925,38 @@ static void i9xx_submit_request(struct i915_request *request)
>   
>   static void i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   {
> +	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
> +
>   	*cs++ = MI_FLUSH;
>   
> +	*cs++ = MI_STORE_DWORD_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +	*cs++ = rq->fence.seqno;
> +
>   	*cs++ = MI_STORE_DWORD_INDEX;
>   	*cs++ = I915_GEM_HWS_INDEX_ADDR;
>   	*cs++ = rq->global_seqno;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int i9xx_emit_breadcrumb_sz = 6;
> +static const int i9xx_emit_breadcrumb_sz = 8;
>   
>   #define GEN5_WA_STORES 8 /* must be at least 1! */
>   static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   {
>   	int i;
>   
> +	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
> +
>   	*cs++ = MI_FLUSH;
>   
> +	*cs++ = MI_STORE_DWORD_INDEX;
> +	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +	*cs++ = rq->fence.seqno;
> +
>   	BUILD_BUG_ON(GEN5_WA_STORES < 1);
>   	for (i = 0; i < GEN5_WA_STORES; i++) {
>   		*cs++ = MI_STORE_DWORD_INDEX;
> @@ -932,11 +965,12 @@ static void gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	}
>   
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
>   }
> -static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 2;
> +static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 6;
>   #undef GEN5_WA_STORES
>   
>   static void
> @@ -1163,6 +1197,10 @@ int intel_ring_pin(struct intel_ring *ring)
>   
>   	GEM_BUG_ON(ring->vaddr);
>   
> +	ret = i915_timeline_pin(ring->timeline);
> +	if (ret)
> +		return ret;
> +
>   	flags = PIN_GLOBAL;
>   
>   	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
> @@ -1179,28 +1217,32 @@ int intel_ring_pin(struct intel_ring *ring)
>   		else
>   			ret = i915_gem_object_set_to_cpu_domain(vma->obj, true);
>   		if (unlikely(ret))
> -			return ret;
> +			goto unpin_timeline;
>   	}
>   
>   	ret = i915_vma_pin(vma, 0, 0, flags);
>   	if (unlikely(ret))
> -		return ret;
> +		goto unpin_timeline;
>   
>   	if (i915_vma_is_map_and_fenceable(vma))
>   		addr = (void __force *)i915_vma_pin_iomap(vma);
>   	else
>   		addr = i915_gem_object_pin_map(vma->obj, map);
> -	if (IS_ERR(addr))
> -		goto err;
> +	if (IS_ERR(addr)) {
> +		ret = PTR_ERR(addr);
> +		goto unpin_ring;
> +	}
>   
>   	vma->obj->pin_global++;
>   
>   	ring->vaddr = addr;
>   	return 0;
>   
> -err:
> +unpin_ring:
>   	i915_vma_unpin(vma);
> -	return PTR_ERR(addr);
> +unpin_timeline:
> +	i915_timeline_unpin(ring->timeline);
> +	return ret;
>   }
>   
>   void intel_ring_reset(struct intel_ring *ring, u32 tail)
> @@ -1229,6 +1271,8 @@ void intel_ring_unpin(struct intel_ring *ring)
>   
>   	ring->vma->obj->pin_global--;
>   	i915_vma_unpin(ring->vma);
> +
> +	i915_timeline_unpin(ring->timeline);
>   }
>   
>   static struct i915_vma *
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index ca95ab278da3..c0a408828415 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -30,6 +30,17 @@ struct mock_ring {
>   	struct i915_timeline timeline;
>   };
>   
> +static void mock_timeline_pin(struct i915_timeline *tl)
> +{
> +	tl->pin_count++;
> +}
> +
> +static void mock_timeline_unpin(struct i915_timeline *tl)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	tl->pin_count--;
> +}
> +
>   static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
>   {
>   	const unsigned long sz = PAGE_SIZE / 2;
> @@ -76,6 +87,8 @@ static void advance(struct mock_request *request)
>   {
>   	list_del_init(&request->link);
>   	mock_seqno_advance(request->base.engine, request->base.global_seqno);
> +	i915_request_mark_complete(&request->base);
> +	GEM_BUG_ON(!i915_request_completed(&request->base));
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
> @@ -108,6 +121,7 @@ static void hw_delay_complete(struct timer_list *t)
>   
>   static void mock_context_unpin(struct intel_context *ce)
>   {
> +	mock_timeline_unpin(ce->ring->timeline);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -129,6 +143,7 @@ mock_context_pin(struct intel_engine_cs *engine,
>   		 struct i915_gem_context *ctx)
>   {
>   	struct intel_context *ce = to_intel_context(ctx, engine);
> +	int err = -ENOMEM;
>   
>   	if (ce->pin_count++)
>   		return ce;
> @@ -139,13 +154,15 @@ mock_context_pin(struct intel_engine_cs *engine,
>   			goto err;
>   	}
>   
> +	mock_timeline_pin(ce->ring->timeline);
> +
>   	ce->ops = &mock_context_ops;
>   	i915_gem_context_get(ctx);
>   	return ce;
>   
>   err:
>   	ce->pin_count = 0;
> -	return ERR_PTR(-ENOMEM);
> +	return ERR_PTR(err);
>   }
>   
>   static int mock_request_alloc(struct i915_request *request)
> @@ -256,7 +273,6 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   
>   void mock_engine_reset(struct intel_engine_cs *engine)
>   {
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, 0);
>   }
>   
>   void mock_engine_free(struct intel_engine_cs *engine)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/34] drm/i915: Refactor out intel_context_init()
  2019-01-21 22:20 ` [PATCH 07/34] drm/i915: Refactor out intel_context_init() Chris Wilson
@ 2019-01-22 12:32   ` Matthew Auld
  2019-01-22 12:39   ` Mika Kuoppala
  1 sibling, 0 replies; 89+ messages in thread
From: Matthew Auld @ 2019-01-22 12:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Mon, 21 Jan 2019 at 23:41, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Prior to adding a third instance of intel_context_init() and extending
> the information stored therewithin, refactor out the common assignments.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 03/34] drm/i915: Show all active engines on hangcheck
  2019-01-21 22:20 ` [PATCH 03/34] drm/i915: Show all active engines on hangcheck Chris Wilson
@ 2019-01-22 12:33   ` Mika Kuoppala
  2019-01-22 12:42     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Mika Kuoppala @ 2019-01-22 12:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> This turns out to be quite useful if one happens to be debugging
> semaphore deadlocks.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_hangcheck.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 7dc11fcb13de..741441daae32 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -195,10 +195,6 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
>  		break;
>  
>  	case ENGINE_DEAD:
> -		if (GEM_SHOW_DEBUG()) {
> -			struct drm_printer p = drm_debug_printer("hangcheck");
> -			intel_engine_dump(engine, &p, "%s\n", engine->name);
> -		}
>  		break;
>  
>  	default:
> @@ -285,6 +281,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  			wedged |= intel_engine_flag(engine);
>  	}
>  
> +	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
> +		struct drm_printer p = drm_debug_printer("hangcheck");
> +
> +		for_each_engine(engine, dev_priv, id) {
> +			if (intel_engine_is_idle(engine))
> +				continue;

Looks rather harmless tho there is that local_bh_disable.
I was pondering if it was worthwhile to determine idle here
with more lightweight approach, but as we already use
the exact same method on determining hangcheck action, lets
stick to this as it is should be then in parity with the
engine action we got earlier.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +
> +			intel_engine_dump(engine, &p, "%s\n", engine->name);
> +		}
> +	}
> +
>  	if (wedged) {
>  		dev_err(dev_priv->drm.dev,
>  			"GPU recovery timed out,"
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework
  2019-01-21 22:20 ` [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework Chris Wilson
@ 2019-01-22 12:37   ` Matthew Auld
  0 siblings, 0 replies; 89+ messages in thread
From: Matthew Auld @ 2019-01-22 12:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Mon, 21 Jan 2019 at 22:22, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Before adding yet another copy of struct live_test and its handler,
> refactor the existing code into a common framework for live selftests.
> For many live selftests, we want to know if the GPU hung or otherwise
> misbehaved during the execution of the test (beyond any infraction in
> the behaviour under test), live_test provides this by comparing the
> GPU state before and after, alerting if it unexpectedly changed (e.g.
> the reset counter changed). It also ensures that the GPU is idle before
> and after the test, so that residual code running on the GPU is flushed
> before testing.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/34] drm/i915: Refactor out intel_context_init()
  2019-01-21 22:20 ` [PATCH 07/34] drm/i915: Refactor out intel_context_init() Chris Wilson
  2019-01-22 12:32   ` Matthew Auld
@ 2019-01-22 12:39   ` Mika Kuoppala
  2019-01-22 12:48     ` Chris Wilson
  1 sibling, 1 reply; 89+ messages in thread
From: Mika Kuoppala @ 2019-01-22 12:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Prior to adding a third instance of intel_context_init() and extending
> the information stored therewithin, refactor out the common assignments.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c       | 7 ++-----
>  drivers/gpu/drm/i915/i915_gem_context.h       | 8 ++++++++
>  drivers/gpu/drm/i915/selftests/mock_context.c | 7 ++-----
>  3 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 5933adbe3d99..fae68c4c4683 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -338,11 +338,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>  	ctx->i915 = dev_priv;
>  	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
>  
> -	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> -		struct intel_context *ce = &ctx->__engine[n];
> -
> -		ce->gem_context = ctx;
> -	}
> +	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
> +		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
>  
>  	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>  	INIT_LIST_HEAD(&ctx->handles_list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index f6d870b1f73e..47d82ce7ba6a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -364,4 +364,12 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
>  	kref_put(&ctx->ref, i915_gem_context_release);
>  }
>  
> +static inline void
> +intel_context_init(struct intel_context *ce,
> +		   struct i915_gem_context *ctx,
> +		   struct intel_engine_cs *engine)
> +{
> +	ce->gem_context = ctx;
> +}
> +

Audience was also waiting intel_context_init_engines()

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  #endif /* !__I915_GEM_CONTEXT_H__ */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index d937bdff26f9..b646cdcdd602 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -45,11 +45,8 @@ mock_context(struct drm_i915_private *i915,
>  	INIT_LIST_HEAD(&ctx->handles_list);
>  	INIT_LIST_HEAD(&ctx->hw_id_link);
>  
> -	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> -		struct intel_context *ce = &ctx->__engine[n];
> -
> -		ce->gem_context = ctx;
> -	}
> +	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
> +		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
>  
>  	ret = i915_gem_context_pin_hw_id(ctx);
>  	if (ret < 0)
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 03/34] drm/i915: Show all active engines on hangcheck
  2019-01-22 12:33   ` Mika Kuoppala
@ 2019-01-22 12:42     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 12:42 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-01-22 12:33:00)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > This turns out to be quite useful if one happens to be debugging
> > semaphore deadlocks.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/intel_hangcheck.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index 7dc11fcb13de..741441daae32 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -195,10 +195,6 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
> >               break;
> >  
> >       case ENGINE_DEAD:
> > -             if (GEM_SHOW_DEBUG()) {
> > -                     struct drm_printer p = drm_debug_printer("hangcheck");
> > -                     intel_engine_dump(engine, &p, "%s\n", engine->name);
> > -             }
> >               break;
> >  
> >       default:
> > @@ -285,6 +281,17 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >                       wedged |= intel_engine_flag(engine);
> >       }
> >  
> > +     if (GEM_SHOW_DEBUG() && (hung | stuck)) {
> > +             struct drm_printer p = drm_debug_printer("hangcheck");
> > +
> > +             for_each_engine(engine, dev_priv, id) {
> > +                     if (intel_engine_is_idle(engine))
> > +                             continue;
> 
> Looks rather harmless tho there is that local_bh_disable.
> I was pondering if it was worthwhile to determine idle here
> with more lightweight approach, but as we already use
> the exact same method on determining hangcheck action, lets
> stick to this as it is should be then in parity with the
> engine action we got earlier.

Plus it's only for glancing at the dmesg; the error state is meant to be
the be-all-end-all of debugging information. I just find it convenient
when watching netconsole and most kernel bugs can be deduced from the
register state itself.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 07/34] drm/i915: Refactor out intel_context_init()
  2019-01-22 12:39   ` Mika Kuoppala
@ 2019-01-22 12:48     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 12:48 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-01-22 12:39:08)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > +static inline void
> > +intel_context_init(struct intel_context *ce,
> > +                struct i915_gem_context *ctx,
> > +                struct intel_engine_cs *engine)
> > +{
> > +     ce->gem_context = ctx;
> > +}
> > +
> 
> Audience was also waiting intel_context_init_engines()

struct intel_context is the per-engine instance, and it's not actually
guaranteed that there will be a contiguous set :) One should always skip
to the end of the novel to find out it was the butler who did it.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 25/34] drm/i915: Track active timelines
  2019-01-21 22:21 ` [PATCH 25/34] drm/i915: Track active timelines Chris Wilson
@ 2019-01-22 14:56   ` Tvrtko Ursulin
  2019-01-22 15:17     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 14:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> Now that we pin timelines around use, we have a clearly defined lifetime
> and convenient points at which we can track only the active timelines.
> This allows us to reduce the list iteration to only consider those
> active timelines and not all.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h      |  2 +-
>   drivers/gpu/drm/i915/i915_gem.c      |  4 +--
>   drivers/gpu/drm/i915/i915_reset.c    |  2 +-
>   drivers/gpu/drm/i915/i915_timeline.c | 39 ++++++++++++++++++----------
>   4 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c00eaf2889fb..5577e0e1034f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1977,7 +1977,7 @@ struct drm_i915_private {
>   
>   		struct i915_gt_timelines {
>   			struct mutex mutex; /* protects list, tainted by GPU */
> -			struct list_head list;
> +			struct list_head active_list;
>   
>   			/* Pack multiple timelines' seqnos into the same page */
>   			spinlock_t hwsp_lock;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 4e0de22f0166..9c499edb4c13 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3246,7 +3246,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>   		return timeout;
>   
>   	mutex_lock(&gt->mutex);
> -	list_for_each_entry(tl, &gt->list, link) {
> +	list_for_each_entry(tl, &gt->active_list, link) {
>   		struct i915_request *rq;
>   
>   		rq = i915_gem_active_get_unlocked(&tl->last_request);
> @@ -3274,7 +3274,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>   
>   		/* restart after reacquiring the lock */
>   		mutex_lock(&gt->mutex);
> -		tl = list_entry(&gt->list, typeof(*tl), link);
> +		tl = list_entry(&gt->active_list, typeof(*tl), link);
>   	}
>   	mutex_unlock(&gt->mutex);
>   
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 09edf488f711..9b9169508139 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -852,7 +852,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>   	 * No more can be submitted until we reset the wedged bit.
>   	 */
>   	mutex_lock(&i915->gt.timelines.mutex);
> -	list_for_each_entry(tl, &i915->gt.timelines.list, link) {
> +	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
>   		struct i915_request *rq;
>   		long timeout;
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 69ee33dfa340..007348b1b469 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -117,7 +117,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   		       const char *name,
>   		       struct i915_vma *hwsp)
>   {
> -	struct i915_gt_timelines *gt = &i915->gt.timelines;
>   	void *vaddr;
>   
>   	/*
> @@ -161,10 +160,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   
>   	i915_syncmap_init(&timeline->sync);
>   
> -	mutex_lock(&gt->mutex);
> -	list_add(&timeline->link, &gt->list);
> -	mutex_unlock(&gt->mutex);
> -
>   	return 0;
>   }
>   
> @@ -173,7 +168,7 @@ void i915_timelines_init(struct drm_i915_private *i915)
>   	struct i915_gt_timelines *gt = &i915->gt.timelines;
>   
>   	mutex_init(&gt->mutex);
> -	INIT_LIST_HEAD(&gt->list);
> +	INIT_LIST_HEAD(&gt->active_list);
>   
>   	spin_lock_init(&gt->hwsp_lock);
>   	INIT_LIST_HEAD(&gt->hwsp_free_list);
> @@ -182,6 +177,24 @@ void i915_timelines_init(struct drm_i915_private *i915)
>   	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
>   }
>   
> +static void timeline_active(struct i915_timeline *tl)
> +{
> +	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> +
> +	mutex_lock(&gt->mutex);
> +	list_add(&tl->link, &gt->active_list);
> +	mutex_unlock(&gt->mutex);
> +}
> +
> +static void timeline_inactive(struct i915_timeline *tl)
> +{
> +	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> +
> +	mutex_lock(&gt->mutex);
> +	list_del(&tl->link);
> +	mutex_unlock(&gt->mutex);
> +}

Bike shedding comments only:
Would it be better to use a verb suffix? Even though timeline_activate 
also wouldn't sound perfect. Since it is file local - activate_timeline? 
Or even just inline to pin/unpin. Unless more gets put into them later..

> +
>   /**
>    * i915_timelines_park - called when the driver idles
>    * @i915: the drm_i915_private device
> @@ -198,7 +211,7 @@ void i915_timelines_park(struct drm_i915_private *i915)
>   	struct i915_timeline *timeline;
>   
>   	mutex_lock(&gt->mutex);
> -	list_for_each_entry(timeline, &gt->list, link) {
> +	list_for_each_entry(timeline, &gt->active_list, link) {
>   		/*
>   		 * All known fences are completed so we can scrap
>   		 * the current sync point tracking and start afresh,
> @@ -212,15 +225,9 @@ void i915_timelines_park(struct drm_i915_private *i915)
>   
>   void i915_timeline_fini(struct i915_timeline *timeline)
>   {
> -	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> -
>   	GEM_BUG_ON(timeline->pin_count);
>   	GEM_BUG_ON(!list_empty(&timeline->requests));
>   
> -	mutex_lock(&gt->mutex);
> -	list_del(&timeline->link);
> -	mutex_unlock(&gt->mutex);
> -
>   	i915_syncmap_free(&timeline->sync);
>   	hwsp_free(timeline);
>   
> @@ -263,6 +270,8 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   	if (err)
>   		goto unpin;
>   
> +	timeline_active(tl);
> +
>   	return 0;
>   
>   unpin:
> @@ -276,6 +285,8 @@ void i915_timeline_unpin(struct i915_timeline *tl)
>   	if (--tl->pin_count)
>   		return;
>   
> +	timeline_inactive(tl);
> +
>   	/*
>   	 * Since this timeline is idle, all bariers upon which we were waiting
>   	 * must also be complete and so we can discard the last used barriers
> @@ -299,7 +310,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
>   {
>   	struct i915_gt_timelines *gt = &i915->gt.timelines;
>   
> -	GEM_BUG_ON(!list_empty(&gt->list));
> +	GEM_BUG_ON(!list_empty(&gt->active_list));
>   	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
>   
>   	mutex_destroy(&gt->mutex);
> 

Never mind the bikeshedding comments:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 25/34] drm/i915: Track active timelines
  2019-01-22 14:56   ` Tvrtko Ursulin
@ 2019-01-22 15:17     ` Chris Wilson
  2019-01-23 22:32       ` John Harrison
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 15:17 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-22 14:56:32)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > Now that we pin timelines around use, we have a clearly defined lifetime
> > and convenient points at which we can track only the active timelines.
> > This allows us to reduce the list iteration to only consider those
> > active timelines and not all.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h      |  2 +-
> >   drivers/gpu/drm/i915/i915_gem.c      |  4 +--
> >   drivers/gpu/drm/i915/i915_reset.c    |  2 +-
> >   drivers/gpu/drm/i915/i915_timeline.c | 39 ++++++++++++++++++----------
> >   4 files changed, 29 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index c00eaf2889fb..5577e0e1034f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1977,7 +1977,7 @@ struct drm_i915_private {
> >   
> >               struct i915_gt_timelines {
> >                       struct mutex mutex; /* protects list, tainted by GPU */
> > -                     struct list_head list;
> > +                     struct list_head active_list;
> >   
> >                       /* Pack multiple timelines' seqnos into the same page */
> >                       spinlock_t hwsp_lock;
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 4e0de22f0166..9c499edb4c13 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3246,7 +3246,7 @@ wait_for_timelines(struct drm_i915_private *i915,
> >               return timeout;
> >   
> >       mutex_lock(&gt->mutex);
> > -     list_for_each_entry(tl, &gt->list, link) {
> > +     list_for_each_entry(tl, &gt->active_list, link) {
> >               struct i915_request *rq;
> >   
> >               rq = i915_gem_active_get_unlocked(&tl->last_request);
> > @@ -3274,7 +3274,7 @@ wait_for_timelines(struct drm_i915_private *i915,
> >   
> >               /* restart after reacquiring the lock */
> >               mutex_lock(&gt->mutex);
> > -             tl = list_entry(&gt->list, typeof(*tl), link);
> > +             tl = list_entry(&gt->active_list, typeof(*tl), link);
> >       }
> >       mutex_unlock(&gt->mutex);
> >   
> > diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> > index 09edf488f711..9b9169508139 100644
> > --- a/drivers/gpu/drm/i915/i915_reset.c
> > +++ b/drivers/gpu/drm/i915/i915_reset.c
> > @@ -852,7 +852,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> >        * No more can be submitted until we reset the wedged bit.
> >        */
> >       mutex_lock(&i915->gt.timelines.mutex);
> > -     list_for_each_entry(tl, &i915->gt.timelines.list, link) {
> > +     list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
> >               struct i915_request *rq;
> >               long timeout;
> >   
> > diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> > index 69ee33dfa340..007348b1b469 100644
> > --- a/drivers/gpu/drm/i915/i915_timeline.c
> > +++ b/drivers/gpu/drm/i915/i915_timeline.c
> > @@ -117,7 +117,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
> >                      const char *name,
> >                      struct i915_vma *hwsp)
> >   {
> > -     struct i915_gt_timelines *gt = &i915->gt.timelines;
> >       void *vaddr;
> >   
> >       /*
> > @@ -161,10 +160,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
> >   
> >       i915_syncmap_init(&timeline->sync);
> >   
> > -     mutex_lock(&gt->mutex);
> > -     list_add(&timeline->link, &gt->list);
> > -     mutex_unlock(&gt->mutex);
> > -
> >       return 0;
> >   }
> >   
> > @@ -173,7 +168,7 @@ void i915_timelines_init(struct drm_i915_private *i915)
> >       struct i915_gt_timelines *gt = &i915->gt.timelines;
> >   
> >       mutex_init(&gt->mutex);
> > -     INIT_LIST_HEAD(&gt->list);
> > +     INIT_LIST_HEAD(&gt->active_list);
> >   
> >       spin_lock_init(&gt->hwsp_lock);
> >       INIT_LIST_HEAD(&gt->hwsp_free_list);
> > @@ -182,6 +177,24 @@ void i915_timelines_init(struct drm_i915_private *i915)
> >       i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
> >   }
> >   
> > +static void timeline_active(struct i915_timeline *tl)
> > +{
> > +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> > +
> > +     mutex_lock(&gt->mutex);
> > +     list_add(&tl->link, &gt->active_list);
> > +     mutex_unlock(&gt->mutex);
> > +}
> > +
> > +static void timeline_inactive(struct i915_timeline *tl)
> > +{
> > +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> > +
> > +     mutex_lock(&gt->mutex);
> > +     list_del(&tl->link);
> > +     mutex_unlock(&gt->mutex);
> > +}
> 
> Bike shedding comments only:
> Would it be better to use a verb suffix? Even though timeline_activate 
> also wouldn't sound perfect. Since it is file local - activate_timeline? 
> Or even just inline to pin/unpin. Unless more gets put into them later..

Haven't got any plans for more here, yet, and was thinking this is a
pinned_list myself. I picked active_list since I was using 'active'
elsewhere for active_ring, active_engines, active_contexts, etc.

I didn't like activate/deactivate enough to switch, and was trying to
avoid reusing pin/unpin along this path:
	i915_timeline_pin -> timeline_pin
begged confusion

[snip]
> Never mind the bikeshedding comments:

There's time enough for someone to open a new pot of paint.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 26/34] drm/i915: Identify active requests
  2019-01-21 22:21 ` [PATCH 26/34] drm/i915: Identify active requests Chris Wilson
@ 2019-01-22 15:34   ` Tvrtko Ursulin
  2019-01-22 15:45     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 15:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> To allow requests to forgo a common execution timeline, one question we
> need to be able to answer is "is this request running?". To track
> whether a request has started on HW, we can emit a breadcrumb at the
> beginning of the request and check its timeline's HWSP to see if the
> breadcrumb has advanced past the start of this request. (This is in
> contrast to the global timeline where we need only ask if we are on the
> global timeline and if the timeline has advanced past the end of the
> previous request.)
> 
> There is still confusion from a preempted request, which has already
> started but relinquished the HW to a high priority request. For the
> common case, this discrepancy should be negligible. However, for
> identification of hung requests, knowing which one was running at the
> time of the hang will be much more important.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  6 +++
>   drivers/gpu/drm/i915/i915_request.c          |  9 ++--
>   drivers/gpu/drm/i915/i915_request.h          |  1 +
>   drivers/gpu/drm/i915/i915_timeline.c         |  1 +
>   drivers/gpu/drm/i915/i915_timeline.h         |  2 +
>   drivers/gpu/drm/i915/intel_engine_cs.c       |  4 +-
>   drivers/gpu/drm/i915/intel_lrc.c             | 47 ++++++++++++++++----
>   drivers/gpu/drm/i915/intel_ringbuffer.c      | 43 ++++++++++--------
>   drivers/gpu/drm/i915/intel_ringbuffer.h      |  6 ++-
>   drivers/gpu/drm/i915/selftests/mock_engine.c |  2 +-
>   10 files changed, 86 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index f250109e1f66..defe7d60bb88 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1976,6 +1976,12 @@ static int eb_submit(struct i915_execbuffer *eb)
>   			return err;
>   	}
>   
> +	if (eb->engine->emit_init_breadcrumb) {
> +		err = eb->engine->emit_init_breadcrumb(eb->request);
> +		if (err)
> +			return err;
> +	}
> +
>   	err = eb->engine->emit_bb_start(eb->request,
>   					eb->batch->node.start +
>   					eb->batch_start_offset,
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index bb2885f1dc1e..0a8a2a1bf55d 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -333,6 +333,7 @@ void i915_request_retire_upto(struct i915_request *rq)
>   
>   static u32 timeline_get_seqno(struct i915_timeline *tl)
>   {
> +	tl->seqno += tl->has_initial_breadcrumb;
>   	return ++tl->seqno;

return tl->seqno += 1 + tl->has_initial_breadcrumb?

Not sure if it would make any difference in the code.

>   }
>   
> @@ -382,8 +383,8 @@ void __i915_request_submit(struct i915_request *request)
>   		intel_engine_enable_signaling(request, false);
>   	spin_unlock(&request->lock);
>   
> -	engine->emit_breadcrumb(request,
> -				request->ring->vaddr + request->postfix);
> +	engine->emit_fini_breadcrumb(request,
> +				     request->ring->vaddr + request->postfix);
>   
>   	/* Transfer from per-context onto the global per-engine timeline */
>   	move_to_timeline(request, &engine->timeline);
> @@ -657,7 +658,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 * around inside i915_request_add() there is sufficient space at
>   	 * the beginning of the ring as well.
>   	 */
> -	rq->reserved_space = 2 * engine->emit_breadcrumb_sz * sizeof(u32);
> +	rq->reserved_space = 2 * engine->emit_fini_breadcrumb_sz * sizeof(u32);

Logic being fini breadcrumb is at least as big as the init one? I can't 
think of any easy asserts to verify that.

Also, a little bit of ring space wastage but I guess we don't care.

>   
>   	/*
>   	 * Record the position of the start of the request so that
> @@ -908,7 +909,7 @@ void i915_request_add(struct i915_request *request)
>   	 * GPU processing the request, we never over-estimate the
>   	 * position of the ring's HEAD.
>   	 */
> -	cs = intel_ring_begin(request, engine->emit_breadcrumb_sz);
> +	cs = intel_ring_begin(request, engine->emit_fini_breadcrumb_sz);
>   	GEM_BUG_ON(IS_ERR(cs));
>   	request->postfix = intel_ring_offset(request, cs);
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 96c586d6ff4d..340d6216791c 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -344,6 +344,7 @@ static inline bool i915_request_started(const struct i915_request *rq)
>   	if (i915_request_signaled(rq))
>   		return true;
>   
> +	/* Remember: started but may have since been preempted! */
>   	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 007348b1b469..7bc9164733bc 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -132,6 +132,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->i915 = i915;
>   	timeline->name = name;
>   	timeline->pin_count = 0;
> +	timeline->has_initial_breadcrumb = !hwsp;
>   
>   	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>   	if (!hwsp) {
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index ab736e2e5707..8caeb66d1cd5 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -48,6 +48,8 @@ struct i915_timeline {
>   	struct i915_vma *hwsp_ggtt;
>   	u32 hwsp_offset;
>   
> +	bool has_initial_breadcrumb;
> +
>   	/**
>   	 * List of breadcrumbs associated with GPU requests currently
>   	 * outstanding.
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index e532b4b27239..2a4c547240a1 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1239,7 +1239,9 @@ static void print_request(struct drm_printer *m,
>   	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
>   		   prefix,
>   		   rq->global_seqno,
> -		   i915_request_completed(rq) ? "!" : "",
> +		   i915_request_completed(rq) ? "!" :
> +		   i915_request_started(rq) ? "*" :
> +		   "",
>   		   rq->fence.context, rq->fence.seqno,
>   		   buf,
>   		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 1bf178ca3e00..0a2d53f19625 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -649,7 +649,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		 * WaIdleLiteRestore:bdw,skl
>   		 * Apply the wa NOOPs to prevent
>   		 * ring:HEAD == rq:TAIL as we resubmit the
> -		 * request. See gen8_emit_breadcrumb() for
> +		 * request. See gen8_emit_fini_breadcrumb() for
>   		 * where we prepare the padding after the
>   		 * end of the request.
>   		 */
> @@ -1294,6 +1294,34 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   	return __execlists_context_pin(engine, ctx, ce);
>   }
>   
> +static int gen8_emit_init_breadcrumb(struct i915_request *rq)
> +{
> +	u32 *cs;
> +
> +	GEM_BUG_ON(!rq->timeline->has_initial_breadcrumb);
> +
> +	cs = intel_ring_begin(rq, 6);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	/*
> +	 * Check if we have been preempted before we even get started.
> +	 *
> +	 * After this point i915_request_started() reports true, even if
> +	 * we get preempted and so are no longer running.
> +	 */
> +	*cs++ = MI_ARB_CHECK;
> +	*cs++ = MI_NOOP;
> +
> +	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
> +	*cs++ = i915_timeline_seqno_address(rq->timeline);
> +	*cs++ = 0;
> +	*cs++ = rq->fence.seqno - 1;
> +
> +	intel_ring_advance(rq, cs);
> +	return 0;
> +}
> +
>   static int emit_pdps(struct i915_request *rq)
>   {
>   	const struct intel_engine_cs * const engine = rq->engine;
> @@ -2049,7 +2077,7 @@ static void gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
>   	request->wa_tail = intel_ring_offset(request, cs);
>   }
>   
> -static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
> +static void gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
>   {
>   	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
>   	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
> @@ -2070,9 +2098,9 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   
>   	gen8_emit_wa_tail(request, cs);
>   }
> -static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
> +static const int gen8_emit_fini_breadcrumb_sz = 10 + WA_TAIL_DWORDS;
>   
> -static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
> +static void gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   {
>   	cs = gen8_emit_ggtt_write_rcs(cs,
>   				      request->fence.seqno,
> @@ -2096,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   
>   	gen8_emit_wa_tail(request, cs);
>   }
> -static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
> +static const int gen8_emit_fini_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;
>   
>   static int gen8_init_rcs_context(struct i915_request *rq)
>   {
> @@ -2188,8 +2216,9 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->request_alloc = execlists_request_alloc;
>   
>   	engine->emit_flush = gen8_emit_flush;
> -	engine->emit_breadcrumb = gen8_emit_breadcrumb;
> -	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_sz;
> +	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> +	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb;
> +	engine->emit_fini_breadcrumb_sz = gen8_emit_fini_breadcrumb_sz;
>   
>   	engine->set_default_submission = intel_execlists_set_default_submission;
>   
> @@ -2302,8 +2331,8 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
>   	/* Override some for render ring. */
>   	engine->init_context = gen8_init_rcs_context;
>   	engine->emit_flush = gen8_emit_flush_render;
> -	engine->emit_breadcrumb = gen8_emit_breadcrumb_rcs;
> -	engine->emit_breadcrumb_sz = gen8_emit_breadcrumb_rcs_sz;
> +	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
> +	engine->emit_fini_breadcrumb_sz = gen8_emit_fini_breadcrumb_rcs_sz;
>   
>   	ret = logical_ring_init(engine);
>   	if (ret)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 751bd4e7da42..f6b30eb46263 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1594,6 +1594,7 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   		err = PTR_ERR(timeline);
>   		goto err;
>   	}
> +	GEM_BUG_ON(timeline->has_initial_breadcrumb);
>   
>   	ring = intel_engine_create_ring(engine, timeline, 32 * PAGE_SIZE);
>   	i915_timeline_put(timeline);
> @@ -1947,6 +1948,7 @@ static int ring_request_alloc(struct i915_request *request)
>   	int ret;
>   
>   	GEM_BUG_ON(!request->hw_context->pin_count);
> +	GEM_BUG_ON(request->timeline->has_initial_breadcrumb);
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
> @@ -2283,11 +2285,16 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   	engine->context_pin = intel_ring_context_pin;
>   	engine->request_alloc = ring_request_alloc;
>   
> -	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> -	engine->emit_breadcrumb_sz = i9xx_emit_breadcrumb_sz;
> +	/*
> +	 * Using a global execution timeline; the previous final breadcrumb is
> +	 * equivalent to our next initial bread so we can elide
> +	 * engine->emit_init_breadcrumb().
> +	 */
> +	engine->emit_fini_breadcrumb = i9xx_emit_breadcrumb;
> +	engine->emit_fini_breadcrumb_sz = i9xx_emit_breadcrumb_sz;
>   	if (IS_GEN(dev_priv, 5)) {
> -		engine->emit_breadcrumb = gen5_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen5_emit_breadcrumb_sz;
> +		engine->emit_fini_breadcrumb = gen5_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb_sz = gen5_emit_breadcrumb_sz;
>   	}
>   
>   	engine->set_default_submission = i9xx_set_default_submission;
> @@ -2317,13 +2324,13 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
>   	if (INTEL_GEN(dev_priv) >= 7) {
>   		engine->init_context = intel_rcs_ctx_init;
>   		engine->emit_flush = gen7_render_ring_flush;
> -		engine->emit_breadcrumb = gen7_rcs_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen7_rcs_emit_breadcrumb_sz;
> +		engine->emit_fini_breadcrumb = gen7_rcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb_sz = gen7_rcs_emit_breadcrumb_sz;
>   	} else if (IS_GEN(dev_priv, 6)) {
>   		engine->init_context = intel_rcs_ctx_init;
>   		engine->emit_flush = gen6_render_ring_flush;
> -		engine->emit_breadcrumb = gen6_rcs_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen6_rcs_emit_breadcrumb_sz;
> +		engine->emit_fini_breadcrumb = gen6_rcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb_sz = gen6_rcs_emit_breadcrumb_sz;
>   	} else if (IS_GEN(dev_priv, 5)) {
>   		engine->emit_flush = gen4_render_ring_flush;
>   	} else {
> @@ -2360,11 +2367,11 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine)
>   		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
>   
>   		if (IS_GEN(dev_priv, 6)) {
> -			engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> -			engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> +			engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
> +			engine->emit_fini_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
>   		} else {
> -			engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> -			engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
> +			engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +			engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
>   		}
>   	} else {
>   		engine->emit_flush = bsd_ring_flush;
> @@ -2389,11 +2396,11 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine)
>   	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
>   
>   	if (IS_GEN(dev_priv, 6)) {
> -		engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
> +		engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb_sz = gen6_xcs_emit_breadcrumb_sz;
>   	} else {
> -		engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> -		engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
> +		engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +		engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
>   	}
>   
>   	return intel_init_ring_buffer(engine);
> @@ -2412,8 +2419,8 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
>   	engine->irq_enable = hsw_vebox_irq_enable;
>   	engine->irq_disable = hsw_vebox_irq_disable;
>   
> -	engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
> -	engine->emit_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
> +	engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
> +	engine->emit_fini_breadcrumb_sz = gen7_xcs_emit_breadcrumb_sz;
>   
>   	return intel_init_ring_buffer(engine);
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index a792bacf2930..d3d4f3667afb 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -463,8 +463,10 @@ struct intel_engine_cs {
>   					 unsigned int dispatch_flags);
>   #define I915_DISPATCH_SECURE BIT(0)
>   #define I915_DISPATCH_PINNED BIT(1)
> -	void		(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
> -	int		emit_breadcrumb_sz;
> +	int		(*emit_init_breadcrumb)(struct i915_request *rq);
> +	void		(*emit_fini_breadcrumb)(struct i915_request *rq,
> +						u32 *cs);
> +	unsigned int	emit_fini_breadcrumb_sz;
>   
>   	/* Pass the request to the hardware queue (e.g. directly into
>   	 * the legacy ringbuffer or to the end of an execlist).
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index c0a408828415..2515cffb4490 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -227,7 +227,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.context_pin = mock_context_pin;
>   	engine->base.request_alloc = mock_request_alloc;
>   	engine->base.emit_flush = mock_emit_flush;
> -	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
> +	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
>   
>   	if (i915_timeline_init(i915,
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 26/34] drm/i915: Identify active requests
  2019-01-22 15:34   ` Tvrtko Ursulin
@ 2019-01-22 15:45     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 15:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-22 15:34:07)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > To allow requests to forgo a common execution timeline, one question we
> > need to be able to answer is "is this request running?". To track
> > whether a request has started on HW, we can emit a breadcrumb at the
> > beginning of the request and check its timeline's HWSP to see if the
> > breadcrumb has advanced past the start of this request. (This is in
> > contrast to the global timeline where we need only ask if we are on the
> > global timeline and if the timeline has advanced past the end of the
> > previous request.)
> > 
> > There is still confusion from a preempted request, which has already
> > started but relinquished the HW to a high priority request. For the
> > common case, this discrepancy should be negligible. However, for
> > identification of hung requests, knowing which one was running at the
> > time of the hang will be much more important.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  6 +++
> >   drivers/gpu/drm/i915/i915_request.c          |  9 ++--
> >   drivers/gpu/drm/i915/i915_request.h          |  1 +
> >   drivers/gpu/drm/i915/i915_timeline.c         |  1 +
> >   drivers/gpu/drm/i915/i915_timeline.h         |  2 +
> >   drivers/gpu/drm/i915/intel_engine_cs.c       |  4 +-
> >   drivers/gpu/drm/i915/intel_lrc.c             | 47 ++++++++++++++++----
> >   drivers/gpu/drm/i915/intel_ringbuffer.c      | 43 ++++++++++--------
> >   drivers/gpu/drm/i915/intel_ringbuffer.h      |  6 ++-
> >   drivers/gpu/drm/i915/selftests/mock_engine.c |  2 +-
> >   10 files changed, 86 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index f250109e1f66..defe7d60bb88 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -1976,6 +1976,12 @@ static int eb_submit(struct i915_execbuffer *eb)
> >                       return err;
> >       }
> >   
> > +     if (eb->engine->emit_init_breadcrumb) {
> > +             err = eb->engine->emit_init_breadcrumb(eb->request);
> > +             if (err)
> > +                     return err;
> > +     }
> > +
> >       err = eb->engine->emit_bb_start(eb->request,
> >                                       eb->batch->node.start +
> >                                       eb->batch_start_offset,
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index bb2885f1dc1e..0a8a2a1bf55d 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -333,6 +333,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> >   
> >   static u32 timeline_get_seqno(struct i915_timeline *tl)
> >   {
> > +     tl->seqno += tl->has_initial_breadcrumb;
> >       return ++tl->seqno;
> 
> return tl->seqno += 1 + tl->has_initial_breadcrumb?
> 
> Not sure if it would make any difference in the code.

Identical code generation, but looks better than conditional increment
then pre-increment.

> > @@ -382,8 +383,8 @@ void __i915_request_submit(struct i915_request *request)
> >               intel_engine_enable_signaling(request, false);
> >       spin_unlock(&request->lock);
> >   
> > -     engine->emit_breadcrumb(request,
> > -                             request->ring->vaddr + request->postfix);
> > +     engine->emit_fini_breadcrumb(request,
> > +                                  request->ring->vaddr + request->postfix);
> >   
> >       /* Transfer from per-context onto the global per-engine timeline */
> >       move_to_timeline(request, &engine->timeline);
> > @@ -657,7 +658,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >        * around inside i915_request_add() there is sufficient space at
> >        * the beginning of the ring as well.
> >        */
> > -     rq->reserved_space = 2 * engine->emit_breadcrumb_sz * sizeof(u32);
> > +     rq->reserved_space = 2 * engine->emit_fini_breadcrumb_sz * sizeof(u32);
> 
> Logic being fini breadcrumb is at least as big as the init one? I can't 
> think of any easy asserts to verify that.

We emit engine->emit_init_breadcrumbs() normally, it's just
engine->emit_fini_breadcrumbs() that is in the reserved portion.

The factor of 2 is to waste space on wraparound.
 
> Also, a little bit of ring space wastage but I guess we don't care.

We don't actually waste space, we only use emit_fini_breadcrumbs_sz, we
just flush enough of the ring for 2*sz to be sure that even if we have
to wrap, there's enough room at the start of the ring for our emit.

So overzealous on flushing if the ring is full, in which case we are
throttling a millisecond earlier than is strictly required (given that
the ring already contains about a few seconds worth of batches)

The real problem here is that throttling one client, strangles them all.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-21 22:21 ` [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
@ 2019-01-22 15:50   ` Tvrtko Ursulin
  2019-01-23 12:54     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-22 15:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> The global seqno is defunct and so we have no meaningful indicator of
> forward progress for an engine. You need to listen to the request
> signaling tracepoints instead.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_irq.c   |  2 --
>   drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
>   2 files changed, 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 5fd5080c4ccb..71d11dc2c235 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1209,8 +1209,6 @@ static void notify_ring(struct intel_engine_cs *engine)
>   		wake_up_process(tsk);
>   
>   	rcu_read_unlock();
> -
> -	trace_intel_engine_notify(engine, wait);
>   }
>   
>   static void vlv_c0_read(struct drm_i915_private *dev_priv,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 33d90eca9cdd..cb5bc65d575d 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -750,31 +750,6 @@ trace_i915_request_out(struct i915_request *rq)
>   #endif
>   #endif
>   
> -TRACE_EVENT(intel_engine_notify,
> -	    TP_PROTO(struct intel_engine_cs *engine, bool waiters),
> -	    TP_ARGS(engine, waiters),
> -
> -	    TP_STRUCT__entry(
> -			     __field(u32, dev)
> -			     __field(u16, class)
> -			     __field(u16, instance)
> -			     __field(u32, seqno)
> -			     __field(bool, waiters)
> -			     ),
> -
> -	    TP_fast_assign(
> -			   __entry->dev = engine->i915->drm.primary->index;
> -			   __entry->class = engine->uabi_class;
> -			   __entry->instance = engine->instance;
> -			   __entry->seqno = intel_engine_get_seqno(engine);
> -			   __entry->waiters = waiters;
> -			   ),
> -
> -	    TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
> -		      __entry->dev, __entry->class, __entry->instance,
> -		      __entry->seqno, __entry->waiters)
> -);
> -
>   DEFINE_EVENT(i915_request, i915_request_retire,
>   	    TP_PROTO(struct i915_request *rq),
>   	    TP_ARGS(rq)
> 

I cannot decide if keeping what we can would make it useful. Certainly 
not for debugging intel_engine_breadcrumbs_irq.. a sequence of 
intel_engine_notify(dev, class, instance) -> dma_fence_signaled would be 
a very unreliable trace of what engine actually executed something. What 
do you think?

Regards,

Tvrtko



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 02/34] drm/i915/execlists: Suppress preempting self
  2019-01-21 22:20 ` [PATCH 02/34] drm/i915/execlists: Suppress preempting self Chris Wilson
@ 2019-01-22 22:18   ` John Harrison
  2019-01-22 22:38     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: John Harrison @ 2019-01-22 22:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 1/21/2019 14:20, Chris Wilson wrote:
> In order to avoid preempting ourselves, we currently refuse to schedule
> the tasklet if we reschedule an inflight context. However, this glosses
> over a few issues such as what happens after a CS completion event and
> we then preempt the newly executing context with itself, or if something
> else causes a tasklet_schedule triggering the same evaluation to
> preempt the active context with itself.
>
> To avoid the extra complications, after deciding that we have
> potentially queued a request with higher priority than the currently
> executing request, inspect the head of the queue to see if it is indeed
> higher priority from another context.
>
> References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Can you explain what was wrong with the previous version of this patch 
(drm/i915/execlists: Store the highest priority context)? It seemed simpler.

John.


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 08/34] drm/i915: Make all GPU resets atomic
  2019-01-21 22:20 ` [PATCH 08/34] drm/i915: Make all GPU resets atomic Chris Wilson
@ 2019-01-22 22:19   ` John Harrison
  2019-01-22 22:27     ` Chris Wilson
  2019-01-23  8:52     ` Mika Kuoppala
  0 siblings, 2 replies; 89+ messages in thread
From: John Harrison @ 2019-01-22 22:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 1/21/2019 14:20, Chris Wilson wrote:
> In preparation for the next few commits, make resetting the GPU atomic.
> Currently, we have prepared gen6+ for atomic resetting of individual
> engines, but now there is a requirement to perform the whole device
> level reset (just the register poking) from inside an atomic context.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_reset.c | 50 +++++++++++++++++--------------
>   1 file changed, 27 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 342d9ee42601..b9d0ea70361c 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -144,14 +144,14 @@ static int i915_do_reset(struct drm_i915_private *i915,
>   
>   	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
>   	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
> -	usleep_range(50, 200);
> -	err = wait_for(i915_in_reset(pdev), 500);
> +	udelay(50);
> +	err = wait_for_atomic(i915_in_reset(pdev), 50);
Is it known to be safe to reduce all of these time out values? Where did 
the originally 500ms value come from? Is there any chance of getting 
sporadic failures because 50ms is borderline in the worst case scenario? 
It still sounds huge but an order of magnitude change in a timeout 
always seems worrying!

>   
>   	/* Clear the reset request. */
>   	pci_write_config_byte(pdev, I915_GDRST, 0);
> -	usleep_range(50, 200);
> +	udelay(50);
>   	if (!err)
> -		err = wait_for(!i915_in_reset(pdev), 500);
> +		err = wait_for_atomic(!i915_in_reset(pdev), 50);
>   
>   	return err;
>   }
> @@ -171,7 +171,7 @@ static int g33_do_reset(struct drm_i915_private *i915,
>   	struct pci_dev *pdev = i915->drm.pdev;
>   
>   	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
> -	return wait_for(g4x_reset_complete(pdev), 500);
> +	return wait_for_atomic(g4x_reset_complete(pdev), 50);
>   }
>   
>   static int g4x_do_reset(struct drm_i915_private *dev_priv,
> @@ -182,13 +182,13 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>   	int ret;
>   
>   	/* WaVcpClkGateDisableForMediaReset:ctg,elk */
> -	I915_WRITE(VDECCLK_GATE_D,
> -		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
> -	POSTING_READ(VDECCLK_GATE_D);
> +	I915_WRITE_FW(VDECCLK_GATE_D,
> +		      I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
> +	POSTING_READ_FW(VDECCLK_GATE_D);
>   
>   	pci_write_config_byte(pdev, I915_GDRST,
>   			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
> -	ret =  wait_for(g4x_reset_complete(pdev), 500);
> +	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
>   		goto out;
> @@ -196,7 +196,7 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>   
>   	pci_write_config_byte(pdev, I915_GDRST,
>   			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
> -	ret =  wait_for(g4x_reset_complete(pdev), 500);
> +	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
>   		goto out;
> @@ -205,9 +205,9 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>   out:
>   	pci_write_config_byte(pdev, I915_GDRST, 0);
>   
> -	I915_WRITE(VDECCLK_GATE_D,
> -		   I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
> -	POSTING_READ(VDECCLK_GATE_D);
> +	I915_WRITE_FW(VDECCLK_GATE_D,
> +		      I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
> +	POSTING_READ_FW(VDECCLK_GATE_D);
>   
>   	return ret;
>   }
> @@ -218,27 +218,29 @@ static int ironlake_do_reset(struct drm_i915_private *dev_priv,
>   {
>   	int ret;
>   
> -	I915_WRITE(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
> -	ret = intel_wait_for_register(dev_priv,
> -				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
> -				      500);
> +	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
> +	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
> +					   ILK_GRDOM_RESET_ENABLE, 0,
> +					   5000, 0,
> +					   NULL);
These two timeouts are now two orders of magnitude smaller? It was 500ms 
but is now 5000us (=5ms)?

John.


>   	if (ret) {
>   		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
>   		goto out;
>   	}
>   
> -	I915_WRITE(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
> -	ret = intel_wait_for_register(dev_priv,
> -				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
> -				      500);
> +	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
> +	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
> +					   ILK_GRDOM_RESET_ENABLE, 0,
> +					   5000, 0,
> +					   NULL);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
>   		goto out;
>   	}
>   
>   out:
> -	I915_WRITE(ILK_GDSR, 0);
> -	POSTING_READ(ILK_GDSR);
> +	I915_WRITE_FW(ILK_GDSR, 0);
> +	POSTING_READ_FW(ILK_GDSR);
>   	return ret;
>   }
>   
> @@ -572,7 +574,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>   		ret = -ENODEV;
>   		if (reset) {
>   			GEM_TRACE("engine_mask=%x\n", engine_mask);
> +			preempt_disable();
>   			ret = reset(i915, engine_mask, retry);
> +			preempt_enable();
>   		}
>   		if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
>   			break;

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 09/34] drm/i915/guc: Disable global reset
  2019-01-21 22:20 ` [PATCH 09/34] drm/i915/guc: Disable global reset Chris Wilson
@ 2019-01-22 22:23   ` John Harrison
  0 siblings, 0 replies; 89+ messages in thread
From: John Harrison @ 2019-01-22 22:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 1/21/2019 14:20, Chris Wilson wrote:
> The guc (and huc) currently inexcruitably depend on struct_mutex for
> device reinitialisation from inside the reset, and indeed taking any
> mutex here is verboten (as we must be able to reset from underneath any
> of our mutexes). That makes recovering the guc unviable without, for
> example, reserving contiguous vma space and pages for it to use.
>
> The plan to re-enable global reset for the GuC centres around reusing the
> WOPM reserved space at the top of the aperture (that we know we can
> populate a contiguous range large enough to dma xfer the fw image).
>
> In the meantime, hopefully no one even notices as the device-reset is
> only used as a backup to the per-engine resets for handling GPU hangs.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_reset.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index b9d0ea70361c..2961c21d9420 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -590,6 +590,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>   
>   bool intel_has_gpu_reset(struct drm_i915_private *i915)
>   {
> +	if (USES_GUC(i915))
> +		return false;
> +
>   	return intel_get_gpu_reset(i915);
>   }
>   

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 08/34] drm/i915: Make all GPU resets atomic
  2019-01-22 22:19   ` John Harrison
@ 2019-01-22 22:27     ` Chris Wilson
  2019-01-23  8:52     ` Mika Kuoppala
  1 sibling, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 22:27 UTC (permalink / raw)
  To: John Harrison, intel-gfx

Quoting John Harrison (2019-01-22 22:19:04)
> On 1/21/2019 14:20, Chris Wilson wrote:
> > In preparation for the next few commits, make resetting the GPU atomic.
> > Currently, we have prepared gen6+ for atomic resetting of individual
> > engines, but now there is a requirement to perform the whole device
> > level reset (just the register poking) from inside an atomic context.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_reset.c | 50 +++++++++++++++++--------------
> >   1 file changed, 27 insertions(+), 23 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> > index 342d9ee42601..b9d0ea70361c 100644
> > --- a/drivers/gpu/drm/i915/i915_reset.c
> > +++ b/drivers/gpu/drm/i915/i915_reset.c
> > @@ -144,14 +144,14 @@ static int i915_do_reset(struct drm_i915_private *i915,
> >   
> >       /* Assert reset for at least 20 usec, and wait for acknowledgement. */
> >       pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
> > -     usleep_range(50, 200);
> > -     err = wait_for(i915_in_reset(pdev), 500);
> > +     udelay(50);
> > +     err = wait_for_atomic(i915_in_reset(pdev), 50);

> Is it known to be safe to reduce all of these time out values? Where did 
> the originally 500ms value come from?

I chose it entirely upon a whim, picking a huge number unlikely to ever
be exceeded, and if it were we would be right to conclude the HW was
unrecoverable.

> Is there any chance of getting 
> sporadic failures because 50ms is borderline in the worst case scenario? 
> It still sounds huge but an order of magnitude change in a timeout 
> always seems worrying!

Whereas 50us is more in line with the little bits of documentation that
still exist.

> > @@ -218,27 +218,29 @@ static int ironlake_do_reset(struct drm_i915_private *dev_priv,
> >   {
> >       int ret;
> >   
> > -     I915_WRITE(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
> > -     ret = intel_wait_for_register(dev_priv,
> > -                                   ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
> > -                                   500);
> > +     I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
> > +     ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
> > +                                        ILK_GRDOM_RESET_ENABLE, 0,
> > +                                        5000, 0,
> > +                                        NULL);
> These two timeouts are now two orders of magnitude smaller? It was 500ms 
> but is now 5000us (=5ms)?

0.5 was the same number plucked from the air. No guidance here, that I
know of, except we have lots of runs through CI to try and estimate
bounds.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 02/34] drm/i915/execlists: Suppress preempting self
  2019-01-22 22:18   ` John Harrison
@ 2019-01-22 22:38     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-22 22:38 UTC (permalink / raw)
  To: John Harrison, intel-gfx

Quoting John Harrison (2019-01-22 22:18:46)
> On 1/21/2019 14:20, Chris Wilson wrote:
> > In order to avoid preempting ourselves, we currently refuse to schedule
> > the tasklet if we reschedule an inflight context. However, this glosses
> > over a few issues such as what happens after a CS completion event and
> > we then preempt the newly executing context with itself, or if something
> > else causes a tasklet_schedule triggering the same evaluation to
> > preempt the active context with itself.
> >
> > To avoid the extra complications, after deciding that we have
> > potentially queued a request with higher priority than the currently
> > executing request, inspect the head of the queue to see if it is indeed
> > higher priority from another context.
> >
> > References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Can you explain what was wrong with the previous version of this patch 
> (drm/i915/execlists: Store the highest priority context)? It seemed simpler.

The goal here is to be a more general suppression mechanism than the
first version. queue_priority is a hint and can't be trusted as we may
have set it for an inflight request since completed. Given that it tells
us that a preemption point _was_ required, but we don't want to forcibly
inject an idle barrier, we inspect the queue instead and not take the
hint at face value. In that light, queue_context is superfluous as we
ignore the ELSP[0] context anyway.

The patch is slightly bigger than it needed to be because I was
refactoring out some changes for later, and a bit of paranoid asserts
from debugging that didn't really belong in the bugfix.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads
  2019-01-21 22:21 ` [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-01-23  0:33   ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-23  0:33 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-01-21 22:21:16)
> We don't want to busywait on the GPU if we have other work to do. If we
> give non-busywaiting workloads higher (initial) priority than workloads
> that require a busywait, we will prioritise work that is ready to run
> immediately.

Fwiw, without preemption using HW semaphore does perturb user visible
scheduling behaviour (enough for me to be able to write a userspace
deadlock).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 12/34] drm/i915: Issue engine resets onto idle engines
  2019-01-21 22:20 ` [PATCH 12/34] drm/i915: Issue engine resets onto idle engines Chris Wilson
@ 2019-01-23  1:18   ` John Harrison
  2019-01-23  1:31     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: John Harrison @ 2019-01-23  1:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 1/21/2019 14:20, Chris Wilson wrote:
> Always perform the requested reset, even if we believe the engine is
> idle. Presumably there was a reason the caller wanted the reset, and in
> the near future we lose the easy tracking for whether the engine is
> idle.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_reset.c             |  4 ----
>   .../gpu/drm/i915/selftests/intel_hangcheck.c  | 22 +++++--------------
>   2 files changed, 6 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 064fc6da1512..d44b095e2860 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -1063,10 +1063,6 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
>   	GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
>   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
>   
> -	if (i915_seqno_passed(intel_engine_get_seqno(engine),
> -			      intel_engine_last_submit(engine)))
> -		return 0;
> -
>   	reset_prepare_engine(engine);
>   
>   	if (msg)
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 8025c7e0bf6c..2c38ea5892d9 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -449,8 +449,6 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>   
>   		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
>   		do {
> -			u32 seqno = intel_engine_get_seqno(engine);
> -
>   			if (active) {
>   				struct i915_request *rq;
>   
> @@ -479,8 +477,6 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>   					break;
>   				}
>   
> -				GEM_BUG_ON(!rq->global_seqno);
> -				seqno = rq->global_seqno - 1;
AFAICT this saved seqno value was never used anyway? It only exists 
inside the loop, was only used in a pr_err earlier in the loop, and the 
start of the loop always (re-)initialises it. Or am I missing some 
hidden macro magic somewhere?


>   				i915_request_put(rq);
>   			}
>   
> @@ -496,11 +492,10 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>   				break;
>   			}
>   
> -			reset_engine_count += active;
>   			if (i915_reset_engine_count(&i915->gpu_error, engine) !=
> -			    reset_engine_count) {
> -				pr_err("%s engine reset %srecorded!\n",
> -				       engine->name, active ? "not " : "");
> +			    ++reset_engine_count) {
> +				pr_err("%s engine reset not recorded!\n",
> +				       engine->name);
>   				err = -EINVAL;
>   				break;
>   			}
> @@ -728,7 +723,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
>   
>   		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
>   		do {
> -			u32 seqno = intel_engine_get_seqno(engine);
>   			struct i915_request *rq = NULL;
>   
>   			if (flags & TEST_ACTIVE) {
> @@ -756,9 +750,6 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
>   					err = -EIO;
>   					break;
>   				}
> -
> -				GEM_BUG_ON(!rq->global_seqno);
> -				seqno = rq->global_seqno - 1;
>   			}
>   
>   			err = i915_reset_engine(engine, NULL);
> @@ -795,10 +786,9 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
>   
>   		reported = i915_reset_engine_count(&i915->gpu_error, engine);
>   		reported -= threads[engine->id].resets;
> -		if (reported != (flags & TEST_ACTIVE ? count : 0)) {
> -			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu, expected %lu reported\n",
> -			       engine->name, test_name, count, reported,
> -			       (flags & TEST_ACTIVE ? count : 0));
> +		if (reported != count) {
> +			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> +			       engine->name, test_name, count, reported);
>   			if (!err)
>   				err = -EINVAL;
>   		}

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 12/34] drm/i915: Issue engine resets onto idle engines
  2019-01-23  1:18   ` John Harrison
@ 2019-01-23  1:31     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-23  1:31 UTC (permalink / raw)
  To: John Harrison, intel-gfx

Quoting John Harrison (2019-01-23 01:18:36)
> On 1/21/2019 14:20, Chris Wilson wrote:
> > @@ -479,8 +477,6 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
> >                                       break;
> >                               }
> >   
> > -                             GEM_BUG_ON(!rq->global_seqno);
> > -                             seqno = rq->global_seqno - 1;
> AFAICT this saved seqno value was never used anyway? It only exists 
> inside the loop, was only used in a pr_err earlier in the loop, and the 
> start of the loop always (re-)initialises it. Or am I missing some 
> hidden macro magic somewhere?

I can't remember what I intended it to be in the first place. History
says that we had to set engine->hangcheck.seqno correctly to force a
per-engine reset. Lost in

commit bba0869b18e44ff2f713c98575ddad8c7c5e9b10
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Apr 6 23:03:53 2018 +0100

    drm/i915: Treat i915_reset_engine() as guilty until proven innocent

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP
  2019-01-21 22:21 ` [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
@ 2019-01-23  1:35   ` John Harrison
  0 siblings, 0 replies; 89+ messages in thread
From: John Harrison @ 2019-01-23  1:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 1/21/2019 14:21, Chris Wilson wrote:
> Supplement the per-engine HWSP with a per-timeline HWSP. That is a
> per-request pointer through which we can check a local seqno,
> abstracting away the presumption of a global seqno. In this first step,
> we point each request back into the engine's HWSP so everything
> continues to work with the global timeline.
>
> v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current
> HW value and that we are accessing it via i915_request merely as a
> convenience.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 16 ++++++----
>   drivers/gpu/drm/i915/i915_request.h | 45 ++++++++++++++++++++++++-----
>   drivers/gpu/drm/i915/intel_lrc.c    |  9 ++++--
>   3 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2721a356368f..d61e86c6a1d1 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -182,10 +182,11 @@ static void free_capture_list(struct i915_request *request)
>   static void __retire_engine_request(struct intel_engine_cs *engine,
>   				    struct i915_request *rq)
>   {
> -	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d:%d\n",
>   		  __func__, engine->name,
>   		  rq->fence.context, rq->fence.seqno,
>   		  rq->global_seqno,
> +		  hwsp_seqno(rq),
>   		  intel_engine_get_seqno(engine));
>   
>   	GEM_BUG_ON(!i915_request_completed(rq));
> @@ -244,10 +245,11 @@ static void i915_request_retire(struct i915_request *request)
>   {
>   	struct i915_gem_active *active, *next;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
>   		  request->engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  request->global_seqno,
> +		  hwsp_seqno(request),
>   		  intel_engine_get_seqno(request->engine));
>   
>   	lockdep_assert_held(&request->i915->drm.struct_mutex);
> @@ -307,10 +309,11 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	struct intel_ring *ring = rq->ring;
>   	struct i915_request *tmp;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
>   		  rq->engine->name,
>   		  rq->fence.context, rq->fence.seqno,
>   		  rq->global_seqno,
> +		  hwsp_seqno(rq),
>   		  intel_engine_get_seqno(rq->engine));
>   
>   	lockdep_assert_held(&rq->i915->drm.struct_mutex);
> @@ -355,10 +358,11 @@ void __i915_request_submit(struct i915_request *request)
>   	struct intel_engine_cs *engine = request->engine;
>   	u32 seqno;
>   
> -	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d:%d\n",
>   		  engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  engine->timeline.seqno + 1,
> +		  hwsp_seqno(request),
>   		  intel_engine_get_seqno(engine));
>   
>   	GEM_BUG_ON(!irqs_disabled());
> @@ -405,10 +409,11 @@ void __i915_request_unsubmit(struct i915_request *request)
>   {
>   	struct intel_engine_cs *engine = request->engine;
>   
> -	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d:%d\n",
>   		  engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  request->global_seqno,
> +		  hwsp_seqno(request),
>   		  intel_engine_get_seqno(engine));
>   
>   	GEM_BUG_ON(!irqs_disabled());
> @@ -616,6 +621,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq->ring = ce->ring;
>   	rq->timeline = ce->ring->timeline;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
> +	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
>   
>   	spin_lock_init(&rq->lock);
>   	dma_fence_init(&rq->fence,
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index c0f084ca4f29..ade010fe6e26 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -130,6 +130,13 @@ struct i915_request {
>   	struct i915_sched_node sched;
>   	struct i915_dependency dep;
>   
> +	/*
> +	 * A convenience pointer to the current breadcrumb value stored in
> +	 * the HW status page (or our timeline's local equivalent). The full
> +	 * path would be rq->hw_context->ring->timeline->hwsp_seqno.
> +	 */
> +	const u32 *hwsp_seqno;
> +
>   	/**
>   	 * GEM sequence number associated with this request on the
>   	 * global execution timeline. It is zero when the request is not
> @@ -285,11 +292,6 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
>   	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
>   }
>   
> -static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
> -					    u32 seqno);
> -static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
> -					      u32 seqno);
> -
>   /**
>    * Returns true if seq1 is later than seq2.
>    */
> @@ -298,6 +300,35 @@ static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
>   	return (s32)(seq1 - seq2) >= 0;
>   }
>   
> +static inline u32 __hwsp_seqno(const struct i915_request *rq)
> +{
> +	return READ_ONCE(*rq->hwsp_seqno);
> +}
> +
> +/**
> + * hwsp_seqno - the current breadcrumb value in the HW status page
> + * @rq: the request, to chase the relevant HW status page
> + *
> + * The emphasis in naming here is that hwsp_seqno() is not a property of the
> + * request, but an indication of the current HW state (associated with this
> + * request). Its value will change as the GPU executes more requests.
> + *
> + * Returns the current breadcrumb value in the associated HW status page (or
> + * the local timeline's equivalent) for this request. The request itself
> + * has the associated breadcrumb value of rq->fence.seqno, when the HW
> + * status page has that breadcrumb or later, this request is complete.
> + */
> +static inline u32 hwsp_seqno(const struct i915_request *rq)
> +{
> +	u32 seqno;
> +
> +	rcu_read_lock(); /* the HWSP may be freed at runtime */
> +	seqno = __hwsp_seqno(rq);
> +	rcu_read_unlock();
> +
> +	return seqno;
> +}
> +

Much better name and good comments too :).

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>   /**
>    * i915_request_started - check if the request has begun being executed
>    * @rq: the request
> @@ -315,14 +346,14 @@ static inline bool i915_request_started(const struct i915_request *rq)
>   	if (!seqno) /* not yet submitted to HW */
>   		return false;
>   
> -	return intel_engine_has_started(rq->engine, seqno);
> +	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
>   }
>   
>   static inline bool
>   __i915_request_completed(const struct i915_request *rq, u32 seqno)
>   {
>   	GEM_BUG_ON(!seqno);
> -	return intel_engine_has_completed(rq->engine, seqno) &&
> +	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
>   		seqno == i915_request_global_seqno(rq);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 464dd309fa99..e45c4e29c435 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -470,11 +470,12 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>   			desc = execlists_update_context(rq);
>   			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
>   
> -			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
> +			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
>   				  engine->name, n,
>   				  port[n].context_id, count,
>   				  rq->global_seqno,
>   				  rq->fence.context, rq->fence.seqno,
> +				  hwsp_seqno(rq),
>   				  intel_engine_get_seqno(engine),
>   				  rq_prio(rq));
>   		} else {
> @@ -767,11 +768,12 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
>   	while (num_ports-- && port_isset(port)) {
>   		struct i915_request *rq = port_request(port);
>   
> -		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
> +		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d:%d)\n",
>   			  rq->engine->name,
>   			  (unsigned int)(port - execlists->port),
>   			  rq->global_seqno,
>   			  rq->fence.context, rq->fence.seqno,
> +			  hwsp_seqno(rq),
>   			  intel_engine_get_seqno(rq->engine));
>   
>   		GEM_BUG_ON(!execlists->active);
> @@ -997,12 +999,13 @@ static void process_csb(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   
>   		rq = port_unpack(port, &count);
> -		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
> +		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
>   			  engine->name,
>   			  port->context_id, count,
>   			  rq ? rq->global_seqno : 0,
>   			  rq ? rq->fence.context : 0,
>   			  rq ? rq->fence.seqno : 0,
> +			  rq ? hwsp_seqno(rq) : 0,
>   			  intel_engine_get_seqno(engine),
>   			  rq ? rq_prio(rq) : 0);
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 08/34] drm/i915: Make all GPU resets atomic
  2019-01-22 22:19   ` John Harrison
  2019-01-22 22:27     ` Chris Wilson
@ 2019-01-23  8:52     ` Mika Kuoppala
  1 sibling, 0 replies; 89+ messages in thread
From: Mika Kuoppala @ 2019-01-23  8:52 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, intel-gfx

John Harrison <John.C.Harrison@Intel.com> writes:

> On 1/21/2019 14:20, Chris Wilson wrote:
>> In preparation for the next few commits, make resetting the GPU atomic.
>> Currently, we have prepared gen6+ for atomic resetting of individual
>> engines, but now there is a requirement to perform the whole device
>> level reset (just the register poking) from inside an atomic context.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reset.c | 50 +++++++++++++++++--------------
>>   1 file changed, 27 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
>> index 342d9ee42601..b9d0ea70361c 100644
>> --- a/drivers/gpu/drm/i915/i915_reset.c
>> +++ b/drivers/gpu/drm/i915/i915_reset.c
>> @@ -144,14 +144,14 @@ static int i915_do_reset(struct drm_i915_private *i915,
>>   
>>   	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
>>   	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
>> -	usleep_range(50, 200);
>> -	err = wait_for(i915_in_reset(pdev), 500);
>> +	udelay(50);
>> +	err = wait_for_atomic(i915_in_reset(pdev), 50);
> Is it known to be safe to reduce all of these time out values? Where did 
> the originally 500ms value come from? Is there any chance of getting 
> sporadic failures because 50ms is borderline in the worst case scenario? 
> It still sounds huge but an order of magnitude change in a timeout 
> always seems worrying!
>
>>   
>>   	/* Clear the reset request. */
>>   	pci_write_config_byte(pdev, I915_GDRST, 0);
>> -	usleep_range(50, 200);
>> +	udelay(50);
>>   	if (!err)
>> -		err = wait_for(!i915_in_reset(pdev), 500);
>> +		err = wait_for_atomic(!i915_in_reset(pdev), 50);
>>   
>>   	return err;
>>   }
>> @@ -171,7 +171,7 @@ static int g33_do_reset(struct drm_i915_private *i915,
>>   	struct pci_dev *pdev = i915->drm.pdev;
>>   
>>   	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
>> -	return wait_for(g4x_reset_complete(pdev), 500);
>> +	return wait_for_atomic(g4x_reset_complete(pdev), 50);
>>   }
>>   
>>   static int g4x_do_reset(struct drm_i915_private *dev_priv,
>> @@ -182,13 +182,13 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>>   	int ret;
>>   
>>   	/* WaVcpClkGateDisableForMediaReset:ctg,elk */
>> -	I915_WRITE(VDECCLK_GATE_D,
>> -		   I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
>> -	POSTING_READ(VDECCLK_GATE_D);
>> +	I915_WRITE_FW(VDECCLK_GATE_D,
>> +		      I915_READ(VDECCLK_GATE_D) | VCP_UNIT_CLOCK_GATE_DISABLE);
>> +	POSTING_READ_FW(VDECCLK_GATE_D);
>>   
>>   	pci_write_config_byte(pdev, I915_GDRST,
>>   			      GRDOM_MEDIA | GRDOM_RESET_ENABLE);
>> -	ret =  wait_for(g4x_reset_complete(pdev), 500);
>> +	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
>>   	if (ret) {
>>   		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
>>   		goto out;
>> @@ -196,7 +196,7 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>>   
>>   	pci_write_config_byte(pdev, I915_GDRST,
>>   			      GRDOM_RENDER | GRDOM_RESET_ENABLE);
>> -	ret =  wait_for(g4x_reset_complete(pdev), 500);
>> +	ret =  wait_for_atomic(g4x_reset_complete(pdev), 50);
>>   	if (ret) {
>>   		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
>>   		goto out;
>> @@ -205,9 +205,9 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
>>   out:
>>   	pci_write_config_byte(pdev, I915_GDRST, 0);
>>   
>> -	I915_WRITE(VDECCLK_GATE_D,
>> -		   I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
>> -	POSTING_READ(VDECCLK_GATE_D);
>> +	I915_WRITE_FW(VDECCLK_GATE_D,
>> +		      I915_READ(VDECCLK_GATE_D) & ~VCP_UNIT_CLOCK_GATE_DISABLE);
>> +	POSTING_READ_FW(VDECCLK_GATE_D);
>>   
>>   	return ret;
>>   }
>> @@ -218,27 +218,29 @@ static int ironlake_do_reset(struct drm_i915_private *dev_priv,
>>   {
>>   	int ret;
>>   
>> -	I915_WRITE(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
>> -	ret = intel_wait_for_register(dev_priv,
>> -				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
>> -				      500);
>> +	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_RENDER | ILK_GRDOM_RESET_ENABLE);
>> +	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
>> +					   ILK_GRDOM_RESET_ENABLE, 0,
>> +					   5000, 0,
>> +					   NULL);
> These two timeouts are now two orders of magnitude smaller? It was 500ms 
> but is now 5000us (=5ms)?

Agreed. I indirecty raised same concern on previous round of
review by saying that it would be nice if we had some statistics
from CI.

The original ballooning of these numbers, from the little
that is available on documentation, is the fact that
previously, it didn't do much harm to pick a large number
to be on safe side, so why not.

Now, it is a different game.

-Mika

>
> John.
>
>
>>   	if (ret) {
>>   		DRM_DEBUG_DRIVER("Wait for render reset failed\n");
>>   		goto out;
>>   	}
>>   
>> -	I915_WRITE(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
>> -	ret = intel_wait_for_register(dev_priv,
>> -				      ILK_GDSR, ILK_GRDOM_RESET_ENABLE, 0,
>> -				      500);
>> +	I915_WRITE_FW(ILK_GDSR, ILK_GRDOM_MEDIA | ILK_GRDOM_RESET_ENABLE);
>> +	ret = __intel_wait_for_register_fw(dev_priv, ILK_GDSR,
>> +					   ILK_GRDOM_RESET_ENABLE, 0,
>> +					   5000, 0,
>> +					   NULL);
>>   	if (ret) {
>>   		DRM_DEBUG_DRIVER("Wait for media reset failed\n");
>>   		goto out;
>>   	}
>>   
>>   out:
>> -	I915_WRITE(ILK_GDSR, 0);
>> -	POSTING_READ(ILK_GDSR);
>> +	I915_WRITE_FW(ILK_GDSR, 0);
>> +	POSTING_READ_FW(ILK_GDSR);
>>   	return ret;
>>   }
>>   
>> @@ -572,7 +574,9 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>>   		ret = -ENODEV;
>>   		if (reset) {
>>   			GEM_TRACE("engine_mask=%x\n", engine_mask);
>> +			preempt_disable();
>>   			ret = reset(i915, engine_mask, retry);
>> +			preempt_enable();
>>   		}
>>   		if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
>>   			break;
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-21 22:21 ` [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
@ 2019-01-23  9:21   ` Tvrtko Ursulin
  2019-01-23 10:01     ` Chris Wilson
  2019-01-23 11:41   ` [PATCH] " Chris Wilson
  1 sibling, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-23  9:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
> thundering i915_wait_request herd"), the issue of handling multiple
> clients waiting in parallel was brought to our attention. The
> requirement was that every client should be woken immediately upon its
> request being signaled, without incurring any cpu overhead.
> 
> To handle certain fragility of our hw meant that we could not do a
> simple check inside the irq handler (some generations required almost
> unbounded delays before we could be sure of seqno coherency) and so
> request completion checking required delegation.
> 
> Before commit 688e6c725816, the solution was simple. Every client waking
> on a request would be woken on every interrupt and each would do a
> heavyweight check to see if their request was complete. Commit
> 688e6c725816 introduced an rbtree so that only the earliest waiter on
> the global timeline would woken, and would wake the next and so on.
> (Along with various complications to handle requests being reordered
> along the global timeline, and also a requirement for kthread to provide
> a delegate for fence signaling that had no process context.)
> 
> The global rbtree depends on knowing the execution timeline (and global
> seqno). Without knowing that order, we must instead check all contexts
> queued to the HW to see which may have advanced. We trim that list by
> only checking queued contexts that are being waited on, but still we
> keep a list of all active contexts and their active signalers that we
> inspect from inside the irq handler. By moving the waiters onto the fence
> signal list, we can combine the client wakeup with the dma_fence
> signaling (a dramatic reduction in complexity, but does require the HW
> being coherent, the seqno must be visible from the cpu before the
> interrupt is raised - we keep a timer backup just in case).
> 
> Having previously fixed all the issues with irq-seqno serialisation (by
> inserting delays onto the GPU after each request instead of random delays
> on the CPU after each interrupt), we can rely on the seqno state to
> perfom direct wakeups from the interrupt handler. This allows us to
> preserve our single context switch behaviour of the current routine,
> with the only downside that we lose the RT priority sorting of wakeups.
> In general, direct wakeup latency of multiple clients is about the same
> (about 10% better in most cases) with a reduction in total CPU time spent
> in the waiter (about 20-50% depending on gen). Average herd behaviour is
> improved, but at the cost of not delegating wakeups on task_prio.
> 
> References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c           |  28 +-
>   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  73 --
>   drivers/gpu/drm/i915/i915_gpu_error.h         |   8 -
>   drivers/gpu/drm/i915/i915_irq.c               |  87 +-
>   drivers/gpu/drm/i915/i915_request.c           | 128 +--
>   drivers/gpu/drm/i915/i915_request.h           |  22 +-
>   drivers/gpu/drm/i915/i915_reset.c             |  13 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c      | 797 +++++-------------
>   drivers/gpu/drm/i915/intel_engine_cs.c        |  34 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c       |   6 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |  95 +--
>   .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
>   drivers/gpu/drm/i915/selftests/i915_request.c | 398 +++++++++
>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   5 -
>   .../drm/i915/selftests/intel_breadcrumbs.c    | 470 -----------
>   .../gpu/drm/i915/selftests/intel_hangcheck.c  |   2 +-
>   drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  54 ++
>   drivers/gpu/drm/i915/selftests/lib_sw_fence.h |   3 +
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  16 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.h  |   6 -
>   21 files changed, 774 insertions(+), 1477 deletions(-)
>   delete mode 100644 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2a6e4044f25b..d7764e62e9b4 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1315,29 +1315,16 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
>   
>   	for_each_engine(engine, dev_priv, id) {
> -		struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -		struct rb_node *rb;
> -
>   		seq_printf(m, "%s:\n", engine->name);
>   		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
>   			   engine->hangcheck.seqno, seqno[id],
>   			   intel_engine_last_submit(engine),
>   			   jiffies_to_msecs(jiffies -
>   					    engine->hangcheck.action_timestamp));
> -		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
> -			   yesno(intel_engine_has_waiter(engine)),
> +		seq_printf(m, "\tfake irq active? %s\n",
>   			   yesno(test_bit(engine->id,
>   					  &dev_priv->gpu_error.missed_irq_rings)));
>   
> -		spin_lock_irq(&b->rb_lock);
> -		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -			struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -			seq_printf(m, "\t%s [%d] waiting for %x\n",
> -				   w->tsk->comm, w->tsk->pid, w->seqno);
> -		}
> -		spin_unlock_irq(&b->rb_lock);
> -
>   		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
>   			   (long long)engine->hangcheck.acthd,
>   			   (long long)acthd[id]);
> @@ -2021,18 +2008,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>   
> -static int count_irq_waiters(struct drm_i915_private *i915)
> -{
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> -	int count = 0;
> -
> -	for_each_engine(engine, i915, id)
> -		count += intel_engine_has_waiter(engine);
> -
> -	return count;
> -}
> -
>   static const char *rps_power_to_str(unsigned int power)
>   {
>   	static const char * const strings[] = {
> @@ -2072,7 +2047,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
>   	seq_printf(m, "GPU busy? %s [%d requests]\n",
>   		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
> -	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
>   	seq_printf(m, "Boosts outstanding? %d\n",
>   		   atomic_read(&rps->num_waiters));
>   	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive));
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 47d82ce7ba6a..9e11b31acd01 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -164,6 +164,8 @@ struct i915_gem_context {
>   	struct intel_context {
>   		struct i915_gem_context *gem_context;
>   		struct intel_engine_cs *active;
> +		struct list_head signal_link;
> +		struct list_head signals;
>   		struct i915_vma *state;
>   		struct intel_ring *ring;
>   		u32 *lrc_reg_state;
> @@ -370,6 +372,9 @@ intel_context_init(struct intel_context *ce,
>   		   struct intel_engine_cs *engine)
>   {
>   	ce->gem_context = ctx;
> +
> +	INIT_LIST_HEAD(&ce->signal_link);
> +	INIT_LIST_HEAD(&ce->signals);
>   }
>   
>   #endif /* !__I915_GEM_CONTEXT_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 96d1d634a29d..825572127029 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -530,7 +530,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>   	}
>   	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
>   	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
> -	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
>   	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
>   	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
>   	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
> @@ -804,21 +803,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>   						    error->epoch);
>   		}
>   
> -		if (IS_ERR(ee->waiters)) {
> -			err_printf(m, "%s --- ? waiters [unable to acquire spinlock]\n",
> -				   m->i915->engine[i]->name);
> -		} else if (ee->num_waiters) {
> -			err_printf(m, "%s --- %d waiters\n",
> -				   m->i915->engine[i]->name,
> -				   ee->num_waiters);
> -			for (j = 0; j < ee->num_waiters; j++) {
> -				err_printf(m, " seqno 0x%08x for %s [%d]\n",
> -					   ee->waiters[j].seqno,
> -					   ee->waiters[j].comm,
> -					   ee->waiters[j].pid);
> -			}
> -		}
> -
>   		print_error_obj(m, m->i915->engine[i],
>   				"ringbuffer", ee->ringbuffer);
>   
> @@ -1000,8 +984,6 @@ void __i915_gpu_state_free(struct kref *error_ref)
>   		i915_error_object_free(ee->wa_ctx);
>   
>   		kfree(ee->requests);
> -		if (!IS_ERR_OR_NULL(ee->waiters))
> -			kfree(ee->waiters);
>   	}
>   
>   	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
> @@ -1203,59 +1185,6 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
>   			I915_READ(RING_SYNC_2(engine->mmio_base));
>   }
>   
> -static void error_record_engine_waiters(struct intel_engine_cs *engine,
> -					struct drm_i915_error_engine *ee)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct drm_i915_error_waiter *waiter;
> -	struct rb_node *rb;
> -	int count;
> -
> -	ee->num_waiters = 0;
> -	ee->waiters = NULL;
> -
> -	if (RB_EMPTY_ROOT(&b->waiters))
> -		return;
> -
> -	if (!spin_trylock_irq(&b->rb_lock)) {
> -		ee->waiters = ERR_PTR(-EDEADLK);
> -		return;
> -	}
> -
> -	count = 0;
> -	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> -		count++;
> -	spin_unlock_irq(&b->rb_lock);
> -
> -	waiter = NULL;
> -	if (count)
> -		waiter = kmalloc_array(count,
> -				       sizeof(struct drm_i915_error_waiter),
> -				       GFP_ATOMIC);
> -	if (!waiter)
> -		return;
> -
> -	if (!spin_trylock_irq(&b->rb_lock)) {
> -		kfree(waiter);
> -		ee->waiters = ERR_PTR(-EDEADLK);
> -		return;
> -	}
> -
> -	ee->waiters = waiter;
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -		strcpy(waiter->comm, w->tsk->comm);
> -		waiter->pid = w->tsk->pid;
> -		waiter->seqno = w->seqno;
> -		waiter++;
> -
> -		if (++ee->num_waiters == count)
> -			break;
> -	}
> -	spin_unlock_irq(&b->rb_lock);
> -}

Capturing context waiters is not interesting for error state?

> -
>   static void error_record_engine_registers(struct i915_gpu_state *error,
>   					  struct intel_engine_cs *engine,
>   					  struct drm_i915_error_engine *ee)
> @@ -1291,7 +1220,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
>   
>   	intel_engine_get_instdone(engine, &ee->instdone);
>   
> -	ee->waiting = intel_engine_has_waiter(engine);
>   	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ee->acthd = intel_engine_get_active_head(engine);
>   	ee->seqno = intel_engine_get_seqno(engine);
> @@ -1540,7 +1468,6 @@ static void gem_record_rings(struct i915_gpu_state *error)
>   		ee->engine_id = i;
>   
>   		error_record_engine_registers(error, engine, ee);
> -		error_record_engine_waiters(engine, ee);
>   		error_record_engine_execlists(engine, ee);
>   
>   		request = i915_gem_find_active_request(engine);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 231173786eae..0e184712cbcc 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -82,8 +82,6 @@ struct i915_gpu_state {
>   		int engine_id;
>   		/* Software tracked state */
>   		bool idle;
> -		bool waiting;
> -		int num_waiters;
>   		unsigned long hangcheck_timestamp;
>   		struct i915_address_space *vm;
>   		int num_requests;
> @@ -159,12 +157,6 @@ struct i915_gpu_state {
>   		} *requests, execlist[EXECLIST_MAX_PORTS];
>   		unsigned int num_ports;
>   
> -		struct drm_i915_error_waiter {
> -			char comm[TASK_COMM_LEN];
> -			pid_t pid;
> -			u32 seqno;
> -		} *waiters;
> -
>   		struct {
>   			u32 gfx_mode;
>   			union {
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 71d11dc2c235..7669b1caeef0 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -28,9 +28,10 @@
>   
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> -#include <linux/sysrq.h>
> -#include <linux/slab.h>
>   #include <linux/circ_buf.h>
> +#include <linux/slab.h>
> +#include <linux/sysrq.h>
> +
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
>   #include "i915_trace.h"
> @@ -1151,66 +1152,6 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>   	return;
>   }
>   
> -static void notify_ring(struct intel_engine_cs *engine)
> -{
> -	const u32 seqno = intel_engine_get_seqno(engine);
> -	struct i915_request *rq = NULL;
> -	struct task_struct *tsk = NULL;
> -	struct intel_wait *wait;
> -
> -	if (unlikely(!engine->breadcrumbs.irq_armed))
> -		return;
> -
> -	rcu_read_lock();
> -
> -	spin_lock(&engine->breadcrumbs.irq_lock);
> -	wait = engine->breadcrumbs.irq_wait;
> -	if (wait) {
> -		/*
> -		 * We use a callback from the dma-fence to submit
> -		 * requests after waiting on our own requests. To
> -		 * ensure minimum delay in queuing the next request to
> -		 * hardware, signal the fence now rather than wait for
> -		 * the signaler to be woken up. We still wake up the
> -		 * waiter in order to handle the irq-seqno coherency
> -		 * issues (we may receive the interrupt before the
> -		 * seqno is written, see __i915_request_irq_complete())
> -		 * and to handle coalescing of multiple seqno updates
> -		 * and many waiters.
> -		 */
> -		if (i915_seqno_passed(seqno, wait->seqno)) {
> -			struct i915_request *waiter = wait->request;
> -
> -			if (waiter &&
> -			    !i915_request_signaled(waiter) &&
> -			    intel_wait_check_request(wait, waiter))
> -				rq = i915_request_get(waiter);
> -
> -			tsk = wait->tsk;
> -		}
> -
> -		engine->breadcrumbs.irq_count++;
> -	} else {
> -		if (engine->breadcrumbs.irq_armed)
> -			__intel_engine_disarm_breadcrumbs(engine);
> -	}
> -	spin_unlock(&engine->breadcrumbs.irq_lock);
> -
> -	if (rq) {
> -		spin_lock(&rq->lock);
> -		dma_fence_signal_locked(&rq->fence);
> -		GEM_BUG_ON(!i915_request_completed(rq));
> -		spin_unlock(&rq->lock);
> -
> -		i915_request_put(rq);
> -	}
> -
> -	if (tsk && tsk->state & TASK_NORMAL)
> -		wake_up_process(tsk);
> -
> -	rcu_read_unlock();
> -}
> -
>   static void vlv_c0_read(struct drm_i915_private *dev_priv,
>   			struct intel_rps_ei *ei)
>   {
> @@ -1455,20 +1396,20 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
>   			       u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[RCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[VCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   }
>   
>   static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
>   			       u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[RCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[VCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[BCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[BCS]);
>   
>   	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>   		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -1488,7 +1429,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   		tasklet = true;
>   
>   	if (iir & GT_RENDER_USER_INTERRUPT) {
> -		notify_ring(engine);
> +		intel_engine_breadcrumbs_irq(engine);
>   		tasklet |= USES_GUC_SUBMISSION(engine->i915);
>   	}
>   
> @@ -1834,7 +1775,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
>   
>   	if (HAS_VEBOX(dev_priv)) {
>   		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[VECS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[VECS]);
>   
>   		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
>   			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
> @@ -4257,7 +4198,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>   		I915_WRITE16(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4365,7 +4306,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>   		I915_WRITE(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4510,10 +4451,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>   		I915_WRITE(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_BSD_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[VCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0a8a2a1bf55d..cca437ac8a7e 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence)
>   
>   static bool i915_fence_enable_signaling(struct dma_fence *fence)
>   {
> -	return intel_engine_enable_signaling(to_request(fence), true);
> +	return intel_engine_enable_signaling(to_request(fence));
>   }
>   
>   static signed long i915_fence_wait(struct dma_fence *fence,
> @@ -378,9 +378,11 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>   	request->global_seqno = seqno;
> -	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
> -		intel_engine_enable_signaling(request, false);
> +	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
> +	    !intel_engine_enable_signaling(request))
> +		intel_engine_queue_breadcrumbs(engine);

Why queue it manually at this point? Could warrant a comment.

>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> @@ -390,8 +392,6 @@ void __i915_request_submit(struct i915_request *request)
>   	move_to_timeline(request, &engine->timeline);
>   
>   	trace_i915_request_execute(request);
> -
> -	wake_up_all(&request->execute);
>   }
>   
>   void i915_request_submit(struct i915_request *request)
> @@ -435,6 +435,7 @@ void __i915_request_unsubmit(struct i915_request *request)
>   	request->global_seqno = 0;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
>   		intel_engine_cancel_signaling(request);
> +	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>   	spin_unlock(&request->lock);
>   
>   	/* Transfer back from the global per-engine timeline to per-context */
> @@ -634,13 +635,11 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
> -	init_waitqueue_head(&rq->execute);
>   
>   	i915_sched_node_init(&rq->sched);
>   
>   	/* No zalloc, must clear what we need by hand */
>   	rq->global_seqno = 0;
> -	rq->signaling.wait.seqno = 0;
>   	rq->file_priv = NULL;
>   	rq->batch = NULL;
>   	rq->capture_list = NULL;
> @@ -1031,13 +1030,10 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>   	return this_cpu != cpu;
>   }
>   
> -static bool __i915_spin_request(const struct i915_request *rq,
> -				u32 seqno, int state, unsigned long timeout_us)
> +static bool __i915_spin_request(const struct i915_request * const rq,
> +				int state, unsigned long timeout_us)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> -	unsigned int irq, cpu;
> -
> -	GEM_BUG_ON(!seqno);
> +	unsigned int cpu;
>   
>   	/*
>   	 * Only wait for the request if we know it is likely to complete.
> @@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	 * it is a fair assumption that it will not complete within our
>   	 * relatively short timeout.
>   	 */
> -	if (!intel_engine_has_started(engine, seqno))
> +	if (!i915_request_started(rq))

Might be more wasteful the more preemption is going on. Probably not the 
most important thing to try a fix straight away, but something to put 
down on some to do list.

Above comment is also outdated now (engine order).

>   		return false;
>   
>   	/*
> @@ -1064,20 +1060,10 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	 * takes to sleep on a request, on the order of a microsecond.
>   	 */
>   
> -	irq = READ_ONCE(engine->breadcrumbs.irq_count);
>   	timeout_us += local_clock_us(&cpu);
>   	do {
> -		if (intel_engine_has_completed(engine, seqno))
> -			return seqno == i915_request_global_seqno(rq);
> -
> -		/*
> -		 * Seqno are meant to be ordered *before* the interrupt. If
> -		 * we see an interrupt without a corresponding seqno advance,
> -		 * assume we won't see one in the near future but require
> -		 * the engine->seqno_barrier() to fixup coherency.
> -		 */
> -		if (READ_ONCE(engine->breadcrumbs.irq_count) != irq)
> -			break;
> +		if (i915_request_completed(rq))
> +			return true;
>   
>   		if (signal_pending_state(state, current))
>   			break;
> @@ -1091,6 +1077,18 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	return false;
>   }
>   
> +struct request_wait {
> +	struct dma_fence_cb cb;
> +	struct task_struct *tsk;
> +};
> +
> +static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
> +{
> +	struct request_wait *wait = container_of(cb, typeof(*wait), cb);
> +
> +	wake_up_process(wait->tsk);
> +}
> +
>   /**
>    * i915_request_wait - wait until execution of request has finished
>    * @rq: the request to wait upon
> @@ -1116,8 +1114,7 @@ long i915_request_wait(struct i915_request *rq,
>   {
>   	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
>   		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> -	DEFINE_WAIT_FUNC(exec, default_wake_function);
> -	struct intel_wait wait;
> +	struct request_wait wait;
>   
>   	might_sleep();
>   	GEM_BUG_ON(timeout < 0);
> @@ -1129,47 +1126,24 @@ long i915_request_wait(struct i915_request *rq,
>   		return -ETIME;
>   
>   	trace_i915_request_wait_begin(rq, flags);
> -	add_wait_queue(&rq->execute, &exec);
> -	intel_wait_init(&wait);
> -	if (flags & I915_WAIT_PRIORITY)
> -		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
> -
> -restart:
> -	do {
> -		set_current_state(state);
> -		if (intel_wait_update_request(&wait, rq))
> -			break;
> -
> -		if (signal_pending_state(state, current)) {
> -			timeout = -ERESTARTSYS;
> -			goto complete;
> -		}
>   
> -		if (!timeout) {
> -			timeout = -ETIME;
> -			goto complete;
> -		}
> +	/* Optimistic short spin before touching IRQs */
> +	if (__i915_spin_request(rq, state, 5))
> +		goto out;
>   
> -		timeout = io_schedule_timeout(timeout);
> -	} while (1);
> +	if (flags & I915_WAIT_PRIORITY)
> +		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
>   
> -	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> -	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
> +	wait.tsk = current;
> +	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
> +		goto out;
>   
> -	/* Optimistic short spin before touching IRQs */
> -	if (__i915_spin_request(rq, wait.seqno, state, 5))
> -		goto complete;
> +	for (;;) {
> +		set_current_state(state);
>   
> -	set_current_state(state);
> -	if (intel_engine_add_wait(rq->engine, &wait))
> -		/*
> -		 * In order to check that we haven't missed the interrupt
> -		 * as we enabled it, we need to kick ourselves to do a
> -		 * coherent check on the seqno before we sleep.
> -		 */
> -		goto wakeup;
> +		if (i915_request_completed(rq))
> +			break;
>   
> -	for (;;) {
>   		if (signal_pending_state(state, current)) {
>   			timeout = -ERESTARTSYS;
>   			break;
> @@ -1181,33 +1155,13 @@ long i915_request_wait(struct i915_request *rq,
>   		}
>   
>   		timeout = io_schedule_timeout(timeout);
> -
> -		if (intel_wait_complete(&wait) &&
> -		    intel_wait_check_request(&wait, rq))
> -			break;
> -
> -		set_current_state(state);
> -
> -wakeup:
> -		if (i915_request_completed(rq))
> -			break;
> -
> -		/* Only spin if we know the GPU is processing this request */
> -		if (__i915_spin_request(rq, wait.seqno, state, 2))
> -			break;
> -
> -		if (!intel_wait_check_request(&wait, rq)) {
> -			intel_engine_remove_wait(rq->engine, &wait);
> -			goto restart;
> -		}
>   	}
> -
> -	intel_engine_remove_wait(rq->engine, &wait);
> -complete:
>   	__set_current_state(TASK_RUNNING);
> -	remove_wait_queue(&rq->execute, &exec);
> -	trace_i915_request_wait_end(rq);
>   
> +	dma_fence_remove_callback(&rq->fence, &wait.cb);
> +
> +out:
> +	trace_i915_request_wait_end(rq);
>   	return timeout;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 340d6216791c..8f78ac97b8d6 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -38,23 +38,16 @@ struct drm_i915_gem_object;
>   struct i915_request;
>   struct i915_timeline;
>   
> -struct intel_wait {
> -	struct rb_node node;
> -	struct task_struct *tsk;
> -	struct i915_request *request;
> -	u32 seqno;
> -};
> -
> -struct intel_signal_node {
> -	struct intel_wait wait;
> -	struct list_head link;
> -};
> -
>   struct i915_capture_list {
>   	struct i915_capture_list *next;
>   	struct i915_vma *vma;
>   };
>   
> +enum {
> +	I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
> +	I915_FENCE_FLAG_SIGNAL,

Describe in comments what these mean please.

> +};
> +
>   /**
>    * Request queue structure.
>    *
> @@ -97,7 +90,7 @@ struct i915_request {
>   	struct intel_context *hw_context;
>   	struct intel_ring *ring;
>   	struct i915_timeline *timeline;
> -	struct intel_signal_node signaling;
> +	struct list_head signal_link;
>   
>   	/*
>   	 * The rcu epoch of when this request was allocated. Used to judiciously
> @@ -116,7 +109,6 @@ struct i915_request {
>   	 */
>   	struct i915_sw_fence submit;
>   	wait_queue_entry_t submitq;
> -	wait_queue_head_t execute;
>   
>   	/*
>   	 * A list of everyone we wait upon, and everyone who waits upon us.
> @@ -255,7 +247,7 @@ i915_request_put(struct i915_request *rq)
>    * that it has passed the global seqno and the global seqno is unchanged
>    * after the read, it is indeed complete).
>    */
> -static u32
> +static inline u32
>   i915_request_global_seqno(const struct i915_request *request)
>   {
>   	return READ_ONCE(request->global_seqno);
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 9b9169508139..d7d2840fcaa5 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -747,18 +747,19 @@ static void reset_restart(struct drm_i915_private *i915)
>   
>   static void nop_submit_request(struct i915_request *request)
>   {
> +	struct intel_engine_cs *engine = request->engine;
>   	unsigned long flags;
>   
>   	GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
> -		  request->engine->name,
> -		  request->fence.context, request->fence.seqno);
> +		  engine->name, request->fence.context, request->fence.seqno);
>   	dma_fence_set_error(&request->fence, -EIO);
>   
> -	spin_lock_irqsave(&request->engine->timeline.lock, flags);
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   	__i915_request_submit(request);
>   	i915_request_mark_complete(request);
> -	intel_engine_write_global_seqno(request->engine, request->global_seqno);
> -	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	intel_engine_queue_breadcrumbs(engine);
>   }
>   
>   void i915_gem_set_wedged(struct drm_i915_private *i915)
> @@ -813,7 +814,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>   
>   	for_each_engine(engine, i915, id) {
>   		reset_finish_engine(engine);
> -		intel_engine_wakeup(engine);
> +		intel_engine_signal_breadcrumbs(engine);
>   	}
>   
>   	smp_mb__before_atomic();
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index b58915b8708b..faeb0083b561 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -29,48 +29,132 @@
>   
>   #define task_asleep(tsk) ((tsk)->state & TASK_NORMAL && !(tsk)->on_rq)
>   
> -static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b)
> +static void irq_enable(struct intel_engine_cs *engine)
>   {
> -	struct intel_wait *wait;
> -	unsigned int result = 0;
> +	if (!engine->irq_enable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->i915->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->i915->irq_lock);
> +}
> +
> +static void irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->i915->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->i915->irq_lock);
> +}
>   
> +static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
> +{
>   	lockdep_assert_held(&b->irq_lock);
>   
> -	wait = b->irq_wait;
> -	if (wait) {
> -		/*
> -		 * N.B. Since task_asleep() and ttwu are not atomic, the
> -		 * waiter may actually go to sleep after the check, causing
> -		 * us to suppress a valid wakeup. We prefer to reduce the
> -		 * number of false positive missed_breadcrumb() warnings
> -		 * at the expense of a few false negatives, as it it easy
> -		 * to trigger a false positive under heavy load. Enough
> -		 * signal should remain from genuine missed_breadcrumb()
> -		 * for us to detect in CI.
> -		 */
> -		bool was_asleep = task_asleep(wait->tsk);
> -
> -		result = ENGINE_WAKEUP_WAITER;
> -		if (wake_up_process(wait->tsk) && was_asleep)
> -			result |= ENGINE_WAKEUP_ASLEEP;
> -	}
> +	GEM_BUG_ON(!b->irq_enabled);
> +	if (!--b->irq_enabled)
> +		irq_disable(container_of(b,
> +					 struct intel_engine_cs,
> +					 breadcrumbs));
>   
> -	return result;
> +	b->irq_armed = false;
>   }
>   
> -unsigned int intel_engine_wakeup(struct intel_engine_cs *engine)
> +void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned long flags;
> -	unsigned int result;
>   
> -	spin_lock_irqsave(&b->irq_lock, flags);
> -	result = __intel_breadcrumbs_wakeup(b);
> -	spin_unlock_irqrestore(&b->irq_lock, flags);
> +	if (!b->irq_armed)
> +		return;
> +
> +	spin_lock_irq(&b->irq_lock);
> +	if (b->irq_armed)
> +		__intel_breadcrumbs_disarm_irq(b);
> +	spin_unlock_irq(&b->irq_lock);
> +}
> +
> +static inline bool __request_completed(const struct i915_request *rq)
> +{
> +	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
> +}
> +
> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;

How can you afford to have this per engine? I guess I might figure out 
later in the patch/series.

> +	struct intel_context *ce, *cn;
> +	struct i915_request *rq, *rn;
> +	LIST_HEAD(signal);
> +
> +	spin_lock(&b->irq_lock);
> +
> +	b->irq_fired = true;
> +	if (b->irq_armed && list_empty(&b->signalers))
> +		__intel_breadcrumbs_disarm_irq(b);
> +
> +	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
> +		GEM_BUG_ON(list_empty(&ce->signals));
> +
> +		list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
> +			if (!__request_completed(rq))
> +				break;
> +
> +			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
> +					     &rq->fence.flags));
> +			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +
> +			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +				     &rq->fence.flags))
> +				continue;

Request has been signalled already, but is still on this list? Who will 
then remove it from this list)?

> +
> +			/*
> +			 * Queue for execution after dropping the signaling
> +			 * spinlock as the callback chain may end adding
> +			 * more signalers to the same context or engine.
> +			 */
> +			i915_request_get(rq);
> +			list_add_tail(&rq->signal_link, &signal);

Shouldn't this be list_move_tail since rq is already on the ce->signals 
list?

> +		}
> +
> +		if (!list_is_first(&rq->signal_link, &ce->signals)) {

Can't rq be NULL here - if only completed requests are on the list and 
so the iterator reached the end?

> +			__list_del_many(&ce->signals, &rq->signal_link);


This block could use a comment - I at least failed to quickly understand 
it. How can we be unlinking entries, if they have already been unlinked?

> +			if (&ce->signals == &rq->signal_link)
> +				list_del_init(&ce->signal_link);

This is another list_empty hack like from another day? Please put a 
comment if you don't want it to be self documenting.

> +		}
> +	}
> +
> +	spin_unlock(&b->irq_lock);
> +
> +	list_for_each_entry_safe(rq, rn, &signal, signal_link) {
> +		dma_fence_signal(&rq->fence);
> +		i915_request_put(rq);
> +	}
> +
> +	return !list_empty(&signal);
> +}
> +
> +bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	bool result;
> +
> +	local_irq_disable();
> +	result = intel_engine_breadcrumbs_irq(engine);
> +	local_irq_enable();
>   
>   	return result;
>   }
>   
> +static void signal_irq_work(struct irq_work *work)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(work, typeof(*engine), breadcrumbs.irq_work);
> +
> +	intel_engine_breadcrumbs_irq(engine);
> +}
> +
>   static unsigned long wait_timeout(void)
>   {
>   	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> @@ -94,19 +178,15 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
>   	struct intel_engine_cs *engine =
>   		from_timer(engine, t, breadcrumbs.hangcheck);
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned int irq_count;
>   
>   	if (!b->irq_armed)
>   		return;
>   
> -	irq_count = READ_ONCE(b->irq_count);
> -	if (b->hangcheck_interrupts != irq_count) {
> -		b->hangcheck_interrupts = irq_count;
> -		mod_timer(&b->hangcheck, wait_timeout());
> -		return;
> -	}
> +	if (b->irq_fired)
> +		goto rearm;
>   
> -	/* We keep the hangcheck timer alive until we disarm the irq, even
> +	/*
> +	 * We keep the hangcheck timer alive until we disarm the irq, even
>   	 * if there are no waiters at present.
>   	 *
>   	 * If the waiter was currently running, assume it hasn't had a chance
> @@ -118,10 +198,13 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
>   	 * but we still have a waiter. Assuming all batches complete within
>   	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
>   	 */
> -	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
> +	synchronize_hardirq(engine->i915->drm.irq);
> +	if (intel_engine_signal_breadcrumbs(engine)) {
>   		missed_breadcrumb(engine);
>   		mod_timer(&b->fake_irq, jiffies + 1);
>   	} else {
> +rearm:
> +		b->irq_fired = false;
>   		mod_timer(&b->hangcheck, wait_timeout());
>   	}
>   }
> @@ -140,11 +223,7 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
>   	 * oldest waiter to do the coherent seqno check.
>   	 */
>   
> -	spin_lock_irq(&b->irq_lock);
> -	if (b->irq_armed && !__intel_breadcrumbs_wakeup(b))
> -		__intel_engine_disarm_breadcrumbs(engine);
> -	spin_unlock_irq(&b->irq_lock);
> -	if (!b->irq_armed)
> +	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
>   		return;
>   
>   	/* If the user has disabled the fake-irq, restore the hangchecking */
> @@ -156,43 +235,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
>   	mod_timer(&b->fake_irq, jiffies + 1);
>   }
>   
> -static void irq_enable(struct intel_engine_cs *engine)
> -{
> -	if (!engine->irq_enable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->i915->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->i915->irq_lock);
> -}
> -
> -static void irq_disable(struct intel_engine_cs *engine)
> -{
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->i915->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->i915->irq_lock);
> -}
> -
> -void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	lockdep_assert_held(&b->irq_lock);
> -	GEM_BUG_ON(b->irq_wait);
> -	GEM_BUG_ON(!b->irq_armed);
> -
> -	GEM_BUG_ON(!b->irq_enabled);
> -	if (!--b->irq_enabled)
> -		irq_disable(engine);
> -
> -	b->irq_armed = false;
> -}
> -
>   void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> @@ -215,40 +257,6 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   	spin_unlock_irq(&b->irq_lock);
>   }
>   
> -void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct intel_wait *wait, *n;
> -
> -	if (!b->irq_armed)
> -		return;
> -
> -	/*
> -	 * We only disarm the irq when we are idle (all requests completed),
> -	 * so if the bottom-half remains asleep, it missed the request
> -	 * completion.
> -	 */
> -	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
> -		missed_breadcrumb(engine);
> -
> -	spin_lock_irq(&b->rb_lock);
> -
> -	spin_lock(&b->irq_lock);
> -	b->irq_wait = NULL;
> -	if (b->irq_armed)
> -		__intel_engine_disarm_breadcrumbs(engine);
> -	spin_unlock(&b->irq_lock);
> -
> -	rbtree_postorder_for_each_entry_safe(wait, n, &b->waiters, node) {
> -		GEM_BUG_ON(!intel_engine_signaled(engine, wait->seqno));
> -		RB_CLEAR_NODE(&wait->node);
> -		wake_up_process(wait->tsk);
> -	}
> -	b->waiters = RB_ROOT;
> -
> -	spin_unlock_irq(&b->rb_lock);
> -}
> -
>   static bool use_fake_irq(const struct intel_breadcrumbs *b)
>   {
>   	const struct intel_engine_cs *engine =
> @@ -264,7 +272,7 @@ static bool use_fake_irq(const struct intel_breadcrumbs *b)
>   	 * engine->seqno_barrier(), a timing error that should be transient
>   	 * and unlikely to reoccur.
>   	 */
> -	return READ_ONCE(b->irq_count) == b->hangcheck_interrupts;
> +	return !b->irq_fired;
>   }
>   
>   static void enable_fake_irq(struct intel_breadcrumbs *b)
> @@ -276,7 +284,7 @@ static void enable_fake_irq(struct intel_breadcrumbs *b)
>   		mod_timer(&b->hangcheck, wait_timeout());
>   }
>   
> -static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
> +static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(b, struct intel_engine_cs, breadcrumbs);
> @@ -315,536 +323,135 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>   	return enabled;
>   }
>   
> -static inline struct intel_wait *to_wait(struct rb_node *node)
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	return rb_entry(node, struct intel_wait, node);
> -}
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   
> -static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
> -					      struct intel_wait *wait)
> -{
> -	lockdep_assert_held(&b->rb_lock);
> -	GEM_BUG_ON(b->irq_wait == wait);
> +	spin_lock_init(&b->irq_lock);
> +	INIT_LIST_HEAD(&b->signalers);
>   
> -	/*
> -	 * This request is completed, so remove it from the tree, mark it as
> -	 * complete, and *then* wake up the associated task. N.B. when the
> -	 * task wakes up, it will find the empty rb_node, discern that it
> -	 * has already been removed from the tree and skip the serialisation
> -	 * of the b->rb_lock and b->irq_lock. This means that the destruction
> -	 * of the intel_wait is not serialised with the interrupt handler
> -	 * by the waiter - it must instead be serialised by the caller.
> -	 */
> -	rb_erase(&wait->node, &b->waiters);
> -	RB_CLEAR_NODE(&wait->node);
> +	init_irq_work(&b->irq_work, signal_irq_work);
>   
> -	if (wait->tsk->state != TASK_RUNNING)
> -		wake_up_process(wait->tsk); /* implicit smp_wmb() */
> +	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
> +	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
>   }
>   
> -static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
> -					    struct rb_node *next)
> +static void cancel_fake_irq(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   
> -	spin_lock(&b->irq_lock);
> -	GEM_BUG_ON(!b->irq_armed);
> -	GEM_BUG_ON(!b->irq_wait);
> -	b->irq_wait = to_wait(next);
> -	spin_unlock(&b->irq_lock);
> -
> -	/* We always wake up the next waiter that takes over as the bottom-half
> -	 * as we may delegate not only the irq-seqno barrier to the next waiter
> -	 * but also the task of waking up concurrent waiters.
> -	 */
> -	if (next)
> -		wake_up_process(to_wait(next)->tsk);
> +	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
> +	del_timer_sync(&b->hangcheck);
> +	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>   }
>   
> -static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
> -				    struct intel_wait *wait)
> +void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct rb_node **p, *parent, *completed;
> -	bool first, armed;
> -	u32 seqno;
> +	unsigned long flags;
>   
> -	GEM_BUG_ON(!wait->seqno);
> +	spin_lock_irqsave(&b->irq_lock, flags);
>   
> -	/* Insert the request into the retirement ordered list
> -	 * of waiters by walking the rbtree. If we are the oldest
> -	 * seqno in the tree (the first to be retired), then
> -	 * set ourselves as the bottom-half.
> -	 *
> -	 * As we descend the tree, prune completed branches since we hold the
> -	 * spinlock we know that the first_waiter must be delayed and can
> -	 * reduce some of the sequential wake up latency if we take action
> -	 * ourselves and wake up the completed tasks in parallel. Also, by
> -	 * removing stale elements in the tree, we may be able to reduce the
> -	 * ping-pong between the old bottom-half and ourselves as first-waiter.
> +	/*
> +	 * Leave the fake_irq timer enabled (if it is running), but clear the
> +	 * bit so that it turns itself off on its next wake up and goes back
> +	 * to the long hangcheck interval if still required.
>   	 */
> -	armed = false;
> -	first = true;
> -	parent = NULL;
> -	completed = NULL;
> -	seqno = intel_engine_get_seqno(engine);
> -
> -	 /* If the request completed before we managed to grab the spinlock,
> -	  * return now before adding ourselves to the rbtree. We let the
> -	  * current bottom-half handle any pending wakeups and instead
> -	  * try and get out of the way quickly.
> -	  */
> -	if (i915_seqno_passed(seqno, wait->seqno)) {
> -		RB_CLEAR_NODE(&wait->node);
> -		return first;
> -	}
> -
> -	p = &b->waiters.rb_node;
> -	while (*p) {
> -		parent = *p;
> -		if (wait->seqno == to_wait(parent)->seqno) {
> -			/* We have multiple waiters on the same seqno, select
> -			 * the highest priority task (that with the smallest
> -			 * task->prio) to serve as the bottom-half for this
> -			 * group.
> -			 */
> -			if (wait->tsk->prio > to_wait(parent)->tsk->prio) {
> -				p = &parent->rb_right;
> -				first = false;
> -			} else {
> -				p = &parent->rb_left;
> -			}
> -		} else if (i915_seqno_passed(wait->seqno,
> -					     to_wait(parent)->seqno)) {
> -			p = &parent->rb_right;
> -			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> -				completed = parent;
> -			else
> -				first = false;
> -		} else {
> -			p = &parent->rb_left;
> -		}
> -	}
> -	rb_link_node(&wait->node, parent, p);
> -	rb_insert_color(&wait->node, &b->waiters);
> -
> -	if (first) {
> -		spin_lock(&b->irq_lock);
> -		b->irq_wait = wait;
> -		/* After assigning ourselves as the new bottom-half, we must
> -		 * perform a cursory check to prevent a missed interrupt.
> -		 * Either we miss the interrupt whilst programming the hardware,
> -		 * or if there was a previous waiter (for a later seqno) they
> -		 * may be woken instead of us (due to the inherent race
> -		 * in the unlocked read of b->irq_seqno_bh in the irq handler)
> -		 * and so we miss the wake up.
> -		 */
> -		armed = __intel_breadcrumbs_enable_irq(b);
> -		spin_unlock(&b->irq_lock);
> -	}
> -
> -	if (completed) {
> -		/* Advance the bottom-half (b->irq_wait) before we wake up
> -		 * the waiters who may scribble over their intel_wait
> -		 * just as the interrupt handler is dereferencing it via
> -		 * b->irq_wait.
> -		 */
> -		if (!first) {
> -			struct rb_node *next = rb_next(completed);
> -			GEM_BUG_ON(next == &wait->node);
> -			__intel_breadcrumbs_next(engine, next);
> -		}
> -
> -		do {
> -			struct intel_wait *crumb = to_wait(completed);
> -			completed = rb_prev(completed);
> -			__intel_breadcrumbs_finish(b, crumb);
> -		} while (completed);
> -	}
> -
> -	GEM_BUG_ON(!b->irq_wait);
> -	GEM_BUG_ON(!b->irq_armed);
> -	GEM_BUG_ON(rb_first(&b->waiters) != &b->irq_wait->node);
> -
> -	return armed;
> -}
> -
> -bool intel_engine_add_wait(struct intel_engine_cs *engine,
> -			   struct intel_wait *wait)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	bool armed;
> -
> -	spin_lock_irq(&b->rb_lock);
> -	armed = __intel_engine_add_wait(engine, wait);
> -	spin_unlock_irq(&b->rb_lock);
> -	if (armed)
> -		return armed;
> -
> -	/* Make the caller recheck if its request has already started. */
> -	return intel_engine_has_started(engine, wait->seqno);
> -}
> -
> -static inline bool chain_wakeup(struct rb_node *rb, int priority)
> -{
> -	return rb && to_wait(rb)->tsk->prio <= priority;
> -}
> +	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>   
> -static inline int wakeup_priority(struct intel_breadcrumbs *b,
> -				  struct task_struct *tsk)
> -{
> -	if (tsk == b->signaler)
> -		return INT_MIN;
> +	if (b->irq_enabled)
> +		irq_enable(engine);
>   	else
> -		return tsk->prio;
> -}
> -
> -static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
> -				       struct intel_wait *wait)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	lockdep_assert_held(&b->rb_lock);
> -
> -	if (RB_EMPTY_NODE(&wait->node))
> -		goto out;
> -
> -	if (b->irq_wait == wait) {
> -		const int priority = wakeup_priority(b, wait->tsk);
> -		struct rb_node *next;
> -
> -		/* We are the current bottom-half. Find the next candidate,
> -		 * the first waiter in the queue on the remaining oldest
> -		 * request. As multiple seqnos may complete in the time it
> -		 * takes us to wake up and find the next waiter, we have to
> -		 * wake up that waiter for it to perform its own coherent
> -		 * completion check.
> -		 */
> -		next = rb_next(&wait->node);
> -		if (chain_wakeup(next, priority)) {
> -			/* If the next waiter is already complete,
> -			 * wake it up and continue onto the next waiter. So
> -			 * if have a small herd, they will wake up in parallel
> -			 * rather than sequentially, which should reduce
> -			 * the overall latency in waking all the completed
> -			 * clients.
> -			 *
> -			 * However, waking up a chain adds extra latency to
> -			 * the first_waiter. This is undesirable if that
> -			 * waiter is a high priority task.
> -			 */
> -			u32 seqno = intel_engine_get_seqno(engine);
> -
> -			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
> -				struct rb_node *n = rb_next(next);
> -
> -				__intel_breadcrumbs_finish(b, to_wait(next));
> -				next = n;
> -				if (!chain_wakeup(next, priority))
> -					break;
> -			}
> -		}
> -
> -		__intel_breadcrumbs_next(engine, next);
> -	} else {
> -		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
> -	}
> -
> -	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
> -	rb_erase(&wait->node, &b->waiters);
> -	RB_CLEAR_NODE(&wait->node);
> -
> -out:
> -	GEM_BUG_ON(b->irq_wait == wait);
> -	GEM_BUG_ON(rb_first(&b->waiters) !=
> -		   (b->irq_wait ? &b->irq_wait->node : NULL));
> -}
> -
> -void intel_engine_remove_wait(struct intel_engine_cs *engine,
> -			      struct intel_wait *wait)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	/* Quick check to see if this waiter was already decoupled from
> -	 * the tree by the bottom-half to avoid contention on the spinlock
> -	 * by the herd.
> -	 */
> -	if (RB_EMPTY_NODE(&wait->node)) {
> -		GEM_BUG_ON(READ_ONCE(b->irq_wait) == wait);
> -		return;
> -	}
> +		irq_disable(engine);
>   
> -	spin_lock_irq(&b->rb_lock);
> -	__intel_engine_remove_wait(engine, wait);
> -	spin_unlock_irq(&b->rb_lock);
> +	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
>   
> -static void signaler_set_rtpriority(void)
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	 struct sched_param param = { .sched_priority = 1 };
> -
> -	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
> +	cancel_fake_irq(engine);
>   }
>   
> -static int intel_breadcrumbs_signaler(void *arg)
> +bool intel_engine_enable_signaling(struct i915_request *rq)

intel_request_enable_signaling?

>   {
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct i915_request *rq, *n;
> -
> -	/* Install ourselves with high priority to reduce signalling latency */
> -	signaler_set_rtpriority();
> -
> -	do {
> -		bool do_schedule = true;
> -		LIST_HEAD(list);
> -		u32 seqno;
> -
> -		set_current_state(TASK_INTERRUPTIBLE);
> -		if (list_empty(&b->signals))
> -			goto sleep;
> -
> -		/*
> -		 * We are either woken up by the interrupt bottom-half,
> -		 * or by a client adding a new signaller. In both cases,
> -		 * the GPU seqno may have advanced beyond our oldest signal.
> -		 * If it has, propagate the signal, remove the waiter and
> -		 * check again with the next oldest signal. Otherwise we
> -		 * need to wait for a new interrupt from the GPU or for
> -		 * a new client.
> -		 */
> -		seqno = intel_engine_get_seqno(engine);
> -
> -		spin_lock_irq(&b->rb_lock);
> -		list_for_each_entry_safe(rq, n, &b->signals, signaling.link) {
> -			u32 this = rq->signaling.wait.seqno;
> -
> -			GEM_BUG_ON(!rq->signaling.wait.seqno);
> -
> -			if (!i915_seqno_passed(seqno, this))
> -				break;
> -
> -			if (likely(this == i915_request_global_seqno(rq))) {
> -				__intel_engine_remove_wait(engine,
> -							   &rq->signaling.wait);
> +	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>   
> -				rq->signaling.wait.seqno = 0;
> -				__list_del_entry(&rq->signaling.link);
> +	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
>   
> -				if (!i915_request_signaled(rq)) {
> -					list_add_tail(&rq->signaling.link,
> -						      &list);
> -					i915_request_get(rq);
> -				}
> -			}
> -		}
> -		spin_unlock_irq(&b->rb_lock);
> +	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> +		return true;
>   
> -		if (!list_empty(&list)) {
> -			local_bh_disable();
> -			list_for_each_entry_safe(rq, n, &list, signaling.link) {
> -				dma_fence_signal(&rq->fence);
> -				GEM_BUG_ON(!i915_request_completed(rq));
> -				i915_request_put(rq);
> -			}
> -			local_bh_enable(); /* kick start the tasklets */
> +	spin_lock(&b->irq_lock);
> +	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&

__test_bit?

> +	    !__request_completed(rq)) {
> +		struct intel_context *ce = rq->hw_context;
> +		struct list_head *pos;
>   
> -			/*
> -			 * If the engine is saturated we may be continually
> -			 * processing completed requests. This angers the
> -			 * NMI watchdog if we never let anything else
> -			 * have access to the CPU. Let's pretend to be nice
> -			 * and relinquish the CPU if we burn through the
> -			 * entire RT timeslice!
> -			 */
> -			do_schedule = need_resched();
> -		}
> +		__intel_breadcrumbs_arm_irq(b);
>   
> -		if (unlikely(do_schedule)) {
> -sleep:
> -			if (kthread_should_park())
> -				kthread_parkme();
> +		list_for_each_prev(pos, &ce->signals) {
> +			struct i915_request *it =
> +				list_entry(pos, typeof(*it), signal_link);
>   
> -			if (unlikely(kthread_should_stop()))
> +			if (i915_seqno_passed(rq->fence.seqno, 

Put a comment against this loop please saying where in the list it is 
looking to insert...

it->fence.seqno))
>   				break;
> -
> -			schedule();
>   		}
> -	} while (1);
> -	__set_current_state(TASK_RUNNING);
> -
> -	return 0;
> -}
> -
> -static void insert_signal(struct intel_breadcrumbs *b,
> -			  struct i915_request *request,
> -			  const u32 seqno)
> -{
> -	struct i915_request *iter;
> -
> -	lockdep_assert_held(&b->rb_lock);
> -
> -	/*
> -	 * A reasonable assumption is that we are called to add signals
> -	 * in sequence, as the requests are submitted for execution and
> -	 * assigned a global_seqno. This will be the case for the majority
> -	 * of internally generated signals (inter-engine signaling).
> -	 *
> -	 * Out of order waiters triggering random signaling enabling will
> -	 * be more problematic, but hopefully rare enough and the list
> -	 * small enough that the O(N) insertion sort is not an issue.
> -	 */
> -
> -	list_for_each_entry_reverse(iter, &b->signals, signaling.link)
> -		if (i915_seqno_passed(seqno, iter->signaling.wait.seqno))
> -			break;
> -
> -	list_add(&request->signaling.link, &iter->signaling.link);
> -}
> +		list_add(&rq->signal_link, pos);
> +		if (pos == &ce->signals)
> +			list_move_tail(&ce->signal_link, &b->signalers);

... and here how it manages the other list as well, on transition from 
empty to active.

>   
> -bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup)
> -{
> -	struct intel_engine_cs *engine = request->engine;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct intel_wait *wait = &request->signaling.wait;
> -	u32 seqno;
> -
> -	/*
> -	 * Note that we may be called from an interrupt handler on another
> -	 * device (e.g. nouveau signaling a fence completion causing us
> -	 * to submit a request, and so enable signaling). As such,
> -	 * we need to make sure that all other users of b->rb_lock protect
> -	 * against interrupts, i.e. use spin_lock_irqsave.
> -	 */
> -
> -	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
> -	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&request->lock);
> -
> -	seqno = i915_request_global_seqno(request);
> -	if (!seqno) /* will be enabled later upon execution */
> -		return true;
> -
> -	GEM_BUG_ON(wait->seqno);
> -	wait->tsk = b->signaler;
> -	wait->request = request;
> -	wait->seqno = seqno;
> -
> -	/*
> -	 * Add ourselves into the list of waiters, but registering our
> -	 * bottom-half as the signaller thread. As per usual, only the oldest
> -	 * waiter (not just signaller) is tasked as the bottom-half waking
> -	 * up all completed waiters after the user interrupt.
> -	 *
> -	 * If we are the oldest waiter, enable the irq (after which we
> -	 * must double check that the seqno did not complete).
> -	 */
> -	spin_lock(&b->rb_lock);
> -	insert_signal(b, request, seqno);
> -	wakeup &= __intel_engine_add_wait(engine, wait);
> -	spin_unlock(&b->rb_lock);
> -
> -	if (wakeup) {
> -		wake_up_process(b->signaler);
> -		return !intel_wait_complete(wait);
> +		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>   	}
> +	spin_unlock(&b->irq_lock);
>   
> -	return true;
> +	return !__request_completed(rq);
>   }
>   
> -void intel_engine_cancel_signaling(struct i915_request *request)
> +void intel_engine_cancel_signaling(struct i915_request *rq)

intel_request_cancel_signaling?

>   {
> -	struct intel_engine_cs *engine = request->engine;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&request->lock);
> +	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>   
> -	if (!READ_ONCE(request->signaling.wait.seqno))
> +	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
>   		return;
>   
> -	spin_lock(&b->rb_lock);
> -	__intel_engine_remove_wait(engine, &request->signaling.wait);
> -	if (fetch_and_zero(&request->signaling.wait.seqno))
> -		__list_del_entry(&request->signaling.link);
> -	spin_unlock(&b->rb_lock);
> -}
> -
> -int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct task_struct *tsk;
> -
> -	spin_lock_init(&b->rb_lock);
> -	spin_lock_init(&b->irq_lock);
> -
> -	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
> -	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
> -
> -	INIT_LIST_HEAD(&b->signals);
> -
> -	/* Spawn a thread to provide a common bottom-half for all signals.
> -	 * As this is an asynchronous interface we cannot steal the current
> -	 * task for handling the bottom-half to the user interrupt, therefore
> -	 * we create a thread to do the coherent seqno dance after the
> -	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
> -	 */
> -	tsk = kthread_run(intel_breadcrumbs_signaler, engine,
> -			  "i915/signal:%d", engine->id);
> -	if (IS_ERR(tsk))
> -		return PTR_ERR(tsk);
> -
> -	b->signaler = tsk;
> +	spin_lock(&b->irq_lock);
> +	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {

__test_and_clear_bit ?

> +		struct intel_context *ce = rq->hw_context;
>   
> -	return 0;
> -}
> +		list_del(&rq->signal_link);
> +		if (list_empty(&ce->signals))
> +			list_del_init(&ce->signal_link);
>   
> -static void cancel_fake_irq(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
> -	del_timer_sync(&b->hangcheck);
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> +		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +	}
> +	spin_unlock(&b->irq_lock);
>   }
>   
> -void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
> +void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> +				    struct drm_printer *p)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&b->irq_lock, flags);
> -
> -	/*
> -	 * Leave the fake_irq timer enabled (if it is running), but clear the
> -	 * bit so that it turns itself off on its next wake up and goes back
> -	 * to the long hangcheck interval if still required.
> -	 */
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> +	struct intel_context *ce;
> +	struct i915_request *rq;
>   
> -	if (b->irq_enabled)
> -		irq_enable(engine);
> -	else
> -		irq_disable(engine);
> -
> -	spin_unlock_irqrestore(&b->irq_lock, flags);
> -}
> -
> -void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	if (list_empty(&b->signalers))
> +		return;
>   
> -	/* The engines should be idle and all requests accounted for! */
> -	WARN_ON(READ_ONCE(b->irq_wait));
> -	WARN_ON(!RB_EMPTY_ROOT(&b->waiters));
> -	WARN_ON(!list_empty(&b->signals));
> +	drm_printf(p, "Signals:\n");
>   
> -	if (!IS_ERR_OR_NULL(b->signaler))
> -		kthread_stop(b->signaler);
> +	spin_lock_irq(&b->irq_lock);
> +	list_for_each_entry(ce, &b->signalers, signal_link) {
> +		list_for_each_entry(rq, &ce->signals, signal_link) {
> +			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
> +				   rq->fence.context, rq->fence.seqno,
> +				   i915_request_completed(rq) ? "!" :
> +				   i915_request_started(rq) ? "*" :
> +				   "",
> +				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
> +		}
> +	}
> +	spin_unlock_irq(&b->irq_lock);
>   
> -	cancel_fake_irq(engine);
> +	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
> +		drm_printf(p, "Fake irq active\n");
>   }
> -
> -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> -#include "selftests/intel_breadcrumbs.c"
> -#endif
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 2a4c547240a1..1d9157bf96ae 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -458,12 +458,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
>   {
>   	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -
> -	/* After manually advancing the seqno, fake the interrupt in case
> -	 * there are any waiters for that seqno.
> -	 */
> -	intel_engine_wakeup(engine);
> -
>   	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
>   }
>   
> @@ -667,16 +661,10 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   		}
>   	}
>   
> -	ret = intel_engine_init_breadcrumbs(engine);
> -	if (ret)
> -		goto err_unpin_preempt;
> +	intel_engine_init_breadcrumbs(engine);
>   
>   	return 0;
>   
> -err_unpin_preempt:
> -	if (i915->preempt_context)
> -		__intel_context_unpin(i915->preempt_context, engine);
> -
>   err_unpin_kernel:
>   	__intel_context_unpin(i915->kernel_context, engine);
>   	return ret;
> @@ -1236,12 +1224,14 @@ static void print_request(struct drm_printer *m,
>   
>   	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
>   
> -	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
> +	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
>   		   prefix,
>   		   rq->global_seqno,
>   		   i915_request_completed(rq) ? "!" :
>   		   i915_request_started(rq) ? "*" :
>   		   "",
> +		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +			    &rq->fence.flags) ?  "+" : "",
>   		   rq->fence.context, rq->fence.seqno,
>   		   buf,
>   		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
> @@ -1434,12 +1424,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   		       struct drm_printer *m,
>   		       const char *header, ...)
>   {
> -	struct intel_breadcrumbs * const b = &engine->breadcrumbs;
>   	struct i915_gpu_error * const error = &engine->i915->gpu_error;
>   	struct i915_request *rq;
>   	intel_wakeref_t wakeref;
> -	unsigned long flags;
> -	struct rb_node *rb;
>   
>   	if (header) {
>   		va_list ap;
> @@ -1507,21 +1494,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   
>   	intel_execlists_show_requests(engine, m, print_request, 8);
>   
> -	spin_lock_irqsave(&b->rb_lock, flags);
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -		drm_printf(m, "\t%s [%d:%c] waiting for %x\n",
> -			   w->tsk->comm, w->tsk->pid,
> -			   task_state_to_char(w->tsk),
> -			   w->seqno);
> -	}
> -	spin_unlock_irqrestore(&b->rb_lock, flags);
> -
>   	drm_printf(m, "HWSP:\n");
>   	hexdump(m, engine->status_page.addr, PAGE_SIZE);
>   
>   	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
> +
> +	intel_engine_print_breadcrumbs(engine, m);
>   }
>   
>   static u8 user_class_map[] = {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index f6b30eb46263..bd44ea41d7ca 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -483,8 +483,8 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   
>   	for (i = 0; i < GEN7_XCS_WA; i++) {
>   		*cs++ = MI_STORE_DWORD_INDEX;
> -		*cs++ = I915_GEM_HWS_INDEX_ADDR;
> -		*cs++ = rq->global_seqno;
> +		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +		*cs++ = rq->fence.seqno;
>   	}
>   
>   	*cs++ = MI_FLUSH_DW;
> @@ -734,7 +734,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
>   	}
>   
>   	/* Papering over lost _interrupts_ immediately following the restart */
> -	intel_engine_wakeup(engine);
> +	intel_engine_queue_breadcrumbs(engine);
>   out:
>   	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d3d4f3667afb..b78cb9bd4bc2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -5,6 +5,7 @@
>   #include <drm/drm_util.h>
>   
>   #include <linux/hashtable.h>
> +#include <linux/irq_work.h>
>   #include <linux/seqlock.h>
>   
>   #include "i915_gem_batch_pool.h"
> @@ -376,22 +377,19 @@ struct intel_engine_cs {
>   	 * the overhead of waking that client is much preferred.
>   	 */
>   	struct intel_breadcrumbs {
> -		spinlock_t irq_lock; /* protects irq_*; irqsafe */
> -		struct intel_wait *irq_wait; /* oldest waiter by retirement */
> +		spinlock_t irq_lock;
> +		struct list_head signalers;
>   
> -		spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
> -		struct rb_root waiters; /* sorted by retirement, priority */
> -		struct list_head signals; /* sorted by retirement */
> -		struct task_struct *signaler; /* used for fence signalling */
> +		struct irq_work irq_work;

Why did you need irq work and not just invoke the handler directly? 
Maybe put a comment here giving a hint.

>   
>   		struct timer_list fake_irq; /* used after a missed interrupt */
>   		struct timer_list hangcheck; /* detect missed interrupts */
>   
>   		unsigned int hangcheck_interrupts;
>   		unsigned int irq_enabled;
> -		unsigned int irq_count;
>   
> -		bool irq_armed : 1;
> +		bool irq_armed;
> +		bool irq_fired;
>   	} breadcrumbs;
>   
>   	struct {
> @@ -880,83 +878,32 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
>   void intel_engine_get_instdone(struct intel_engine_cs *engine,
>   			       struct intel_instdone *instdone);
>   
> -/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
> -int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> -
> -static inline void intel_wait_init(struct intel_wait *wait)
> -{
> -	wait->tsk = current;
> -	wait->request = NULL;
> -}
> -
> -static inline void intel_wait_init_for_seqno(struct intel_wait *wait, u32 seqno)
> -{
> -	wait->tsk = current;
> -	wait->seqno = seqno;
> -}
> -
> -static inline bool intel_wait_has_seqno(const struct intel_wait *wait)
> -{
> -	return wait->seqno;
> -}
> -
> -static inline bool
> -intel_wait_update_seqno(struct intel_wait *wait, u32 seqno)
> -{
> -	wait->seqno = seqno;
> -	return intel_wait_has_seqno(wait);
> -}
> -
> -static inline bool
> -intel_wait_update_request(struct intel_wait *wait,
> -			  const struct i915_request *rq)
> -{
> -	return intel_wait_update_seqno(wait, i915_request_global_seqno(rq));
> -}
> -
> -static inline bool
> -intel_wait_check_seqno(const struct intel_wait *wait, u32 seqno)
> -{
> -	return wait->seqno == seqno;
> -}
> -
> -static inline bool
> -intel_wait_check_request(const struct intel_wait *wait,
> -			 const struct i915_request *rq)
> -{
> -	return intel_wait_check_seqno(wait, i915_request_global_seqno(rq));
> -}
> -
> -static inline bool intel_wait_complete(const struct intel_wait *wait)
> -{
> -	return RB_EMPTY_NODE(&wait->node);
> -}
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>   
> -bool intel_engine_add_wait(struct intel_engine_cs *engine,
> -			   struct intel_wait *wait);
> -void intel_engine_remove_wait(struct intel_engine_cs *engine,
> -			      struct intel_wait *wait);
> -bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup);
> +bool intel_engine_enable_signaling(struct i915_request *request);
>   void intel_engine_cancel_signaling(struct i915_request *request);
>   
> -static inline bool intel_engine_has_waiter(const struct intel_engine_cs *engine)
> -{
> -	return READ_ONCE(engine->breadcrumbs.irq_wait);
> -}
> -
> -unsigned int intel_engine_wakeup(struct intel_engine_cs *engine);
> -#define ENGINE_WAKEUP_WAITER BIT(0)
> -#define ENGINE_WAKEUP_ASLEEP BIT(1)
> -
>   void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
>   void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
>   
> -void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
> +bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
>   void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
>   
> +static inline void
> +intel_engine_queue_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	irq_work_queue(&engine->breadcrumbs.irq_work);
> +}
> +
> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine);
> +
>   void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
>   void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>   
> +void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> +				    struct drm_printer *p);
> +
>   static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
>   {
>   	memset(batch, 0, 6 * sizeof(u32));
> diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> index 4a83a1c6c406..88e5ab586337 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> +++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> @@ -15,7 +15,6 @@ selftest(scatterlist, scatterlist_mock_selftests)
>   selftest(syncmap, i915_syncmap_mock_selftests)
>   selftest(uncore, intel_uncore_mock_selftests)
>   selftest(engine, intel_engine_cs_mock_selftests)
> -selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
>   selftest(timelines, i915_timeline_mock_selftests)
>   selftest(requests, i915_request_mock_selftests)
>   selftest(objects, i915_gem_object_mock_selftests)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 4d4b86b5fa11..a5359a03bfec 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -25,9 +25,12 @@
>   #include <linux/prime_numbers.h>
>   
>   #include "../i915_selftest.h"
> +#include "i915_random.h"
>   #include "igt_live_test.h"
> +#include "lib_sw_fence.h"
>   
>   #include "mock_context.h"
> +#include "mock_drm.h"
>   #include "mock_gem_device.h"
>   
>   static int igt_add_request(void *arg)
> @@ -247,6 +250,239 @@ static int igt_request_rewind(void *arg)
>   	return err;
>   }
>   
> +struct smoketest {
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context **contexts;
> +	unsigned int ncontexts, max_batch;
> +	atomic_long_t num_waits, num_fences;
> +	struct i915_request *(*request_alloc)(struct i915_gem_context *,
> +					      struct intel_engine_cs *);
> +
> +};
> +
> +static struct i915_request *
> +__mock_request_alloc(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine)
> +{
> +	return mock_request(engine, ctx, 0);
> +}
> +
> +static struct i915_request *
> +__live_request_alloc(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine)
> +{
> +	return i915_request_alloc(engine, ctx);
> +}
> +
> +static int __igt_breadcrumbs_smoketest(void *arg)
> +{
> +	struct smoketest *t = arg;
> +	struct mutex *BKL = &t->engine->i915->drm.struct_mutex;

Breaking new ground, well okay, although caching dev or i915 would be 
good enough.

> +	struct i915_request **requests;
> +	I915_RND_STATE(prng);
> +	const unsigned int total = 4 * t->ncontexts + 1;
> +	const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
> +	unsigned int num_waits = 0, num_fences = 0;
> +	unsigned int *order;
> +	int err = 0;

Still in the Chrismas spirit? ;) No worries, it's selftests.

I ran out of steam and will look at selftests during some second pass. 
In expectation, please put some high level comments for each test to 
roughly say what it plans to test and with what approach. I makes 
reverse engineering the algorithm much easier.

Regards,

Tvrtko

> +
> +	requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
> +	if (!requests)
> +		return -ENOMEM;
> +
> +	order = i915_random_order(total, &prng);
> +	if (!order) {
> +		err = -ENOMEM;
> +		goto out_requests;
> +	}
> +
> +	while (!kthread_should_stop()) {
> +		struct i915_sw_fence *submit, *wait;
> +		unsigned int n, count;
> +
> +		submit = heap_fence_create(GFP_KERNEL);
> +		if (!submit) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		wait = heap_fence_create(GFP_KERNEL);
> +		if (!wait) {
> +			i915_sw_fence_commit(submit);
> +			heap_fence_put(submit);
> +			err = ENOMEM;
> +			break;
> +		}
> +
> +		i915_random_reorder(order, total, &prng);
> +		count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
> +
> +		for (n = 0; n < count; n++) {
> +			struct i915_gem_context *ctx =
> +				t->contexts[order[n] % t->ncontexts];
> +			struct i915_request *rq;
> +
> +			mutex_lock(BKL);
> +
> +			rq = t->request_alloc(ctx, t->engine);
> +			if (IS_ERR(rq)) {
> +				mutex_unlock(BKL);
> +				err = PTR_ERR(rq);
> +				count = n;
> +				break;
> +			}
> +
> +			err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +							       submit,
> +							       GFP_KERNEL);
> +
> +			requests[n] = i915_request_get(rq);
> +			i915_request_add(rq);
> +
> +			mutex_unlock(BKL);
> +
> +			if (err >= 0)
> +				err = i915_sw_fence_await_dma_fence(wait,
> +								    &rq->fence,
> +								    0,
> +								    GFP_KERNEL);
> +			if (err < 0) {
> +				i915_request_put(rq);
> +				count = n;
> +				break;
> +			}
> +		}
> +
> +		i915_sw_fence_commit(submit);
> +		i915_sw_fence_commit(wait);
> +
> +		if (!wait_event_timeout(wait->wait,
> +					i915_sw_fence_done(wait),
> +					HZ / 2)) {
> +			struct i915_request *rq = requests[count - 1];
> +
> +			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
> +			       count,
> +			       rq->fence.context, rq->fence.seqno,
> +			       t->engine->name);
> +			i915_gem_set_wedged(t->engine->i915);
> +			GEM_BUG_ON(!i915_request_completed(rq));
> +			i915_sw_fence_wait(wait);
> +			err = -EIO;
> +		}
> +
> +		for (n = 0; n < count; n++) {
> +			struct i915_request *rq = requests[n];
> +
> +			if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +				      &rq->fence.flags)) {
> +				pr_err("%llu:%llu was not signaled!\n",
> +				       rq->fence.context, rq->fence.seqno);
> +				err = -EINVAL;
> +			}
> +
> +			i915_request_put(rq);
> +		}
> +
> +		heap_fence_put(wait);
> +		heap_fence_put(submit);
> +
> +		if (err < 0)
> +			break;
> +
> +		num_fences += count;
> +		num_waits++;
> +
> +		cond_resched();
> +	}
> +
> +	atomic_long_add(num_fences, &t->num_fences);
> +	atomic_long_add(num_waits, &t->num_waits);
> +
> +	kfree(order);
> +out_requests:
> +	kfree(requests);
> +	return err;
> +}
> +
> +static int mock_breadcrumbs_smoketest(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct smoketest t = {
> +		.engine = i915->engine[RCS],
> +		.ncontexts = 1024,
> +		.max_batch = 1024,
> +		.request_alloc = __mock_request_alloc
> +	};
> +	unsigned int ncpus = num_online_cpus();
> +	struct task_struct **threads;
> +	unsigned int n;
> +	int ret = 0;
> +
> +	threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
> +	if (!threads)
> +		return -ENOMEM;
> +
> +	t.contexts =
> +		kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
> +	if (!t.contexts) {
> +		ret = -ENOMEM;
> +		goto out_threads;
> +	}
> +
> +	mutex_lock(&t.engine->i915->drm.struct_mutex);
> +	for (n = 0; n < t.ncontexts; n++) {
> +		t.contexts[n] = mock_context(t.engine->i915, "mock");
> +		if (!t.contexts[n]) {
> +			ret = -ENOMEM;
> +			goto out_contexts;
> +		}
> +	}
> +
> +	for (n = 0; n < ncpus; n++) {
> +		threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
> +					 &t, "igt/%d", n);
> +		if (IS_ERR(threads[n])) {
> +			ret = PTR_ERR(threads[n]);
> +			ncpus = n;
> +			break;
> +		}
> +
> +		get_task_struct(threads[n]);
> +	}
> +	mutex_unlock(&t.engine->i915->drm.struct_mutex);
> +
> +	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
> +
> +	for (n = 0; n < ncpus; n++) {
> +		int err;
> +
> +		err = kthread_stop(threads[n]);
> +		if (err < 0 && !ret)
> +			ret = err;
> +
> +		put_task_struct(threads[n]);
> +	}
> +	pr_info("Completed %lu waits for %lu fence across %d cpus\n",
> +		atomic_long_read(&t.num_waits),
> +		atomic_long_read(&t.num_fences),
> +		ncpus);
> +
> +	mutex_lock(&t.engine->i915->drm.struct_mutex);
> +out_contexts:
> +	for (n = 0; n < t.ncontexts; n++) {
> +		if (!t.contexts[n])
> +			break;
> +		mock_context_close(t.contexts[n]);
> +	}
> +	mutex_unlock(&t.engine->i915->drm.struct_mutex);
> +	kfree(t.contexts);
> +out_threads:
> +	kfree(threads);
> +
> +	return ret;
> +}
> +
>   int i915_request_mock_selftests(void)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -254,6 +490,7 @@ int i915_request_mock_selftests(void)
>   		SUBTEST(igt_wait_request),
>   		SUBTEST(igt_fence_wait),
>   		SUBTEST(igt_request_rewind),
> +		SUBTEST(mock_breadcrumbs_smoketest),
>   	};
>   	struct drm_i915_private *i915;
>   	intel_wakeref_t wakeref;
> @@ -812,6 +1049,166 @@ static int live_sequential_engines(void *arg)
>   	return err;
>   }
>   
> +static int
> +max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> +{
> +	struct i915_request *rq;
> +	int ret;
> +
> +	/*
> +	 * Before execlists, all contexts share the same ringbuffer. With
> +	 * execlists, each context/engine has a separate ringbuffer and
> +	 * for the purposes of this test, inexhaustible.
> +	 *
> +	 * For the global ringbuffer though, we have to be very careful
> +	 * that we do not wrap while preventing the execution of requests
> +	 * with a unsignaled fence.
> +	 */
> +	if (HAS_EXECLISTS(ctx->i915))
> +		return INT_MAX;
> +
> +	rq = i915_request_alloc(engine, ctx);
> +	if (IS_ERR(rq)) {
> +		ret = PTR_ERR(rq);
> +	} else {
> +		int sz;
> +
> +		ret = rq->ring->size - rq->reserved_space;
> +		i915_request_add(rq);
> +
> +		sz = rq->ring->emit - rq->head;
> +		if (sz < 0)
> +			sz += rq->ring->size;
> +		ret /= sz;
> +		ret /= 2; /* leave half spare, in case of emergency! */
> +
> +		/* One ring interleaved between requests from all cpus */
> +		ret /= num_online_cpus() + 1;
> +	}
> +
> +	return ret;
> +}
> +
> +static int live_breadcrumbs_smoketest(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct smoketest t[I915_NUM_ENGINES];
> +	unsigned int ncpus = num_online_cpus();
> +	unsigned long num_waits, num_fences;
> +	struct intel_engine_cs *engine;
> +	struct task_struct **threads;
> +	struct igt_live_test live;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	struct drm_file *file;
> +	unsigned int n;
> +	int ret = 0;
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	file = mock_file(i915);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto out_rpm;
> +	}
> +
> +	threads = kcalloc(ncpus * I915_NUM_ENGINES,
> +			  sizeof(*threads),
> +			  GFP_KERNEL);
> +	if (!threads)
> +		return -ENOMEM;
> +
> +	memset(&t[0], 0, sizeof(t[0]));
> +	t[0].request_alloc = __live_request_alloc;
> +	t[0].ncontexts = 64;
> +	t[0].contexts = kmalloc_array(t[0].ncontexts,
> +				      sizeof(*t[0].contexts),
> +				      GFP_KERNEL);
> +	if (!t[0].contexts) {
> +		ret = -ENOMEM;
> +		goto out_threads;
> +	}
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	for (n = 0; n < t[0].ncontexts; n++) {
> +		t[0].contexts[n] = live_context(i915, file);
> +		if (!t[0].contexts[n]) {
> +			ret = -ENOMEM;
> +			goto out_contexts;
> +		}
> +	}
> +
> +	ret = igt_live_test_begin(&live, i915, __func__, "");
> +	if (ret)
> +		goto out_contexts;
> +
> +	for_each_engine(engine, i915, id) {
> +		t[id] = t[0];
> +		t[id].engine = engine;
> +		t[id].max_batch = max_batches(t[0].contexts[0], engine);
> +		if (t[id].max_batch < 0) {
> +			ret = t[id].max_batch;
> +			goto out_flush;
> +		}
> +		pr_debug("Limiting batches to %d requests on %s\n",
> +			 t[id].max_batch, engine->name);
> +
> +		for (n = 0; n < ncpus; n++) {
> +			struct task_struct *tsk;
> +
> +			tsk = kthread_run(__igt_breadcrumbs_smoketest,
> +					  &t[id], "igt/%d.%d", id, n);
> +			if (IS_ERR(tsk)) {
> +				ret = PTR_ERR(tsk);
> +				goto out_flush;
> +			}
> +
> +			get_task_struct(tsk);
> +			threads[id * ncpus + n] = tsk;
> +		}
> +	}
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
> +
> +out_flush:
> +	num_waits = 0;
> +	num_fences = 0;
> +	for_each_engine(engine, i915, id) {
> +		for (n = 0; n < ncpus; n++) {
> +			struct task_struct *tsk = threads[id * ncpus + n];
> +			int err;
> +
> +			if (!tsk)
> +				continue;
> +
> +			err = kthread_stop(tsk);
> +			if (err < 0 && !ret)
> +				ret = err;
> +
> +			put_task_struct(tsk);
> +		}
> +
> +		num_waits += atomic_long_read(&t[id].num_waits);
> +		num_fences += atomic_long_read(&t[id].num_fences);
> +	}
> +	pr_info("Completed %lu waits for %lu fence across %d engines and %d cpus\n",
> +		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	ret = igt_live_test_end(&live) ?: ret;
> +out_contexts:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	kfree(t[0].contexts);
> +out_threads:
> +	kfree(threads);
> +	mock_file_free(i915, file);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +
> +	return ret;
> +}
> +
>   int i915_request_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -819,6 +1216,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_all_engines),
>   		SUBTEST(live_sequential_engines),
>   		SUBTEST(live_empty_request),
> +		SUBTEST(live_breadcrumbs_smoketest),
>   	};
>   
>   	if (i915_terminally_wedged(&i915->gpu_error))
> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> index 0e70df0230b8..9ebd9225684e 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> @@ -185,11 +185,6 @@ void igt_spinner_fini(struct igt_spinner *spin)
>   
>   bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
>   {
> -	if (!wait_event_timeout(rq->execute,
> -				READ_ONCE(rq->global_seqno),
> -				msecs_to_jiffies(10)))
> -		return false;
> -
>   	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
>   					       rq->fence.seqno),
>   			     10) &&
> diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> deleted file mode 100644
> index f03b407fdbe2..000000000000
> --- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> +++ /dev/null
> @@ -1,470 +0,0 @@
> -/*
> - * Copyright © 2016 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the next
> - * paragraph) shall be included in all copies or substantial portions of the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> - * IN THE SOFTWARE.
> - *
> - */
> -
> -#include "../i915_selftest.h"
> -#include "i915_random.h"
> -
> -#include "mock_gem_device.h"
> -#include "mock_engine.h"
> -
> -static int check_rbtree(struct intel_engine_cs *engine,
> -			const unsigned long *bitmap,
> -			const struct intel_wait *waiters,
> -			const int count)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct rb_node *rb;
> -	int n;
> -
> -	if (&b->irq_wait->node != rb_first(&b->waiters)) {
> -		pr_err("First waiter does not match first element of wait-tree\n");
> -		return -EINVAL;
> -	}
> -
> -	n = find_first_bit(bitmap, count);
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = container_of(rb, typeof(*w), node);
> -		int idx = w - waiters;
> -
> -		if (!test_bit(idx, bitmap)) {
> -			pr_err("waiter[%d, seqno=%d] removed but still in wait-tree\n",
> -			       idx, w->seqno);
> -			return -EINVAL;
> -		}
> -
> -		if (n != idx) {
> -			pr_err("waiter[%d, seqno=%d] does not match expected next element in tree [%d]\n",
> -			       idx, w->seqno, n);
> -			return -EINVAL;
> -		}
> -
> -		n = find_next_bit(bitmap, count, n + 1);
> -	}
> -
> -	return 0;
> -}
> -
> -static int check_completion(struct intel_engine_cs *engine,
> -			    const unsigned long *bitmap,
> -			    const struct intel_wait *waiters,
> -			    const int count)
> -{
> -	int n;
> -
> -	for (n = 0; n < count; n++) {
> -		if (intel_wait_complete(&waiters[n]) != !!test_bit(n, bitmap))
> -			continue;
> -
> -		pr_err("waiter[%d, seqno=%d] is %s, but expected %s\n",
> -		       n, waiters[n].seqno,
> -		       intel_wait_complete(&waiters[n]) ? "complete" : "active",
> -		       test_bit(n, bitmap) ? "active" : "complete");
> -		return -EINVAL;
> -	}
> -
> -	return 0;
> -}
> -
> -static int check_rbtree_empty(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	if (b->irq_wait) {
> -		pr_err("Empty breadcrumbs still has a waiter\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!RB_EMPTY_ROOT(&b->waiters)) {
> -		pr_err("Empty breadcrumbs, but wait-tree not empty\n");
> -		return -EINVAL;
> -	}
> -
> -	return 0;
> -}
> -
> -static int igt_random_insert_remove(void *arg)
> -{
> -	const u32 seqno_bias = 0x1000;
> -	I915_RND_STATE(prng);
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_wait *waiters;
> -	const int count = 4096;
> -	unsigned int *order;
> -	unsigned long *bitmap;
> -	int err = -ENOMEM;
> -	int n;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
> -			 GFP_KERNEL);
> -	if (!bitmap)
> -		goto out_waiters;
> -
> -	order = i915_random_order(count, &prng);
> -	if (!order)
> -		goto out_bitmap;
> -
> -	for (n = 0; n < count; n++)
> -		intel_wait_init_for_seqno(&waiters[n], seqno_bias + n);
> -
> -	err = check_rbtree(engine, bitmap, waiters, count);
> -	if (err)
> -		goto out_order;
> -
> -	/* Add and remove waiters into the rbtree in random order. At each
> -	 * step, we verify that the rbtree is correctly ordered.
> -	 */
> -	for (n = 0; n < count; n++) {
> -		int i = order[n];
> -
> -		intel_engine_add_wait(engine, &waiters[i]);
> -		__set_bit(i, bitmap);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err)
> -			goto out_order;
> -	}
> -
> -	i915_random_reorder(order, count, &prng);
> -	for (n = 0; n < count; n++) {
> -		int i = order[n];
> -
> -		intel_engine_remove_wait(engine, &waiters[i]);
> -		__clear_bit(i, bitmap);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err)
> -			goto out_order;
> -	}
> -
> -	err = check_rbtree_empty(engine);
> -out_order:
> -	kfree(order);
> -out_bitmap:
> -	kfree(bitmap);
> -out_waiters:
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -static int igt_insert_complete(void *arg)
> -{
> -	const u32 seqno_bias = 0x1000;
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_wait *waiters;
> -	const int count = 4096;
> -	unsigned long *bitmap;
> -	int err = -ENOMEM;
> -	int n, m;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
> -			 GFP_KERNEL);
> -	if (!bitmap)
> -		goto out_waiters;
> -
> -	for (n = 0; n < count; n++) {
> -		intel_wait_init_for_seqno(&waiters[n], n + seqno_bias);
> -		intel_engine_add_wait(engine, &waiters[n]);
> -		__set_bit(n, bitmap);
> -	}
> -	err = check_rbtree(engine, bitmap, waiters, count);
> -	if (err)
> -		goto out_bitmap;
> -
> -	/* On each step, we advance the seqno so that several waiters are then
> -	 * complete (we increase the seqno by increasingly larger values to
> -	 * retire more and more waiters at once). All retired waiters should
> -	 * be woken and removed from the rbtree, and so that we check.
> -	 */
> -	for (n = 0; n < count; n = m) {
> -		int seqno = 2 * n;
> -
> -		GEM_BUG_ON(find_first_bit(bitmap, count) != n);
> -
> -		if (intel_wait_complete(&waiters[n])) {
> -			pr_err("waiter[%d, seqno=%d] completed too early\n",
> -			       n, waiters[n].seqno);
> -			err = -EINVAL;
> -			goto out_bitmap;
> -		}
> -
> -		/* complete the following waiters */
> -		mock_seqno_advance(engine, seqno + seqno_bias);
> -		for (m = n; m <= seqno; m++) {
> -			if (m == count)
> -				break;
> -
> -			GEM_BUG_ON(!test_bit(m, bitmap));
> -			__clear_bit(m, bitmap);
> -		}
> -
> -		intel_engine_remove_wait(engine, &waiters[n]);
> -		RB_CLEAR_NODE(&waiters[n].node);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err) {
> -			pr_err("rbtree corrupt after seqno advance to %d\n",
> -			       seqno + seqno_bias);
> -			goto out_bitmap;
> -		}
> -
> -		err = check_completion(engine, bitmap, waiters, count);
> -		if (err) {
> -			pr_err("completions after seqno advance to %d failed\n",
> -			       seqno + seqno_bias);
> -			goto out_bitmap;
> -		}
> -	}
> -
> -	err = check_rbtree_empty(engine);
> -out_bitmap:
> -	kfree(bitmap);
> -out_waiters:
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -struct igt_wakeup {
> -	struct task_struct *tsk;
> -	atomic_t *ready, *set, *done;
> -	struct intel_engine_cs *engine;
> -	unsigned long flags;
> -#define STOP 0
> -#define IDLE 1
> -	wait_queue_head_t *wq;
> -	u32 seqno;
> -};
> -
> -static bool wait_for_ready(struct igt_wakeup *w)
> -{
> -	DEFINE_WAIT(ready);
> -
> -	set_bit(IDLE, &w->flags);
> -	if (atomic_dec_and_test(w->done))
> -		wake_up_var(w->done);
> -
> -	if (test_bit(STOP, &w->flags))
> -		goto out;
> -
> -	for (;;) {
> -		prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
> -		if (atomic_read(w->ready) == 0)
> -			break;
> -
> -		schedule();
> -	}
> -	finish_wait(w->wq, &ready);
> -
> -out:
> -	clear_bit(IDLE, &w->flags);
> -	if (atomic_dec_and_test(w->set))
> -		wake_up_var(w->set);
> -
> -	return !test_bit(STOP, &w->flags);
> -}
> -
> -static int igt_wakeup_thread(void *arg)
> -{
> -	struct igt_wakeup *w = arg;
> -	struct intel_wait wait;
> -
> -	while (wait_for_ready(w)) {
> -		GEM_BUG_ON(kthread_should_stop());
> -
> -		intel_wait_init_for_seqno(&wait, w->seqno);
> -		intel_engine_add_wait(w->engine, &wait);
> -		for (;;) {
> -			set_current_state(TASK_UNINTERRUPTIBLE);
> -			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
> -					      w->seqno))
> -				break;
> -
> -			if (test_bit(STOP, &w->flags)) /* emergency escape */
> -				break;
> -
> -			schedule();
> -		}
> -		intel_engine_remove_wait(w->engine, &wait);
> -		__set_current_state(TASK_RUNNING);
> -	}
> -
> -	return 0;
> -}
> -
> -static void igt_wake_all_sync(atomic_t *ready,
> -			      atomic_t *set,
> -			      atomic_t *done,
> -			      wait_queue_head_t *wq,
> -			      int count)
> -{
> -	atomic_set(set, count);
> -	atomic_set(ready, 0);
> -	wake_up_all(wq);
> -
> -	wait_var_event(set, !atomic_read(set));
> -	atomic_set(ready, count);
> -	atomic_set(done, count);
> -}
> -
> -static int igt_wakeup(void *arg)
> -{
> -	I915_RND_STATE(prng);
> -	struct intel_engine_cs *engine = arg;
> -	struct igt_wakeup *waiters;
> -	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> -	const int count = 4096;
> -	const u32 max_seqno = count / 4;
> -	atomic_t ready, set, done;
> -	int err = -ENOMEM;
> -	int n, step;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	/* Create a large number of threads, each waiting on a random seqno.
> -	 * Multiple waiters will be waiting for the same seqno.
> -	 */
> -	atomic_set(&ready, count);
> -	for (n = 0; n < count; n++) {
> -		waiters[n].wq = &wq;
> -		waiters[n].ready = &ready;
> -		waiters[n].set = &set;
> -		waiters[n].done = &done;
> -		waiters[n].engine = engine;
> -		waiters[n].flags = BIT(IDLE);
> -
> -		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
> -					     "i915/igt:%d", n);
> -		if (IS_ERR(waiters[n].tsk))
> -			goto out_waiters;
> -
> -		get_task_struct(waiters[n].tsk);
> -	}
> -
> -	for (step = 1; step <= max_seqno; step <<= 1) {
> -		u32 seqno;
> -
> -		/* The waiter threads start paused as we assign them a random
> -		 * seqno and reset the engine. Once the engine is reset,
> -		 * we signal that the threads may begin their wait upon their
> -		 * seqno.
> -		 */
> -		for (n = 0; n < count; n++) {
> -			GEM_BUG_ON(!test_bit(IDLE, &waiters[n].flags));
> -			waiters[n].seqno =
> -				1 + prandom_u32_state(&prng) % max_seqno;
> -		}
> -		mock_seqno_advance(engine, 0);
> -		igt_wake_all_sync(&ready, &set, &done, &wq, count);
> -
> -		/* Simulate the GPU doing chunks of work, with one or more
> -		 * seqno appearing to finish at the same time. A random number
> -		 * of threads will be waiting upon the update and hopefully be
> -		 * woken.
> -		 */
> -		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
> -			usleep_range(50, 500);
> -			mock_seqno_advance(engine, seqno);
> -		}
> -		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
> -
> -		/* With the seqno now beyond any of the waiting threads, they
> -		 * should all be woken, see that they are complete and signal
> -		 * that they are ready for the next test. We wait until all
> -		 * threads are complete and waiting for us (i.e. not a seqno).
> -		 */
> -		if (!wait_var_event_timeout(&done,
> -					    !atomic_read(&done), 10 * HZ)) {
> -			pr_err("Timed out waiting for %d remaining waiters\n",
> -			       atomic_read(&done));
> -			err = -ETIMEDOUT;
> -			break;
> -		}
> -
> -		err = check_rbtree_empty(engine);
> -		if (err)
> -			break;
> -	}
> -
> -out_waiters:
> -	for (n = 0; n < count; n++) {
> -		if (IS_ERR(waiters[n].tsk))
> -			break;
> -
> -		set_bit(STOP, &waiters[n].flags);
> -	}
> -	mock_seqno_advance(engine, INT_MAX); /* wakeup any broken waiters */
> -	igt_wake_all_sync(&ready, &set, &done, &wq, n);
> -
> -	for (n = 0; n < count; n++) {
> -		if (IS_ERR(waiters[n].tsk))
> -			break;
> -
> -		kthread_stop(waiters[n].tsk);
> -		put_task_struct(waiters[n].tsk);
> -	}
> -
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -int intel_breadcrumbs_mock_selftests(void)
> -{
> -	static const struct i915_subtest tests[] = {
> -		SUBTEST(igt_random_insert_remove),
> -		SUBTEST(igt_insert_complete),
> -		SUBTEST(igt_wakeup),
> -	};
> -	struct drm_i915_private *i915;
> -	int err;
> -
> -	i915 = mock_gem_device();
> -	if (!i915)
> -		return -ENOMEM;
> -
> -	err = i915_subtests(tests, i915->engine[RCS]);
> -	drm_dev_put(&i915->drm);
> -
> -	return err;
> -}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 2c38ea5892d9..7b6f3bea9ef8 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1127,7 +1127,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>   
>   	wait_for_completion(&arg.completion);
>   
> -	if (wait_for(waitqueue_active(&rq->execute), 10)) {
> +	if (wait_for(!list_empty(&rq->fence.cb_list), 10)) {
>   		struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
>   		pr_err("igt/evict_vma kthread did not wait\n");
> diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> index b26f07b55d86..2bfa72c1654b 100644
> --- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> +++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> @@ -76,3 +76,57 @@ void timed_fence_fini(struct timed_fence *tf)
>   	destroy_timer_on_stack(&tf->timer);
>   	i915_sw_fence_fini(&tf->fence);
>   }
> +
> +struct heap_fence {
> +	struct i915_sw_fence fence;
> +	union {
> +		struct kref ref;
> +		struct rcu_head rcu;
> +	};
> +};
> +
> +static int __i915_sw_fence_call
> +heap_fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	struct heap_fence *h = container_of(fence, typeof(*h), fence);
> +
> +	switch (state) {
> +	case FENCE_COMPLETE:
> +		break;
> +
> +	case FENCE_FREE:
> +		heap_fence_put(&h->fence);
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +struct i915_sw_fence *heap_fence_create(gfp_t gfp)
> +{
> +	struct heap_fence *h;
> +
> +	h = kmalloc(sizeof(*h), gfp);
> +	if (!h)
> +		return NULL;
> +
> +	i915_sw_fence_init(&h->fence, heap_fence_notify);
> +	refcount_set(&h->ref.refcount, 2);
> +
> +	return &h->fence;
> +}
> +
> +static void heap_fence_release(struct kref *ref)
> +{
> +	struct heap_fence *h = container_of(ref, typeof(*h), ref);
> +
> +	i915_sw_fence_fini(&h->fence);
> +
> +	kfree_rcu(h, rcu);
> +}
> +
> +void heap_fence_put(struct i915_sw_fence *fence)
> +{
> +	struct heap_fence *h = container_of(fence, typeof(*h), fence);
> +
> +	kref_put(&h->ref, heap_fence_release);
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> index 474aafb92ae1..1f9927e10f3a 100644
> --- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> +++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> @@ -39,4 +39,7 @@ struct timed_fence {
>   void timed_fence_init(struct timed_fence *tf, unsigned long expires);
>   void timed_fence_fini(struct timed_fence *tf);
>   
> +struct i915_sw_fence *heap_fence_create(gfp_t gfp);
> +void heap_fence_put(struct i915_sw_fence *fence);
> +
>   #endif /* _LIB_SW_FENCE_H_ */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 2515cffb4490..e70b4a6cfc67 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -86,17 +86,21 @@ static struct mock_request *first_request(struct mock_engine *engine)
>   static void advance(struct mock_request *request)
>   {
>   	list_del_init(&request->link);
> -	mock_seqno_advance(request->base.engine, request->base.global_seqno);
> +	intel_engine_write_global_seqno(request->base.engine,
> +					request->base.global_seqno);
>   	i915_request_mark_complete(&request->base);
>   	GEM_BUG_ON(!i915_request_completed(&request->base));
> +
> +	intel_engine_queue_breadcrumbs(request->base.engine);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
>   {
>   	struct mock_engine *engine = from_timer(engine, t, hw_delay);
>   	struct mock_request *request;
> +	unsigned long flags;
>   
> -	spin_lock(&engine->hw_lock);
> +	spin_lock_irqsave(&engine->hw_lock, flags);
>   
>   	/* Timer fired, first request is complete */
>   	request = first_request(engine);
> @@ -116,7 +120,7 @@ static void hw_delay_complete(struct timer_list *t)
>   		advance(request);
>   	}
>   
> -	spin_unlock(&engine->hw_lock);
> +	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
>   static void mock_context_unpin(struct intel_context *ce)
> @@ -191,11 +195,12 @@ static void mock_submit_request(struct i915_request *request)
>   	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
> +	unsigned long flags;
>   
>   	i915_request_submit(request);
>   	GEM_BUG_ON(!request->global_seqno);
>   
> -	spin_lock_irq(&engine->hw_lock);
> +	spin_lock_irqsave(&engine->hw_lock, flags);
>   	list_add_tail(&mock->link, &engine->hw_queue);
>   	if (mock->link.prev == &engine->hw_queue) {
>   		if (mock->delay)
> @@ -203,7 +208,7 @@ static void mock_submit_request(struct i915_request *request)
>   		else
>   			advance(mock);
>   	}
> -	spin_unlock_irq(&engine->hw_lock);
> +	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> @@ -273,6 +278,7 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   
>   void mock_engine_reset(struct intel_engine_cs *engine)
>   {
> +	intel_engine_write_global_seqno(engine, 0);
>   }
>   
>   void mock_engine_free(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.h b/drivers/gpu/drm/i915/selftests/mock_engine.h
> index 133d0c21790d..b9cc3a245f16 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.h
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.h
> @@ -46,10 +46,4 @@ void mock_engine_flush(struct intel_engine_cs *engine);
>   void mock_engine_reset(struct intel_engine_cs *engine);
>   void mock_engine_free(struct intel_engine_cs *engine);
>   
> -static inline void mock_seqno_advance(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -	intel_engine_wakeup(engine);
> -}
> -
>   #endif /* !__MOCK_ENGINE_H__ */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-23  9:21   ` Tvrtko Ursulin
@ 2019-01-23 10:01     ` Chris Wilson
  2019-01-23 16:28       ` Tvrtko Ursulin
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-23 10:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-23 09:21:45)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > -static void error_record_engine_waiters(struct intel_engine_cs *engine,
> > -                                     struct drm_i915_error_engine *ee)
> > -{
> > -     struct intel_breadcrumbs *b = &engine->breadcrumbs;
> > -     struct drm_i915_error_waiter *waiter;
> > -     struct rb_node *rb;
> > -     int count;
> > -
> > -     ee->num_waiters = 0;
> > -     ee->waiters = NULL;
> > -
> > -     if (RB_EMPTY_ROOT(&b->waiters))
> > -             return;
> > -
> > -     if (!spin_trylock_irq(&b->rb_lock)) {
> > -             ee->waiters = ERR_PTR(-EDEADLK);
> > -             return;
> > -     }
> > -
> > -     count = 0;
> > -     for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> > -             count++;
> > -     spin_unlock_irq(&b->rb_lock);
> > -
> > -     waiter = NULL;
> > -     if (count)
> > -             waiter = kmalloc_array(count,
> > -                                    sizeof(struct drm_i915_error_waiter),
> > -                                    GFP_ATOMIC);
> > -     if (!waiter)
> > -             return;
> > -
> > -     if (!spin_trylock_irq(&b->rb_lock)) {
> > -             kfree(waiter);
> > -             ee->waiters = ERR_PTR(-EDEADLK);
> > -             return;
> > -     }
> > -
> > -     ee->waiters = waiter;
> > -     for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> > -             struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> > -
> > -             strcpy(waiter->comm, w->tsk->comm);
> > -             waiter->pid = w->tsk->pid;
> > -             waiter->seqno = w->seqno;
> > -             waiter++;
> > -
> > -             if (++ee->num_waiters == count)
> > -                     break;
> > -     }
> > -     spin_unlock_irq(&b->rb_lock);
> > -}
> 
> Capturing context waiters is not interesting for error state?

Not really, we don't have a direct link to the process. We could dig it
out by identifying our special wait_cb inside the fence->signal_list,
but I couldn't be bothered. Who's waiting at the time of the error has
never been that interesting for error debugging, just provides an
overview of the system state.

Who issued the hanging command is much more of interest for the hunting
posse than their victim.

However, storing fence->flag (i.e. the DMA_FENCE_FLAG_SIGNAL_ENABLE_BIT
+ DMA_FENCE_FLAG_SIGNALED_BIT) seems like it would come in handy.

> > -static bool __i915_spin_request(const struct i915_request *rq,
> > -                             u32 seqno, int state, unsigned long timeout_us)
> > +static bool __i915_spin_request(const struct i915_request * const rq,
> > +                             int state, unsigned long timeout_us)
> >   {
> > -     struct intel_engine_cs *engine = rq->engine;
> > -     unsigned int irq, cpu;
> > -
> > -     GEM_BUG_ON(!seqno);
> > +     unsigned int cpu;
> >   
> >       /*
> >        * Only wait for the request if we know it is likely to complete.
> > @@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
> >        * it is a fair assumption that it will not complete within our
> >        * relatively short timeout.
> >        */
> > -     if (!intel_engine_has_started(engine, seqno))
> > +     if (!i915_request_started(rq))
> 
> Might be more wasteful the more preemption is going on. Probably not the 
> most important thing to try a fix straight away, but something to put 
> down on some to do list.

Actually... That would be cheap to fix here as we do a test_bit(ACTIVE).
Hmm, I wonder if that makes sense for all callers.

Maybe i915_request_is_running(rq) as a followup.
 
> Above comment is also outdated now (engine order).

I left a comment! Silly me.

> > +enum {
> > +     I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
> > +     I915_FENCE_FLAG_SIGNAL,
> 
> Describe in comments what these mean please.

Mean, you expect them to have meaning outside of their use? :)

> > +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> > +{
> > +     struct intel_breadcrumbs *b = &engine->breadcrumbs;
> 
> How can you afford to have this per engine? I guess I might figure out 
> later in the patch/series.

Hmm, it's always been per engine... What cost are you considering?

> > +     struct intel_context *ce, *cn;
> > +     struct i915_request *rq, *rn;
> > +     LIST_HEAD(signal);
> > +
> > +     spin_lock(&b->irq_lock);
> > +
> > +     b->irq_fired = true;
> > +     if (b->irq_armed && list_empty(&b->signalers))
> > +             __intel_breadcrumbs_disarm_irq(b);
> > +
> > +     list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
> > +             GEM_BUG_ON(list_empty(&ce->signals));
> > +
> > +             list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
> > +                     if (!__request_completed(rq))
> > +                             break;
> > +
> > +                     GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
> > +                                          &rq->fence.flags));
> > +                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> > +
> > +                     if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> > +                                  &rq->fence.flags))
> > +                             continue;
> 
> Request has been signalled already, but is still on this list? Who will 
> then remove it from this list)?

Race with retire-request, as we operate here only under b->irq_lock not
rq->lock, and retire-request uses rq->lock then b->irq_lock.

> > +                     /*
> > +                      * Queue for execution after dropping the signaling
> > +                      * spinlock as the callback chain may end adding
> > +                      * more signalers to the same context or engine.
> > +                      */
> > +                     i915_request_get(rq);
> > +                     list_add_tail(&rq->signal_link, &signal);
> 
> Shouldn't this be list_move_tail since rq is already on the ce->signals 
> list?

(1) We delete in bulk, see (2)

> > +             }
> > +
> > +             if (!list_is_first(&rq->signal_link, &ce->signals)) {
> 
> Can't rq be NULL here - if only completed requests are on the list and 
> so the iterator reached the end?

Iterator at end == &ce->signals.

> > +                     __list_del_many(&ce->signals, &rq->signal_link);
> 
> 
> This block could use a comment - I at least failed to quickly understand 
> it. How can we be unlinking entries, if they have already been unlinked?

(2) Because we did list_add not list_move, see (1).

> > +                     if (&ce->signals == &rq->signal_link)
> > +                             list_del_init(&ce->signal_link);
> 
> This is another list_empty hack like from another day? Please put a 
> comment if you don't want it to be self documenting.


> > -static int intel_breadcrumbs_signaler(void *arg)
> > +bool intel_engine_enable_signaling(struct i915_request *rq)
> 
> intel_request_enable_signaling?

I'm warming to it.

> > +     spin_lock(&b->irq_lock);
> > +     if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
> 
> __test_bit?

Heh. test_bit == __test_bit :)

In general though, we have to be cautious as we don't own the whole flags
field.

> > +             list_for_each_prev(pos, &ce->signals) {
> > +                     struct i915_request *it =
> > +                             list_entry(pos, typeof(*it), signal_link);
> >   
> > -                     if (unlikely(kthread_should_stop()))
> > +                     if (i915_seqno_passed(rq->fence.seqno, 
> 
> Put a comment against this loop please saying where in the list it is 
> looking to insert...

Oh you haven't written this variant of insertion sort for vblank
handling 20 times. I swear I end up repeating my mistakes over and over
again at every level in the stack.

> > +             list_add(&rq->signal_link, pos);
> > +             if (pos == &ce->signals)
> > +                     list_move_tail(&ce->signal_link, &b->signalers);
> 
> ... and here how it manages the other list as well, on transition from 
> empty to active.

Seems like the code was easy enough to follow ;)

> > +     spin_lock(&b->irq_lock);
> > +     if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
> 
> __test_and_clear_bit ?

Yeah this is where we need to be careful with rq->fence.flags. It is not
specified that __test_and_clear_bit only operates on the single bit and
so to be cautious we do the locked instruction to clobbering other
atomic instructions to the rest of the field.

Although you can make a very strong case that all fence->flags are
serialised for signaling today.

> > -             spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
> > -             struct rb_root waiters; /* sorted by retirement, priority */
> > -             struct list_head signals; /* sorted by retirement */
> > -             struct task_struct *signaler; /* used for fence signalling */
> > +             struct irq_work irq_work;
> 
> Why did you need irq work and not just invoke the handler directly? 
> Maybe put a comment here giving a hint.

/* lock inversion horrors */

Due to the way we may directly submit requests on handling the
dma_fence_signal, we can end up processing a
i915_request_enable_signaling on the same engine as is currently
emitting the signal.

> > +static int __igt_breadcrumbs_smoketest(void *arg)
> > +{
> > +     struct smoketest *t = arg;
> > +     struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
> 
> Breaking new ground, well okay, although caching dev or i915 would be 
> good enough.
> 
> > +     struct i915_request **requests;
> > +     I915_RND_STATE(prng);
> > +     const unsigned int total = 4 * t->ncontexts + 1;
> > +     const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
> > +     unsigned int num_waits = 0, num_fences = 0;
> > +     unsigned int *order;
> > +     int err = 0;
> 
> Still in the Chrismas spirit? ;) No worries, it's selftests.

I was feeling generous in replacing the elaborate breadcrumb testing we
had with something at all!

That testing was the best part of intel_breadcrumbs.
 
> I ran out of steam and will look at selftests during some second pass. 
> In expectation, please put some high level comments for each test to 
> roughly say what it plans to test and with what approach. I makes 
> reverse engineering the algorithm much easier.

There's only one test (just run with mock_request and i915_request), a
very, very, very simple smoketest.

I did not come up with ways of testing the new signal_list to the same
rigour as we did before. :(
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-21 22:21 ` [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
  2019-01-23  9:21   ` Tvrtko Ursulin
@ 2019-01-23 11:41   ` Chris Wilson
  1 sibling, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-23 11:41 UTC (permalink / raw)
  To: intel-gfx

A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c725816, the solution was simple. Every client waking
on a request would be woken on every interrupt and each would do a
heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting

References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  28 +-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |  83 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   9 +-
 drivers/gpu/drm/i915/i915_irq.c               |  87 +-
 drivers/gpu/drm/i915/i915_request.c           | 140 +--
 drivers/gpu/drm/i915/i915_request.h           |  72 +-
 drivers/gpu/drm/i915/i915_reset.c             |  13 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c      | 811 +++++-------------
 drivers/gpu/drm/i915/intel_engine_cs.c        |  34 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  94 +-
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 drivers/gpu/drm/i915/selftests/i915_request.c | 420 +++++++++
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   5 -
 .../drm/i915/selftests/intel_breadcrumbs.c    | 470 ----------
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  54 ++
 drivers/gpu/drm/i915/selftests/lib_sw_fence.h |   3 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  16 +-
 drivers/gpu/drm/i915/selftests/mock_engine.h  |   6 -
 21 files changed, 877 insertions(+), 1482 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index c88cb566fe10..7bcb9cbfeae8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1315,29 +1315,16 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_breadcrumbs *b = &engine->breadcrumbs;
-		struct rb_node *rb;
-
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
 			   engine->hangcheck.seqno, seqno[id],
 			   intel_engine_last_submit(engine),
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
-		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
-			   yesno(intel_engine_has_waiter(engine)),
+		seq_printf(m, "\tfake irq active? %s\n",
 			   yesno(test_bit(engine->id,
 					  &dev_priv->gpu_error.missed_irq_rings)));
 
-		spin_lock_irq(&b->rb_lock);
-		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-			struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-			seq_printf(m, "\t%s [%d] waiting for %x\n",
-				   w->tsk->comm, w->tsk->pid, w->seqno);
-		}
-		spin_unlock_irq(&b->rb_lock);
-
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
@@ -2021,18 +2008,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static int count_irq_waiters(struct drm_i915_private *i915)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int count = 0;
-
-	for_each_engine(engine, i915, id)
-		count += intel_engine_has_waiter(engine);
-
-	return count;
-}
-
 static const char *rps_power_to_str(unsigned int power)
 {
 	static const char * const strings[] = {
@@ -2072,7 +2047,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
 	seq_printf(m, "GPU busy? %s [%d requests]\n",
 		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
-	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
 	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive));
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 47d82ce7ba6a..9e11b31acd01 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -164,6 +164,8 @@ struct i915_gem_context {
 	struct intel_context {
 		struct i915_gem_context *gem_context;
 		struct intel_engine_cs *active;
+		struct list_head signal_link;
+		struct list_head signals;
 		struct i915_vma *state;
 		struct intel_ring *ring;
 		u32 *lrc_reg_state;
@@ -370,6 +372,9 @@ intel_context_init(struct intel_context *ce,
 		   struct intel_engine_cs *engine)
 {
 	ce->gem_context = ctx;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
 }
 
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 96d1d634a29d..420b94341433 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -447,9 +447,14 @@ static void error_print_request(struct drm_i915_error_state_buf *m,
 	if (!erq->seqno)
 		return;
 
-	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
+	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x%s%s, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
 		   prefix, erq->pid, erq->ban_score,
-		   erq->context, erq->seqno, erq->sched_attr.priority,
+		   erq->context, erq->seqno,
+		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+			    &erq->flags) ? "!" : "",
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &erq->flags) ? "+" : "",
+		   erq->sched_attr.priority,
 		   jiffies_to_msecs(erq->jiffies - epoch),
 		   erq->start, erq->head, erq->tail);
 }
@@ -530,7 +535,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	}
 	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
 	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
-	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
 	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
 	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
 	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
@@ -804,21 +808,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 						    error->epoch);
 		}
 
-		if (IS_ERR(ee->waiters)) {
-			err_printf(m, "%s --- ? waiters [unable to acquire spinlock]\n",
-				   m->i915->engine[i]->name);
-		} else if (ee->num_waiters) {
-			err_printf(m, "%s --- %d waiters\n",
-				   m->i915->engine[i]->name,
-				   ee->num_waiters);
-			for (j = 0; j < ee->num_waiters; j++) {
-				err_printf(m, " seqno 0x%08x for %s [%d]\n",
-					   ee->waiters[j].seqno,
-					   ee->waiters[j].comm,
-					   ee->waiters[j].pid);
-			}
-		}
-
 		print_error_obj(m, m->i915->engine[i],
 				"ringbuffer", ee->ringbuffer);
 
@@ -1000,8 +989,6 @@ void __i915_gpu_state_free(struct kref *error_ref)
 		i915_error_object_free(ee->wa_ctx);
 
 		kfree(ee->requests);
-		if (!IS_ERR_OR_NULL(ee->waiters))
-			kfree(ee->waiters);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
@@ -1203,59 +1190,6 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
 			I915_READ(RING_SYNC_2(engine->mmio_base));
 }
 
-static void error_record_engine_waiters(struct intel_engine_cs *engine,
-					struct drm_i915_error_engine *ee)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct drm_i915_error_waiter *waiter;
-	struct rb_node *rb;
-	int count;
-
-	ee->num_waiters = 0;
-	ee->waiters = NULL;
-
-	if (RB_EMPTY_ROOT(&b->waiters))
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	count = 0;
-	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
-		count++;
-	spin_unlock_irq(&b->rb_lock);
-
-	waiter = NULL;
-	if (count)
-		waiter = kmalloc_array(count,
-				       sizeof(struct drm_i915_error_waiter),
-				       GFP_ATOMIC);
-	if (!waiter)
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		kfree(waiter);
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	ee->waiters = waiter;
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		strcpy(waiter->comm, w->tsk->comm);
-		waiter->pid = w->tsk->pid;
-		waiter->seqno = w->seqno;
-		waiter++;
-
-		if (++ee->num_waiters == count)
-			break;
-	}
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static void error_record_engine_registers(struct i915_gpu_state *error,
 					  struct intel_engine_cs *engine,
 					  struct drm_i915_error_engine *ee)
@@ -1291,7 +1225,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 
 	intel_engine_get_instdone(engine, &ee->instdone);
 
-	ee->waiting = intel_engine_has_waiter(engine);
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
@@ -1365,6 +1298,7 @@ static void record_request(struct i915_request *request,
 {
 	struct i915_gem_context *ctx = request->gem_context;
 
+	erq->flags = request->fence.flags;
 	erq->context = ctx->hw_id;
 	erq->sched_attr = request->sched.attr;
 	erq->ban_score = atomic_read(&ctx->ban_score);
@@ -1540,7 +1474,6 @@ static void gem_record_rings(struct i915_gpu_state *error)
 		ee->engine_id = i;
 
 		error_record_engine_registers(error, engine, ee);
-		error_record_engine_waiters(engine, ee);
 		error_record_engine_execlists(engine, ee);
 
 		request = i915_gem_find_active_request(engine);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 231173786eae..74757c424aab 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -82,8 +82,6 @@ struct i915_gpu_state {
 		int engine_id;
 		/* Software tracked state */
 		bool idle;
-		bool waiting;
-		int num_waiters;
 		unsigned long hangcheck_timestamp;
 		struct i915_address_space *vm;
 		int num_requests;
@@ -147,6 +145,7 @@ struct i915_gpu_state {
 		struct drm_i915_error_object *default_state;
 
 		struct drm_i915_error_request {
+			unsigned long flags;
 			long jiffies;
 			pid_t pid;
 			u32 context;
@@ -159,12 +158,6 @@ struct i915_gpu_state {
 		} *requests, execlist[EXECLIST_MAX_PORTS];
 		unsigned int num_ports;
 
-		struct drm_i915_error_waiter {
-			char comm[TASK_COMM_LEN];
-			pid_t pid;
-			u32 seqno;
-		} *waiters;
-
 		struct {
 			u32 gfx_mode;
 			union {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 71d11dc2c235..7669b1caeef0 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -28,9 +28,10 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include <linux/sysrq.h>
-#include <linux/slab.h>
 #include <linux/circ_buf.h>
+#include <linux/slab.h>
+#include <linux/sysrq.h>
+
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
@@ -1151,66 +1152,6 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 	return;
 }
 
-static void notify_ring(struct intel_engine_cs *engine)
-{
-	const u32 seqno = intel_engine_get_seqno(engine);
-	struct i915_request *rq = NULL;
-	struct task_struct *tsk = NULL;
-	struct intel_wait *wait;
-
-	if (unlikely(!engine->breadcrumbs.irq_armed))
-		return;
-
-	rcu_read_lock();
-
-	spin_lock(&engine->breadcrumbs.irq_lock);
-	wait = engine->breadcrumbs.irq_wait;
-	if (wait) {
-		/*
-		 * We use a callback from the dma-fence to submit
-		 * requests after waiting on our own requests. To
-		 * ensure minimum delay in queuing the next request to
-		 * hardware, signal the fence now rather than wait for
-		 * the signaler to be woken up. We still wake up the
-		 * waiter in order to handle the irq-seqno coherency
-		 * issues (we may receive the interrupt before the
-		 * seqno is written, see __i915_request_irq_complete())
-		 * and to handle coalescing of multiple seqno updates
-		 * and many waiters.
-		 */
-		if (i915_seqno_passed(seqno, wait->seqno)) {
-			struct i915_request *waiter = wait->request;
-
-			if (waiter &&
-			    !i915_request_signaled(waiter) &&
-			    intel_wait_check_request(wait, waiter))
-				rq = i915_request_get(waiter);
-
-			tsk = wait->tsk;
-		}
-
-		engine->breadcrumbs.irq_count++;
-	} else {
-		if (engine->breadcrumbs.irq_armed)
-			__intel_engine_disarm_breadcrumbs(engine);
-	}
-	spin_unlock(&engine->breadcrumbs.irq_lock);
-
-	if (rq) {
-		spin_lock(&rq->lock);
-		dma_fence_signal_locked(&rq->fence);
-		GEM_BUG_ON(!i915_request_completed(rq));
-		spin_unlock(&rq->lock);
-
-		i915_request_put(rq);
-	}
-
-	if (tsk && tsk->state & TASK_NORMAL)
-		wake_up_process(tsk);
-
-	rcu_read_unlock();
-}
-
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
 			struct intel_rps_ei *ei)
 {
@@ -1455,20 +1396,20 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 }
 
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[BCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[BCS]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -1488,7 +1429,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
-		notify_ring(engine);
+		intel_engine_breadcrumbs_irq(engine);
 		tasklet |= USES_GUC_SUBMISSION(engine->i915);
 	}
 
@@ -1834,7 +1775,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 
 	if (HAS_VEBOX(dev_priv)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VECS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VECS]);
 
 		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
@@ -4257,7 +4198,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		I915_WRITE16(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4365,7 +4306,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4510,10 +4451,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_BSD_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5dbedec99ad9..b5c844fa4670 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence)
 
 static bool i915_fence_enable_signaling(struct dma_fence *fence)
 {
-	return intel_engine_enable_signaling(to_request(fence), true);
+	return i915_request_enable_breadcrumb(to_request(fence));
 }
 
 static signed long i915_fence_wait(struct dma_fence *fence,
@@ -203,7 +203,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	if (!i915_request_signaled(rq))
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		intel_engine_cancel_signaling(rq);
+		i915_request_cancel_breadcrumb(rq);
 	if (rq->waitboost) {
 		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
 		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
@@ -377,9 +377,11 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	request->global_seqno = seqno;
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
-		intel_engine_enable_signaling(request, false);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
+	    !i915_request_enable_breadcrumb(request))
+		intel_engine_queue_breadcrumbs(engine);
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -389,8 +391,6 @@ void __i915_request_submit(struct i915_request *request)
 	move_to_timeline(request, &engine->timeline);
 
 	trace_i915_request_execute(request);
-
-	wake_up_all(&request->execute);
 }
 
 void i915_request_submit(struct i915_request *request)
@@ -433,7 +433,8 @@ void __i915_request_unsubmit(struct i915_request *request)
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
 	request->global_seqno = 0;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
-		intel_engine_cancel_signaling(request);
+		i915_request_cancel_breadcrumb(request);
+	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	spin_unlock(&request->lock);
 
 	/* Transfer back from the global per-engine timeline to per-context */
@@ -633,13 +634,11 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
-	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
 
 	/* No zalloc, must clear what we need by hand */
 	rq->global_seqno = 0;
-	rq->signaling.wait.seqno = 0;
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
@@ -1030,13 +1029,10 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(const struct i915_request *rq,
-				u32 seqno, int state, unsigned long timeout_us)
+static bool __i915_spin_request(const struct i915_request * const rq,
+				int state, unsigned long timeout_us)
 {
-	struct intel_engine_cs *engine = rq->engine;
-	unsigned int irq, cpu;
-
-	GEM_BUG_ON(!seqno);
+	unsigned int cpu;
 
 	/*
 	 * Only wait for the request if we know it is likely to complete.
@@ -1044,12 +1040,12 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * We don't track the timestamps around requests, nor the average
 	 * request length, so we do not have a good indicator that this
 	 * request will complete within the timeout. What we do know is the
-	 * order in which requests are executed by the engine and so we can
-	 * tell if the request has started. If the request hasn't started yet,
-	 * it is a fair assumption that it will not complete within our
-	 * relatively short timeout.
+	 * order in which requests are executed by the context and so we can
+	 * tell if the request has been started. If the request is not even
+	 * running yet, it is a fair assumption that it will not complete
+	 * within our relatively short timeout.
 	 */
-	if (!intel_engine_has_started(engine, seqno))
+	if (!i915_request_is_running(rq))
 		return false;
 
 	/*
@@ -1063,20 +1059,10 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	irq = READ_ONCE(engine->breadcrumbs.irq_count);
 	timeout_us += local_clock_us(&cpu);
 	do {
-		if (intel_engine_has_completed(engine, seqno))
-			return seqno == i915_request_global_seqno(rq);
-
-		/*
-		 * Seqno are meant to be ordered *before* the interrupt. If
-		 * we see an interrupt without a corresponding seqno advance,
-		 * assume we won't see one in the near future but require
-		 * the engine->seqno_barrier() to fixup coherency.
-		 */
-		if (READ_ONCE(engine->breadcrumbs.irq_count) != irq)
-			break;
+		if (i915_request_completed(rq))
+			return true;
 
 		if (signal_pending_state(state, current))
 			break;
@@ -1090,6 +1076,18 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	return false;
 }
 
+struct request_wait {
+	struct dma_fence_cb cb;
+	struct task_struct *tsk;
+};
+
+static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	struct request_wait *wait = container_of(cb, typeof(*wait), cb);
+
+	wake_up_process(wait->tsk);
+}
+
 /**
  * i915_request_wait - wait until execution of request has finished
  * @rq: the request to wait upon
@@ -1115,8 +1113,7 @@ long i915_request_wait(struct i915_request *rq,
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT_FUNC(exec, default_wake_function);
-	struct intel_wait wait;
+	struct request_wait wait;
 
 	might_sleep();
 	GEM_BUG_ON(timeout < 0);
@@ -1128,47 +1125,24 @@ long i915_request_wait(struct i915_request *rq,
 		return -ETIME;
 
 	trace_i915_request_wait_begin(rq, flags);
-	add_wait_queue(&rq->execute, &exec);
-	intel_wait_init(&wait);
-	if (flags & I915_WAIT_PRIORITY)
-		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
-
-restart:
-	do {
-		set_current_state(state);
-		if (intel_wait_update_request(&wait, rq))
-			break;
-
-		if (signal_pending_state(state, current)) {
-			timeout = -ERESTARTSYS;
-			goto complete;
-		}
 
-		if (!timeout) {
-			timeout = -ETIME;
-			goto complete;
-		}
+	/* Optimistic short spin before touching IRQs */
+	if (__i915_spin_request(rq, state, 5))
+		goto out;
 
-		timeout = io_schedule_timeout(timeout);
-	} while (1);
+	if (flags & I915_WAIT_PRIORITY)
+		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
 
-	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
-	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
+	wait.tsk = current;
+	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
+		goto out;
 
-	/* Optimistic short spin before touching IRQs */
-	if (__i915_spin_request(rq, wait.seqno, state, 5))
-		goto complete;
+	for (;;) {
+		set_current_state(state);
 
-	set_current_state(state);
-	if (intel_engine_add_wait(rq->engine, &wait))
-		/*
-		 * In order to check that we haven't missed the interrupt
-		 * as we enabled it, we need to kick ourselves to do a
-		 * coherent check on the seqno before we sleep.
-		 */
-		goto wakeup;
+		if (i915_request_completed(rq))
+			break;
 
-	for (;;) {
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
 			break;
@@ -1180,33 +1154,13 @@ long i915_request_wait(struct i915_request *rq,
 		}
 
 		timeout = io_schedule_timeout(timeout);
-
-		if (intel_wait_complete(&wait) &&
-		    intel_wait_check_request(&wait, rq))
-			break;
-
-		set_current_state(state);
-
-wakeup:
-		if (i915_request_completed(rq))
-			break;
-
-		/* Only spin if we know the GPU is processing this request */
-		if (__i915_spin_request(rq, wait.seqno, state, 2))
-			break;
-
-		if (!intel_wait_check_request(&wait, rq)) {
-			intel_engine_remove_wait(rq->engine, &wait);
-			goto restart;
-		}
 	}
-
-	intel_engine_remove_wait(rq->engine, &wait);
-complete:
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&rq->execute, &exec);
-	trace_i915_request_wait_end(rq);
 
+	dma_fence_remove_callback(&rq->fence, &wait.cb);
+
+out:
+	trace_i915_request_wait_end(rq);
 	return timeout;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 340d6216791c..3cffb96203b9 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,23 +38,34 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
-struct intel_wait {
-	struct rb_node node;
-	struct task_struct *tsk;
-	struct i915_request *request;
-	u32 seqno;
-};
-
-struct intel_signal_node {
-	struct intel_wait wait;
-	struct list_head link;
-};
-
 struct i915_capture_list {
 	struct i915_capture_list *next;
 	struct i915_vma *vma;
 };
 
+enum {
+	/*
+	 * I915_FENCE_FLAG_ACTIVE - this request is currently submitted to HW.
+	 *
+	 * Set by __i915_request_submit() on handing over to HW, and cleared
+	 * by __i915_request_unsubmit() if we preempt this request.
+	 *
+	 * Finally cleared for consistency on retiring the request, when
+	 * we know the HW is no longer running this request.
+	 *
+	 * See i915_request_is_active()
+	 */
+	I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
+
+	/*
+	 * I915_FENCE_FLAG_SIGNAL - this request is currently on signal_list
+	 *
+	 * Internal bookkeeping used by the breadcrumb code to track when
+	 * a request is on the various signal_list.
+	 */
+	I915_FENCE_FLAG_SIGNAL,
+};
+
 /**
  * Request queue structure.
  *
@@ -97,7 +108,7 @@ struct i915_request {
 	struct intel_context *hw_context;
 	struct intel_ring *ring;
 	struct i915_timeline *timeline;
-	struct intel_signal_node signaling;
+	struct list_head signal_link;
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
@@ -116,7 +127,6 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
-	wait_queue_head_t execute;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
@@ -255,7 +265,7 @@ i915_request_put(struct i915_request *rq)
  * that it has passed the global seqno and the global seqno is unchanged
  * after the read, it is indeed complete).
  */
-static u32
+static inline u32
 i915_request_global_seqno(const struct i915_request *request)
 {
 	return READ_ONCE(request->global_seqno);
@@ -277,6 +287,10 @@ void i915_request_skip(struct i915_request *request, int error);
 void __i915_request_unsubmit(struct i915_request *request);
 void i915_request_unsubmit(struct i915_request *request);
 
+/* Note: part of the intel_breadcrumbs family */
+bool i915_request_enable_breadcrumb(struct i915_request *request);
+void i915_request_cancel_breadcrumb(struct i915_request *request);
+
 long i915_request_wait(struct i915_request *rq,
 		       unsigned int flags,
 		       long timeout)
@@ -293,6 +307,11 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
+static inline bool i915_request_is_active(const struct i915_request *rq)
+{
+	return test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+}
+
 /**
  * Returns true if seq1 is later than seq2.
  */
@@ -330,6 +349,11 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
 	return seqno;
 }
 
+static inline bool __i915_request_has_started(const struct i915_request *rq)
+{
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
+}
+
 /**
  * i915_request_started - check if the request has begun being executed
  * @rq: the request
@@ -345,7 +369,23 @@ static inline bool i915_request_started(const struct i915_request *rq)
 		return true;
 
 	/* Remember: started but may have since been preempted! */
-	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
+	return __i915_request_has_started(rq);
+}
+
+/**
+ * i915_request_is_running - check if the request may actually be executing
+ * @rq: the request
+ *
+ * Returns true if the request is currently submitted to hardware, has passed
+ * its start point (i.e. the context is setup and not busywaiting). Note that
+ * it may no longer be running by the time the function returns!
+ */
+static inline bool i915_request_is_running(const struct i915_request *rq)
+{
+	if (!i915_request_is_active(rq))
+		return false;
+
+	return __i915_request_has_started(rq);
 }
 
 static inline bool i915_request_completed(const struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 9b9169508139..d7d2840fcaa5 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -747,18 +747,19 @@ static void reset_restart(struct drm_i915_private *i915)
 
 static void nop_submit_request(struct i915_request *request)
 {
+	struct intel_engine_cs *engine = request->engine;
 	unsigned long flags;
 
 	GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
-		  request->engine->name,
-		  request->fence.context, request->fence.seqno);
+		  engine->name, request->fence.context, request->fence.seqno);
 	dma_fence_set_error(&request->fence, -EIO);
 
-	spin_lock_irqsave(&request->engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 	__i915_request_submit(request);
 	i915_request_mark_complete(request);
-	intel_engine_write_global_seqno(request->engine, request->global_seqno);
-	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	intel_engine_queue_breadcrumbs(engine);
 }
 
 void i915_gem_set_wedged(struct drm_i915_private *i915)
@@ -813,7 +814,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 
 	for_each_engine(engine, i915, id) {
 		reset_finish_engine(engine);
-		intel_engine_wakeup(engine);
+		intel_engine_signal_breadcrumbs(engine);
 	}
 
 	smp_mb__before_atomic();
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index b58915b8708b..fef74c47a4a0 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -29,48 +29,142 @@
 
 #define task_asleep(tsk) ((tsk)->state & TASK_NORMAL && !(tsk)->on_rq)
 
-static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b)
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
 {
-	struct intel_wait *wait;
-	unsigned int result = 0;
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
 
+static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+{
 	lockdep_assert_held(&b->irq_lock);
 
-	wait = b->irq_wait;
-	if (wait) {
+	GEM_BUG_ON(!b->irq_enabled);
+	if (!--b->irq_enabled)
+		irq_disable(container_of(b,
+					 struct intel_engine_cs,
+					 breadcrumbs));
+
+	b->irq_armed = false;
+}
+
+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	if (!b->irq_armed)
+		return;
+
+	spin_lock_irq(&b->irq_lock);
+	if (b->irq_armed)
+		__intel_breadcrumbs_disarm_irq(b);
+	spin_unlock_irq(&b->irq_lock);
+}
+
+static inline bool __request_completed(const struct i915_request *rq)
+{
+	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
+}
+
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_context *ce, *cn;
+	struct i915_request *rq, *rn;
+	LIST_HEAD(signal);
+
+	spin_lock(&b->irq_lock);
+
+	b->irq_fired = true;
+	if (b->irq_armed && list_empty(&b->signalers))
+		__intel_breadcrumbs_disarm_irq(b);
+
+	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
+		GEM_BUG_ON(list_empty(&ce->signals));
+
+		list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
+			if (!__request_completed(rq))
+				break;
+
+			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
+					     &rq->fence.flags));
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+			/*
+			 * We may race with direct invocation of
+			 * dma_fence_signal(), e.g. i915_request_retire(),
+			 * in which case we can skip processing it ourselves.
+			 */
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &rq->fence.flags))
+				continue;
+
+			/*
+			 * Queue for execution after dropping the signaling
+			 * spinlock as the callback chain may end adding
+			 * more signalers to the same context or engine.
+			 */
+			i915_request_get(rq);
+			list_add_tail(&rq->signal_link, &signal);
+		}
+
 		/*
-		 * N.B. Since task_asleep() and ttwu are not atomic, the
-		 * waiter may actually go to sleep after the check, causing
-		 * us to suppress a valid wakeup. We prefer to reduce the
-		 * number of false positive missed_breadcrumb() warnings
-		 * at the expense of a few false negatives, as it it easy
-		 * to trigger a false positive under heavy load. Enough
-		 * signal should remain from genuine missed_breadcrumb()
-		 * for us to detect in CI.
+		 * We process the list deletion in bulk, only using a list
+		 * add (not list move) above but keeping the state of
+		 * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit.
 		 */
-		bool was_asleep = task_asleep(wait->tsk);
+		if (!list_is_first(&rq->signal_link, &ce->signals)) {
+			__list_del_many(&ce->signals, &rq->signal_link);
+			if (&ce->signals == &rq->signal_link) /* now empty */
+				list_del_init(&ce->signal_link);
+		}
+	}
+
+	spin_unlock(&b->irq_lock);
 
-		result = ENGINE_WAKEUP_WAITER;
-		if (wake_up_process(wait->tsk) && was_asleep)
-			result |= ENGINE_WAKEUP_ASLEEP;
+	list_for_each_entry_safe(rq, rn, &signal, signal_link) {
+		dma_fence_signal(&rq->fence);
+		i915_request_put(rq);
 	}
 
-	return result;
+	return !list_empty(&signal);
 }
 
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine)
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
-	unsigned int result;
+	bool result;
 
-	spin_lock_irqsave(&b->irq_lock, flags);
-	result = __intel_breadcrumbs_wakeup(b);
-	spin_unlock_irqrestore(&b->irq_lock, flags);
+	local_irq_disable();
+	result = intel_engine_breadcrumbs_irq(engine);
+	local_irq_enable();
 
 	return result;
 }
 
+static void signal_irq_work(struct irq_work *work)
+{
+	struct intel_engine_cs *engine =
+		container_of(work, typeof(*engine), breadcrumbs.irq_work);
+
+	intel_engine_breadcrumbs_irq(engine);
+}
+
 static unsigned long wait_timeout(void)
 {
 	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
@@ -94,19 +188,15 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	struct intel_engine_cs *engine =
 		from_timer(engine, t, breadcrumbs.hangcheck);
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned int irq_count;
 
 	if (!b->irq_armed)
 		return;
 
-	irq_count = READ_ONCE(b->irq_count);
-	if (b->hangcheck_interrupts != irq_count) {
-		b->hangcheck_interrupts = irq_count;
-		mod_timer(&b->hangcheck, wait_timeout());
-		return;
-	}
+	if (b->irq_fired)
+		goto rearm;
 
-	/* We keep the hangcheck timer alive until we disarm the irq, even
+	/*
+	 * We keep the hangcheck timer alive until we disarm the irq, even
 	 * if there are no waiters at present.
 	 *
 	 * If the waiter was currently running, assume it hasn't had a chance
@@ -118,10 +208,13 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	 * but we still have a waiter. Assuming all batches complete within
 	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
 	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
+	synchronize_hardirq(engine->i915->drm.irq);
+	if (intel_engine_signal_breadcrumbs(engine)) {
 		missed_breadcrumb(engine);
 		mod_timer(&b->fake_irq, jiffies + 1);
 	} else {
+rearm:
+		b->irq_fired = false;
 		mod_timer(&b->hangcheck, wait_timeout());
 	}
 }
@@ -140,11 +233,7 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	 * oldest waiter to do the coherent seqno check.
 	 */
 
-	spin_lock_irq(&b->irq_lock);
-	if (b->irq_armed && !__intel_breadcrumbs_wakeup(b))
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock_irq(&b->irq_lock);
-	if (!b->irq_armed)
+	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
 		return;
 
 	/* If the user has disabled the fake-irq, restore the hangchecking */
@@ -156,43 +245,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	mod_timer(&b->fake_irq, jiffies + 1);
 }
 
-static void irq_enable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_enable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-static void irq_disable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->irq_lock);
-	GEM_BUG_ON(b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-
-	GEM_BUG_ON(!b->irq_enabled);
-	if (!--b->irq_enabled)
-		irq_disable(engine);
-
-	b->irq_armed = false;
-}
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -215,40 +267,6 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
 	spin_unlock_irq(&b->irq_lock);
 }
 
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait, *n;
-
-	if (!b->irq_armed)
-		return;
-
-	/*
-	 * We only disarm the irq when we are idle (all requests completed),
-	 * so if the bottom-half remains asleep, it missed the request
-	 * completion.
-	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
-		missed_breadcrumb(engine);
-
-	spin_lock_irq(&b->rb_lock);
-
-	spin_lock(&b->irq_lock);
-	b->irq_wait = NULL;
-	if (b->irq_armed)
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock(&b->irq_lock);
-
-	rbtree_postorder_for_each_entry_safe(wait, n, &b->waiters, node) {
-		GEM_BUG_ON(!intel_engine_signaled(engine, wait->seqno));
-		RB_CLEAR_NODE(&wait->node);
-		wake_up_process(wait->tsk);
-	}
-	b->waiters = RB_ROOT;
-
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static bool use_fake_irq(const struct intel_breadcrumbs *b)
 {
 	const struct intel_engine_cs *engine =
@@ -264,7 +282,7 @@ static bool use_fake_irq(const struct intel_breadcrumbs *b)
 	 * engine->seqno_barrier(), a timing error that should be transient
 	 * and unlikely to reoccur.
 	 */
-	return READ_ONCE(b->irq_count) == b->hangcheck_interrupts;
+	return !b->irq_fired;
 }
 
 static void enable_fake_irq(struct intel_breadcrumbs *b)
@@ -276,7 +294,7 @@ static void enable_fake_irq(struct intel_breadcrumbs *b)
 		mod_timer(&b->hangcheck, wait_timeout());
 }
 
-static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
@@ -315,536 +333,149 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	return enabled;
 }
 
-static inline struct intel_wait *to_wait(struct rb_node *node)
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 {
-	return rb_entry(node, struct intel_wait, node);
-}
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
-					      struct intel_wait *wait)
-{
-	lockdep_assert_held(&b->rb_lock);
-	GEM_BUG_ON(b->irq_wait == wait);
+	spin_lock_init(&b->irq_lock);
+	INIT_LIST_HEAD(&b->signalers);
 
-	/*
-	 * This request is completed, so remove it from the tree, mark it as
-	 * complete, and *then* wake up the associated task. N.B. when the
-	 * task wakes up, it will find the empty rb_node, discern that it
-	 * has already been removed from the tree and skip the serialisation
-	 * of the b->rb_lock and b->irq_lock. This means that the destruction
-	 * of the intel_wait is not serialised with the interrupt handler
-	 * by the waiter - it must instead be serialised by the caller.
-	 */
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
+	init_irq_work(&b->irq_work, signal_irq_work);
 
-	if (wait->tsk->state != TASK_RUNNING)
-		wake_up_process(wait->tsk); /* implicit smp_wmb() */
+	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
+	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
 }
 
-static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
-					    struct rb_node *next)
+static void cancel_fake_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-	spin_lock(&b->irq_lock);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(!b->irq_wait);
-	b->irq_wait = to_wait(next);
-	spin_unlock(&b->irq_lock);
-
-	/* We always wake up the next waiter that takes over as the bottom-half
-	 * as we may delegate not only the irq-seqno barrier to the next waiter
-	 * but also the task of waking up concurrent waiters.
-	 */
-	if (next)
-		wake_up_process(to_wait(next)->tsk);
+	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
+	del_timer_sync(&b->hangcheck);
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
-static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
-				    struct intel_wait *wait)
+void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node **p, *parent, *completed;
-	bool first, armed;
-	u32 seqno;
+	unsigned long flags;
 
-	GEM_BUG_ON(!wait->seqno);
+	spin_lock_irqsave(&b->irq_lock, flags);
 
-	/* Insert the request into the retirement ordered list
-	 * of waiters by walking the rbtree. If we are the oldest
-	 * seqno in the tree (the first to be retired), then
-	 * set ourselves as the bottom-half.
-	 *
-	 * As we descend the tree, prune completed branches since we hold the
-	 * spinlock we know that the first_waiter must be delayed and can
-	 * reduce some of the sequential wake up latency if we take action
-	 * ourselves and wake up the completed tasks in parallel. Also, by
-	 * removing stale elements in the tree, we may be able to reduce the
-	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	/*
+	 * Leave the fake_irq timer enabled (if it is running), but clear the
+	 * bit so that it turns itself off on its next wake up and goes back
+	 * to the long hangcheck interval if still required.
 	 */
-	armed = false;
-	first = true;
-	parent = NULL;
-	completed = NULL;
-	seqno = intel_engine_get_seqno(engine);
-
-	 /* If the request completed before we managed to grab the spinlock,
-	  * return now before adding ourselves to the rbtree. We let the
-	  * current bottom-half handle any pending wakeups and instead
-	  * try and get out of the way quickly.
-	  */
-	if (i915_seqno_passed(seqno, wait->seqno)) {
-		RB_CLEAR_NODE(&wait->node);
-		return first;
-	}
-
-	p = &b->waiters.rb_node;
-	while (*p) {
-		parent = *p;
-		if (wait->seqno == to_wait(parent)->seqno) {
-			/* We have multiple waiters on the same seqno, select
-			 * the highest priority task (that with the smallest
-			 * task->prio) to serve as the bottom-half for this
-			 * group.
-			 */
-			if (wait->tsk->prio > to_wait(parent)->tsk->prio) {
-				p = &parent->rb_right;
-				first = false;
-			} else {
-				p = &parent->rb_left;
-			}
-		} else if (i915_seqno_passed(wait->seqno,
-					     to_wait(parent)->seqno)) {
-			p = &parent->rb_right;
-			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
-				completed = parent;
-			else
-				first = false;
-		} else {
-			p = &parent->rb_left;
-		}
-	}
-	rb_link_node(&wait->node, parent, p);
-	rb_insert_color(&wait->node, &b->waiters);
-
-	if (first) {
-		spin_lock(&b->irq_lock);
-		b->irq_wait = wait;
-		/* After assigning ourselves as the new bottom-half, we must
-		 * perform a cursory check to prevent a missed interrupt.
-		 * Either we miss the interrupt whilst programming the hardware,
-		 * or if there was a previous waiter (for a later seqno) they
-		 * may be woken instead of us (due to the inherent race
-		 * in the unlocked read of b->irq_seqno_bh in the irq handler)
-		 * and so we miss the wake up.
-		 */
-		armed = __intel_breadcrumbs_enable_irq(b);
-		spin_unlock(&b->irq_lock);
-	}
-
-	if (completed) {
-		/* Advance the bottom-half (b->irq_wait) before we wake up
-		 * the waiters who may scribble over their intel_wait
-		 * just as the interrupt handler is dereferencing it via
-		 * b->irq_wait.
-		 */
-		if (!first) {
-			struct rb_node *next = rb_next(completed);
-			GEM_BUG_ON(next == &wait->node);
-			__intel_breadcrumbs_next(engine, next);
-		}
-
-		do {
-			struct intel_wait *crumb = to_wait(completed);
-			completed = rb_prev(completed);
-			__intel_breadcrumbs_finish(b, crumb);
-		} while (completed);
-	}
-
-	GEM_BUG_ON(!b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(rb_first(&b->waiters) != &b->irq_wait->node);
-
-	return armed;
-}
-
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	bool armed;
-
-	spin_lock_irq(&b->rb_lock);
-	armed = __intel_engine_add_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
-	if (armed)
-		return armed;
-
-	/* Make the caller recheck if its request has already started. */
-	return intel_engine_has_started(engine, wait->seqno);
-}
-
-static inline bool chain_wakeup(struct rb_node *rb, int priority)
-{
-	return rb && to_wait(rb)->tsk->prio <= priority;
-}
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 
-static inline int wakeup_priority(struct intel_breadcrumbs *b,
-				  struct task_struct *tsk)
-{
-	if (tsk == b->signaler)
-		return INT_MIN;
+	if (b->irq_enabled)
+		irq_enable(engine);
 	else
-		return tsk->prio;
-}
-
-static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
-				       struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	if (RB_EMPTY_NODE(&wait->node))
-		goto out;
-
-	if (b->irq_wait == wait) {
-		const int priority = wakeup_priority(b, wait->tsk);
-		struct rb_node *next;
-
-		/* We are the current bottom-half. Find the next candidate,
-		 * the first waiter in the queue on the remaining oldest
-		 * request. As multiple seqnos may complete in the time it
-		 * takes us to wake up and find the next waiter, we have to
-		 * wake up that waiter for it to perform its own coherent
-		 * completion check.
-		 */
-		next = rb_next(&wait->node);
-		if (chain_wakeup(next, priority)) {
-			/* If the next waiter is already complete,
-			 * wake it up and continue onto the next waiter. So
-			 * if have a small herd, they will wake up in parallel
-			 * rather than sequentially, which should reduce
-			 * the overall latency in waking all the completed
-			 * clients.
-			 *
-			 * However, waking up a chain adds extra latency to
-			 * the first_waiter. This is undesirable if that
-			 * waiter is a high priority task.
-			 */
-			u32 seqno = intel_engine_get_seqno(engine);
-
-			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
-				struct rb_node *n = rb_next(next);
-
-				__intel_breadcrumbs_finish(b, to_wait(next));
-				next = n;
-				if (!chain_wakeup(next, priority))
-					break;
-			}
-		}
-
-		__intel_breadcrumbs_next(engine, next);
-	} else {
-		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
-	}
-
-	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
+		irq_disable(engine);
 
-out:
-	GEM_BUG_ON(b->irq_wait == wait);
-	GEM_BUG_ON(rb_first(&b->waiters) !=
-		   (b->irq_wait ? &b->irq_wait->node : NULL));
+	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait)
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	/* Quick check to see if this waiter was already decoupled from
-	 * the tree by the bottom-half to avoid contention on the spinlock
-	 * by the herd.
-	 */
-	if (RB_EMPTY_NODE(&wait->node)) {
-		GEM_BUG_ON(READ_ONCE(b->irq_wait) == wait);
-		return;
-	}
-
-	spin_lock_irq(&b->rb_lock);
-	__intel_engine_remove_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
+	cancel_fake_irq(engine);
 }
 
-static void signaler_set_rtpriority(void)
+bool i915_request_enable_breadcrumb(struct i915_request *rq)
 {
-	 struct sched_param param = { .sched_priority = 1 };
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
-}
+	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
 
-static int intel_breadcrumbs_signaler(void *arg)
-{
-	struct intel_engine_cs *engine = arg;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct i915_request *rq, *n;
-
-	/* Install ourselves with high priority to reduce signalling latency */
-	signaler_set_rtpriority();
+	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
+		return true;
 
-	do {
-		bool do_schedule = true;
-		LIST_HEAD(list);
-		u32 seqno;
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
+	    !__request_completed(rq)) {
+		struct intel_context *ce = rq->hw_context;
+		struct list_head *pos;
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (list_empty(&b->signals))
-			goto sleep;
+		__intel_breadcrumbs_arm_irq(b);
 
 		/*
-		 * We are either woken up by the interrupt bottom-half,
-		 * or by a client adding a new signaller. In both cases,
-		 * the GPU seqno may have advanced beyond our oldest signal.
-		 * If it has, propagate the signal, remove the waiter and
-		 * check again with the next oldest signal. Otherwise we
-		 * need to wait for a new interrupt from the GPU or for
-		 * a new client.
+		 * We keep the seqno in retirement order, so we can break
+		 * inside intel_engine_breadcrumbs_irq as soon as we've passed
+		 * the last completed request (or seen a request that hasn't
+		 * event started). We could iterate the timeline->requests list,
+		 * but keeping a separate signalers_list has the advantage of
+		 * hopefully being much smaller than the full list and so
+		 * provides faster iteration and detection when there are no
+		 * more interrupts required for this context.
+		 *
+		 * We typically expect to add new signalers in order, so we
+		 * start looking for our insertion point from the tail of
+		 * the list.
 		 */
-		seqno = intel_engine_get_seqno(engine);
-
-		spin_lock_irq(&b->rb_lock);
-		list_for_each_entry_safe(rq, n, &b->signals, signaling.link) {
-			u32 this = rq->signaling.wait.seqno;
-
-			GEM_BUG_ON(!rq->signaling.wait.seqno);
+		list_for_each_prev(pos, &ce->signals) {
+			struct i915_request *it =
+				list_entry(pos, typeof(*it), signal_link);
 
-			if (!i915_seqno_passed(seqno, this))
+			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
 				break;
-
-			if (likely(this == i915_request_global_seqno(rq))) {
-				__intel_engine_remove_wait(engine,
-							   &rq->signaling.wait);
-
-				rq->signaling.wait.seqno = 0;
-				__list_del_entry(&rq->signaling.link);
-
-				if (!i915_request_signaled(rq)) {
-					list_add_tail(&rq->signaling.link,
-						      &list);
-					i915_request_get(rq);
-				}
-			}
 		}
-		spin_unlock_irq(&b->rb_lock);
-
-		if (!list_empty(&list)) {
-			local_bh_disable();
-			list_for_each_entry_safe(rq, n, &list, signaling.link) {
-				dma_fence_signal(&rq->fence);
-				GEM_BUG_ON(!i915_request_completed(rq));
-				i915_request_put(rq);
-			}
-			local_bh_enable(); /* kick start the tasklets */
+		list_add(&rq->signal_link, pos);
+		if (pos == &ce->signals) /* catch transitions from empty list */
+			list_move_tail(&ce->signal_link, &b->signalers);
 
-			/*
-			 * If the engine is saturated we may be continually
-			 * processing completed requests. This angers the
-			 * NMI watchdog if we never let anything else
-			 * have access to the CPU. Let's pretend to be nice
-			 * and relinquish the CPU if we burn through the
-			 * entire RT timeslice!
-			 */
-			do_schedule = need_resched();
-		}
-
-		if (unlikely(do_schedule)) {
-sleep:
-			if (kthread_should_park())
-				kthread_parkme();
-
-			if (unlikely(kthread_should_stop()))
-				break;
-
-			schedule();
-		}
-	} while (1);
-	__set_current_state(TASK_RUNNING);
-
-	return 0;
-}
-
-static void insert_signal(struct intel_breadcrumbs *b,
-			  struct i915_request *request,
-			  const u32 seqno)
-{
-	struct i915_request *iter;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	/*
-	 * A reasonable assumption is that we are called to add signals
-	 * in sequence, as the requests are submitted for execution and
-	 * assigned a global_seqno. This will be the case for the majority
-	 * of internally generated signals (inter-engine signaling).
-	 *
-	 * Out of order waiters triggering random signaling enabling will
-	 * be more problematic, but hopefully rare enough and the list
-	 * small enough that the O(N) insertion sort is not an issue.
-	 */
-
-	list_for_each_entry_reverse(iter, &b->signals, signaling.link)
-		if (i915_seqno_passed(seqno, iter->signaling.wait.seqno))
-			break;
-
-	list_add(&request->signaling.link, &iter->signaling.link);
-}
-
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup)
-{
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait = &request->signaling.wait;
-	u32 seqno;
-
-	/*
-	 * Note that we may be called from an interrupt handler on another
-	 * device (e.g. nouveau signaling a fence completion causing us
-	 * to submit a request, and so enable signaling). As such,
-	 * we need to make sure that all other users of b->rb_lock protect
-	 * against interrupts, i.e. use spin_lock_irqsave.
-	 */
-
-	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
-
-	seqno = i915_request_global_seqno(request);
-	if (!seqno) /* will be enabled later upon execution */
-		return true;
-
-	GEM_BUG_ON(wait->seqno);
-	wait->tsk = b->signaler;
-	wait->request = request;
-	wait->seqno = seqno;
-
-	/*
-	 * Add ourselves into the list of waiters, but registering our
-	 * bottom-half as the signaller thread. As per usual, only the oldest
-	 * waiter (not just signaller) is tasked as the bottom-half waking
-	 * up all completed waiters after the user interrupt.
-	 *
-	 * If we are the oldest waiter, enable the irq (after which we
-	 * must double check that the seqno did not complete).
-	 */
-	spin_lock(&b->rb_lock);
-	insert_signal(b, request, seqno);
-	wakeup &= __intel_engine_add_wait(engine, wait);
-	spin_unlock(&b->rb_lock);
-
-	if (wakeup) {
-		wake_up_process(b->signaler);
-		return !intel_wait_complete(wait);
+		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 	}
+	spin_unlock(&b->irq_lock);
 
-	return true;
+	return !__request_completed(rq);
 }
 
-void intel_engine_cancel_signaling(struct i915_request *request)
+void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	if (!READ_ONCE(request->signaling.wait.seqno))
+	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
-	spin_lock(&b->rb_lock);
-	__intel_engine_remove_wait(engine, &request->signaling.wait);
-	if (fetch_and_zero(&request->signaling.wait.seqno))
-		__list_del_entry(&request->signaling.link);
-	spin_unlock(&b->rb_lock);
-}
-
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct task_struct *tsk;
-
-	spin_lock_init(&b->rb_lock);
-	spin_lock_init(&b->irq_lock);
-
-	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
-	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
-
-	INIT_LIST_HEAD(&b->signals);
-
-	/* Spawn a thread to provide a common bottom-half for all signals.
-	 * As this is an asynchronous interface we cannot steal the current
-	 * task for handling the bottom-half to the user interrupt, therefore
-	 * we create a thread to do the coherent seqno dance after the
-	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
-	 */
-	tsk = kthread_run(intel_breadcrumbs_signaler, engine,
-			  "i915/signal:%d", engine->id);
-	if (IS_ERR(tsk))
-		return PTR_ERR(tsk);
-
-	b->signaler = tsk;
-
-	return 0;
-}
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
+		struct intel_context *ce = rq->hw_context;
 
-static void cancel_fake_irq(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+		list_del(&rq->signal_link);
+		if (list_empty(&ce->signals))
+			list_del_init(&ce->signal_link);
 
-	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
-	del_timer_sync(&b->hangcheck);
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+	}
+	spin_unlock(&b->irq_lock);
 }
 
-void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
+	struct intel_context *ce;
+	struct i915_request *rq;
 
-	spin_lock_irqsave(&b->irq_lock, flags);
-
-	/*
-	 * Leave the fake_irq timer enabled (if it is running), but clear the
-	 * bit so that it turns itself off on its next wake up and goes back
-	 * to the long hangcheck interval if still required.
-	 */
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-
-	if (b->irq_enabled)
-		irq_enable(engine);
-	else
-		irq_disable(engine);
-
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-}
-
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	if (list_empty(&b->signalers))
+		return;
 
-	/* The engines should be idle and all requests accounted for! */
-	WARN_ON(READ_ONCE(b->irq_wait));
-	WARN_ON(!RB_EMPTY_ROOT(&b->waiters));
-	WARN_ON(!list_empty(&b->signals));
+	drm_printf(p, "Signals:\n");
 
-	if (!IS_ERR_OR_NULL(b->signaler))
-		kthread_stop(b->signaler);
+	spin_lock_irq(&b->irq_lock);
+	list_for_each_entry(ce, &b->signalers, signal_link) {
+		list_for_each_entry(rq, &ce->signals, signal_link) {
+			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
+				   rq->fence.context, rq->fence.seqno,
+				   i915_request_completed(rq) ? "!" :
+				   i915_request_started(rq) ? "*" :
+				   "",
+				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
+		}
+	}
+	spin_unlock_irq(&b->irq_lock);
 
-	cancel_fake_irq(engine);
+	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
+		drm_printf(p, "Fake irq active\n");
 }
-
-#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-#include "selftests/intel_breadcrumbs.c"
-#endif
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2a4c547240a1..1d9157bf96ae 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -458,12 +458,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
 {
 	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-
-	/* After manually advancing the seqno, fake the interrupt in case
-	 * there are any waiters for that seqno.
-	 */
-	intel_engine_wakeup(engine);
-
 	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
 }
 
@@ -667,16 +661,10 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 		}
 	}
 
-	ret = intel_engine_init_breadcrumbs(engine);
-	if (ret)
-		goto err_unpin_preempt;
+	intel_engine_init_breadcrumbs(engine);
 
 	return 0;
 
-err_unpin_preempt:
-	if (i915->preempt_context)
-		__intel_context_unpin(i915->preempt_context, engine);
-
 err_unpin_kernel:
 	__intel_context_unpin(i915->kernel_context, engine);
 	return ret;
@@ -1236,12 +1224,14 @@ static void print_request(struct drm_printer *m,
 
 	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
 
-	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
+	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
 		   prefix,
 		   rq->global_seqno,
 		   i915_request_completed(rq) ? "!" :
 		   i915_request_started(rq) ? "*" :
 		   "",
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &rq->fence.flags) ?  "+" : "",
 		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
@@ -1434,12 +1424,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
 {
-	struct intel_breadcrumbs * const b = &engine->breadcrumbs;
 	struct i915_gpu_error * const error = &engine->i915->gpu_error;
 	struct i915_request *rq;
 	intel_wakeref_t wakeref;
-	unsigned long flags;
-	struct rb_node *rb;
 
 	if (header) {
 		va_list ap;
@@ -1507,21 +1494,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
 	intel_execlists_show_requests(engine, m, print_request, 8);
 
-	spin_lock_irqsave(&b->rb_lock, flags);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		drm_printf(m, "\t%s [%d:%c] waiting for %x\n",
-			   w->tsk->comm, w->tsk->pid,
-			   task_state_to_char(w->tsk),
-			   w->seqno);
-	}
-	spin_unlock_irqrestore(&b->rb_lock, flags);
-
 	drm_printf(m, "HWSP:\n");
 	hexdump(m, engine->status_page.addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	intel_engine_print_breadcrumbs(engine, m);
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index f6b30eb46263..bd44ea41d7ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -483,8 +483,8 @@ static void gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 
 	for (i = 0; i < GEN7_XCS_WA; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_INDEX_ADDR;
-		*cs++ = rq->global_seqno;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
 	}
 
 	*cs++ = MI_FLUSH_DW;
@@ -734,7 +734,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	}
 
 	/* Papering over lost _interrupts_ immediately following the restart */
-	intel_engine_wakeup(engine);
+	intel_engine_queue_breadcrumbs(engine);
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d3d4f3667afb..e4ccc01b06ec 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,7 @@
 #include <drm/drm_util.h>
 
 #include <linux/hashtable.h>
+#include <linux/irq_work.h>
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
@@ -376,22 +377,19 @@ struct intel_engine_cs {
 	 * the overhead of waking that client is much preferred.
 	 */
 	struct intel_breadcrumbs {
-		spinlock_t irq_lock; /* protects irq_*; irqsafe */
-		struct intel_wait *irq_wait; /* oldest waiter by retirement */
+		spinlock_t irq_lock;
+		struct list_head signalers;
 
-		spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
-		struct rb_root waiters; /* sorted by retirement, priority */
-		struct list_head signals; /* sorted by retirement */
-		struct task_struct *signaler; /* used for fence signalling */
+		struct irq_work irq_work; /* for use from inside irq_lock */
 
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		struct timer_list hangcheck; /* detect missed interrupts */
 
 		unsigned int hangcheck_interrupts;
 		unsigned int irq_enabled;
-		unsigned int irq_count;
 
-		bool irq_armed : 1;
+		bool irq_armed;
+		bool irq_fired;
 	} breadcrumbs;
 
 	struct {
@@ -880,83 +878,29 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
-/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
-
-static inline void intel_wait_init(struct intel_wait *wait)
-{
-	wait->tsk = current;
-	wait->request = NULL;
-}
-
-static inline void intel_wait_init_for_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->tsk = current;
-	wait->seqno = seqno;
-}
-
-static inline bool intel_wait_has_seqno(const struct intel_wait *wait)
-{
-	return wait->seqno;
-}
-
-static inline bool
-intel_wait_update_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->seqno = seqno;
-	return intel_wait_has_seqno(wait);
-}
-
-static inline bool
-intel_wait_update_request(struct intel_wait *wait,
-			  const struct i915_request *rq)
-{
-	return intel_wait_update_seqno(wait, i915_request_global_seqno(rq));
-}
-
-static inline bool
-intel_wait_check_seqno(const struct intel_wait *wait, u32 seqno)
-{
-	return wait->seqno == seqno;
-}
-
-static inline bool
-intel_wait_check_request(const struct intel_wait *wait,
-			 const struct i915_request *rq)
-{
-	return intel_wait_check_seqno(wait, i915_request_global_seqno(rq));
-}
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
-static inline bool intel_wait_complete(const struct intel_wait *wait)
-{
-	return RB_EMPTY_NODE(&wait->node);
-}
+void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
+void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
 
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait);
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait);
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup);
-void intel_engine_cancel_signaling(struct i915_request *request);
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
 
-static inline bool intel_engine_has_waiter(const struct intel_engine_cs *engine)
+static inline void
+intel_engine_queue_breadcrumbs(struct intel_engine_cs *engine)
 {
-	return READ_ONCE(engine->breadcrumbs.irq_wait);
+	irq_work_queue(&engine->breadcrumbs.irq_work);
 }
 
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine);
-#define ENGINE_WAKEUP_WAITER BIT(0)
-#define ENGINE_WAKEUP_ASLEEP BIT(1)
-
-void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
-void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
-
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine);
 
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p);
+
 static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
 {
 	memset(batch, 0, 6 * sizeof(u32));
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 4a83a1c6c406..88e5ab586337 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -15,7 +15,6 @@ selftest(scatterlist, scatterlist_mock_selftests)
 selftest(syncmap, i915_syncmap_mock_selftests)
 selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
-selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
 selftest(timelines, i915_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 4d4b86b5fa11..013bef479475 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -25,9 +25,12 @@
 #include <linux/prime_numbers.h>
 
 #include "../i915_selftest.h"
+#include "i915_random.h"
 #include "igt_live_test.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
+#include "mock_drm.h"
 #include "mock_gem_device.h"
 
 static int igt_add_request(void *arg)
@@ -247,6 +250,253 @@ static int igt_request_rewind(void *arg)
 	return err;
 }
 
+struct smoketest {
+	struct intel_engine_cs *engine;
+	struct i915_gem_context **contexts;
+	unsigned int ncontexts, max_batch;
+	atomic_long_t num_waits, num_fences;
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,
+					      struct intel_engine_cs *);
+};
+
+static struct i915_request *
+__mock_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return mock_request(engine, ctx, 0);
+}
+
+static struct i915_request *
+__live_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return i915_request_alloc(engine, ctx);
+}
+
+static int __igt_breadcrumbs_smoketest(void *arg)
+{
+	struct smoketest *t = arg;
+	struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
+	struct i915_request **requests;
+	I915_RND_STATE(prng);
+	const unsigned int total = 4 * t->ncontexts + 1;
+	const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
+	unsigned int num_waits = 0, num_fences = 0;
+	unsigned int *order;
+	int err = 0;
+
+	/*
+	 * A very simple test to catch the most egregious of list handling bugs.
+	 *
+	 * At its heart, we simple create oodles of requests running across
+	 * multiple kthreads and enable signaling on them, for the sole purpose
+	 * of stressing our breadcrumb handling. The only inspection we do is
+	 * that the fences were marked as signaled.
+	 */
+
+	requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
+	if (!requests)
+		return -ENOMEM;
+
+	order = i915_random_order(total, &prng);
+	if (!order) {
+		err = -ENOMEM;
+		goto out_requests;
+	}
+
+	while (!kthread_should_stop()) {
+		struct i915_sw_fence *submit, *wait;
+		unsigned int n, count;
+
+		submit = heap_fence_create(GFP_KERNEL);
+		if (!submit) {
+			err = -ENOMEM;
+			break;
+		}
+
+		wait = heap_fence_create(GFP_KERNEL);
+		if (!wait) {
+			i915_sw_fence_commit(submit);
+			heap_fence_put(submit);
+			err = ENOMEM;
+			break;
+		}
+
+		i915_random_reorder(order, total, &prng);
+		count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
+
+		for (n = 0; n < count; n++) {
+			struct i915_gem_context *ctx =
+				t->contexts[order[n] % t->ncontexts];
+			struct i915_request *rq;
+
+			mutex_lock(BKL);
+
+			rq = t->request_alloc(ctx, t->engine);
+			if (IS_ERR(rq)) {
+				mutex_unlock(BKL);
+				err = PTR_ERR(rq);
+				count = n;
+				break;
+			}
+
+			err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							       submit,
+							       GFP_KERNEL);
+
+			requests[n] = i915_request_get(rq);
+			i915_request_add(rq);
+
+			mutex_unlock(BKL);
+
+			if (err >= 0)
+				err = i915_sw_fence_await_dma_fence(wait,
+								    &rq->fence,
+								    0,
+								    GFP_KERNEL);
+			if (err < 0) {
+				i915_request_put(rq);
+				count = n;
+				break;
+			}
+		}
+
+		i915_sw_fence_commit(submit);
+		i915_sw_fence_commit(wait);
+
+		if (!wait_event_timeout(wait->wait,
+					i915_sw_fence_done(wait),
+					HZ / 2)) {
+			struct i915_request *rq = requests[count - 1];
+
+			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
+			       count,
+			       rq->fence.context, rq->fence.seqno,
+			       t->engine->name);
+			i915_gem_set_wedged(t->engine->i915);
+			GEM_BUG_ON(!i915_request_completed(rq));
+			i915_sw_fence_wait(wait);
+			err = -EIO;
+		}
+
+		for (n = 0; n < count; n++) {
+			struct i915_request *rq = requests[n];
+
+			if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				      &rq->fence.flags)) {
+				pr_err("%llu:%llu was not signaled!\n",
+				       rq->fence.context, rq->fence.seqno);
+				err = -EINVAL;
+			}
+
+			i915_request_put(rq);
+		}
+
+		heap_fence_put(wait);
+		heap_fence_put(submit);
+
+		if (err < 0)
+			break;
+
+		num_fences += count;
+		num_waits++;
+
+		cond_resched();
+	}
+
+	atomic_long_add(num_fences, &t->num_fences);
+	atomic_long_add(num_waits, &t->num_waits);
+
+	kfree(order);
+out_requests:
+	kfree(requests);
+	return err;
+}
+
+static int mock_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t = {
+		.engine = i915->engine[RCS],
+		.ncontexts = 1024,
+		.max_batch = 1024,
+		.request_alloc = __mock_request_alloc
+	};
+	unsigned int ncpus = num_online_cpus();
+	struct task_struct **threads;
+	unsigned int n;
+	int ret = 0;
+
+	/*
+	 * Smoketest our breadcrumb/signal handling for requests across multiple
+	 * threads. A very simple test to only catch the most egregious of bugs.
+	 * See __igt_breadcrumbs_smoketest();
+	 */
+
+	threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
+	if (!threads)
+		return -ENOMEM;
+
+	t.contexts =
+		kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
+	if (!t.contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+	for (n = 0; n < t.ncontexts; n++) {
+		t.contexts[n] = mock_context(t.engine->i915, "mock");
+		if (!t.contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+
+	for (n = 0; n < ncpus; n++) {
+		threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
+					 &t, "igt/%d", n);
+		if (IS_ERR(threads[n])) {
+			ret = PTR_ERR(threads[n]);
+			ncpus = n;
+			break;
+		}
+
+		get_task_struct(threads[n]);
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+	for (n = 0; n < ncpus; n++) {
+		int err;
+
+		err = kthread_stop(threads[n]);
+		if (err < 0 && !ret)
+			ret = err;
+
+		put_task_struct(threads[n]);
+	}
+	pr_info("Completed %lu waits for %lu fence across %d cpus\n",
+		atomic_long_read(&t.num_waits),
+		atomic_long_read(&t.num_fences),
+		ncpus);
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+out_contexts:
+	for (n = 0; n < t.ncontexts; n++) {
+		if (!t.contexts[n])
+			break;
+		mock_context_close(t.contexts[n]);
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+	kfree(t.contexts);
+out_threads:
+	kfree(threads);
+
+	return ret;
+}
+
 int i915_request_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
@@ -254,6 +504,7 @@ int i915_request_mock_selftests(void)
 		SUBTEST(igt_wait_request),
 		SUBTEST(igt_fence_wait),
 		SUBTEST(igt_request_rewind),
+		SUBTEST(mock_breadcrumbs_smoketest),
 	};
 	struct drm_i915_private *i915;
 	intel_wakeref_t wakeref;
@@ -812,6 +1063,174 @@ static int live_sequential_engines(void *arg)
 	return err;
 }
 
+static int
+max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+	int ret;
+
+	/*
+	 * Before execlists, all contexts share the same ringbuffer. With
+	 * execlists, each context/engine has a separate ringbuffer and
+	 * for the purposes of this test, inexhaustible.
+	 *
+	 * For the global ringbuffer though, we have to be very careful
+	 * that we do not wrap while preventing the execution of requests
+	 * with a unsignaled fence.
+	 */
+	if (HAS_EXECLISTS(ctx->i915))
+		return INT_MAX;
+
+	rq = i915_request_alloc(engine, ctx);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+	} else {
+		int sz;
+
+		ret = rq->ring->size - rq->reserved_space;
+		i915_request_add(rq);
+
+		sz = rq->ring->emit - rq->head;
+		if (sz < 0)
+			sz += rq->ring->size;
+		ret /= sz;
+		ret /= 2; /* leave half spare, in case of emergency! */
+
+		/* One ring interleaved between requests from all cpus */
+		ret /= num_online_cpus() + 1;
+	}
+
+	return ret;
+}
+
+static int live_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t[I915_NUM_ENGINES];
+	unsigned int ncpus = num_online_cpus();
+	unsigned long num_waits, num_fences;
+	struct intel_engine_cs *engine;
+	struct task_struct **threads;
+	struct igt_live_test live;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	struct drm_file *file;
+	unsigned int n;
+	int ret = 0;
+
+	/*
+	 * Smoketest our breadcrumb/signal handling for requests across multiple
+	 * threads. A very simple test to only catch the most egregious of bugs.
+	 * See __igt_breadcrumbs_smoketest();
+	 *
+	 * On real hardware this time.
+	 */
+
+	wakeref = intel_runtime_pm_get(i915);
+
+	file = mock_file(i915);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out_rpm;
+	}
+
+	threads = kcalloc(ncpus * I915_NUM_ENGINES,
+			  sizeof(*threads),
+			  GFP_KERNEL);
+	if (!threads)
+		return -ENOMEM;
+
+	memset(&t[0], 0, sizeof(t[0]));
+	t[0].request_alloc = __live_request_alloc;
+	t[0].ncontexts = 64;
+	t[0].contexts = kmalloc_array(t[0].ncontexts,
+				      sizeof(*t[0].contexts),
+				      GFP_KERNEL);
+	if (!t[0].contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&i915->drm.struct_mutex);
+	for (n = 0; n < t[0].ncontexts; n++) {
+		t[0].contexts[n] = live_context(i915, file);
+		if (!t[0].contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+
+	ret = igt_live_test_begin(&live, i915, __func__, "");
+	if (ret)
+		goto out_contexts;
+
+	for_each_engine(engine, i915, id) {
+		t[id] = t[0];
+		t[id].engine = engine;
+		t[id].max_batch = max_batches(t[0].contexts[0], engine);
+		if (t[id].max_batch < 0) {
+			ret = t[id].max_batch;
+			goto out_flush;
+		}
+		pr_debug("Limiting batches to %d requests on %s\n",
+			 t[id].max_batch, engine->name);
+
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk;
+
+			tsk = kthread_run(__igt_breadcrumbs_smoketest,
+					  &t[id], "igt/%d.%d", id, n);
+			if (IS_ERR(tsk)) {
+				ret = PTR_ERR(tsk);
+				goto out_flush;
+			}
+
+			get_task_struct(tsk);
+			threads[id * ncpus + n] = tsk;
+		}
+	}
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+out_flush:
+	num_waits = 0;
+	num_fences = 0;
+	for_each_engine(engine, i915, id) {
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk = threads[id * ncpus + n];
+			int err;
+
+			if (!tsk)
+				continue;
+
+			err = kthread_stop(tsk);
+			if (err < 0 && !ret)
+				ret = err;
+
+			put_task_struct(tsk);
+		}
+
+		num_waits += atomic_long_read(&t[id].num_waits);
+		num_fences += atomic_long_read(&t[id].num_fences);
+	}
+	pr_info("Completed %lu waits for %lu fence across %d engines and %d cpus\n",
+		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
+
+	mutex_lock(&i915->drm.struct_mutex);
+	ret = igt_live_test_end(&live) ?: ret;
+out_contexts:
+	mutex_unlock(&i915->drm.struct_mutex);
+	kfree(t[0].contexts);
+out_threads:
+	kfree(threads);
+	mock_file_free(i915, file);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+
+	return ret;
+}
+
 int i915_request_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -819,6 +1238,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_all_engines),
 		SUBTEST(live_sequential_engines),
 		SUBTEST(live_empty_request),
+		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
 	if (i915_terminally_wedged(&i915->gpu_error))
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0e70df0230b8..9ebd9225684e 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -185,11 +185,6 @@ void igt_spinner_fini(struct igt_spinner *spin)
 
 bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
 {
-	if (!wait_event_timeout(rq->execute,
-				READ_ONCE(rq->global_seqno),
-				msecs_to_jiffies(10)))
-		return false;
-
 	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
 					       rq->fence.seqno),
 			     10) &&
diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
deleted file mode 100644
index f03b407fdbe2..000000000000
--- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
+++ /dev/null
@@ -1,470 +0,0 @@
-/*
- * Copyright © 2016 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- *
- */
-
-#include "../i915_selftest.h"
-#include "i915_random.h"
-
-#include "mock_gem_device.h"
-#include "mock_engine.h"
-
-static int check_rbtree(struct intel_engine_cs *engine,
-			const unsigned long *bitmap,
-			const struct intel_wait *waiters,
-			const int count)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node *rb;
-	int n;
-
-	if (&b->irq_wait->node != rb_first(&b->waiters)) {
-		pr_err("First waiter does not match first element of wait-tree\n");
-		return -EINVAL;
-	}
-
-	n = find_first_bit(bitmap, count);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = container_of(rb, typeof(*w), node);
-		int idx = w - waiters;
-
-		if (!test_bit(idx, bitmap)) {
-			pr_err("waiter[%d, seqno=%d] removed but still in wait-tree\n",
-			       idx, w->seqno);
-			return -EINVAL;
-		}
-
-		if (n != idx) {
-			pr_err("waiter[%d, seqno=%d] does not match expected next element in tree [%d]\n",
-			       idx, w->seqno, n);
-			return -EINVAL;
-		}
-
-		n = find_next_bit(bitmap, count, n + 1);
-	}
-
-	return 0;
-}
-
-static int check_completion(struct intel_engine_cs *engine,
-			    const unsigned long *bitmap,
-			    const struct intel_wait *waiters,
-			    const int count)
-{
-	int n;
-
-	for (n = 0; n < count; n++) {
-		if (intel_wait_complete(&waiters[n]) != !!test_bit(n, bitmap))
-			continue;
-
-		pr_err("waiter[%d, seqno=%d] is %s, but expected %s\n",
-		       n, waiters[n].seqno,
-		       intel_wait_complete(&waiters[n]) ? "complete" : "active",
-		       test_bit(n, bitmap) ? "active" : "complete");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int check_rbtree_empty(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	if (b->irq_wait) {
-		pr_err("Empty breadcrumbs still has a waiter\n");
-		return -EINVAL;
-	}
-
-	if (!RB_EMPTY_ROOT(&b->waiters)) {
-		pr_err("Empty breadcrumbs, but wait-tree not empty\n");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int igt_random_insert_remove(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned int *order;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	order = i915_random_order(count, &prng);
-	if (!order)
-		goto out_bitmap;
-
-	for (n = 0; n < count; n++)
-		intel_wait_init_for_seqno(&waiters[n], seqno_bias + n);
-
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_order;
-
-	/* Add and remove waiters into the rbtree in random order. At each
-	 * step, we verify that the rbtree is correctly ordered.
-	 */
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_add_wait(engine, &waiters[i]);
-		__set_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	i915_random_reorder(order, count, &prng);
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_remove_wait(engine, &waiters[i]);
-		__clear_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	err = check_rbtree_empty(engine);
-out_order:
-	kfree(order);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-static int igt_insert_complete(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n, m;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	for (n = 0; n < count; n++) {
-		intel_wait_init_for_seqno(&waiters[n], n + seqno_bias);
-		intel_engine_add_wait(engine, &waiters[n]);
-		__set_bit(n, bitmap);
-	}
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_bitmap;
-
-	/* On each step, we advance the seqno so that several waiters are then
-	 * complete (we increase the seqno by increasingly larger values to
-	 * retire more and more waiters at once). All retired waiters should
-	 * be woken and removed from the rbtree, and so that we check.
-	 */
-	for (n = 0; n < count; n = m) {
-		int seqno = 2 * n;
-
-		GEM_BUG_ON(find_first_bit(bitmap, count) != n);
-
-		if (intel_wait_complete(&waiters[n])) {
-			pr_err("waiter[%d, seqno=%d] completed too early\n",
-			       n, waiters[n].seqno);
-			err = -EINVAL;
-			goto out_bitmap;
-		}
-
-		/* complete the following waiters */
-		mock_seqno_advance(engine, seqno + seqno_bias);
-		for (m = n; m <= seqno; m++) {
-			if (m == count)
-				break;
-
-			GEM_BUG_ON(!test_bit(m, bitmap));
-			__clear_bit(m, bitmap);
-		}
-
-		intel_engine_remove_wait(engine, &waiters[n]);
-		RB_CLEAR_NODE(&waiters[n].node);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("rbtree corrupt after seqno advance to %d\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-
-		err = check_completion(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("completions after seqno advance to %d failed\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-	}
-
-	err = check_rbtree_empty(engine);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-struct igt_wakeup {
-	struct task_struct *tsk;
-	atomic_t *ready, *set, *done;
-	struct intel_engine_cs *engine;
-	unsigned long flags;
-#define STOP 0
-#define IDLE 1
-	wait_queue_head_t *wq;
-	u32 seqno;
-};
-
-static bool wait_for_ready(struct igt_wakeup *w)
-{
-	DEFINE_WAIT(ready);
-
-	set_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->done))
-		wake_up_var(w->done);
-
-	if (test_bit(STOP, &w->flags))
-		goto out;
-
-	for (;;) {
-		prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
-		if (atomic_read(w->ready) == 0)
-			break;
-
-		schedule();
-	}
-	finish_wait(w->wq, &ready);
-
-out:
-	clear_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->set))
-		wake_up_var(w->set);
-
-	return !test_bit(STOP, &w->flags);
-}
-
-static int igt_wakeup_thread(void *arg)
-{
-	struct igt_wakeup *w = arg;
-	struct intel_wait wait;
-
-	while (wait_for_ready(w)) {
-		GEM_BUG_ON(kthread_should_stop());
-
-		intel_wait_init_for_seqno(&wait, w->seqno);
-		intel_engine_add_wait(w->engine, &wait);
-		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
-					      w->seqno))
-				break;
-
-			if (test_bit(STOP, &w->flags)) /* emergency escape */
-				break;
-
-			schedule();
-		}
-		intel_engine_remove_wait(w->engine, &wait);
-		__set_current_state(TASK_RUNNING);
-	}
-
-	return 0;
-}
-
-static void igt_wake_all_sync(atomic_t *ready,
-			      atomic_t *set,
-			      atomic_t *done,
-			      wait_queue_head_t *wq,
-			      int count)
-{
-	atomic_set(set, count);
-	atomic_set(ready, 0);
-	wake_up_all(wq);
-
-	wait_var_event(set, !atomic_read(set));
-	atomic_set(ready, count);
-	atomic_set(done, count);
-}
-
-static int igt_wakeup(void *arg)
-{
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct igt_wakeup *waiters;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-	const int count = 4096;
-	const u32 max_seqno = count / 4;
-	atomic_t ready, set, done;
-	int err = -ENOMEM;
-	int n, step;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	/* Create a large number of threads, each waiting on a random seqno.
-	 * Multiple waiters will be waiting for the same seqno.
-	 */
-	atomic_set(&ready, count);
-	for (n = 0; n < count; n++) {
-		waiters[n].wq = &wq;
-		waiters[n].ready = &ready;
-		waiters[n].set = &set;
-		waiters[n].done = &done;
-		waiters[n].engine = engine;
-		waiters[n].flags = BIT(IDLE);
-
-		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
-					     "i915/igt:%d", n);
-		if (IS_ERR(waiters[n].tsk))
-			goto out_waiters;
-
-		get_task_struct(waiters[n].tsk);
-	}
-
-	for (step = 1; step <= max_seqno; step <<= 1) {
-		u32 seqno;
-
-		/* The waiter threads start paused as we assign them a random
-		 * seqno and reset the engine. Once the engine is reset,
-		 * we signal that the threads may begin their wait upon their
-		 * seqno.
-		 */
-		for (n = 0; n < count; n++) {
-			GEM_BUG_ON(!test_bit(IDLE, &waiters[n].flags));
-			waiters[n].seqno =
-				1 + prandom_u32_state(&prng) % max_seqno;
-		}
-		mock_seqno_advance(engine, 0);
-		igt_wake_all_sync(&ready, &set, &done, &wq, count);
-
-		/* Simulate the GPU doing chunks of work, with one or more
-		 * seqno appearing to finish at the same time. A random number
-		 * of threads will be waiting upon the update and hopefully be
-		 * woken.
-		 */
-		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
-			usleep_range(50, 500);
-			mock_seqno_advance(engine, seqno);
-		}
-		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
-
-		/* With the seqno now beyond any of the waiting threads, they
-		 * should all be woken, see that they are complete and signal
-		 * that they are ready for the next test. We wait until all
-		 * threads are complete and waiting for us (i.e. not a seqno).
-		 */
-		if (!wait_var_event_timeout(&done,
-					    !atomic_read(&done), 10 * HZ)) {
-			pr_err("Timed out waiting for %d remaining waiters\n",
-			       atomic_read(&done));
-			err = -ETIMEDOUT;
-			break;
-		}
-
-		err = check_rbtree_empty(engine);
-		if (err)
-			break;
-	}
-
-out_waiters:
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		set_bit(STOP, &waiters[n].flags);
-	}
-	mock_seqno_advance(engine, INT_MAX); /* wakeup any broken waiters */
-	igt_wake_all_sync(&ready, &set, &done, &wq, n);
-
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		kthread_stop(waiters[n].tsk);
-		put_task_struct(waiters[n].tsk);
-	}
-
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-int intel_breadcrumbs_mock_selftests(void)
-{
-	static const struct i915_subtest tests[] = {
-		SUBTEST(igt_random_insert_remove),
-		SUBTEST(igt_insert_complete),
-		SUBTEST(igt_wakeup),
-	};
-	struct drm_i915_private *i915;
-	int err;
-
-	i915 = mock_gem_device();
-	if (!i915)
-		return -ENOMEM;
-
-	err = i915_subtests(tests, i915->engine[RCS]);
-	drm_dev_put(&i915->drm);
-
-	return err;
-}
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 2c38ea5892d9..7b6f3bea9ef8 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1127,7 +1127,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 	wait_for_completion(&arg.completion);
 
-	if (wait_for(waitqueue_active(&rq->execute), 10)) {
+	if (wait_for(!list_empty(&rq->fence.cb_list), 10)) {
 		struct drm_printer p = drm_info_printer(i915->drm.dev);
 
 		pr_err("igt/evict_vma kthread did not wait\n");
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
index b26f07b55d86..2bfa72c1654b 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
@@ -76,3 +76,57 @@ void timed_fence_fini(struct timed_fence *tf)
 	destroy_timer_on_stack(&tf->timer);
 	i915_sw_fence_fini(&tf->fence);
 }
+
+struct heap_fence {
+	struct i915_sw_fence fence;
+	union {
+		struct kref ref;
+		struct rcu_head rcu;
+	};
+};
+
+static int __i915_sw_fence_call
+heap_fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		break;
+
+	case FENCE_FREE:
+		heap_fence_put(&h->fence);
+	}
+
+	return NOTIFY_DONE;
+}
+
+struct i915_sw_fence *heap_fence_create(gfp_t gfp)
+{
+	struct heap_fence *h;
+
+	h = kmalloc(sizeof(*h), gfp);
+	if (!h)
+		return NULL;
+
+	i915_sw_fence_init(&h->fence, heap_fence_notify);
+	refcount_set(&h->ref.refcount, 2);
+
+	return &h->fence;
+}
+
+static void heap_fence_release(struct kref *ref)
+{
+	struct heap_fence *h = container_of(ref, typeof(*h), ref);
+
+	i915_sw_fence_fini(&h->fence);
+
+	kfree_rcu(h, rcu);
+}
+
+void heap_fence_put(struct i915_sw_fence *fence)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	kref_put(&h->ref, heap_fence_release);
+}
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
index 474aafb92ae1..1f9927e10f3a 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
@@ -39,4 +39,7 @@ struct timed_fence {
 void timed_fence_init(struct timed_fence *tf, unsigned long expires);
 void timed_fence_fini(struct timed_fence *tf);
 
+struct i915_sw_fence *heap_fence_create(gfp_t gfp);
+void heap_fence_put(struct i915_sw_fence *fence);
+
 #endif /* _LIB_SW_FENCE_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 2515cffb4490..e70b4a6cfc67 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -86,17 +86,21 @@ static struct mock_request *first_request(struct mock_engine *engine)
 static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
-	mock_seqno_advance(request->base.engine, request->base.global_seqno);
+	intel_engine_write_global_seqno(request->base.engine,
+					request->base.global_seqno);
 	i915_request_mark_complete(&request->base);
 	GEM_BUG_ON(!i915_request_completed(&request->base));
+
+	intel_engine_queue_breadcrumbs(request->base.engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
 	struct mock_request *request;
+	unsigned long flags;
 
-	spin_lock(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 
 	/* Timer fired, first request is complete */
 	request = first_request(engine);
@@ -116,7 +120,7 @@ static void hw_delay_complete(struct timer_list *t)
 		advance(request);
 	}
 
-	spin_unlock(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 static void mock_context_unpin(struct intel_context *ce)
@@ -191,11 +195,12 @@ static void mock_submit_request(struct i915_request *request)
 	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
+	unsigned long flags;
 
 	i915_request_submit(request);
 	GEM_BUG_ON(!request->global_seqno);
 
-	spin_lock_irq(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 	list_add_tail(&mock->link, &engine->hw_queue);
 	if (mock->link.prev == &engine->hw_queue) {
 		if (mock->delay)
@@ -203,7 +208,7 @@ static void mock_submit_request(struct i915_request *request)
 		else
 			advance(mock);
 	}
-	spin_unlock_irq(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
@@ -273,6 +278,7 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 
 void mock_engine_reset(struct intel_engine_cs *engine)
 {
+	intel_engine_write_global_seqno(engine, 0);
 }
 
 void mock_engine_free(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.h b/drivers/gpu/drm/i915/selftests/mock_engine.h
index 133d0c21790d..b9cc3a245f16 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.h
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.h
@@ -46,10 +46,4 @@ void mock_engine_flush(struct intel_engine_cs *engine);
 void mock_engine_reset(struct intel_engine_cs *engine);
 void mock_engine_free(struct intel_engine_cs *engine);
 
-static inline void mock_seqno_advance(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-	intel_engine_wakeup(engine);
-}
-
 #endif /* !__MOCK_ENGINE_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (37 preceding siblings ...)
  2019-01-22  1:35 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2019-01-23 12:00 ` Patchwork
  2019-01-23 12:11 ` ✗ Fi.CI.SPARSE: " Patchwork
  2019-01-23 12:48 ` ✗ Fi.CI.BAT: failure " Patchwork
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-23 12:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/55528/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a3cfcd69db36 drm/i915/execlists: Mark up priority boost on preemption
10d459a62e38 drm/i915/execlists: Suppress preempting self
-:18: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#18: 
References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")

-:18: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")'
#18: 
References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")

total: 1 errors, 1 warnings, 0 checks, 92 lines checked
dcf4a4c2fe84 drm/i915: Make all GPU resets atomic
-:24: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#24: FILE: drivers/gpu/drm/i915/i915_reset.c:147:
+	udelay(50);

-:30: CHECK:USLEEP_RANGE: usleep_range is preferred over udelay; see Documentation/timers/timers-howto.txt
#30: FILE: drivers/gpu/drm/i915/i915_reset.c:152:
+	udelay(50);

total: 0 errors, 0 warnings, 2 checks, 111 lines checked
4a03a5667d92 drm/i915/guc: Disable global reset
3476465681a2 drm/i915: Remove GPU reset dependence on struct_mutex
-:878: WARNING:MEMORY_BARRIER: memory barrier without comment
#878: FILE: drivers/gpu/drm/i915/i915_reset.c:692:
+	smp_store_mb(i915->gpu_error.restart, NULL);

-:1031: WARNING:IF_0: Consider removing the code enclosed by this #if 0 and its #endif
#1031: FILE: drivers/gpu/drm/i915/i915_reset.c:920:
+#if 0

-:1302: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#1302: FILE: drivers/gpu/drm/i915/intel_hangcheck.c:35:
+	bool wedged:1;

-:1303: WARNING:BOOL_BITFIELD: Avoid using bool as bitfield.  Prefer bool bitfields as unsigned int or u<8|16|32>
#1303: FILE: drivers/gpu/drm/i915/intel_hangcheck.c:36:
+	bool stalled:1;

total: 0 errors, 4 warnings, 0 checks, 1729 lines checked
12191ab875dc drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
00a3c0fb6b45 drm/i915: Issue engine resets onto idle engines
399e8162cc6e drm/i915: Stop tracking MRU activity on VMA
efdf8e2689a2 drm/i915: Pull VM lists under the VM mutex.
a1a65ee34602 drm/i915: Move vma lookup to its own lock
-:161: WARNING:USE_SPINLOCK_T: struct spinlock should be spinlock_t
#161: FILE: drivers/gpu/drm/i915/i915_gem_object.h:94:
+		struct spinlock lock;

total: 0 errors, 1 warnings, 0 checks, 290 lines checked
da04ace11678 drm/i915: Always allocate an object/vma for the HWSP
a1911268d4ef drm/i915: Move list of timelines under its own lock
951f23aae0b5 drm/i915: Introduce concept of per-timeline (context) HWSP
095b07e9d68d drm/i915: Enlarge vma->pin_count
0747da7f8236 drm/i915: Allocate a status page for each timeline
e19bdb2707c7 drm/i915: Share per-timeline HWSP using a slab suballocator
-:80: CHECK:SPACING: No space is necessary after a cast
#80: FILE: drivers/gpu/drm/i915/i915_timeline.c:44:
+	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);

total: 0 errors, 0 warnings, 1 checks, 416 lines checked
e996dc446d99 drm/i915: Track the context's seqno in its own timeline HWSP
-:225: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#225: FILE: drivers/gpu/drm/i915/intel_lrc.c:2073:
 }
+static const int gen8_emit_breadcrumb_sz = 10 + WA_TAIL_DWORDS;

-:256: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#256: FILE: drivers/gpu/drm/i915/intel_lrc.c:2099:
 }
+static const int gen8_emit_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;

-:282: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#282: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:344:
 }
+static const int gen6_rcs_emit_breadcrumb_sz = 18;

-:305: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#305: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:451:
 }
+static const int gen7_rcs_emit_breadcrumb_sz = 10;

-:326: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#326: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:469:
 }
+static const int gen6_xcs_emit_breadcrumb_sz = 8;

-:354: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#354: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:499:
 }
+static const int gen7_xcs_emit_breadcrumb_sz = 10 + GEN7_XCS_WA * 3;

-:404: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#404: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:945:
 }
+static const int i9xx_emit_breadcrumb_sz = 8;

-:432: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#432: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:973:
 }
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 6;

total: 0 errors, 0 warnings, 8 checks, 471 lines checked
de57462d995b drm/i915: Track active timelines
c16ba49bd4fb drm/i915: Identify active requests
-:196: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#196: FILE: drivers/gpu/drm/i915/intel_lrc.c:2101:
 }
+static const int gen8_emit_fini_breadcrumb_sz = 10 + WA_TAIL_DWORDS;

-:208: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#208: FILE: drivers/gpu/drm/i915/intel_lrc.c:2127:
 }
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 14 + WA_TAIL_DWORDS;

total: 0 errors, 0 warnings, 2 checks, 278 lines checked
4ceaaf382a3a drm/i915: Remove the intel_engine_notify tracepoint
779d269c1c4a drm/i915: Replace global breadcrumbs with per-context interrupt tracking
-:18: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#18: 
Before commit 688e6c725816, the solution was simple. Every client waking

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#21: 
688e6c725816 introduced an rbtree so that only the earliest waiter on

-:53: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#53: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:53: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#53: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:2150: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct i915_gem_context *' should also have an identifier name
#2150: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:2150: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct intel_engine_cs *' should also have an identifier name
#2150: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:2173: WARNING:LINE_SPACING: Missing a blank line after declarations
#2173: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:281:
+	struct i915_request **requests;
+	I915_RND_STATE(prng);

-:2603: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#2603: 
deleted file mode 100644

total: 3 errors, 5 warnings, 0 checks, 2549 lines checked
0e4b16cc65d1 drm/i915: Drop fake breadcrumb irq
6f083f333e1b drm/i915: Keep timeline HWSP allocated until the system is idle
eca588a4e40b drm/i915/execlists: Refactor out can_merge_rq()
4ae5953452b9 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:296: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#296: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:298: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#298: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:299: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#299: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:300: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#300: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:301: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#301: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 259 lines checked
cb287c2a9e41 drm/i915: Prioritise non-busywait semaphore workloads
39186577611a drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
-:122: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#122: FILE: drivers/gpu/drm/i915/intel_lrc.c:2131:
 }
+static const int gen8_emit_fini_breadcrumb_sz = 14 + WA_TAIL_DWORDS;

-:143: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#143: FILE: drivers/gpu/drm/i915/intel_lrc.c:2162:
 }
+static const int gen8_emit_fini_breadcrumb_rcs_sz = 20 + WA_TAIL_DWORDS;

-:170: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#170: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:472:
 }
+static const int gen6_xcs_emit_breadcrumb_sz = 10;

-:195: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#195: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:507:
 }
+static const int gen7_xcs_emit_breadcrumb_sz = 14 + GEN7_XCS_WA * 3;

-:218: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#218: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:958:
 }
+static const int i9xx_emit_breadcrumb_sz = 12;

-:243: CHECK:LINE_SPACING: Please use a blank line after function/struct/union/enum declarations
#243: FILE: drivers/gpu/drm/i915/intel_ringbuffer.c:989:
 }
+static const int gen5_emit_breadcrumb_sz = GEN5_WA_STORES * 3 + 8;

total: 0 errors, 0 warnings, 6 checks, 236 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (38 preceding siblings ...)
  2019-01-23 12:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2) Patchwork
@ 2019-01-23 12:11 ` Patchwork
  2019-01-23 12:48 ` ✗ Fi.CI.BAT: failure " Patchwork
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-23 12:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/55528/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Mark up priority boost on preemption
+drivers/gpu/drm/i915/intel_ringbuffer.h:602:23: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Suppress preempting self
Okay!

Commit: drm/i915: Make all GPU resets atomic
Okay!

Commit: drm/i915/guc: Disable global reset
Okay!

Commit: drm/i915: Remove GPU reset dependence on struct_mutex
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3546:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3541:16: warning: expression using sizeof(void)

Commit: drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
Okay!

Commit: drm/i915: Issue engine resets onto idle engines
Okay!

Commit: drm/i915: Stop tracking MRU activity on VMA
Okay!

Commit: drm/i915: Pull VM lists under the VM mutex.
Okay!

Commit: drm/i915: Move vma lookup to its own lock
Okay!

Commit: drm/i915: Always allocate an object/vma for the HWSP
Okay!

Commit: drm/i915: Move list of timelines under its own lock
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3541:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)

Commit: drm/i915: Introduce concept of per-timeline (context) HWSP
Okay!

Commit: drm/i915: Enlarge vma->pin_count
Okay!

Commit: drm/i915: Allocate a status page for each timeline
+./include/linux/mm.h:619:13: error: not a function <noident>
+./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/mm.h:619:13: warning: call with no type!

Commit: drm/i915: Share per-timeline HWSP using a slab suballocator
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+./include/linux/slab.h:664:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/slab.h:664:13: warning: call with no type!

Commit: drm/i915: Track the context's seqno in its own timeline HWSP
Okay!

Commit: drm/i915: Track active timelines
Okay!

Commit: drm/i915: Identify active requests
Okay!

Commit: drm/i915: Remove the intel_engine_notify tracepoint
Okay!

Commit: drm/i915: Replace global breadcrumbs with per-context interrupt tracking
+drivers/gpu/drm/i915/selftests/i915_request.c:283:40: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_request.c:283:40: warning: expression using sizeof(void)
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
-./include/linux/mm.h:619:13: warning: call with no type!
+./include/linux/slab.h:664:13: error: not a function <noident>
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Drop fake breadcrumb irq
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until the system is idle
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3550:16: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
Okay!

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
  2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
                   ` (39 preceding siblings ...)
  2019-01-23 12:11 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-01-23 12:48 ` Patchwork
  40 siblings, 0 replies; 89+ messages in thread
From: Patchwork @ 2019-01-23 12:48 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2)
URL   : https://patchwork.freedesktop.org/series/55528/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5469 -> Patchwork_12014
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12014 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12014, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/55528/revisions/2/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12014:

### IGT changes ###

#### Possible regressions ####

  * igt@pm_rpm@module-reload:
    - fi-skl-6770hq:      PASS -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12014 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_module_load@reload:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927]

  
#### Possible fixes ####

  * igt@kms_busy@basic-flip-b:
    - fi-gdg-551:         FAIL [fdo#103182] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108622]: https://bugs.freedesktop.org/show_bug.cgi?id=108622
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271


Participating hosts (44 -> 41)
------------------------------

  Additional (1): fi-hsw-peppy 
  Missing    (4): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5469 -> Patchwork_12014

  CI_DRM_5469: 388cbc6121c1bd3d9846789bfef0a3e08c346461 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4785: 70749c70926f12043d3408b160606e1e6238ed3a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12014: 39186577611a8b3fd4cef8a860a6c82dc2c11736 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

39186577611a drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
cb287c2a9e41 drm/i915: Prioritise non-busywait semaphore workloads
4ae5953452b9 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
eca588a4e40b drm/i915/execlists: Refactor out can_merge_rq()
6f083f333e1b drm/i915: Keep timeline HWSP allocated until the system is idle
0e4b16cc65d1 drm/i915: Drop fake breadcrumb irq
779d269c1c4a drm/i915: Replace global breadcrumbs with per-context interrupt tracking
4ceaaf382a3a drm/i915: Remove the intel_engine_notify tracepoint
c16ba49bd4fb drm/i915: Identify active requests
de57462d995b drm/i915: Track active timelines
e996dc446d99 drm/i915: Track the context's seqno in its own timeline HWSP
e19bdb2707c7 drm/i915: Share per-timeline HWSP using a slab suballocator
0747da7f8236 drm/i915: Allocate a status page for each timeline
095b07e9d68d drm/i915: Enlarge vma->pin_count
951f23aae0b5 drm/i915: Introduce concept of per-timeline (context) HWSP
a1911268d4ef drm/i915: Move list of timelines under its own lock
da04ace11678 drm/i915: Always allocate an object/vma for the HWSP
a1a65ee34602 drm/i915: Move vma lookup to its own lock
efdf8e2689a2 drm/i915: Pull VM lists under the VM mutex.
399e8162cc6e drm/i915: Stop tracking MRU activity on VMA
00a3c0fb6b45 drm/i915: Issue engine resets onto idle engines
12191ab875dc drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest
3476465681a2 drm/i915: Remove GPU reset dependence on struct_mutex
4a03a5667d92 drm/i915/guc: Disable global reset
dcf4a4c2fe84 drm/i915: Make all GPU resets atomic
10d459a62e38 drm/i915/execlists: Suppress preempting self
a3cfcd69db36 drm/i915/execlists: Mark up priority boost on preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12014/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-22 15:50   ` Tvrtko Ursulin
@ 2019-01-23 12:54     ` Chris Wilson
  2019-01-23 13:18       ` Tvrtko Ursulin
  0 siblings, 1 reply; 89+ messages in thread
From: Chris Wilson @ 2019-01-23 12:54 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-22 15:50:27)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > The global seqno is defunct and so we have no meaningful indicator of
> > forward progress for an engine. You need to listen to the request
> > signaling tracepoints instead.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_irq.c   |  2 --
> >   drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
> >   2 files changed, 27 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > index 5fd5080c4ccb..71d11dc2c235 100644
> > --- a/drivers/gpu/drm/i915/i915_irq.c
> > +++ b/drivers/gpu/drm/i915/i915_irq.c
> > @@ -1209,8 +1209,6 @@ static void notify_ring(struct intel_engine_cs *engine)
> >               wake_up_process(tsk);
> >   
> >       rcu_read_unlock();
> > -
> > -     trace_intel_engine_notify(engine, wait);
> >   }
> >   
> >   static void vlv_c0_read(struct drm_i915_private *dev_priv,
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 33d90eca9cdd..cb5bc65d575d 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -750,31 +750,6 @@ trace_i915_request_out(struct i915_request *rq)
> >   #endif
> >   #endif
> >   
> > -TRACE_EVENT(intel_engine_notify,
> > -         TP_PROTO(struct intel_engine_cs *engine, bool waiters),
> > -         TP_ARGS(engine, waiters),
> > -
> > -         TP_STRUCT__entry(
> > -                          __field(u32, dev)
> > -                          __field(u16, class)
> > -                          __field(u16, instance)
> > -                          __field(u32, seqno)
> > -                          __field(bool, waiters)
> > -                          ),
> > -
> > -         TP_fast_assign(
> > -                        __entry->dev = engine->i915->drm.primary->index;
> > -                        __entry->class = engine->uabi_class;
> > -                        __entry->instance = engine->instance;
> > -                        __entry->seqno = intel_engine_get_seqno(engine);
> > -                        __entry->waiters = waiters;
> > -                        ),
> > -
> > -         TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
> > -                   __entry->dev, __entry->class, __entry->instance,
> > -                   __entry->seqno, __entry->waiters)
> > -);
> > -
> >   DEFINE_EVENT(i915_request, i915_request_retire,
> >           TP_PROTO(struct i915_request *rq),
> >           TP_ARGS(rq)
> > 
> 
> I cannot decide if keeping what we can would make it useful. Certainly 
> not for debugging intel_engine_breadcrumbs_irq.. a sequence of 
> intel_engine_notify(dev, class, instance) -> dma_fence_signaled would be 
> a very unreliable trace of what engine actually executed something. What 
> do you think?

All we get is a tracepoint to say an user-interrupt occurred, but nothing to
tie it to any request. We are debugging interrupt generation at that
point, and I feel a tracepoint ill-suited. We want something geared
towards CI instead, so a bunch of selftests... That would be sensible!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-23 12:54     ` Chris Wilson
@ 2019-01-23 13:18       ` Tvrtko Ursulin
  2019-01-23 13:24         ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-23 13:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/01/2019 12:54, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-22 15:50:27)
>>
>> On 21/01/2019 22:21, Chris Wilson wrote:
>>> The global seqno is defunct and so we have no meaningful indicator of
>>> forward progress for an engine. You need to listen to the request
>>> signaling tracepoints instead.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_irq.c   |  2 --
>>>    drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
>>>    2 files changed, 27 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index 5fd5080c4ccb..71d11dc2c235 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -1209,8 +1209,6 @@ static void notify_ring(struct intel_engine_cs *engine)
>>>                wake_up_process(tsk);
>>>    
>>>        rcu_read_unlock();
>>> -
>>> -     trace_intel_engine_notify(engine, wait);
>>>    }
>>>    
>>>    static void vlv_c0_read(struct drm_i915_private *dev_priv,
>>> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
>>> index 33d90eca9cdd..cb5bc65d575d 100644
>>> --- a/drivers/gpu/drm/i915/i915_trace.h
>>> +++ b/drivers/gpu/drm/i915/i915_trace.h
>>> @@ -750,31 +750,6 @@ trace_i915_request_out(struct i915_request *rq)
>>>    #endif
>>>    #endif
>>>    
>>> -TRACE_EVENT(intel_engine_notify,
>>> -         TP_PROTO(struct intel_engine_cs *engine, bool waiters),
>>> -         TP_ARGS(engine, waiters),
>>> -
>>> -         TP_STRUCT__entry(
>>> -                          __field(u32, dev)
>>> -                          __field(u16, class)
>>> -                          __field(u16, instance)
>>> -                          __field(u32, seqno)
>>> -                          __field(bool, waiters)
>>> -                          ),
>>> -
>>> -         TP_fast_assign(
>>> -                        __entry->dev = engine->i915->drm.primary->index;
>>> -                        __entry->class = engine->uabi_class;
>>> -                        __entry->instance = engine->instance;
>>> -                        __entry->seqno = intel_engine_get_seqno(engine);
>>> -                        __entry->waiters = waiters;
>>> -                        ),
>>> -
>>> -         TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
>>> -                   __entry->dev, __entry->class, __entry->instance,
>>> -                   __entry->seqno, __entry->waiters)
>>> -);
>>> -
>>>    DEFINE_EVENT(i915_request, i915_request_retire,
>>>            TP_PROTO(struct i915_request *rq),
>>>            TP_ARGS(rq)
>>>
>>
>> I cannot decide if keeping what we can would make it useful. Certainly
>> not for debugging intel_engine_breadcrumbs_irq.. a sequence of
>> intel_engine_notify(dev, class, instance) -> dma_fence_signaled would be
>> a very unreliable trace of what engine actually executed something. What
>> do you think?
> 
> All we get is a tracepoint to say an user-interrupt occurred, but nothing to
> tie it to any request. We are debugging interrupt generation at that
> point, and I feel a tracepoint ill-suited. We want something geared
> towards CI instead, so a bunch of selftests... That would be sensible!

We get the engine as well, so could look at sequence of dma fence 
signaling happening following that and imply something, sometimes. Like 
which physical engine executed what. Since the signaling is done 
directly from the interrupt handler and engines are handled in serial 
fashion.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-23 13:18       ` Tvrtko Ursulin
@ 2019-01-23 13:24         ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-23 13:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-23 13:18:43)
> 
> On 23/01/2019 12:54, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-22 15:50:27)
> >>
> >> On 21/01/2019 22:21, Chris Wilson wrote:
> >>> The global seqno is defunct and so we have no meaningful indicator of
> >>> forward progress for an engine. You need to listen to the request
> >>> signaling tracepoints instead.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_irq.c   |  2 --
> >>>    drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
> >>>    2 files changed, 27 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >>> index 5fd5080c4ccb..71d11dc2c235 100644
> >>> --- a/drivers/gpu/drm/i915/i915_irq.c
> >>> +++ b/drivers/gpu/drm/i915/i915_irq.c
> >>> @@ -1209,8 +1209,6 @@ static void notify_ring(struct intel_engine_cs *engine)
> >>>                wake_up_process(tsk);
> >>>    
> >>>        rcu_read_unlock();
> >>> -
> >>> -     trace_intel_engine_notify(engine, wait);
> >>>    }
> >>>    
> >>>    static void vlv_c0_read(struct drm_i915_private *dev_priv,
> >>> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> >>> index 33d90eca9cdd..cb5bc65d575d 100644
> >>> --- a/drivers/gpu/drm/i915/i915_trace.h
> >>> +++ b/drivers/gpu/drm/i915/i915_trace.h
> >>> @@ -750,31 +750,6 @@ trace_i915_request_out(struct i915_request *rq)
> >>>    #endif
> >>>    #endif
> >>>    
> >>> -TRACE_EVENT(intel_engine_notify,
> >>> -         TP_PROTO(struct intel_engine_cs *engine, bool waiters),
> >>> -         TP_ARGS(engine, waiters),
> >>> -
> >>> -         TP_STRUCT__entry(
> >>> -                          __field(u32, dev)
> >>> -                          __field(u16, class)
> >>> -                          __field(u16, instance)
> >>> -                          __field(u32, seqno)
> >>> -                          __field(bool, waiters)
> >>> -                          ),
> >>> -
> >>> -         TP_fast_assign(
> >>> -                        __entry->dev = engine->i915->drm.primary->index;
> >>> -                        __entry->class = engine->uabi_class;
> >>> -                        __entry->instance = engine->instance;
> >>> -                        __entry->seqno = intel_engine_get_seqno(engine);
> >>> -                        __entry->waiters = waiters;
> >>> -                        ),
> >>> -
> >>> -         TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
> >>> -                   __entry->dev, __entry->class, __entry->instance,
> >>> -                   __entry->seqno, __entry->waiters)
> >>> -);
> >>> -
> >>>    DEFINE_EVENT(i915_request, i915_request_retire,
> >>>            TP_PROTO(struct i915_request *rq),
> >>>            TP_ARGS(rq)
> >>>
> >>
> >> I cannot decide if keeping what we can would make it useful. Certainly
> >> not for debugging intel_engine_breadcrumbs_irq.. a sequence of
> >> intel_engine_notify(dev, class, instance) -> dma_fence_signaled would be
> >> a very unreliable trace of what engine actually executed something. What
> >> do you think?
> > 
> > All we get is a tracepoint to say an user-interrupt occurred, but nothing to
> > tie it to any request. We are debugging interrupt generation at that
> > point, and I feel a tracepoint ill-suited. We want something geared
> > towards CI instead, so a bunch of selftests... That would be sensible!
> 
> We get the engine as well, so could look at sequence of dma fence 
> signaling happening following that and imply something, sometimes. Like 
> which physical engine executed what. Since the signaling is done 
> directly from the interrupt handler and engines are handled in serial 
> fashion.

Oh, but fence signaling so rarely happens from an irq handler ;)

Or rather we so often signal fences as we do the next execbuf (or
otherwise) that the irq handler has nothing to do.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-23 10:01     ` Chris Wilson
@ 2019-01-23 16:28       ` Tvrtko Ursulin
  0 siblings, 0 replies; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-23 16:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/01/2019 10:01, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-23 09:21:45)
>>
>> On 21/01/2019 22:21, Chris Wilson wrote:
>>> -static void error_record_engine_waiters(struct intel_engine_cs *engine,
>>> -                                     struct drm_i915_error_engine *ee)
>>> -{
>>> -     struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>> -     struct drm_i915_error_waiter *waiter;
>>> -     struct rb_node *rb;
>>> -     int count;
>>> -
>>> -     ee->num_waiters = 0;
>>> -     ee->waiters = NULL;
>>> -
>>> -     if (RB_EMPTY_ROOT(&b->waiters))
>>> -             return;
>>> -
>>> -     if (!spin_trylock_irq(&b->rb_lock)) {
>>> -             ee->waiters = ERR_PTR(-EDEADLK);
>>> -             return;
>>> -     }
>>> -
>>> -     count = 0;
>>> -     for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
>>> -             count++;
>>> -     spin_unlock_irq(&b->rb_lock);
>>> -
>>> -     waiter = NULL;
>>> -     if (count)
>>> -             waiter = kmalloc_array(count,
>>> -                                    sizeof(struct drm_i915_error_waiter),
>>> -                                    GFP_ATOMIC);
>>> -     if (!waiter)
>>> -             return;
>>> -
>>> -     if (!spin_trylock_irq(&b->rb_lock)) {
>>> -             kfree(waiter);
>>> -             ee->waiters = ERR_PTR(-EDEADLK);
>>> -             return;
>>> -     }
>>> -
>>> -     ee->waiters = waiter;
>>> -     for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
>>> -             struct intel_wait *w = rb_entry(rb, typeof(*w), node);
>>> -
>>> -             strcpy(waiter->comm, w->tsk->comm);
>>> -             waiter->pid = w->tsk->pid;
>>> -             waiter->seqno = w->seqno;
>>> -             waiter++;
>>> -
>>> -             if (++ee->num_waiters == count)
>>> -                     break;
>>> -     }
>>> -     spin_unlock_irq(&b->rb_lock);
>>> -}
>>
>> Capturing context waiters is not interesting for error state?
> 
> Not really, we don't have a direct link to the process. We could dig it
> out by identifying our special wait_cb inside the fence->signal_list,
> but I couldn't be bothered. Who's waiting at the time of the error has
> never been that interesting for error debugging, just provides an
> overview of the system state.
> 
> Who issued the hanging command is much more of interest for the hunting
> posse than their victim.
> 
> However, storing fence->flag (i.e. the DMA_FENCE_FLAG_SIGNAL_ENABLE_BIT
> + DMA_FENCE_FLAG_SIGNALED_BIT) seems like it would come in handy.
> 
>>> -static bool __i915_spin_request(const struct i915_request *rq,
>>> -                             u32 seqno, int state, unsigned long timeout_us)
>>> +static bool __i915_spin_request(const struct i915_request * const rq,
>>> +                             int state, unsigned long timeout_us)
>>>    {
>>> -     struct intel_engine_cs *engine = rq->engine;
>>> -     unsigned int irq, cpu;
>>> -
>>> -     GEM_BUG_ON(!seqno);
>>> +     unsigned int cpu;
>>>    
>>>        /*
>>>         * Only wait for the request if we know it is likely to complete.
>>> @@ -1050,7 +1046,7 @@ static bool __i915_spin_request(const struct i915_request *rq,
>>>         * it is a fair assumption that it will not complete within our
>>>         * relatively short timeout.
>>>         */
>>> -     if (!intel_engine_has_started(engine, seqno))
>>> +     if (!i915_request_started(rq))
>>
>> Might be more wasteful the more preemption is going on. Probably not the
>> most important thing to try a fix straight away, but something to put
>> down on some to do list.
> 
> Actually... That would be cheap to fix here as we do a test_bit(ACTIVE).
> Hmm, I wonder if that makes sense for all callers.
> 
> Maybe i915_request_is_running(rq) as a followup.
>   
>> Above comment is also outdated now (engine order).
> 
> I left a comment! Silly me.
> 
>>> +enum {
>>> +     I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
>>> +     I915_FENCE_FLAG_SIGNAL,
>>
>> Describe in comments what these mean please.
> 
> Mean, you expect them to have meaning outside of their use? :)

No, just that the use can be glanced from here instead of derived from 
following the code. :p

>>> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>>> +{
>>> +     struct intel_breadcrumbs *b = &engine->breadcrumbs;
>>
>> How can you afford to have this per engine? I guess I might figure out
>> later in the patch/series.
> 
> Hmm, it's always been per engine... What cost are you considering?

I was getting ahead of myself, well the patch series, by thinking how 
you can afford to store list of waiters per engine, while with the 
virtual engine we won't know. Unless there will be a list on the virtual 
engine and some sort of a super-list about on which engines to run the 
breadcrumbs irq handler for every physical engine interrupt.

>>> +     struct intel_context *ce, *cn;
>>> +     struct i915_request *rq, *rn;
>>> +     LIST_HEAD(signal);
>>> +
>>> +     spin_lock(&b->irq_lock);
>>> +
>>> +     b->irq_fired = true;
>>> +     if (b->irq_armed && list_empty(&b->signalers))
>>> +             __intel_breadcrumbs_disarm_irq(b);
>>> +
>>> +     list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
>>> +             GEM_BUG_ON(list_empty(&ce->signals));
>>> +
>>> +             list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
>>> +                     if (!__request_completed(rq))
>>> +                             break;
>>> +
>>> +                     GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
>>> +                                          &rq->fence.flags));
>>> +                     clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>>> +
>>> +                     if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
>>> +                                  &rq->fence.flags))
>>> +                             continue;
>>
>> Request has been signalled already, but is still on this list? Who will
>> then remove it from this list)?
> 
> Race with retire-request, as we operate here only under b->irq_lock not
> rq->lock, and retire-request uses rq->lock then b->irq_lock.

Right, yes.

> 
>>> +                     /*
>>> +                      * Queue for execution after dropping the signaling
>>> +                      * spinlock as the callback chain may end adding
>>> +                      * more signalers to the same context or engine.
>>> +                      */
>>> +                     i915_request_get(rq);
>>> +                     list_add_tail(&rq->signal_link, &signal);
>>
>> Shouldn't this be list_move_tail since rq is already on the ce->signals
>> list?
> 
> (1) We delete in bulk, see (2)
> 
>>> +             }
>>> +
>>> +             if (!list_is_first(&rq->signal_link, &ce->signals)) {
>>
>> Can't rq be NULL here - if only completed requests are on the list and
>> so the iterator reached the end?
> 
> Iterator at end == &ce->signals.

Yes, my bad.

> 
>>> +                     __list_del_many(&ce->signals, &rq->signal_link);
>>
>>
>> This block could use a comment - I at least failed to quickly understand
>> it. How can we be unlinking entries, if they have already been unlinked?
> 
> (2) Because we did list_add not list_move, see (1).
> 
>>> +                     if (&ce->signals == &rq->signal_link)
>>> +                             list_del_init(&ce->signal_link);
>>
>> This is another list_empty hack like from another day? Please put a
>> comment if you don't want it to be self documenting.
> 
>>> -static int intel_breadcrumbs_signaler(void *arg)
>>> +bool intel_engine_enable_signaling(struct i915_request *rq)
>>
>> intel_request_enable_signaling?
> 
> I'm warming to it.
> 
>>> +     spin_lock(&b->irq_lock);
>>> +     if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
>>
>> __test_bit?
> 
> Heh. test_bit == __test_bit :)
> 
> In general though, we have to be cautious as we don't own the whole flags
> field.
> 
>>> +             list_for_each_prev(pos, &ce->signals) {
>>> +                     struct i915_request *it =
>>> +                             list_entry(pos, typeof(*it), signal_link);
>>>    
>>> -                     if (unlikely(kthread_should_stop()))
>>> +                     if (i915_seqno_passed(rq->fence.seqno,
>>
>> Put a comment against this loop please saying where in the list it is
>> looking to insert...
> 
> Oh you haven't written this variant of insertion sort for vblank
> handling 20 times. I swear I end up repeating my mistakes over and over
> again at every level in the stack.
> 
>>> +             list_add(&rq->signal_link, pos);
>>> +             if (pos == &ce->signals)
>>> +                     list_move_tail(&ce->signal_link, &b->signalers);
>>
>> ... and here how it manages the other list as well, on transition from
>> empty to active.
> 
> Seems like the code was easy enough to follow ;)
> 
>>> +     spin_lock(&b->irq_lock);
>>> +     if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
>>
>> __test_and_clear_bit ?
> 
> Yeah this is where we need to be careful with rq->fence.flags. It is not
> specified that __test_and_clear_bit only operates on the single bit and
> so to be cautious we do the locked instruction to clobbering other
> atomic instructions to the rest of the field.

Ok.

> Although you can make a very strong case that all fence->flags are
> serialised for signaling today.
> 
>>> -             spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
>>> -             struct rb_root waiters; /* sorted by retirement, priority */
>>> -             struct list_head signals; /* sorted by retirement */
>>> -             struct task_struct *signaler; /* used for fence signalling */
>>> +             struct irq_work irq_work;
>>
>> Why did you need irq work and not just invoke the handler directly?
>> Maybe put a comment here giving a hint.
> 
> /* lock inversion horrors */
> 
> Due to the way we may directly submit requests on handling the
> dma_fence_signal, we can end up processing a
> i915_request_enable_signaling on the same engine as is currently
> emitting the signal.
> 
>>> +static int __igt_breadcrumbs_smoketest(void *arg)
>>> +{
>>> +     struct smoketest *t = arg;
>>> +     struct mutex *BKL = &t->engine->i915->drm.struct_mutex;
>>
>> Breaking new ground, well okay, although caching dev or i915 would be
>> good enough.
>>
>>> +     struct i915_request **requests;
>>> +     I915_RND_STATE(prng);
>>> +     const unsigned int total = 4 * t->ncontexts + 1;
>>> +     const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
>>> +     unsigned int num_waits = 0, num_fences = 0;
>>> +     unsigned int *order;
>>> +     int err = 0;
>>
>> Still in the Chrismas spirit? ;) No worries, it's selftests.
> 
> I was feeling generous in replacing the elaborate breadcrumb testing we
> had with something at all!

That was a dig at a Christmas tree declarations you are normally so fond 
of. :))

> That testing was the best part of intel_breadcrumbs.
>   
>> I ran out of steam and will look at selftests during some second pass.
>> In expectation, please put some high level comments for each test to
>> roughly say what it plans to test and with what approach. I makes
>> reverse engineering the algorithm much easier.
> 
> There's only one test (just run with mock_request and i915_request), a
> very, very, very simple smoketest.
> 
> I did not come up with ways of testing the new signal_list to the same
> rigour as we did before. :(

I'll go and read the next version..

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 25/34] drm/i915: Track active timelines
  2019-01-22 15:17     ` Chris Wilson
@ 2019-01-23 22:32       ` John Harrison
  2019-01-23 23:08         ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: John Harrison @ 2019-01-23 22:32 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, intel-gfx

On 1/22/2019 07:17, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-22 14:56:32)
>> On 21/01/2019 22:21, Chris Wilson wrote:
>>> Now that we pin timelines around use, we have a clearly defined lifetime
>>> and convenient points at which we can track only the active timelines.
>>> This allows us to reduce the list iteration to only consider those
>>> active timelines and not all.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_drv.h      |  2 +-
>>>    drivers/gpu/drm/i915/i915_gem.c      |  4 +--
>>>    drivers/gpu/drm/i915/i915_reset.c    |  2 +-
>>>    drivers/gpu/drm/i915/i915_timeline.c | 39 ++++++++++++++++++----------
>>>    4 files changed, 29 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index c00eaf2889fb..5577e0e1034f 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1977,7 +1977,7 @@ struct drm_i915_private {
>>>    
>>>                struct i915_gt_timelines {
>>>                        struct mutex mutex; /* protects list, tainted by GPU */
>>> -                     struct list_head list;
>>> +                     struct list_head active_list;
>>>    
>>>                        /* Pack multiple timelines' seqnos into the same page */
>>>                        spinlock_t hwsp_lock;
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 4e0de22f0166..9c499edb4c13 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -3246,7 +3246,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>>>                return timeout;
>>>    
>>>        mutex_lock(&gt->mutex);
>>> -     list_for_each_entry(tl, &gt->list, link) {
>>> +     list_for_each_entry(tl, &gt->active_list, link) {
>>>                struct i915_request *rq;
>>>    
>>>                rq = i915_gem_active_get_unlocked(&tl->last_request);
>>> @@ -3274,7 +3274,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>>>    
>>>                /* restart after reacquiring the lock */
>>>                mutex_lock(&gt->mutex);
>>> -             tl = list_entry(&gt->list, typeof(*tl), link);
>>> +             tl = list_entry(&gt->active_list, typeof(*tl), link);
>>>        }
>>>        mutex_unlock(&gt->mutex);
>>>    
>>> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
>>> index 09edf488f711..9b9169508139 100644
>>> --- a/drivers/gpu/drm/i915/i915_reset.c
>>> +++ b/drivers/gpu/drm/i915/i915_reset.c
>>> @@ -852,7 +852,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>>>         * No more can be submitted until we reset the wedged bit.
>>>         */
>>>        mutex_lock(&i915->gt.timelines.mutex);
>>> -     list_for_each_entry(tl, &i915->gt.timelines.list, link) {
>>> +     list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
>>>                struct i915_request *rq;
>>>                long timeout;
>>>    
>>> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
>>> index 69ee33dfa340..007348b1b469 100644
>>> --- a/drivers/gpu/drm/i915/i915_timeline.c
>>> +++ b/drivers/gpu/drm/i915/i915_timeline.c
>>> @@ -117,7 +117,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
>>>                       const char *name,
>>>                       struct i915_vma *hwsp)
>>>    {
>>> -     struct i915_gt_timelines *gt = &i915->gt.timelines;
>>>        void *vaddr;
>>>    
>>>        /*
>>> @@ -161,10 +160,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
>>>    
>>>        i915_syncmap_init(&timeline->sync);
>>>    
>>> -     mutex_lock(&gt->mutex);
>>> -     list_add(&timeline->link, &gt->list);
>>> -     mutex_unlock(&gt->mutex);
>>> -
>>>        return 0;
>>>    }
>>>    
>>> @@ -173,7 +168,7 @@ void i915_timelines_init(struct drm_i915_private *i915)
>>>        struct i915_gt_timelines *gt = &i915->gt.timelines;
>>>    
>>>        mutex_init(&gt->mutex);
>>> -     INIT_LIST_HEAD(&gt->list);
>>> +     INIT_LIST_HEAD(&gt->active_list);
>>>    
>>>        spin_lock_init(&gt->hwsp_lock);
>>>        INIT_LIST_HEAD(&gt->hwsp_free_list);
>>> @@ -182,6 +177,24 @@ void i915_timelines_init(struct drm_i915_private *i915)
>>>        i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
>>>    }
>>>    
>>> +static void timeline_active(struct i915_timeline *tl)
>>> +{
>>> +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
>>> +
>>> +     mutex_lock(&gt->mutex);
>>> +     list_add(&tl->link, &gt->active_list);
>>> +     mutex_unlock(&gt->mutex);
>>> +}
>>> +
>>> +static void timeline_inactive(struct i915_timeline *tl)
>>> +{
>>> +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
>>> +
>>> +     mutex_lock(&gt->mutex);
>>> +     list_del(&tl->link);
>>> +     mutex_unlock(&gt->mutex);
>>> +}
>> Bike shedding comments only:
>> Would it be better to use a verb suffix? Even though timeline_activate
>> also wouldn't sound perfect. Since it is file local - activate_timeline?
>> Or even just inline to pin/unpin. Unless more gets put into them later..
> Haven't got any plans for more here, yet, and was thinking this is a
> pinned_list myself. I picked active_list since I was using 'active'
> elsewhere for active_ring, active_engines, active_contexts, etc.
>
> I didn't like activate/deactivate enough to switch, and was trying to
> avoid reusing pin/unpin along this path:
> 	i915_timeline_pin -> timeline_pin
> begged confusion
>
> [snip]
>> Never mind the bikeshedding comments:
> There's time enough for someone to open a new pot of paint.
I agree that having a verb in there would make things clearer. Maybe 
timeline_make_(in)active? Or timeline_mark_(in)active?

> -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 25/34] drm/i915: Track active timelines
  2019-01-23 22:32       ` John Harrison
@ 2019-01-23 23:08         ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-23 23:08 UTC (permalink / raw)
  To: John Harrison, Tvrtko Ursulin, intel-gfx

Quoting John Harrison (2019-01-23 22:32:54)
> On 1/22/2019 07:17, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-22 14:56:32)
> >> On 21/01/2019 22:21, Chris Wilson wrote:
> >>> +static void timeline_active(struct i915_timeline *tl)
> >>> +{
> >>> +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> >>> +
> >>> +     mutex_lock(&gt->mutex);
> >>> +     list_add(&tl->link, &gt->active_list);
> >>> +     mutex_unlock(&gt->mutex);
> >>> +}
> >>> +
> >>> +static void timeline_inactive(struct i915_timeline *tl)
> >>> +{
> >>> +     struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
> >>> +
> >>> +     mutex_lock(&gt->mutex);
> >>> +     list_del(&tl->link);
> >>> +     mutex_unlock(&gt->mutex);
> >>> +}
> >> Bike shedding comments only:
> >> Would it be better to use a verb suffix? Even though timeline_activate
> >> also wouldn't sound perfect. Since it is file local - activate_timeline?
> >> Or even just inline to pin/unpin. Unless more gets put into them later..
> > Haven't got any plans for more here, yet, and was thinking this is a
> > pinned_list myself. I picked active_list since I was using 'active'
> > elsewhere for active_ring, active_engines, active_contexts, etc.
> >
> > I didn't like activate/deactivate enough to switch, and was trying to
> > avoid reusing pin/unpin along this path:
> >       i915_timeline_pin -> timeline_pin
> > begged confusion
> >
> > [snip]
> >> Never mind the bikeshedding comments:
> > There's time enough for someone to open a new pot of paint.
> I agree that having a verb in there would make things clearer. Maybe 
> timeline_make_(in)active? Or timeline_mark_(in)active?

mark more so than make (since the action we are doing is external).

How about

	timeline_track_active()
	timeline_untrack_active() / timeline_cancel_active()

or
	timeline_add_to_active()
	timeline_remove_from_active()

Finally!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex
  2019-01-21 22:20 ` [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex Chris Wilson
@ 2019-01-24 12:06   ` Mika Kuoppala
  2019-01-24 12:50     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Mika Kuoppala @ 2019-01-24 12:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Now that the submission backends are controlled via their own spinlocks,
> with a wave of a magic wand we can lift the struct_mutex requirement
> around GPU reset. That is we allow the submission frontend (userspace)
> to keep on submitting while we process the GPU reset as we can suspend
> the backend independently.
>
> The major change is around the backoff/handoff strategy for performing
> the reset. With no mutex deadlock, we no longer have to coordinate with
> any waiter, and just perform the reset immediately.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c           |  38 +-
>  drivers/gpu/drm/i915/i915_drv.h               |   5 -
>  drivers/gpu/drm/i915/i915_gem.c               |  18 +-
>  drivers/gpu/drm/i915/i915_gem_fence_reg.h     |   1 -
>  drivers/gpu/drm/i915/i915_gem_gtt.h           |   1 +
>  drivers/gpu/drm/i915/i915_gpu_error.c         | 104 +++--
>  drivers/gpu/drm/i915/i915_gpu_error.h         |  28 +-
>  drivers/gpu/drm/i915/i915_request.c           |  47 ---
>  drivers/gpu/drm/i915/i915_reset.c             | 397 ++++++++----------
>  drivers/gpu/drm/i915/i915_reset.h             |   3 +
>  drivers/gpu/drm/i915/intel_engine_cs.c        |   6 +-
>  drivers/gpu/drm/i915/intel_guc_submission.c   |   5 +-
>  drivers/gpu/drm/i915/intel_hangcheck.c        |  28 +-
>  drivers/gpu/drm/i915/intel_lrc.c              |  92 ++--
>  drivers/gpu/drm/i915/intel_overlay.c          |   2 -
>  drivers/gpu/drm/i915/intel_ringbuffer.c       |  91 ++--
>  drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 +-
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  |  57 +--
>  .../drm/i915/selftests/intel_workarounds.c    |   3 -
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |   4 +-
>  20 files changed, 393 insertions(+), 554 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 24d6d4ce14ef..3ec369980d40 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1284,8 +1284,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  		seq_puts(m, "Wedged\n");
>  	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
>  		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
> -	if (test_bit(I915_RESET_HANDOFF, &dev_priv->gpu_error.flags))
> -		seq_puts(m, "Reset in progress: reset handoff to waiter\n");
>  	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
>  		seq_puts(m, "Waiter holding struct mutex\n");
>  	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
> @@ -1321,15 +1319,15 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  		struct rb_node *rb;
>  
>  		seq_printf(m, "%s:\n", engine->name);
> -		seq_printf(m, "\tseqno = %x [current %x, last %x]\n",
> +		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
>  			   engine->hangcheck.seqno, seqno[id],
> -			   intel_engine_last_submit(engine));
> -		seq_printf(m, "\twaiters? %s, fake irq active? %s, stalled? %s, wedged? %s\n",
> +			   intel_engine_last_submit(engine),
> +			   jiffies_to_msecs(jiffies -
> +					    engine->hangcheck.action_timestamp));
> +		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
>  			   yesno(intel_engine_has_waiter(engine)),
>  			   yesno(test_bit(engine->id,
> -					  &dev_priv->gpu_error.missed_irq_rings)),
> -			   yesno(engine->hangcheck.stalled),
> -			   yesno(engine->hangcheck.wedged));
> +					  &dev_priv->gpu_error.missed_irq_rings)));
>  
>  		spin_lock_irq(&b->rb_lock);
>  		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> @@ -1343,11 +1341,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
>  			   (long long)engine->hangcheck.acthd,
>  			   (long long)acthd[id]);
> -		seq_printf(m, "\taction = %s(%d) %d ms ago\n",
> -			   hangcheck_action_to_str(engine->hangcheck.action),
> -			   engine->hangcheck.action,
> -			   jiffies_to_msecs(jiffies -
> -					    engine->hangcheck.action_timestamp));

Yeah it is a time for sample and most decision are on top of
seqno. Welcomed compression.

>  
>  		if (engine->id == RCS) {
>  			seq_puts(m, "\tinstdone read =\n");
> @@ -3886,8 +3879,6 @@ static int
>  i915_wedged_set(void *data, u64 val)

*hones his axe*

>  {
>  	struct drm_i915_private *i915 = data;
> -	struct intel_engine_cs *engine;
> -	unsigned int tmp;
>  
>  	/*
>  	 * There is no safeguard against this debugfs entry colliding
> @@ -3900,18 +3891,8 @@ i915_wedged_set(void *data, u64 val)
>  	if (i915_reset_backoff(&i915->gpu_error))
>  		return -EAGAIN;
>  
> -	for_each_engine_masked(engine, i915, val, tmp) {
> -		engine->hangcheck.seqno = intel_engine_get_seqno(engine);
> -		engine->hangcheck.stalled = true;
> -	}
> -
>  	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
>  			  "Manually set wedged engine mask = %llx", val);
> -
> -	wait_on_bit(&i915->gpu_error.flags,
> -		    I915_RESET_HANDOFF,
> -		    TASK_UNINTERRUPTIBLE);
> -
>  	return 0;
>  }
>  
> @@ -4066,13 +4047,8 @@ i915_drop_caches_set(void *data, u64 val)
>  		mutex_unlock(&i915->drm.struct_mutex);
>  	}
>  
> -	if (val & DROP_RESET_ACTIVE &&
> -	    i915_terminally_wedged(&i915->gpu_error)) {
> +	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
>  		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
> -		wait_on_bit(&i915->gpu_error.flags,
> -			    I915_RESET_HANDOFF,
> -			    TASK_UNINTERRUPTIBLE);
> -	}
>  
>  	fs_reclaim_acquire(GFP_KERNEL);
>  	if (val & DROP_BOUND)
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 03db011caa8e..59a7e90113d7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3001,11 +3001,6 @@ static inline bool i915_reset_backoff(struct i915_gpu_error *error)
>  	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
>  }
>  
> -static inline bool i915_reset_handoff(struct i915_gpu_error *error)
> -{
> -	return unlikely(test_bit(I915_RESET_HANDOFF, &error->flags));
> -}
> -
>  static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>  {
>  	return unlikely(test_bit(I915_WEDGED, &error->flags));
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b359390ba22c..d20b42386c3c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -657,11 +657,6 @@ i915_gem_object_wait(struct drm_i915_gem_object *obj,
>  		     struct intel_rps_client *rps_client)
>  {
>  	might_sleep();
> -#if IS_ENABLED(CONFIG_LOCKDEP)
> -	GEM_BUG_ON(debug_locks &&
> -		   !!lockdep_is_held(&obj->base.dev->struct_mutex) !=
> -		   !!(flags & I915_WAIT_LOCKED));
> -#endif
>  	GEM_BUG_ON(timeout < 0);
>  
>  	timeout = i915_gem_object_wait_reservation(obj->resv,
> @@ -4493,8 +4488,6 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  
>  	GEM_TRACE("\n");
>  
> -	mutex_lock(&i915->drm.struct_mutex);
> -
>  	wakeref = intel_runtime_pm_get(i915);
>  	intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
>  
> @@ -4520,6 +4513,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
>  	intel_runtime_pm_put(i915, wakeref);
>

unset_wedged looks ok, I should have faith
as I reviewed the patch. In retrospect
READ_ONCE on gt.scratch might have been
good at rising suspicion, even tho
superfluous.

Looks like that engines we are saved by the
timeline lock. Andw we have layed some GEM_BUG_ON mines
there so we will hear the explosions if any.

> +	mutex_lock(&i915->drm.struct_mutex);
>  	i915_gem_contexts_lost(i915);
>  	mutex_unlock(&i915->drm.struct_mutex);
>  }
> @@ -4534,6 +4528,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>  	wakeref = intel_runtime_pm_get(i915);
>  	intel_suspend_gt_powersave(i915);
>  
> +	flush_workqueue(i915->wq);

I don't know what is happening here. Why
don't we need the i915_gem_drain_workqueue in here?

> +
>  	mutex_lock(&i915->drm.struct_mutex);
>  
>  	/*
> @@ -4563,11 +4559,9 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>  	i915_retire_requests(i915); /* ensure we flush after wedging */
>  
>  	mutex_unlock(&i915->drm.struct_mutex);
> +	i915_reset_flush(i915);
>  
> -	intel_uc_suspend(i915);
> -
> -	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
> -	cancel_delayed_work_sync(&i915->gt.retire_work);
> +	drain_delayed_work(&i915->gt.retire_work);

Hangcheck is inside reset flush but why the change
for retire?

>  
>  	/*
>  	 * As the idle_work is rearming if it detects a race, play safe and
> @@ -4575,6 +4569,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>  	 */
>  	drain_delayed_work(&i915->gt.idle_work);
>  
> +	intel_uc_suspend(i915);
> +
>  	/*
>  	 * Assert that we successfully flushed all the work and
>  	 * reset the GPU back to its idle, low power state.
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.h b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
> index 99a31ded4dfd..09dcaf14121b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.h
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.h
> @@ -50,4 +50,3 @@ struct drm_i915_fence_reg {
>  };
>  
>  #endif
> -
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 9229b03d629b..a0039ea97cdc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -39,6 +39,7 @@
>  #include <linux/pagevec.h>
>  
>  #include "i915_request.h"
> +#include "i915_reset.h"
>  #include "i915_selftest.h"
>  #include "i915_timeline.h"
>  
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1f8e80e31b49..4eef0462489c 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -533,10 +533,7 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>  	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
>  	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
>  	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
> -	err_printf(m, "  hangcheck stall: %s\n", yesno(ee->hangcheck_stalled));
> -	err_printf(m, "  hangcheck action: %s\n",
> -		   hangcheck_action_to_str(ee->hangcheck_action));
> -	err_printf(m, "  hangcheck action timestamp: %dms (%lu%s)\n",
> +	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
>  		   jiffies_to_msecs(ee->hangcheck_timestamp - epoch),
>  		   ee->hangcheck_timestamp,
>  		   ee->hangcheck_timestamp == epoch ? "; epoch" : "");
> @@ -684,15 +681,15 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>  		   jiffies_to_msecs(error->capture - error->epoch));
>  
>  	for (i = 0; i < ARRAY_SIZE(error->engine); i++) {
> -		if (error->engine[i].hangcheck_stalled &&
> -		    error->engine[i].context.pid) {
> -			err_printf(m, "Active process (on ring %s): %s [%d], score %d%s\n",
> -				   engine_name(m->i915, i),
> -				   error->engine[i].context.comm,
> -				   error->engine[i].context.pid,
> -				   error->engine[i].context.ban_score,
> -				   bannable(&error->engine[i].context));
> -		}
> +		if (!error->engine[i].context.pid)
> +			continue;
> +
> +		err_printf(m, "Active process (on ring %s): %s [%d], score %d%s\n",
> +			   engine_name(m->i915, i),
> +			   error->engine[i].context.comm,
> +			   error->engine[i].context.pid,
> +			   error->engine[i].context.ban_score,
> +			   bannable(&error->engine[i].context));
>  	}
>  	err_printf(m, "Reset count: %u\n", error->reset_count);
>  	err_printf(m, "Suspend count: %u\n", error->suspend_count);
> @@ -1144,7 +1141,8 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
>  	return i;
>  }
>  
> -/* Generate a semi-unique error code. The code is not meant to have meaning, The
> +/*
> + * Generate a semi-unique error code. The code is not meant to have meaning, The
>   * code's only purpose is to try to prevent false duplicated bug reports by
>   * grossly estimating a GPU error state.
>   *
> @@ -1153,29 +1151,23 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
>   *
>   * It's only a small step better than a random number in its current form.
>   */
> -static u32 i915_error_generate_code(struct drm_i915_private *dev_priv,
> -				    struct i915_gpu_state *error,
> -				    int *engine_id)
> +static u32 i915_error_generate_code(struct i915_gpu_state *error,
> +				    unsigned long engine_mask)
>  {
> -	u32 error_code = 0;
> -	int i;
> -
> -	/* IPEHR would be an ideal way to detect errors, as it's the gross
> +	/*
> +	 * IPEHR would be an ideal way to detect errors, as it's the gross
>  	 * measure of "the command that hung." However, has some very common
>  	 * synchronization commands which almost always appear in the case
>  	 * strictly a client bug. Use instdone to differentiate those some.
>  	 */
> -	for (i = 0; i < I915_NUM_ENGINES; i++) {
> -		if (error->engine[i].hangcheck_stalled) {
> -			if (engine_id)
> -				*engine_id = i;
> +	if (engine_mask) {
> +		struct drm_i915_error_engine *ee =
> +			&error->engine[ffs(engine_mask)];
>  
> -			return error->engine[i].ipehr ^
> -			       error->engine[i].instdone.instdone;
> -		}
> +		return ee->ipehr ^ ee->instdone.instdone;
>  	}
>  
> -	return error_code;
> +	return 0;
>  }
>  
>  static void gem_record_fences(struct i915_gpu_state *error)
> @@ -1338,9 +1330,8 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
>  	}
>  
>  	ee->idle = intel_engine_is_idle(engine);
> -	ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
> -	ee->hangcheck_action = engine->hangcheck.action;
> -	ee->hangcheck_stalled = engine->hangcheck.stalled;
> +	if (!ee->idle)
> +		ee->hangcheck_timestamp = engine->hangcheck.action_timestamp;
>  	ee->reset_count = i915_reset_engine_count(&dev_priv->gpu_error,
>  						  engine);
>  
> @@ -1783,31 +1774,35 @@ static void capture_reg_state(struct i915_gpu_state *error)
>  	error->pgtbl_er = I915_READ(PGTBL_ER);
>  }
>  
> -static void i915_error_capture_msg(struct drm_i915_private *dev_priv,
> -				   struct i915_gpu_state *error,
> -				   u32 engine_mask,
> -				   const char *error_msg)
> +static const char *
> +error_msg(struct i915_gpu_state *error, unsigned long engines, const char *msg)
>  {
> -	u32 ecode;
> -	int engine_id = -1, len;
> +	int len;
> +	int i;
>  
> -	ecode = i915_error_generate_code(dev_priv, error, &engine_id);
> +	for (i = 0; i < ARRAY_SIZE(error->engine); i++)
> +		if (!error->engine[i].context.pid)
> +			engines &= ~BIT(i);

No more grouping for driver internal hangs...?

>  
>  	len = scnprintf(error->error_msg, sizeof(error->error_msg),
> -			"GPU HANG: ecode %d:%d:0x%08x",
> -			INTEL_GEN(dev_priv), engine_id, ecode);
> -
> -	if (engine_id != -1 && error->engine[engine_id].context.pid)
> +			"GPU HANG: ecode %d:%lx:0x%08x",
> +			INTEL_GEN(error->i915), engines,
> +			i915_error_generate_code(error, engines));
> +	if (engines) {
> +		/* Just show the first executing process, more is confusing */
> +		i = ffs(engines);

then why not just make the ecode accepting single engine and move it here.

>  		len += scnprintf(error->error_msg + len,
>  				 sizeof(error->error_msg) - len,
>  				 ", in %s [%d]",
> -				 error->engine[engine_id].context.comm,
> -				 error->engine[engine_id].context.pid);
> +				 error->engine[i].context.comm,
> +				 error->engine[i].context.pid);
> +	}
> +	if (msg)
> +		len += scnprintf(error->error_msg + len,
> +				 sizeof(error->error_msg) - len,
> +				 ", %s", msg);
>  
> -	scnprintf(error->error_msg + len, sizeof(error->error_msg) - len,
> -		  ", reason: %s, action: %s",
> -		  error_msg,
> -		  engine_mask ? "reset" : "continue");
> +	return error->error_msg;
>  }
>  
>  static void capture_gen_state(struct i915_gpu_state *error)
> @@ -1847,7 +1842,7 @@ static unsigned long capture_find_epoch(const struct i915_gpu_state *error)
>  	for (i = 0; i < ARRAY_SIZE(error->engine); i++) {
>  		const struct drm_i915_error_engine *ee = &error->engine[i];
>  
> -		if (ee->hangcheck_stalled &&
> +		if (ee->hangcheck_timestamp &&
>  		    time_before(ee->hangcheck_timestamp, epoch))
>  			epoch = ee->hangcheck_timestamp;
>  	}
> @@ -1921,7 +1916,7 @@ i915_capture_gpu_state(struct drm_i915_private *i915)
>   * i915_capture_error_state - capture an error record for later analysis
>   * @i915: i915 device
>   * @engine_mask: the mask of engines triggering the hang
> - * @error_msg: a message to insert into the error capture header
> + * @msg: a message to insert into the error capture header
>   *
>   * Should be called when an error is detected (either a hang or an error
>   * interrupt) to capture error state from the time of the error.  Fills
> @@ -1929,8 +1924,8 @@ i915_capture_gpu_state(struct drm_i915_private *i915)
>   * to pick up.
>   */
>  void i915_capture_error_state(struct drm_i915_private *i915,
> -			      u32 engine_mask,
> -			      const char *error_msg)
> +			      unsigned long engine_mask,
> +			      const char *msg)
>  {
>  	static bool warned;
>  	struct i915_gpu_state *error;
> @@ -1946,8 +1941,7 @@ void i915_capture_error_state(struct drm_i915_private *i915,
>  	if (IS_ERR(error))
>  		return;
>  
> -	i915_error_capture_msg(i915, error, engine_mask, error_msg);
> -	DRM_INFO("%s\n", error->error_msg);
> +	dev_info(i915->drm.dev, "%s\n", error_msg(error, engine_mask, msg));
>  
>  	if (!error->simulated) {
>  		spin_lock_irqsave(&i915->gpu_error.lock, flags);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 604291f7762d..231173786eae 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -85,8 +85,6 @@ struct i915_gpu_state {
>  		bool waiting;
>  		int num_waiters;
>  		unsigned long hangcheck_timestamp;
> -		bool hangcheck_stalled;
> -		enum intel_engine_hangcheck_action hangcheck_action;
>  		struct i915_address_space *vm;
>  		int num_requests;
>  		u32 reset_count;
> @@ -197,6 +195,8 @@ struct i915_gpu_state {
>  	struct scatterlist *sgl, *fit;
>  };
>  
> +struct i915_gpu_restart;
> +
>  struct i915_gpu_error {
>  	/* For hangcheck timer */
>  #define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
> @@ -247,15 +247,6 @@ struct i915_gpu_error {
>  	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
>  	 * secondary role in preventing two concurrent global reset attempts.
>  	 *
> -	 * #I915_RESET_HANDOFF - To perform the actual GPU reset, we need the
> -	 * struct_mutex. We try to acquire the struct_mutex in the reset worker,
> -	 * but it may be held by some long running waiter (that we cannot
> -	 * interrupt without causing trouble). Once we are ready to do the GPU
> -	 * reset, we set the I915_RESET_HANDOFF bit and wakeup any waiters. If
> -	 * they already hold the struct_mutex and want to participate they can
> -	 * inspect the bit and do the reset directly, otherwise the worker
> -	 * waits for the struct_mutex.
> -	 *
>  	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
>  	 * acquire the struct_mutex to reset an engine, we need an explicit
>  	 * flag to prevent two concurrent reset attempts in the same engine.
> @@ -269,20 +260,13 @@ struct i915_gpu_error {
>  	 */
>  	unsigned long flags;
>  #define I915_RESET_BACKOFF	0
> -#define I915_RESET_HANDOFF	1
> -#define I915_RESET_MODESET	2
> -#define I915_RESET_ENGINE	3
> +#define I915_RESET_MODESET	1
> +#define I915_RESET_ENGINE	2
>  #define I915_WEDGED		(BITS_PER_LONG - 1)
>  
>  	/** Number of times an engine has been reset */
>  	u32 reset_engine_count[I915_NUM_ENGINES];
>  
> -	/** Set of stalled engines with guilty requests, in the current reset */
> -	u32 stalled_mask;
> -
> -	/** Reason for the current *global* reset */
> -	const char *reason;
> -
>  	struct mutex wedge_mutex; /* serialises wedging/unwedging */
>  
>  	/**
> @@ -299,6 +283,8 @@ struct i915_gpu_error {
>  
>  	/* For missed irq/seqno simulation. */
>  	unsigned long test_irq_rings;
> +
> +	struct i915_gpu_restart *restart;
>  };
>  
>  struct drm_i915_error_state_buf {
> @@ -320,7 +306,7 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
>  
>  struct i915_gpu_state *i915_capture_gpu_state(struct drm_i915_private *i915);
>  void i915_capture_error_state(struct drm_i915_private *dev_priv,
> -			      u32 engine_mask,
> +			      unsigned long engine_mask,
>  			      const char *error_msg);
>  
>  static inline struct i915_gpu_state *
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5e178f5ac18b..80232de8e2be 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1083,18 +1083,6 @@ static bool __i915_spin_request(const struct i915_request *rq,
>  	return false;
>  }
>  
> -static bool __i915_wait_request_check_and_reset(struct i915_request *request)
> -{
> -	struct i915_gpu_error *error = &request->i915->gpu_error;
> -
> -	if (likely(!i915_reset_handoff(error)))
> -		return false;
> -
> -	__set_current_state(TASK_RUNNING);
> -	i915_reset(request->i915, error->stalled_mask, error->reason);
> -	return true;
> -}
> -
>  /**
>   * i915_request_wait - wait until execution of request has finished
>   * @rq: the request to wait upon
> @@ -1120,17 +1108,10 @@ long i915_request_wait(struct i915_request *rq,
>  {
>  	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
>  		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> -	wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
> -	DEFINE_WAIT_FUNC(reset, default_wake_function);
>  	DEFINE_WAIT_FUNC(exec, default_wake_function);
>  	struct intel_wait wait;
>  
>  	might_sleep();
> -#if IS_ENABLED(CONFIG_LOCKDEP)
> -	GEM_BUG_ON(debug_locks &&
> -		   !!lockdep_is_held(&rq->i915->drm.struct_mutex) !=
> -		   !!(flags & I915_WAIT_LOCKED));
> -#endif
>  	GEM_BUG_ON(timeout < 0);
>  
>  	if (i915_request_completed(rq))
> @@ -1140,11 +1121,7 @@ long i915_request_wait(struct i915_request *rq,
>  		return -ETIME;
>  
>  	trace_i915_request_wait_begin(rq, flags);
> -
>  	add_wait_queue(&rq->execute, &exec);
> -	if (flags & I915_WAIT_LOCKED)
> -		add_wait_queue(errq, &reset);
> -
>  	intel_wait_init(&wait);
>  	if (flags & I915_WAIT_PRIORITY)
>  		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
> @@ -1155,10 +1132,6 @@ long i915_request_wait(struct i915_request *rq,
>  		if (intel_wait_update_request(&wait, rq))
>  			break;
>  
> -		if (flags & I915_WAIT_LOCKED &&
> -		    __i915_wait_request_check_and_reset(rq))
> -			continue;
> -
>  		if (signal_pending_state(state, current)) {
>  			timeout = -ERESTARTSYS;
>  			goto complete;
> @@ -1188,9 +1161,6 @@ long i915_request_wait(struct i915_request *rq,
>  		 */
>  		goto wakeup;
>  
> -	if (flags & I915_WAIT_LOCKED)
> -		__i915_wait_request_check_and_reset(rq);
> -
>  	for (;;) {
>  		if (signal_pending_state(state, current)) {
>  			timeout = -ERESTARTSYS;
> @@ -1214,21 +1184,6 @@ long i915_request_wait(struct i915_request *rq,
>  		if (i915_request_completed(rq))
>  			break;
>  
> -		/*
> -		 * If the GPU is hung, and we hold the lock, reset the GPU
> -		 * and then check for completion. On a full reset, the engine's
> -		 * HW seqno will be advanced passed us and we are complete.
> -		 * If we do a partial reset, we have to wait for the GPU to
> -		 * resume and update the breadcrumb.
> -		 *
> -		 * If we don't hold the mutex, we can just wait for the worker
> -		 * to come along and update the breadcrumb (either directly
> -		 * itself, or indirectly by recovering the GPU).
> -		 */
> -		if (flags & I915_WAIT_LOCKED &&
> -		    __i915_wait_request_check_and_reset(rq))
> -			continue;
> -
>  		/* Only spin if we know the GPU is processing this request */
>  		if (__i915_spin_request(rq, wait.seqno, state, 2))
>  			break;
> @@ -1242,8 +1197,6 @@ long i915_request_wait(struct i915_request *rq,
>  	intel_engine_remove_wait(rq->engine, &wait);
>  complete:
>  	__set_current_state(TASK_RUNNING);
> -	if (flags & I915_WAIT_LOCKED)
> -		remove_wait_queue(errq, &reset);
>  	remove_wait_queue(&rq->execute, &exec);
>  	trace_i915_request_wait_end(rq);
>  
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 2961c21d9420..064fc6da1512 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -5,6 +5,7 @@
>   */
>  
>  #include <linux/sched/mm.h>
> +#include <linux/stop_machine.h>
>  
>  #include "i915_drv.h"
>  #include "i915_gpu_error.h"
> @@ -17,22 +18,23 @@ static void engine_skip_context(struct i915_request *rq)
>  	struct intel_engine_cs *engine = rq->engine;
>  	struct i915_gem_context *hung_ctx = rq->gem_context;
>  	struct i915_timeline *timeline = rq->timeline;
> -	unsigned long flags;
>  
> +	lockdep_assert_held(&engine->timeline.lock);
>  	GEM_BUG_ON(timeline == &engine->timeline);
>  
> -	spin_lock_irqsave(&engine->timeline.lock, flags);
>  	spin_lock(&timeline->lock);
>  
> -	list_for_each_entry_continue(rq, &engine->timeline.requests, link)
> -		if (rq->gem_context == hung_ctx)
> -			i915_request_skip(rq, -EIO);
> +	if (rq->global_seqno) {
> +		list_for_each_entry_continue(rq,
> +					     &engine->timeline.requests, link)
> +			if (rq->gem_context == hung_ctx)
> +				i915_request_skip(rq, -EIO);
> +	}
>  
>  	list_for_each_entry(rq, &timeline->requests, link)
>  		i915_request_skip(rq, -EIO);
>  
>  	spin_unlock(&timeline->lock);
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>  }
>  
>  static void client_mark_guilty(struct drm_i915_file_private *file_priv,
> @@ -59,7 +61,7 @@ static void client_mark_guilty(struct drm_i915_file_private *file_priv,
>  	}
>  }
>  
> -static void context_mark_guilty(struct i915_gem_context *ctx)
> +static bool context_mark_guilty(struct i915_gem_context *ctx)
>  {
>  	unsigned int score;
>  	bool banned, bannable;
> @@ -72,7 +74,7 @@ static void context_mark_guilty(struct i915_gem_context *ctx)
>  
>  	/* Cool contexts don't accumulate client ban score */
>  	if (!bannable)
> -		return;
> +		return false;
>  
>  	if (banned) {
>  		DRM_DEBUG_DRIVER("context %s: guilty %d, score %u, banned\n",
> @@ -83,6 +85,8 @@ static void context_mark_guilty(struct i915_gem_context *ctx)
>  
>  	if (!IS_ERR_OR_NULL(ctx->file_priv))
>  		client_mark_guilty(ctx->file_priv, ctx);
> +
> +	return banned;
>  }
>  
>  static void context_mark_innocent(struct i915_gem_context *ctx)
> @@ -90,6 +94,21 @@ static void context_mark_innocent(struct i915_gem_context *ctx)
>  	atomic_inc(&ctx->active_count);
>  }
>  
> +void i915_reset_request(struct i915_request *rq, bool guilty)
> +{
> +	lockdep_assert_held(&rq->engine->timeline.lock);
> +	GEM_BUG_ON(i915_request_completed(rq));
> +
> +	if (guilty) {
> +		i915_request_skip(rq, -EIO);
> +		if (context_mark_guilty(rq->gem_context))
> +			engine_skip_context(rq);
> +	} else {
> +		dma_fence_set_error(&rq->fence, -EAGAIN);
> +		context_mark_innocent(rq->gem_context);
> +	}
> +}
> +
>  static void gen3_stop_engine(struct intel_engine_cs *engine)
>  {
>  	struct drm_i915_private *dev_priv = engine->i915;
> @@ -533,22 +552,6 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>  	int retry;
>  	int ret;
>  
> -	/*
> -	 * We want to perform per-engine reset from atomic context (e.g.
> -	 * softirq), which imposes the constraint that we cannot sleep.
> -	 * However, experience suggests that spending a bit of time waiting
> -	 * for a reset helps in various cases, so for a full-device reset
> -	 * we apply the opposite rule and wait if we want to. As we should
> -	 * always follow up a failed per-engine reset with a full device reset,
> -	 * being a little faster, stricter and more error prone for the
> -	 * atomic case seems an acceptable compromise.
> -	 *
> -	 * Unfortunately this leads to a bimodal routine, when the goal was
> -	 * to have a single reset function that worked for resetting any
> -	 * number of engines simultaneously.
> -	 */
> -	might_sleep_if(engine_mask == ALL_ENGINES);

Oh here it is. I was after this on atomic resets.

> -
>  	/*
>  	 * If the power well sleeps during the reset, the reset
>  	 * request may be dropped and never completes (causing -EIO).
> @@ -580,8 +583,6 @@ int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
>  		}
>  		if (ret != -ETIMEDOUT || engine_mask != ALL_ENGINES)
>  			break;
> -
> -		cond_resched();
>  	}
>  	intel_uncore_forcewake_put(i915, FORCEWAKE_ALL);
>  
> @@ -620,11 +621,8 @@ int intel_reset_guc(struct drm_i915_private *i915)
>   * Ensure irq handler finishes, and not run again.
>   * Also return the active request so that we only search for it once.
>   */
> -static struct i915_request *
> -reset_prepare_engine(struct intel_engine_cs *engine)
> +static void reset_prepare_engine(struct intel_engine_cs *engine)
>  {
> -	struct i915_request *rq;
> -
>  	/*
>  	 * During the reset sequence, we must prevent the engine from
>  	 * entering RC6. As the context state is undefined until we restart
> @@ -633,162 +631,86 @@ reset_prepare_engine(struct intel_engine_cs *engine)
>  	 * GPU state upon resume, i.e. fail to restart after a reset.
>  	 */
>  	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
> -
> -	rq = engine->reset.prepare(engine);
> -	if (rq && rq->fence.error == -EIO)
> -		rq = ERR_PTR(-EIO); /* Previous reset failed! */
> -
> -	return rq;
> +	engine->reset.prepare(engine);
>  }
>  
> -static int reset_prepare(struct drm_i915_private *i915)
> +static void reset_prepare(struct drm_i915_private *i915)
>  {
>  	struct intel_engine_cs *engine;
> -	struct i915_request *rq;
>  	enum intel_engine_id id;
> -	int err = 0;
>  
> -	for_each_engine(engine, i915, id) {
> -		rq = reset_prepare_engine(engine);
> -		if (IS_ERR(rq)) {
> -			err = PTR_ERR(rq);
> -			continue;
> -		}
> -
> -		engine->hangcheck.active_request = rq;
> -	}
> +	for_each_engine(engine, i915, id)
> +		reset_prepare_engine(engine);
>  
> -	i915_gem_revoke_fences(i915);
>  	intel_uc_sanitize(i915);
> -
> -	return err;
>  }
>  
> -/* Returns the request if it was guilty of the hang */
> -static struct i915_request *
> -reset_request(struct intel_engine_cs *engine,
> -	      struct i915_request *rq,
> -	      bool stalled)
> +static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>  {
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	int err;
> +
>  	/*
> -	 * The guilty request will get skipped on a hung engine.
> -	 *
> -	 * Users of client default contexts do not rely on logical
> -	 * state preserved between batches so it is safe to execute
> -	 * queued requests following the hang. Non default contexts
> -	 * rely on preserved state, so skipping a batch loses the
> -	 * evolution of the state and it needs to be considered corrupted.
> -	 * Executing more queued batches on top of corrupted state is
> -	 * risky. But we take the risk by trying to advance through
> -	 * the queued requests in order to make the client behaviour
> -	 * more predictable around resets, by not throwing away random
> -	 * amount of batches it has prepared for execution. Sophisticated
> -	 * clients can use gem_reset_stats_ioctl and dma fence status
> -	 * (exported via sync_file info ioctl on explicit fences) to observe
> -	 * when it loses the context state and should rebuild accordingly.
> -	 *
> -	 * The context ban, and ultimately the client ban, mechanism are safety
> -	 * valves if client submission ends up resulting in nothing more than
> -	 * subsequent hangs.
> +	 * Everything depends on having the GTT running, so we need to start
> +	 * there.
>  	 */
> +	err = i915_ggtt_enable_hw(i915);
> +	if (err)
> +		return err;
>  
> -	if (i915_request_completed(rq)) {
> -		GEM_TRACE("%s pardoned global=%d (fence %llx:%lld), current %d\n",
> -			  engine->name, rq->global_seqno,
> -			  rq->fence.context, rq->fence.seqno,
> -			  intel_engine_get_seqno(engine));
> -		stalled = false;
> -	}
> -
> -	if (stalled) {
> -		context_mark_guilty(rq->gem_context);
> -		i915_request_skip(rq, -EIO);
> +	for_each_engine(engine, i915, id)
> +		intel_engine_reset(engine, stalled_mask & ENGINE_MASK(id));
>  
> -		/* If this context is now banned, skip all pending requests. */
> -		if (i915_gem_context_is_banned(rq->gem_context))
> -			engine_skip_context(rq);
> -	} else {
> -		/*
> -		 * Since this is not the hung engine, it may have advanced
> -		 * since the hang declaration. Double check by refinding
> -		 * the active request at the time of the reset.
> -		 */
> -		rq = i915_gem_find_active_request(engine);
> -		if (rq) {
> -			unsigned long flags;
> -
> -			context_mark_innocent(rq->gem_context);
> -			dma_fence_set_error(&rq->fence, -EAGAIN);
> -
> -			/* Rewind the engine to replay the incomplete rq */
> -			spin_lock_irqsave(&engine->timeline.lock, flags);
> -			rq = list_prev_entry(rq, link);
> -			if (&rq->link == &engine->timeline.requests)
> -				rq = NULL;
> -			spin_unlock_irqrestore(&engine->timeline.lock, flags);
> -		}
> -	}
> +	i915_gem_restore_fences(i915);
>  
> -	return rq;
> +	return err;
>  }
>  
> -static void reset_engine(struct intel_engine_cs *engine,
> -			 struct i915_request *rq,
> -			 bool stalled)
> +static void reset_finish_engine(struct intel_engine_cs *engine)
>  {
> -	if (rq)
> -		rq = reset_request(engine, rq, stalled);
> -
> -	/* Setup the CS to resume from the breadcrumb of the hung request */
> -	engine->reset.reset(engine, rq);
> +	engine->reset.finish(engine);
> +	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
>  }
>  
> -static void gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> +struct i915_gpu_restart {
> +	struct work_struct work;
> +	struct drm_i915_private *i915;
> +};
> +
> +static void restart_work(struct work_struct *work)
>  {
> +	struct i915_gpu_restart *arg = container_of(work, typeof(*arg), work);
> +	struct drm_i915_private *i915 = arg->i915;
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
>  
> -	lockdep_assert_held(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +	mutex_lock(&i915->drm.struct_mutex);
>  
> -	i915_retire_requests(i915);

Can't do this anymore yes. What will be the effect
of delaying this and the other explicit retirements?
Are we more prone to starvation?

> +	smp_store_mb(i915->gpu_error.restart, NULL);

Checkpatch might want a comment for the mb.

>  
>  	for_each_engine(engine, i915, id) {
> -		struct intel_context *ce;
> -
> -		reset_engine(engine,
> -			     engine->hangcheck.active_request,
> -			     stalled_mask & ENGINE_MASK(id));
> -		ce = fetch_and_zero(&engine->last_retired_context);
> -		if (ce)
> -			intel_context_unpin(ce);
> +		struct i915_request *rq;
>  
>  		/*
>  		 * Ostensibily, we always want a context loaded for powersaving,
>  		 * so if the engine is idle after the reset, send a request
>  		 * to load our scratch kernel_context.
> -		 *
> -		 * More mysteriously, if we leave the engine idle after a reset,
> -		 * the next userspace batch may hang, with what appears to be
> -		 * an incoherent read by the CS (presumably stale TLB). An
> -		 * empty request appears sufficient to paper over the glitch.
>  		 */
> -		if (intel_engine_is_idle(engine)) {
> -			struct i915_request *rq;
> +		if (!intel_engine_is_idle(engine))
> +			continue;

Why did you remove the comment on needing a empty request?

Also if the request causing nonidle could be troublesome one,
from troublesome context, why not just skip the idle check and
always add one for kernel ctx?

>  
> -			rq = i915_request_alloc(engine, i915->kernel_context);
> -			if (!IS_ERR(rq))
> -				i915_request_add(rq);
> -		}
> +		rq = i915_request_alloc(engine, i915->kernel_context);
> +		if (!IS_ERR(rq))
> +			i915_request_add(rq);
>  	}
>  
> -	i915_gem_restore_fences(i915);
> -}
> -
> -static void reset_finish_engine(struct intel_engine_cs *engine)
> -{
> -	engine->reset.finish(engine);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	intel_runtime_pm_put(i915, wakeref);
>  
> -	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
> +	kfree(arg);
>  }
>  
>  static void reset_finish(struct drm_i915_private *i915)
> @@ -796,11 +718,30 @@ static void reset_finish(struct drm_i915_private *i915)
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
>  
> -	lockdep_assert_held(&i915->drm.struct_mutex);
> -
> -	for_each_engine(engine, i915, id) {
> -		engine->hangcheck.active_request = NULL;
> +	for_each_engine(engine, i915, id)
>  		reset_finish_engine(engine);
> +}
> +
> +static void reset_restart(struct drm_i915_private *i915)
> +{
> +	struct i915_gpu_restart *arg;
> +
> +	/*
> +	 * Following the reset, ensure that we always reload context for
> +	 * powersaving, and to correct engine->last_retired_context. Since
> +	 * this requires us to submit a request, queue a worker to do that
> +	 * task for us to evade any locking here.
> +	 */

Nice, this was/will be helpful!

> +	if (READ_ONCE(i915->gpu_error.restart))
> +		return;
> +
> +	arg = kmalloc(sizeof(*arg), GFP_KERNEL);
> +	if (arg) {
> +		arg->i915 = i915;
> +		INIT_WORK(&arg->work, restart_work);
> +
> +		WRITE_ONCE(i915->gpu_error.restart, arg);
> +		queue_work(i915->wq, &arg->work);
>  	}
>  }
>  
> @@ -889,8 +830,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	struct i915_timeline *tl;
>  	bool ret = false;
>  
> -	lockdep_assert_held(&i915->drm.struct_mutex);
> -
>  	if (!test_bit(I915_WEDGED, &error->flags))
>  		return true;
>  
> @@ -913,9 +852,9 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	 */
>  	list_for_each_entry(tl, &i915->gt.timelines, link) {
>  		struct i915_request *rq;
> +		long timeout;
>  
> -		rq = i915_gem_active_peek(&tl->last_request,
> -					  &i915->drm.struct_mutex);
> +		rq = i915_gem_active_get_unlocked(&tl->last_request);
>  		if (!rq)
>  			continue;
>  
> @@ -930,12 +869,12 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  		 * and when the seqno passes the fence, the signaler
>  		 * then signals the fence waking us up).
>  		 */
> -		if (dma_fence_default_wait(&rq->fence, true,
> -					   MAX_SCHEDULE_TIMEOUT) < 0)
> +		timeout = dma_fence_default_wait(&rq->fence, true,
> +						 MAX_SCHEDULE_TIMEOUT);
> +		i915_request_put(rq);
> +		if (timeout < 0)
>  			goto unlock;
>  	}
> -	i915_retire_requests(i915);
> -	GEM_BUG_ON(i915->gt.active_requests);
>  
>  	intel_engines_sanitize(i915, false);
>  
> @@ -949,7 +888,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	 * context and do not require stop_machine().
>  	 */
>  	intel_engines_reset_default_submission(i915);
> -	i915_gem_contexts_lost(i915);
>  
>  	GEM_TRACE("end\n");
>  
> @@ -962,6 +900,43 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	return ret;
>  }
>  
> +struct __i915_reset {
> +	struct drm_i915_private *i915;
> +	unsigned int stalled_mask;
> +};
> +
> +static int __i915_reset__BKL(void *data)
> +{
> +	struct __i915_reset *arg = data;
> +	int err;
> +
> +	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
> +	if (err)
> +		return err;
> +
> +	return gt_reset(arg->i915, arg->stalled_mask);
> +}
> +
> +#if 0
> +#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)

Lets remove the machinery to select reset stop_machine and the #include.

> +#else
> +#define __do_reset(fn, arg) fn(arg)
> +#endif
> +
> +static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> +{
> +	struct __i915_reset arg = { i915, stalled_mask };
> +	int err, i;
> +
> +	err = __do_reset(__i915_reset__BKL, &arg);
> +	for (i = 0; err && i < 3; i++) {
> +		msleep(100);
> +		err = __do_reset(__i915_reset__BKL, &arg);
> +	}
> +
> +	return err;
> +}
> +
>  /**
>   * i915_reset - reset chip after a hang
>   * @i915: #drm_i915_private to reset
> @@ -987,31 +962,22 @@ void i915_reset(struct drm_i915_private *i915,
>  {
>  	struct i915_gpu_error *error = &i915->gpu_error;
>  	int ret;
> -	int i;
>  
>  	GEM_TRACE("flags=%lx\n", error->flags);
>  
>  	might_sleep();

What will? I didn't spot anything in execlists_reset_prepare.

> -	lockdep_assert_held(&i915->drm.struct_mutex);
>  	assert_rpm_wakelock_held(i915);
>  	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
>  
> -	if (!test_bit(I915_RESET_HANDOFF, &error->flags))
> -		return;
> -
>  	/* Clear any previous failed attempts at recovery. Time to try again. */
>  	if (!i915_gem_unset_wedged(i915))
> -		goto wakeup;
> +		return;
>  
>  	if (reason)
>  		dev_notice(i915->drm.dev, "Resetting chip for %s\n", reason);
>  	error->reset_count++;
>  
> -	ret = reset_prepare(i915);
> -	if (ret) {
> -		dev_err(i915->drm.dev, "GPU recovery failed\n");
> -		goto taint;
> -	}
> +	reset_prepare(i915);
>  
>  	if (!intel_has_gpu_reset(i915)) {
>  		if (i915_modparams.reset)
> @@ -1021,32 +987,11 @@ void i915_reset(struct drm_i915_private *i915,
>  		goto error;
>  	}
>  
> -	for (i = 0; i < 3; i++) {
> -		ret = intel_gpu_reset(i915, ALL_ENGINES);
> -		if (ret == 0)
> -			break;
> -
> -		msleep(100);
> -	}
> -	if (ret) {
> +	if (do_reset(i915, stalled_mask)) {
>  		dev_err(i915->drm.dev, "Failed to reset chip\n");
>  		goto taint;
>  	}
>  
> -	/* Ok, now get things going again... */
> -
> -	/*
> -	 * Everything depends on having the GTT running, so we need to start
> -	 * there.
> -	 */
> -	ret = i915_ggtt_enable_hw(i915);
> -	if (ret) {
> -		DRM_ERROR("Failed to re-enable GGTT following reset (%d)\n",
> -			  ret);
> -		goto error;
> -	}
> -
> -	gt_reset(i915, stalled_mask);
>  	intel_overlay_reset(i915);
>  
>  	/*
> @@ -1068,9 +1013,8 @@ void i915_reset(struct drm_i915_private *i915,
>  
>  finish:
>  	reset_finish(i915);
> -wakeup:
> -	clear_bit(I915_RESET_HANDOFF, &error->flags);
> -	wake_up_bit(&error->flags, I915_RESET_HANDOFF);
> +	if (!i915_terminally_wedged(error))
> +		reset_restart(i915);
>  	return;
>  
>  taint:
> @@ -1089,7 +1033,6 @@ void i915_reset(struct drm_i915_private *i915,
>  	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
>  error:
>  	i915_gem_set_wedged(i915);
> -	i915_retire_requests(i915);
>  	goto finish;
>  }
>  
> @@ -1115,18 +1058,16 @@ static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
>  int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
>  {
>  	struct i915_gpu_error *error = &engine->i915->gpu_error;
> -	struct i915_request *active_request;
>  	int ret;
>  
>  	GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
>  	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
>  
> -	active_request = reset_prepare_engine(engine);
> -	if (IS_ERR_OR_NULL(active_request)) {
> -		/* Either the previous reset failed, or we pardon the reset. */
> -		ret = PTR_ERR(active_request);
> -		goto out;
> -	}
> +	if (i915_seqno_passed(intel_engine_get_seqno(engine),
> +			      intel_engine_last_submit(engine)))
> +		return 0;

You seem to have a patch to remove this shortly after so
squash?

I need to restock on coffee at this point.
-Mika

> +
> +	reset_prepare_engine(engine);
>  
>  	if (msg)
>  		dev_notice(engine->i915->drm.dev,
> @@ -1150,7 +1091,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
>  	 * active request and can drop it, adjust head to skip the offending
>  	 * request to resume executing remaining requests in the queue.
>  	 */
> -	reset_engine(engine, active_request, true);
> +	intel_engine_reset(engine, true);
>  
>  	/*
>  	 * The engine and its registers (and workarounds in case of render)
> @@ -1187,30 +1128,7 @@ static void i915_reset_device(struct drm_i915_private *i915,
>  	i915_wedge_on_timeout(&w, i915, 5 * HZ) {
>  		intel_prepare_reset(i915);
>  
> -		error->reason = reason;
> -		error->stalled_mask = engine_mask;
> -
> -		/* Signal that locked waiters should reset the GPU */
> -		smp_mb__before_atomic();
> -		set_bit(I915_RESET_HANDOFF, &error->flags);
> -		wake_up_all(&error->wait_queue);
> -
> -		/*
> -		 * Wait for anyone holding the lock to wakeup, without
> -		 * blocking indefinitely on struct_mutex.
> -		 */
> -		do {
> -			if (mutex_trylock(&i915->drm.struct_mutex)) {
> -				i915_reset(i915, engine_mask, reason);
> -				mutex_unlock(&i915->drm.struct_mutex);
> -			}
> -		} while (wait_on_bit_timeout(&error->flags,
> -					     I915_RESET_HANDOFF,
> -					     TASK_UNINTERRUPTIBLE,
> -					     1));
> -
> -		error->stalled_mask = 0;
> -		error->reason = NULL;
> +		i915_reset(i915, engine_mask, reason);
>  
>  		intel_finish_reset(i915);
>  	}
> @@ -1366,6 +1284,25 @@ void i915_handle_error(struct drm_i915_private *i915,
>  	intel_runtime_pm_put(i915, wakeref);
>  }
>  
> +bool i915_reset_flush(struct drm_i915_private *i915)
> +{
> +	int err;
> +
> +	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
> +
> +	flush_workqueue(i915->wq);
> +	GEM_BUG_ON(READ_ONCE(i915->gpu_error.restart));
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	err = i915_gem_wait_for_idle(i915,
> +				     I915_WAIT_LOCKED |
> +				     I915_WAIT_FOR_IDLE_BOOST,
> +				     MAX_SCHEDULE_TIMEOUT);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	return !err;
> +}
> +
>  static void i915_wedge_me(struct work_struct *work)
>  {
>  	struct i915_wedge_me *w = container_of(work, typeof(*w), work.work);
> diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
> index b6a519bde67d..f2d347f319df 100644
> --- a/drivers/gpu/drm/i915/i915_reset.h
> +++ b/drivers/gpu/drm/i915/i915_reset.h
> @@ -29,6 +29,9 @@ void i915_reset(struct drm_i915_private *i915,
>  int i915_reset_engine(struct intel_engine_cs *engine,
>  		      const char *reason);
>  
> +void i915_reset_request(struct i915_request *rq, bool guilty);
> +bool i915_reset_flush(struct drm_i915_private *i915);
> +
>  bool intel_has_gpu_reset(struct drm_i915_private *i915);
>  bool intel_has_reset_engine(struct drm_i915_private *i915);
>  
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 2f3c71f6d313..fc52737751e7 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1071,10 +1071,8 @@ void intel_engines_sanitize(struct drm_i915_private *i915, bool force)
>  	if (!reset_engines(i915) && !force)
>  		return;
>  
> -	for_each_engine(engine, i915, id) {
> -		if (engine->reset.reset)
> -			engine->reset.reset(engine, NULL);
> -	}
> +	for_each_engine(engine, i915, id)
> +		intel_engine_reset(engine, false);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index ab1c49b106f2..7217c7e3ee8d 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -834,8 +834,7 @@ static void guc_submission_tasklet(unsigned long data)
>  	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>  }
>  
> -static struct i915_request *
> -guc_reset_prepare(struct intel_engine_cs *engine)
> +static void guc_reset_prepare(struct intel_engine_cs *engine)
>  {
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
>  
> @@ -861,8 +860,6 @@ guc_reset_prepare(struct intel_engine_cs *engine)
>  	 */
>  	if (engine->i915->guc.preempt_wq)
>  		flush_workqueue(engine->i915->guc.preempt_wq);
> -
> -	return i915_gem_find_active_request(engine);
>  }
>  
>  /*
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 741441daae32..5662d6fed523 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -25,6 +25,17 @@
>  #include "i915_drv.h"
>  #include "i915_reset.h"
>  
> +struct hangcheck {
> +	u64 acthd;
> +	u32 seqno;
> +	enum intel_engine_hangcheck_action action;
> +	unsigned long action_timestamp;
> +	int deadlock;
> +	struct intel_instdone instdone;
> +	bool wedged:1;
> +	bool stalled:1;
> +};
> +
>  static bool instdone_unchanged(u32 current_instdone, u32 *old_instdone)
>  {
>  	u32 tmp = current_instdone | *old_instdone;
> @@ -119,25 +130,22 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>  }
>  
>  static void hangcheck_load_sample(struct intel_engine_cs *engine,
> -				  struct intel_engine_hangcheck *hc)
> +				  struct hangcheck *hc)
>  {
>  	hc->acthd = intel_engine_get_active_head(engine);
>  	hc->seqno = intel_engine_get_seqno(engine);
>  }
>  
>  static void hangcheck_store_sample(struct intel_engine_cs *engine,
> -				   const struct intel_engine_hangcheck *hc)
> +				   const struct hangcheck *hc)
>  {
>  	engine->hangcheck.acthd = hc->acthd;
>  	engine->hangcheck.seqno = hc->seqno;
> -	engine->hangcheck.action = hc->action;
> -	engine->hangcheck.stalled = hc->stalled;
> -	engine->hangcheck.wedged = hc->wedged;
>  }
>  
>  static enum intel_engine_hangcheck_action
>  hangcheck_get_action(struct intel_engine_cs *engine,
> -		     const struct intel_engine_hangcheck *hc)
> +		     const struct hangcheck *hc)
>  {
>  	if (engine->hangcheck.seqno != hc->seqno)
>  		return ENGINE_ACTIVE_SEQNO;
> @@ -149,7 +157,7 @@ hangcheck_get_action(struct intel_engine_cs *engine,
>  }
>  
>  static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
> -					struct intel_engine_hangcheck *hc)
> +					struct hangcheck *hc)
>  {
>  	unsigned long timeout = I915_ENGINE_DEAD_TIMEOUT;
>  
> @@ -265,19 +273,19 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);
>  
>  	for_each_engine(engine, dev_priv, id) {
> -		struct intel_engine_hangcheck hc;
> +		struct hangcheck hc;
>  
>  		hangcheck_load_sample(engine, &hc);
>  		hangcheck_accumulate_sample(engine, &hc);
>  		hangcheck_store_sample(engine, &hc);
>  
> -		if (engine->hangcheck.stalled) {
> +		if (hc.stalled) {
>  			hung |= intel_engine_flag(engine);
>  			if (hc.action != ENGINE_DEAD)
>  				stuck |= intel_engine_flag(engine);
>  		}
>  
> -		if (engine->hangcheck.wedged)
> +		if (hc.wedged)
>  			wedged |= intel_engine_flag(engine);
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 28d183439952..c11cbf34258d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -136,6 +136,7 @@
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
>  #include "i915_gem_render_state.h"
> +#include "i915_reset.h"
>  #include "i915_vgpu.h"
>  #include "intel_lrc_reg.h"
>  #include "intel_mocs.h"
> @@ -288,7 +289,8 @@ static void unwind_wa_tail(struct i915_request *rq)
>  	assert_ring_tail_valid(rq->ring, rq->tail);
>  }
>  
> -static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
> +static struct i915_request *
> +__unwind_incomplete_requests(struct intel_engine_cs *engine)
>  {
>  	struct i915_request *rq, *rn, *active = NULL;
>  	struct list_head *uninitialized_var(pl);
> @@ -330,6 +332,8 @@ static void __unwind_incomplete_requests(struct intel_engine_cs *engine)
>  		list_move_tail(&active->sched.link,
>  			       i915_sched_lookup_priolist(engine, prio));
>  	}
> +
> +	return active;
>  }
>  
>  void
> @@ -1743,11 +1747,9 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>  	return 0;
>  }
>  
> -static struct i915_request *
> -execlists_reset_prepare(struct intel_engine_cs *engine)
> +static void execlists_reset_prepare(struct intel_engine_cs *engine)
>  {
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request *request, *active;
>  	unsigned long flags;
>  
>  	GEM_TRACE("%s: depth<-%d\n", engine->name,
> @@ -1763,59 +1765,21 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
>  	 * prevents the race.
>  	 */
>  	__tasklet_disable_sync_once(&execlists->tasklet);
> +	GEM_BUG_ON(!reset_in_progress(execlists));
>  
> +	/* And flush any current direct submission. */
>  	spin_lock_irqsave(&engine->timeline.lock, flags);
> -
> -	/*
> -	 * We want to flush the pending context switches, having disabled
> -	 * the tasklet above, we can assume exclusive access to the execlists.
> -	 * For this allows us to catch up with an inflight preemption event,
> -	 * and avoid blaming an innocent request if the stall was due to the
> -	 * preemption itself.
> -	 */
> -	process_csb(engine);
> -
> -	/*
> -	 * The last active request can then be no later than the last request
> -	 * now in ELSP[0]. So search backwards from there, so that if the GPU
> -	 * has advanced beyond the last CSB update, it will be pardoned.
> -	 */
> -	active = NULL;
> -	request = port_request(execlists->port);
> -	if (request) {
> -		/*
> -		 * Prevent the breadcrumb from advancing before we decide
> -		 * which request is currently active.
> -		 */
> -		intel_engine_stop_cs(engine);
> -
> -		list_for_each_entry_from_reverse(request,
> -						 &engine->timeline.requests,
> -						 link) {
> -			if (__i915_request_completed(request,
> -						     request->global_seqno))
> -				break;
> -
> -			active = request;
> -		}
> -	}
> -
> +	process_csb(engine); /* drain preemption events */
>  	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> -
> -	return active;
>  }
>  
> -static void execlists_reset(struct intel_engine_cs *engine,
> -			    struct i915_request *request)
> +static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
>  {
>  	struct intel_engine_execlists * const execlists = &engine->execlists;
> +	struct i915_request *rq;
>  	unsigned long flags;
>  	u32 *regs;
>  
> -	GEM_TRACE("%s request global=%d, current=%d\n",
> -		  engine->name, request ? request->global_seqno : 0,
> -		  intel_engine_get_seqno(engine));
> -
>  	spin_lock_irqsave(&engine->timeline.lock, flags);
>  
>  	/*
> @@ -1830,12 +1794,18 @@ static void execlists_reset(struct intel_engine_cs *engine,
>  	execlists_cancel_port_requests(execlists);
>  
>  	/* Push back any incomplete requests for replay after the reset. */
> -	__unwind_incomplete_requests(engine);
> +	rq = __unwind_incomplete_requests(engine);
>  
>  	/* Following the reset, we need to reload the CSB read/write pointers */
>  	reset_csb_pointers(&engine->execlists);
>  
> -	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
> +		  engine->name,
> +		  rq ? rq->global_seqno : 0,
> +		  intel_engine_get_seqno(engine),
> +		  yesno(stalled));
> +	if (!rq)
> +		goto out_unlock;
>  
>  	/*
>  	 * If the request was innocent, we leave the request in the ELSP
> @@ -1848,8 +1818,9 @@ static void execlists_reset(struct intel_engine_cs *engine,
>  	 * and have to at least restore the RING register in the context
>  	 * image back to the expected values to skip over the guilty request.
>  	 */
> -	if (!request || request->fence.error != -EIO)
> -		return;
> +	i915_reset_request(rq, stalled);
> +	if (!stalled)
> +		goto out_unlock;
>  
>  	/*
>  	 * We want a simple context + ring to execute the breadcrumb update.
> @@ -1859,25 +1830,23 @@ static void execlists_reset(struct intel_engine_cs *engine,
>  	 * future request will be after userspace has had the opportunity
>  	 * to recreate its own state.
>  	 */
> -	regs = request->hw_context->lrc_reg_state;
> +	regs = rq->hw_context->lrc_reg_state;
>  	if (engine->pinned_default_state) {
>  		memcpy(regs, /* skip restoring the vanilla PPHWSP */
>  		       engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
>  		       engine->context_size - PAGE_SIZE);
>  	}
> -	execlists_init_reg_state(regs,
> -				 request->gem_context, engine, request->ring);
> +	execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring);
>  
>  	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
> -	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
> -
> -	request->ring->head = intel_ring_wrap(request->ring, request->postfix);
> -	regs[CTX_RING_HEAD + 1] = request->ring->head;
> +	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(rq->ring->vma);
>  
> -	intel_ring_update_space(request->ring);
> +	rq->ring->head = intel_ring_wrap(rq->ring, rq->postfix);
> +	regs[CTX_RING_HEAD + 1] = rq->ring->head;
> +	intel_ring_update_space(rq->ring);
>  
> -	/* Reset WaIdleLiteRestore:bdw,skl as well */
> -	unwind_wa_tail(request);
> +out_unlock:
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>  }
>  
>  static void execlists_reset_finish(struct intel_engine_cs *engine)
> @@ -1890,6 +1859,7 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
>  	 * to sleep before we restart and reload a context.
>  	 *
>  	 */
> +	GEM_BUG_ON(!reset_in_progress(execlists));
>  	if (!RB_EMPTY_ROOT(&execlists->queue.rb_root))
>  		execlists->tasklet.func(execlists->tasklet.data);
>  
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index c81db81e4416..f68c7975006c 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -478,8 +478,6 @@ void intel_overlay_reset(struct drm_i915_private *dev_priv)
>  	if (!overlay)
>  		return;
>  
> -	intel_overlay_release_old_vid(overlay);
> -

How to compensate for this?

>  	overlay->old_xscale = 0;
>  	overlay->old_yscale = 0;
>  	overlay->crtc = NULL;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 26b7274a2d43..662907e1a286 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -33,6 +33,7 @@
>  
>  #include "i915_drv.h"
>  #include "i915_gem_render_state.h"
> +#include "i915_reset.h"
>  #include "i915_trace.h"
>  #include "intel_drv.h"
>  #include "intel_workarounds.h"
> @@ -707,52 +708,80 @@ static int init_ring_common(struct intel_engine_cs *engine)
>  	return ret;
>  }
>  
> -static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
> +static void reset_prepare(struct intel_engine_cs *engine)
>  {
>  	intel_engine_stop_cs(engine);
> -	return i915_gem_find_active_request(engine);
>  }
>  
> -static void skip_request(struct i915_request *rq)
> +static void reset_ring(struct intel_engine_cs *engine, bool stalled)
>  {
> -	void *vaddr = rq->ring->vaddr;
> +	struct i915_timeline *tl = &engine->timeline;
> +	struct i915_request *pos, *rq;
> +	unsigned long flags;
>  	u32 head;
>  
> -	head = rq->infix;
> -	if (rq->postfix < head) {
> -		memset32(vaddr + head, MI_NOOP,
> -			 (rq->ring->size - head) / sizeof(u32));
> -		head = 0;
> +	rq = NULL;
> +	spin_lock_irqsave(&tl->lock, flags);
> +	list_for_each_entry(pos, &tl->requests, link) {
> +		if (!__i915_request_completed(pos, pos->global_seqno)) {
> +			rq = pos;
> +			break;
> +		}
>  	}
> -	memset32(vaddr + head, MI_NOOP, (rq->postfix - head) / sizeof(u32));
> -}
> -
> -static void reset_ring(struct intel_engine_cs *engine, struct i915_request *rq)
> -{
> -	GEM_TRACE("%s request global=%d, current=%d\n",
> -		  engine->name, rq ? rq->global_seqno : 0,
> -		  intel_engine_get_seqno(engine));
>  
> +	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
> +		  engine->name,
> +		  rq ? rq->global_seqno : 0,
> +		  intel_engine_get_seqno(engine),
> +		  yesno(stalled));
>  	/*
> -	 * Try to restore the logical GPU state to match the continuation
> -	 * of the request queue. If we skip the context/PD restore, then
> -	 * the next request may try to execute assuming that its context
> -	 * is valid and loaded on the GPU and so may try to access invalid
> -	 * memory, prompting repeated GPU hangs.
> +	 * The guilty request will get skipped on a hung engine.
>  	 *
> -	 * If the request was guilty, we still restore the logical state
> -	 * in case the next request requires it (e.g. the aliasing ppgtt),
> -	 * but skip over the hung batch.
> +	 * Users of client default contexts do not rely on logical
> +	 * state preserved between batches so it is safe to execute
> +	 * queued requests following the hang. Non default contexts
> +	 * rely on preserved state, so skipping a batch loses the
> +	 * evolution of the state and it needs to be considered corrupted.
> +	 * Executing more queued batches on top of corrupted state is
> +	 * risky. But we take the risk by trying to advance through
> +	 * the queued requests in order to make the client behaviour
> +	 * more predictable around resets, by not throwing away random
> +	 * amount of batches it has prepared for execution. Sophisticated
> +	 * clients can use gem_reset_stats_ioctl and dma fence status
> +	 * (exported via sync_file info ioctl on explicit fences) to observe
> +	 * when it loses the context state and should rebuild accordingly.
>  	 *
> -	 * If the request was innocent, we try to replay the request with
> -	 * the restored context.
> +	 * The context ban, and ultimately the client ban, mechanism are safety
> +	 * valves if client submission ends up resulting in nothing more than
> +	 * subsequent hangs.
>  	 */
> +
>  	if (rq) {
> -		/* If the rq hung, jump to its breadcrumb and skip the batch */
> -		rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
> -		if (rq->fence.error == -EIO)
> -			skip_request(rq);
> +		/*
> +		 * Try to restore the logical GPU state to match the
> +		 * continuation of the request queue. If we skip the
> +		 * context/PD restore, then the next request may try to execute
> +		 * assuming that its context is valid and loaded on the GPU and
> +		 * so may try to access invalid memory, prompting repeated GPU
> +		 * hangs.
> +		 *
> +		 * If the request was guilty, we still restore the logical
> +		 * state in case the next request requires it (e.g. the
> +		 * aliasing ppgtt), but skip over the hung batch.
> +		 *
> +		 * If the request was innocent, we try to replay the request
> +		 * with the restored context.
> +		 */
> +		i915_reset_request(rq, stalled);
> +
> +		GEM_BUG_ON(rq->ring != engine->buffer);
> +		head = rq->head;
> +	} else {
> +		head = engine->buffer->tail;
>  	}
> +	engine->buffer->head = intel_ring_wrap(engine->buffer, head);
> +
> +	spin_unlock_irqrestore(&tl->lock, flags);
>  }
>  
>  static void reset_finish(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c3ef0f9bf321..32ed44196c1a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -120,13 +120,8 @@ struct intel_instdone {
>  struct intel_engine_hangcheck {
>  	u64 acthd;
>  	u32 seqno;
> -	enum intel_engine_hangcheck_action action;
>  	unsigned long action_timestamp;
> -	int deadlock;
>  	struct intel_instdone instdone;
> -	struct i915_request *active_request;
> -	bool stalled:1;
> -	bool wedged:1;
>  };
>  
>  struct intel_ring {
> @@ -444,9 +439,8 @@ struct intel_engine_cs {
>  	int		(*init_hw)(struct intel_engine_cs *engine);
>  
>  	struct {
> -		struct i915_request *(*prepare)(struct intel_engine_cs *engine);
> -		void (*reset)(struct intel_engine_cs *engine,
> -			      struct i915_request *rq);
> +		void (*prepare)(struct intel_engine_cs *engine);
> +		void (*reset)(struct intel_engine_cs *engine, bool stalled);
>  		void (*finish)(struct intel_engine_cs *engine);
>  	} reset;
>  
> @@ -1018,6 +1012,13 @@ gen8_emit_ggtt_write(u32 *cs, u32 value, u32 gtt_offset)
>  	return cs;
>  }
>  
> +static inline void intel_engine_reset(struct intel_engine_cs *engine,
> +				      bool stalled)
> +{
> +	if (engine->reset.reset)
> +		engine->reset.reset(engine, stalled);
> +}
> +
>  void intel_engines_sanitize(struct drm_i915_private *i915, bool force);
>  
>  bool intel_engine_is_idle(struct intel_engine_cs *engine);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 12550b55c42f..67431355cd6e 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -363,9 +363,7 @@ static int igt_global_reset(void *arg)
>  	/* Check that we can issue a global GPU reset */
>  
>  	igt_global_reset_lock(i915);
> -	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
>  
> -	mutex_lock(&i915->drm.struct_mutex);
>  	reset_count = i915_reset_count(&i915->gpu_error);
>  
>  	i915_reset(i915, ALL_ENGINES, NULL);
> @@ -374,9 +372,7 @@ static int igt_global_reset(void *arg)
>  		pr_err("No GPU reset recorded!\n");
>  		err = -EINVAL;
>  	}
> -	mutex_unlock(&i915->drm.struct_mutex);
>  
> -	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
>  	igt_global_reset_unlock(i915);
>  
>  	if (i915_terminally_wedged(&i915->gpu_error))
> @@ -399,9 +395,7 @@ static int igt_wedged_reset(void *arg)
>  	i915_gem_set_wedged(i915);
>  	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
>  
> -	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
>  	i915_reset(i915, ALL_ENGINES, NULL);
> -	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
>  
>  	intel_runtime_pm_put(i915, wakeref);
>  	mutex_unlock(&i915->drm.struct_mutex);
> @@ -511,7 +505,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>  				break;
>  			}
>  
> -			if (!wait_for_idle(engine)) {
> +			if (!i915_reset_flush(i915)) {
>  				struct drm_printer p =
>  					drm_info_printer(i915->drm.dev);
>  
> @@ -903,20 +897,13 @@ static int igt_reset_engines(void *arg)
>  	return 0;
>  }
>  
> -static u32 fake_hangcheck(struct i915_request *rq, u32 mask)
> +static u32 fake_hangcheck(struct drm_i915_private *i915, u32 mask)
>  {
> -	struct i915_gpu_error *error = &rq->i915->gpu_error;
> -	u32 reset_count = i915_reset_count(error);
> -
> -	error->stalled_mask = mask;
> -
> -	/* set_bit() must be after we have setup the backchannel (mask) */
> -	smp_mb__before_atomic();
> -	set_bit(I915_RESET_HANDOFF, &error->flags);
> +	u32 count = i915_reset_count(&i915->gpu_error);
>  
> -	wake_up_all(&error->wait_queue);
> +	i915_reset(i915, mask, NULL);
>  
> -	return reset_count;
> +	return count;
>  }
>  
>  static int igt_reset_wait(void *arg)
> @@ -962,7 +949,7 @@ static int igt_reset_wait(void *arg)
>  		goto out_rq;
>  	}
>  
> -	reset_count = fake_hangcheck(rq, ALL_ENGINES);
> +	reset_count = fake_hangcheck(i915, ALL_ENGINES);
>  
>  	timeout = i915_request_wait(rq, I915_WAIT_LOCKED, 10);
>  	if (timeout < 0) {
> @@ -972,7 +959,6 @@ static int igt_reset_wait(void *arg)
>  		goto out_rq;
>  	}
>  
> -	GEM_BUG_ON(test_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags));
>  	if (i915_reset_count(&i915->gpu_error) == reset_count) {
>  		pr_err("No GPU reset recorded!\n");
>  		err = -EINVAL;
> @@ -1162,7 +1148,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  	}
>  
>  out_reset:
> -	fake_hangcheck(rq, intel_engine_flag(rq->engine));
> +	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
>  
>  	if (tsk) {
>  		struct igt_wedge_me w;
> @@ -1341,12 +1327,7 @@ static int igt_reset_queue(void *arg)
>  				goto fini;
>  			}
>  
> -			reset_count = fake_hangcheck(prev, ENGINE_MASK(id));
> -
> -			i915_reset(i915, ENGINE_MASK(id), NULL);
> -
> -			GEM_BUG_ON(test_bit(I915_RESET_HANDOFF,
> -					    &i915->gpu_error.flags));
> +			reset_count = fake_hangcheck(i915, ENGINE_MASK(id));
>  
>  			if (prev->fence.error != -EIO) {
>  				pr_err("GPU reset not recorded on hanging request [fence.error=%d]!\n",
> @@ -1565,6 +1546,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  		pr_err("%s(%s): Failed to start request %llx, at %x\n",
>  		       __func__, engine->name,
>  		       rq->fence.seqno, hws_seqno(&h, rq));
> +		i915_gem_set_wedged(i915);
>  		err = -EIO;
>  	}
>  
> @@ -1588,7 +1570,6 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  static void force_reset(struct drm_i915_private *i915)
>  {
>  	i915_gem_set_wedged(i915);
> -	set_bit(I915_RESET_HANDOFF, &i915->gpu_error.flags);
>  	i915_reset(i915, 0, NULL);
>  }
>  
> @@ -1618,6 +1599,26 @@ static int igt_atomic_reset(void *arg)
>  	if (i915_terminally_wedged(&i915->gpu_error))
>  		goto unlock;
>  
> +	if (intel_has_gpu_reset(i915)) {
> +		const typeof(*phases) *p;
> +
> +		for (p = phases; p->name; p++) {
> +			GEM_TRACE("intel_gpu_reset under %s\n", p->name);
> +
> +			p->critical_section_begin();
> +			err = intel_gpu_reset(i915, ALL_ENGINES);
> +			p->critical_section_end();
> +
> +			if (err) {
> +				pr_err("intel_gpu_reset failed under %s\n",
> +				       p->name);
> +				goto out;
> +			}
> +		}
> +
> +		force_reset(i915);
> +	}
> +
>  	if (intel_has_reset_engine(i915)) {
>  		struct intel_engine_cs *engine;
>  		enum intel_engine_id id;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> index a8cac56be835..b15c4f26c593 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> @@ -214,7 +214,6 @@ static int check_whitelist(struct i915_gem_context *ctx,
>  
>  static int do_device_reset(struct intel_engine_cs *engine)
>  {
> -	set_bit(I915_RESET_HANDOFF, &engine->i915->gpu_error.flags);
>  	i915_reset(engine->i915, ENGINE_MASK(engine->id), "live_workarounds");
>  	return 0;
>  }
> @@ -394,7 +393,6 @@ static int
>  live_gpu_reset_gt_engine_workarounds(void *arg)
>  {
>  	struct drm_i915_private *i915 = arg;
> -	struct i915_gpu_error *error = &i915->gpu_error;
>  	intel_wakeref_t wakeref;
>  	struct wa_lists lists;
>  	bool ok;
> @@ -413,7 +411,6 @@ live_gpu_reset_gt_engine_workarounds(void *arg)
>  	if (!ok)
>  		goto out;
>  
> -	set_bit(I915_RESET_HANDOFF, &error->flags);
>  	i915_reset(i915, ALL_ENGINES, "live_workarounds");
>  
>  	ok = verify_gt_engine_wa(i915, &lists, "after reset");
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 5477ad4a7e7d..8ab5a2688a0c 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -58,8 +58,8 @@ static void mock_device_release(struct drm_device *dev)
>  	i915_gem_contexts_lost(i915);
>  	mutex_unlock(&i915->drm.struct_mutex);
>  
> -	cancel_delayed_work_sync(&i915->gt.retire_work);
> -	cancel_delayed_work_sync(&i915->gt.idle_work);
> +	drain_delayed_work(&i915->gt.retire_work);
> +	drain_delayed_work(&i915->gt.idle_work);
>  	i915_gem_drain_workqueue(i915);
>  
>  	mutex_lock(&i915->drm.struct_mutex);
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex
  2019-01-24 12:06   ` Mika Kuoppala
@ 2019-01-24 12:50     ` Chris Wilson
  2019-01-24 13:12       ` Chris Wilson
  2019-01-24 14:10       ` Chris Wilson
  0 siblings, 2 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-24 12:50 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-01-24 12:06:01)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > +     mutex_lock(&i915->drm.struct_mutex);
> >       i915_gem_contexts_lost(i915);
> >       mutex_unlock(&i915->drm.struct_mutex);
> >  }
> > @@ -4534,6 +4528,8 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >       wakeref = intel_runtime_pm_get(i915);
> >       intel_suspend_gt_powersave(i915);
> >  
> > +     flush_workqueue(i915->wq);
> 
> I don't know what is happening here. Why
> don't we need the i915_gem_drain_workqueue in here?

It's just a poke at the queue before we end up doing the same work
ourselves.

> >       mutex_lock(&i915->drm.struct_mutex);
> >  
> >       /*
> > @@ -4563,11 +4559,9 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >       i915_retire_requests(i915); /* ensure we flush after wedging */
> >  
> >       mutex_unlock(&i915->drm.struct_mutex);
> > +     i915_reset_flush(i915);
> >  
> > -     intel_uc_suspend(i915);
> > -
> > -     cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
> > -     cancel_delayed_work_sync(&i915->gt.retire_work);
> > +     drain_delayed_work(&i915->gt.retire_work);
> 
> Hangcheck is inside reset flush but why the change
> for retire?

So we didn't leave retire_work work in an ill-defined state. I was
probably thinking consistency over cleverness. It's also highly probable
that I stuck something else in there at one point.

> > -     ecode = i915_error_generate_code(dev_priv, error, &engine_id);
> > +     for (i = 0; i < ARRAY_SIZE(error->engine); i++)
> > +             if (!error->engine[i].context.pid)
> > +                     engines &= ~BIT(i);
> 
> No more grouping for driver internal hangs...?

There's no pid, right, so the message at moment is garbled. The ecode is
just noise, you don't use it for analysis, you don't use it for error
state matching (or at least you shouldn't as it has no meaning wrt the
error state).

> >       len = scnprintf(error->error_msg, sizeof(error->error_msg),
> > -                     "GPU HANG: ecode %d:%d:0x%08x",
> > -                     INTEL_GEN(dev_priv), engine_id, ecode);
> > -
> > -     if (engine_id != -1 && error->engine[engine_id].context.pid)
> > +                     "GPU HANG: ecode %d:%lx:0x%08x",
> > +                     INTEL_GEN(error->i915), engines,
> > +                     i915_error_generate_code(error, engines));
> > +     if (engines) {
> > +             /* Just show the first executing process, more is confusing */
> > +             i = ffs(engines);
> 
> then why not just make the ecode accepting single engine and move it here.

I don't think the ecode should just accept a single engine. I don't
think the ecode should exist at all, but that's another matter.

> > -     /*
> > -      * We want to perform per-engine reset from atomic context (e.g.
> > -      * softirq), which imposes the constraint that we cannot sleep.
> > -      * However, experience suggests that spending a bit of time waiting
> > -      * for a reset helps in various cases, so for a full-device reset
> > -      * we apply the opposite rule and wait if we want to. As we should
> > -      * always follow up a failed per-engine reset with a full device reset,
> > -      * being a little faster, stricter and more error prone for the
> > -      * atomic case seems an acceptable compromise.
> > -      *
> > -      * Unfortunately this leads to a bimodal routine, when the goal was
> > -      * to have a single reset function that worked for resetting any
> > -      * number of engines simultaneously.
> > -      */
> > -     might_sleep_if(engine_mask == ALL_ENGINES);
> 
> Oh here it is. I was after this on atomic resets.

I was saying it didn't make sense to lift the restriction until we
relied upon. Just because the code became safe doesn't mean it was then
part of the API :)

> > +static void restart_work(struct work_struct *work)
> >  {
> > +     struct i915_gpu_restart *arg = container_of(work, typeof(*arg), work);
> > +     struct drm_i915_private *i915 = arg->i915;
> >       struct intel_engine_cs *engine;
> >       enum intel_engine_id id;
> > +     intel_wakeref_t wakeref;
> >  
> > -     lockdep_assert_held(&i915->drm.struct_mutex);
> > +     wakeref = intel_runtime_pm_get(i915);
> > +     mutex_lock(&i915->drm.struct_mutex);
> >  
> > -     i915_retire_requests(i915);
> 
> Can't do this anymore yes. What will be the effect
> of delaying this and the other explicit retirements?
> Are we more prone to starvation?

No. We risk gen3 spontaneously dying as we have no idea why it needs a
request in the pipeline soon after a reset, so no idea if a potential
delay will kill it.

> > +     smp_store_mb(i915->gpu_error.restart, NULL);
> 
> Checkpatch might want a comment for the mb.

Check patch is silly about this one. It's precisely because the other
side is unserialised is it smp_store_mb. But it's just a glorified
WRITE_ONCE not that important.

> >       for_each_engine(engine, i915, id) {
> > -             struct intel_context *ce;
> > -
> > -             reset_engine(engine,
> > -                          engine->hangcheck.active_request,
> > -                          stalled_mask & ENGINE_MASK(id));
> > -             ce = fetch_and_zero(&engine->last_retired_context);
> > -             if (ce)
> > -                     intel_context_unpin(ce);
> > +             struct i915_request *rq;
> >  
> >               /*
> >                * Ostensibily, we always want a context loaded for powersaving,
> >                * so if the engine is idle after the reset, send a request
> >                * to load our scratch kernel_context.
> > -              *
> > -              * More mysteriously, if we leave the engine idle after a reset,
> > -              * the next userspace batch may hang, with what appears to be
> > -              * an incoherent read by the CS (presumably stale TLB). An
> > -              * empty request appears sufficient to paper over the glitch.
> >                */
> > -             if (intel_engine_is_idle(engine)) {
> > -                     struct i915_request *rq;
> > +             if (!intel_engine_is_idle(engine))
> > +                     continue;
> 
> Why did you remove the comment on needing a empty request?

I don't believe it's true any more, and it was detracting from the
emphasis on the idea of restoring a context.

> Also if the request causing nonidle could be troublesome one,
> from troublesome context, why not just skip the idle check and
> always add one for kernel ctx?

Hence why I removed the comment. It's just a distraction. This whole
routine needs to be scrapped in favour of avoiding the request
allocation and just leaving the engine in a good state before
restarting. But I haven't thought of a nice way without allocations,
preallocating on the kernel_context may be possible, but it should also
be able to setup without hooking into any request machinery.

> > +#if 0
> > +#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
> 
> Lets remove the machinery to select reset stop_machine and the #include.

NO. It's there as a very specific reminder that this code has a glaring
caveat and that a very very simple fix is the line above.

> > +#else
> > +#define __do_reset(fn, arg) fn(arg)
> > +#endif
> > +
> > +static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> > +{
> > +     struct __i915_reset arg = { i915, stalled_mask };
> > +     int err, i;
> > +
> > +     err = __do_reset(__i915_reset__BKL, &arg);
> > +     for (i = 0; err && i < 3; i++) {
> > +             msleep(100);
> > +             err = __do_reset(__i915_reset__BKL, &arg);
> > +     }
> > +
> > +     return err;
> > +}
> > +
> >  /**
> >   * i915_reset - reset chip after a hang
> >   * @i915: #drm_i915_private to reset
> > @@ -987,31 +962,22 @@ void i915_reset(struct drm_i915_private *i915,
> >  {
> >       struct i915_gpu_error *error = &i915->gpu_error;
> >       int ret;
> > -     int i;
> >  
> >       GEM_TRACE("flags=%lx\n", error->flags);
> >  
> >       might_sleep();
> 
> What will? I didn't spot anything in execlists_reset_prepare.

stop_machine or whatever barrier we come up with to provide the same
function is going to require sleeping to coordinate with userspace
memory transactions.

> > @@ -1115,18 +1058,16 @@ static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
> >  int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
> >  {
> >       struct i915_gpu_error *error = &engine->i915->gpu_error;
> > -     struct i915_request *active_request;
> >       int ret;
> >  
> >       GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
> >       GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
> >  
> > -     active_request = reset_prepare_engine(engine);
> > -     if (IS_ERR_OR_NULL(active_request)) {
> > -             /* Either the previous reset failed, or we pardon the reset. */
> > -             ret = PTR_ERR(active_request);
> > -             goto out;
> > -     }
> > +     if (i915_seqno_passed(intel_engine_get_seqno(engine),
> > +                           intel_engine_last_submit(engine)))
> > +             return 0;
> 
> You seem to have a patch to remove this shortly after so
> squash?

No. That is quite a significant change in behaviour, we haven't been
poking HW like that up to that patch, and you know how picky HW can be.

It also has an impact on the selftests, and as such you want the
arguments be concise.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex
  2019-01-24 12:50     ` Chris Wilson
@ 2019-01-24 13:12       ` Chris Wilson
  2019-01-24 14:10       ` Chris Wilson
  1 sibling, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-24 13:12 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2019-01-24 12:50:30)
> Quoting Mika Kuoppala (2019-01-24 12:06:01)
> > Why did you remove the comment on needing a empty request?
> 
> I don't believe it's true any more, and it was detracting from the
> emphasis on the idea of restoring a context.

Thinking about it, it was probably fixed by the work for reset handling
prior to the gen6/gen7 ppgtt updates.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex
  2019-01-24 12:50     ` Chris Wilson
  2019-01-24 13:12       ` Chris Wilson
@ 2019-01-24 14:10       ` Chris Wilson
  1 sibling, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-24 14:10 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2019-01-24 12:50:30)
> Quoting Mika Kuoppala (2019-01-24 12:06:01)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > > -     /*
> > > -      * We want to perform per-engine reset from atomic context (e.g.
> > > -      * softirq), which imposes the constraint that we cannot sleep.
> > > -      * However, experience suggests that spending a bit of time waiting
> > > -      * for a reset helps in various cases, so for a full-device reset
> > > -      * we apply the opposite rule and wait if we want to. As we should
> > > -      * always follow up a failed per-engine reset with a full device reset,
> > > -      * being a little faster, stricter and more error prone for the
> > > -      * atomic case seems an acceptable compromise.
> > > -      *
> > > -      * Unfortunately this leads to a bimodal routine, when the goal was
> > > -      * to have a single reset function that worked for resetting any
> > > -      * number of engines simultaneously.
> > > -      */
> > > -     might_sleep_if(engine_mask == ALL_ENGINES);
> > 
> > Oh here it is. I was after this on atomic resets.
> 
> I was saying it didn't make sense to lift the restriction until we
> relied upon. Just because the code became safe doesn't mean it was then
> part of the API :)

Hmm, set-wedged is meant to be magical more or less now. That gives more
weight to the argument of making intel_gpu_reset() magical and removing
this comment earlier.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 29/34] drm/i915: Drop fake breadcrumb irq
  2019-01-21 22:21 ` [PATCH 29/34] drm/i915: Drop fake breadcrumb irq Chris Wilson
@ 2019-01-24 17:55   ` Tvrtko Ursulin
  2019-01-24 18:18     ` Chris Wilson
  0 siblings, 1 reply; 89+ messages in thread
From: Tvrtko Ursulin @ 2019-01-24 17:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/01/2019 22:21, Chris Wilson wrote:
> Missed breadcrumb detection is defunct due to the tight coupling with

How it is defunct.. oh because there is no irq_count any more... could 
you describe the transition from irq_count to irq_fired and then to 
nothing briefly?

> dma_fence signaling and the myriad ways we may signal fences from
> everywhere but from an interrupt, i.e. we frequently signal a fence
> before we even see its interrupt. This means that even if we miss an
> interrupt for a fence, it still is signaled before our breadcrumb
> hangcheck fires, so simplify the breadcrumb hangchecking by moving it
> into the GPU hangcheck and forgo fake interrupts.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c           |  93 -----------
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   2 -
>   drivers/gpu/drm/i915/i915_gpu_error.h         |   5 -
>   drivers/gpu/drm/i915/intel_breadcrumbs.c      | 147 +-----------------
>   drivers/gpu/drm/i915/intel_hangcheck.c        |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |   5 -
>   .../gpu/drm/i915/selftests/igt_live_test.c    |   7 -
>   7 files changed, 5 insertions(+), 256 deletions(-)

With this balance of insertions and deletions I cannot decide if this is 
easy or hard to review.

IGT uses intel_detect_and_clear_missed_interrupts a lot. What is the 
plan there?

> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index d7764e62e9b4..c2aaf010c3d1 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1321,9 +1321,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   			   intel_engine_last_submit(engine),
>   			   jiffies_to_msecs(jiffies -
>   					    engine->hangcheck.action_timestamp));
> -		seq_printf(m, "\tfake irq active? %s\n",
> -			   yesno(test_bit(engine->id,
> -					  &dev_priv->gpu_error.missed_irq_rings)));
>   
>   		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
>   			   (long long)engine->hangcheck.acthd,
> @@ -3874,94 +3871,6 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
>   			i915_wedged_get, i915_wedged_set,
>   			"%llu\n");
>   
> -static int
> -fault_irq_set(struct drm_i915_private *i915,
> -	      unsigned long *irq,
> -	      unsigned long val)
> -{
> -	int err;
> -
> -	err = mutex_lock_interruptible(&i915->drm.struct_mutex);
> -	if (err)
> -		return err;
> -
> -	err = i915_gem_wait_for_idle(i915,
> -				     I915_WAIT_LOCKED |
> -				     I915_WAIT_INTERRUPTIBLE,
> -				     MAX_SCHEDULE_TIMEOUT);
> -	if (err)
> -		goto err_unlock;
> -
> -	*irq = val;
> -	mutex_unlock(&i915->drm.struct_mutex);
> -
> -	/* Flush idle worker to disarm irq */
> -	drain_delayed_work(&i915->gt.idle_work);
> -
> -	return 0;
> -
> -err_unlock:
> -	mutex_unlock(&i915->drm.struct_mutex);
> -	return err;
> -}
> -
> -static int
> -i915_ring_missed_irq_get(void *data, u64 *val)
> -{
> -	struct drm_i915_private *dev_priv = data;
> -
> -	*val = dev_priv->gpu_error.missed_irq_rings;
> -	return 0;
> -}
> -
> -static int
> -i915_ring_missed_irq_set(void *data, u64 val)
> -{
> -	struct drm_i915_private *i915 = data;
> -
> -	return fault_irq_set(i915, &i915->gpu_error.missed_irq_rings, val);
> -}
> -
> -DEFINE_SIMPLE_ATTRIBUTE(i915_ring_missed_irq_fops,
> -			i915_ring_missed_irq_get, i915_ring_missed_irq_set,
> -			"0x%08llx\n");
> -
> -static int
> -i915_ring_test_irq_get(void *data, u64 *val)
> -{
> -	struct drm_i915_private *dev_priv = data;
> -
> -	*val = dev_priv->gpu_error.test_irq_rings;
> -
> -	return 0;
> -}
> -
> -static int
> -i915_ring_test_irq_set(void *data, u64 val)
> -{
> -	struct drm_i915_private *i915 = data;
> -
> -	/* GuC keeps the user interrupt permanently enabled for submission */
> -	if (USES_GUC_SUBMISSION(i915))
> -		return -ENODEV;
> -
> -	/*
> -	 * From icl, we can no longer individually mask interrupt generation
> -	 * from each engine.
> -	 */
> -	if (INTEL_GEN(i915) >= 11)
> -		return -ENODEV;
> -
> -	val &= INTEL_INFO(i915)->ring_mask;
> -	DRM_DEBUG_DRIVER("Masking interrupts on rings 0x%08llx\n", val);
> -
> -	return fault_irq_set(i915, &i915->gpu_error.test_irq_rings, val);
> -}
> -
> -DEFINE_SIMPLE_ATTRIBUTE(i915_ring_test_irq_fops,
> -			i915_ring_test_irq_get, i915_ring_test_irq_set,
> -			"0x%08llx\n");
> -
>   #define DROP_UNBOUND	BIT(0)
>   #define DROP_BOUND	BIT(1)
>   #define DROP_RETIRE	BIT(2)
> @@ -4724,8 +4633,6 @@ static const struct i915_debugfs_files {
>   } i915_debugfs_files[] = {
>   	{"i915_wedged", &i915_wedged_fops},
>   	{"i915_cache_sharing", &i915_cache_sharing_fops},
> -	{"i915_ring_missed_irq", &i915_ring_missed_irq_fops},
> -	{"i915_ring_test_irq", &i915_ring_test_irq_fops},
>   	{"i915_gem_drop_caches", &i915_drop_caches_fops},
>   #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
>   	{"i915_error_state", &i915_error_state_fops},
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 825572127029..0584c8dfa6ae 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -718,8 +718,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>   	err_printf(m, "FORCEWAKE: 0x%08x\n", error->forcewake);
>   	err_printf(m, "DERRMR: 0x%08x\n", error->derrmr);
>   	err_printf(m, "CCID: 0x%08x\n", error->ccid);
> -	err_printf(m, "Missed interrupts: 0x%08lx\n",
> -		   m->i915->gpu_error.missed_irq_rings);
>   
>   	for (i = 0; i < error->nfence; i++)
>   		err_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 0e184712cbcc..99a53c0cd6da 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -203,8 +203,6 @@ struct i915_gpu_error {
>   
>   	atomic_t pending_fb_pin;
>   
> -	unsigned long missed_irq_rings;
> -
>   	/**
>   	 * State variable controlling the reset flow and count
>   	 *
> @@ -273,9 +271,6 @@ struct i915_gpu_error {
>   	 */
>   	wait_queue_head_t reset_queue;
>   
> -	/* For missed irq/seqno simulation. */
> -	unsigned long test_irq_rings;
> -
>   	struct i915_gpu_restart *restart;
>   };
>   
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index faeb0083b561..3bdfa63ea4a1 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -91,7 +91,6 @@ bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
>   
>   	spin_lock(&b->irq_lock);
>   
> -	b->irq_fired = true;
>   	if (b->irq_armed && list_empty(&b->signalers))
>   		__intel_breadcrumbs_disarm_irq(b);
>   
> @@ -155,86 +154,6 @@ static void signal_irq_work(struct irq_work *work)
>   	intel_engine_breadcrumbs_irq(engine);
>   }
>   
> -static unsigned long wait_timeout(void)
> -{
> -	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> -}
> -
> -static noinline void missed_breadcrumb(struct intel_engine_cs *engine)
> -{
> -	if (GEM_SHOW_DEBUG()) {
> -		struct drm_printer p = drm_debug_printer(__func__);
> -
> -		intel_engine_dump(engine, &p,
> -				  "%s missed breadcrumb at %pS\n",
> -				  engine->name, __builtin_return_address(0));
> -	}
> -
> -	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> -}
> -
> -static void intel_breadcrumbs_hangcheck(struct timer_list *t)
> -{
> -	struct intel_engine_cs *engine =
> -		from_timer(engine, t, breadcrumbs.hangcheck);
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	if (!b->irq_armed)
> -		return;
> -
> -	if (b->irq_fired)
> -		goto rearm;
> -
> -	/*
> -	 * We keep the hangcheck timer alive until we disarm the irq, even
> -	 * if there are no waiters at present.
> -	 *
> -	 * If the waiter was currently running, assume it hasn't had a chance
> -	 * to process the pending interrupt (e.g, low priority task on a loaded
> -	 * system) and wait until it sleeps before declaring a missed interrupt.
> -	 *
> -	 * If the waiter was asleep (and not even pending a wakeup), then we
> -	 * must have missed an interrupt as the GPU has stopped advancing
> -	 * but we still have a waiter. Assuming all batches complete within
> -	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
> -	 */
> -	synchronize_hardirq(engine->i915->drm.irq);
> -	if (intel_engine_signal_breadcrumbs(engine)) {
> -		missed_breadcrumb(engine);
> -		mod_timer(&b->fake_irq, jiffies + 1);
> -	} else {
> -rearm:
> -		b->irq_fired = false;
> -		mod_timer(&b->hangcheck, wait_timeout());
> -	}
> -}
> -
> -static void intel_breadcrumbs_fake_irq(struct timer_list *t)
> -{
> -	struct intel_engine_cs *engine =
> -		from_timer(engine, t, breadcrumbs.fake_irq);
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	/*
> -	 * The timer persists in case we cannot enable interrupts,
> -	 * or if we have previously seen seqno/interrupt incoherency
> -	 * ("missed interrupt" syndrome, better known as a "missed breadcrumb").
> -	 * Here the worker will wake up every jiffie in order to kick the
> -	 * oldest waiter to do the coherent seqno check.
> -	 */
> -
> -	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
> -		return;
> -
> -	/* If the user has disabled the fake-irq, restore the hangchecking */
> -	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings)) {
> -		mod_timer(&b->hangcheck, wait_timeout());
> -		return;
> -	}
> -
> -	mod_timer(&b->fake_irq, jiffies + 1);
> -}
> -
>   void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> @@ -257,43 +176,14 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   	spin_unlock_irq(&b->irq_lock);
>   }
>   
> -static bool use_fake_irq(const struct intel_breadcrumbs *b)
> -{
> -	const struct intel_engine_cs *engine =
> -		container_of(b, struct intel_engine_cs, breadcrumbs);
> -
> -	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
> -		return false;
> -
> -	/*
> -	 * Only start with the heavy weight fake irq timer if we have not
> -	 * seen any interrupts since enabling it the first time. If the
> -	 * interrupts are still arriving, it means we made a mistake in our
> -	 * engine->seqno_barrier(), a timing error that should be transient
> -	 * and unlikely to reoccur.
> -	 */
> -	return !b->irq_fired;
> -}
> -
> -static void enable_fake_irq(struct intel_breadcrumbs *b)
> -{
> -	/* Ensure we never sleep indefinitely */
> -	if (!b->irq_enabled || use_fake_irq(b))
> -		mod_timer(&b->fake_irq, jiffies + 1);
> -	else
> -		mod_timer(&b->hangcheck, wait_timeout());
> -}
> -
> -static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> +static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(b, struct intel_engine_cs, breadcrumbs);
> -	struct drm_i915_private *i915 = engine->i915;
> -	bool enabled;
>   
>   	lockdep_assert_held(&b->irq_lock);
>   	if (b->irq_armed)
> -		return false;
> +		return;
>   
>   	/*
>   	 * The breadcrumb irq will be disarmed on the interrupt after the
> @@ -311,16 +201,8 @@ static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	 * the driver is idle) we disarm the breadcrumbs.
>   	 */
>   
> -	/* No interrupts? Kick the waiter every jiffie! */
> -	enabled = false;
> -	if (!b->irq_enabled++ &&
> -	    !test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
> +	if (!b->irq_enabled++)
>   		irq_enable(engine);
> -		enabled = true;
> -	}
> -
> -	enable_fake_irq(b);
> -	return enabled;
>   }
>   
>   void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> @@ -331,18 +213,6 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>   	INIT_LIST_HEAD(&b->signalers);
>   
>   	init_irq_work(&b->irq_work, signal_irq_work);
> -
> -	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
> -	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
> -}
> -
> -static void cancel_fake_irq(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
> -	del_timer_sync(&b->hangcheck);
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>   }
>   
>   void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
> @@ -352,13 +222,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
>   
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
> -	/*
> -	 * Leave the fake_irq timer enabled (if it is running), but clear the
> -	 * bit so that it turns itself off on its next wake up and goes back
> -	 * to the long hangcheck interval if still required.
> -	 */
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> -
>   	if (b->irq_enabled)
>   		irq_enable(engine);
>   	else
> @@ -369,7 +232,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
>   
>   void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	cancel_fake_irq(engine);
>   }
>   
>   bool intel_engine_enable_signaling(struct i915_request *rq)
> @@ -451,7 +313,4 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
>   		}
>   	}
>   	spin_unlock_irq(&b->irq_lock);
> -
> -	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
> -		drm_printf(p, "Fake irq active\n");
>   }
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 5662d6fed523..a219c796e56d 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -275,6 +275,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   	for_each_engine(engine, dev_priv, id) {
>   		struct hangcheck hc;
>   
> +		intel_engine_signal_breadcrumbs(engine);
> +

Sounds fine. So only downside is detecting missed interrupts gets 
slower. But in practice they don't happen often?

>   		hangcheck_load_sample(engine, &hc);
>   		hangcheck_accumulate_sample(engine, &hc);
>   		hangcheck_store_sample(engine, &hc);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b78cb9bd4bc2..7eec96cf2a0b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -382,14 +382,9 @@ struct intel_engine_cs {
>   
>   		struct irq_work irq_work;
>   
> -		struct timer_list fake_irq; /* used after a missed interrupt */
> -		struct timer_list hangcheck; /* detect missed interrupts */
> -
> -		unsigned int hangcheck_interrupts;
>   		unsigned int irq_enabled;
>   
>   		bool irq_armed;
> -		bool irq_fired;
>   	} breadcrumbs;
>   
>   	struct {
> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> index 5deb485fb942..3e902761cd16 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> @@ -35,7 +35,6 @@ int igt_live_test_begin(struct igt_live_test *t,
>   		return err;
>   	}
>   
> -	i915->gpu_error.missed_irq_rings = 0;
>   	t->reset_global = i915_reset_count(&i915->gpu_error);
>   
>   	for_each_engine(engine, i915, id)
> @@ -75,11 +74,5 @@ int igt_live_test_end(struct igt_live_test *t)
>   		return -EIO;
>   	}
>   
> -	if (i915->gpu_error.missed_irq_rings) {
> -		pr_err("%s(%s): Missed interrupts on engines %lx\n",
> -		       t->func, t->name, i915->gpu_error.missed_irq_rings);
> -		return -EIO;
> -	}
> -
>   	return 0;
>   }
> 

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 29/34] drm/i915: Drop fake breadcrumb irq
  2019-01-24 17:55   ` Tvrtko Ursulin
@ 2019-01-24 18:18     ` Chris Wilson
  0 siblings, 0 replies; 89+ messages in thread
From: Chris Wilson @ 2019-01-24 18:18 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-24 17:55:20)
> 
> On 21/01/2019 22:21, Chris Wilson wrote:
> > Missed breadcrumb detection is defunct due to the tight coupling with
> 
> How it is defunct.. oh because there is no irq_count any more... could 
> you describe the transition from irq_count to irq_fired and then to 
> nothing briefly?

We don't have an independent intel_wait to distinguish irq completions
from dma_fence_signals. Everytime we call dma_fence_signal() we think we
saw an interrupt, and we complete fences very often before we see
interrupts... And then our test completely fails to setup the machine to
detect the missed breadcrumb as the requests get completed by anything
but the missed breadcrumb timer.

> > dma_fence signaling and the myriad ways we may signal fences from
> > everywhere but from an interrupt, i.e. we frequently signal a fence
> > before we even see its interrupt. This means that even if we miss an
> > interrupt for a fence, it still is signaled before our breadcrumb
> > hangcheck fires, so simplify the breadcrumb hangchecking by moving it
> > into the GPU hangcheck and forgo fake interrupts.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c           |  93 -----------
> >   drivers/gpu/drm/i915/i915_gpu_error.c         |   2 -
> >   drivers/gpu/drm/i915/i915_gpu_error.h         |   5 -
> >   drivers/gpu/drm/i915/intel_breadcrumbs.c      | 147 +-----------------
> >   drivers/gpu/drm/i915/intel_hangcheck.c        |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.h       |   5 -
> >   .../gpu/drm/i915/selftests/igt_live_test.c    |   7 -
> >   7 files changed, 5 insertions(+), 256 deletions(-)
> 
> With this balance of insertions and deletions I cannot decide if this is 
> easy or hard to review.
> 
> IGT uses intel_detect_and_clear_missed_interrupts a lot. What is the 
> plan there?

They are defunct. They no longer detect anything useful from the
previous patch, this just makes it official. igt has been prepped for
the loss of the debugfs.

Without this patch we get false positives from i915_missed_interrupt
instead.

I've tried and failed to replace the detection in an acceptable manner,
without a separate irq completion tracker it seems hopeless.

> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index 5662d6fed523..a219c796e56d 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -275,6 +275,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >       for_each_engine(engine, dev_priv, id) {
> >               struct hangcheck hc;
> >   
> > +             intel_engine_signal_breadcrumbs(engine);
> > +
> 
> Sounds fine. So only downside is detecting missed interrupts gets 
> slower. But in practice they don't happen often?

In practice, no missed interrupts. *touch wood*
That was why fixing gen5-gen7 beforehand was so important.

Having a signal here and in retire_work, means that in the worst case
everything updates once a second. Enough for users to be able to
complain. But more than likely every frame update will flush the earlier
requests anyway, hence why we couldn't detect missed breadcrumbs in igt
in the first place.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2019-01-24 18:18 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-21 22:20 HWSP for HW semaphores Chris Wilson
2019-01-21 22:20 ` [PATCH 01/34] drm/i915/execlists: Mark up priority boost on preemption Chris Wilson
2019-01-21 22:20 ` [PATCH 02/34] drm/i915/execlists: Suppress preempting self Chris Wilson
2019-01-22 22:18   ` John Harrison
2019-01-22 22:38     ` Chris Wilson
2019-01-21 22:20 ` [PATCH 03/34] drm/i915: Show all active engines on hangcheck Chris Wilson
2019-01-22 12:33   ` Mika Kuoppala
2019-01-22 12:42     ` Chris Wilson
2019-01-21 22:20 ` [PATCH 04/34] drm/i915/selftests: Refactor common live_test framework Chris Wilson
2019-01-22 12:37   ` Matthew Auld
2019-01-21 22:20 ` [PATCH 05/34] drm/i915/selftests: Track evict objects explicitly Chris Wilson
2019-01-22 11:53   ` Matthew Auld
2019-01-21 22:20 ` [PATCH 06/34] drm/i915/selftests: Create a clean GGTT for vma/gtt selftesting Chris Wilson
2019-01-22 12:07   ` Matthew Auld
2019-01-21 22:20 ` [PATCH 07/34] drm/i915: Refactor out intel_context_init() Chris Wilson
2019-01-22 12:32   ` Matthew Auld
2019-01-22 12:39   ` Mika Kuoppala
2019-01-22 12:48     ` Chris Wilson
2019-01-21 22:20 ` [PATCH 08/34] drm/i915: Make all GPU resets atomic Chris Wilson
2019-01-22 22:19   ` John Harrison
2019-01-22 22:27     ` Chris Wilson
2019-01-23  8:52     ` Mika Kuoppala
2019-01-21 22:20 ` [PATCH 09/34] drm/i915/guc: Disable global reset Chris Wilson
2019-01-22 22:23   ` John Harrison
2019-01-21 22:20 ` [PATCH 10/34] drm/i915: Remove GPU reset dependence on struct_mutex Chris Wilson
2019-01-24 12:06   ` Mika Kuoppala
2019-01-24 12:50     ` Chris Wilson
2019-01-24 13:12       ` Chris Wilson
2019-01-24 14:10       ` Chris Wilson
2019-01-21 22:20 ` [PATCH 11/34] drm/i915/selftests: Trim struct_mutex duration for set-wedged selftest Chris Wilson
2019-01-21 22:20 ` [PATCH 12/34] drm/i915: Issue engine resets onto idle engines Chris Wilson
2019-01-23  1:18   ` John Harrison
2019-01-23  1:31     ` Chris Wilson
2019-01-21 22:20 ` [PATCH 13/34] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
2019-01-21 22:20 ` [PATCH 14/34] drm/i915: Pull VM lists under the VM mutex Chris Wilson
2019-01-22  9:09   ` Tvrtko Ursulin
2019-01-21 22:20 ` [PATCH 15/34] drm/i915: Move vma lookup to its own lock Chris Wilson
2019-01-21 22:20 ` [PATCH 16/34] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
2019-01-21 22:21 ` [PATCH 17/34] drm/i915: Move list of timelines under its own lock Chris Wilson
2019-01-21 22:21 ` [PATCH 18/34] drm/i915/selftests: Use common mock_engine::advance Chris Wilson
2019-01-22  9:33   ` Tvrtko Ursulin
2019-01-21 22:21 ` [PATCH 19/34] drm/i915: Tidy common test_bit probing of i915_request->fence.flags Chris Wilson
2019-01-22  9:35   ` Tvrtko Ursulin
2019-01-21 22:21 ` [PATCH 20/34] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
2019-01-23  1:35   ` John Harrison
2019-01-21 22:21 ` [PATCH 21/34] drm/i915: Enlarge vma->pin_count Chris Wilson
2019-01-21 22:21 ` [PATCH 22/34] drm/i915: Allocate a status page for each timeline Chris Wilson
2019-01-21 22:21 ` [PATCH 23/34] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
2019-01-22 10:47   ` Tvrtko Ursulin
2019-01-22 11:12     ` Chris Wilson
2019-01-22 11:33       ` Tvrtko Ursulin
2019-01-21 22:21 ` [PATCH 24/34] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
2019-01-22 12:24   ` Tvrtko Ursulin
2019-01-21 22:21 ` [PATCH 25/34] drm/i915: Track active timelines Chris Wilson
2019-01-22 14:56   ` Tvrtko Ursulin
2019-01-22 15:17     ` Chris Wilson
2019-01-23 22:32       ` John Harrison
2019-01-23 23:08         ` Chris Wilson
2019-01-21 22:21 ` [PATCH 26/34] drm/i915: Identify active requests Chris Wilson
2019-01-22 15:34   ` Tvrtko Ursulin
2019-01-22 15:45     ` Chris Wilson
2019-01-21 22:21 ` [PATCH 27/34] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
2019-01-22 15:50   ` Tvrtko Ursulin
2019-01-23 12:54     ` Chris Wilson
2019-01-23 13:18       ` Tvrtko Ursulin
2019-01-23 13:24         ` Chris Wilson
2019-01-21 22:21 ` [PATCH 28/34] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
2019-01-23  9:21   ` Tvrtko Ursulin
2019-01-23 10:01     ` Chris Wilson
2019-01-23 16:28       ` Tvrtko Ursulin
2019-01-23 11:41   ` [PATCH] " Chris Wilson
2019-01-21 22:21 ` [PATCH 29/34] drm/i915: Drop fake breadcrumb irq Chris Wilson
2019-01-24 17:55   ` Tvrtko Ursulin
2019-01-24 18:18     ` Chris Wilson
2019-01-21 22:21 ` [PATCH 30/34] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
2019-01-21 22:37   ` Chris Wilson
2019-01-21 22:48     ` Chris Wilson
2019-01-21 22:21 ` [PATCH 31/34] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
2019-01-21 22:21 ` [PATCH 32/34] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-01-21 22:21 ` [PATCH 33/34] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-01-23  0:33   ` Chris Wilson
2019-01-21 22:21 ` [PATCH 34/34] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
2019-01-22  0:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption Patchwork
2019-01-22  0:22 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-01-22  0:30 ` ✓ Fi.CI.BAT: success " Patchwork
2019-01-22  1:35 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-01-23 12:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/34] drm/i915/execlists: Mark up priority boost on preemption (rev2) Patchwork
2019-01-23 12:11 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-01-23 12:48 ` ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.