[PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device
@ 2019-01-28  1:02 Chris Wilson
  2019-01-28  1:02 ` [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint Chris Wilson
                   ` (31 more replies)
  0 siblings, 32 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

During igt, we ask to reset the device if any requests are still
outstanding at the end of a test, as this quickly kills off any
erroneous hanging request streams that may escape a test. However, since
it may take the device a few milliseconds to flush itself after the end
of a normal test, *cough* guc *cough*, we may accidentally tell the
device to reset itself after it idles. If we wait a moment, our usual
I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
than umpteen hangchecks!), we can differentiate better between a stuck
engine and a healthy one, and so avoid prematurely forcing the reset and
any extra complications that may entail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3b995f9fdc06..e46de507fea2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4051,7 +4051,8 @@ i915_drop_caches_set(void *data, u64 val)
 		  val, val & DROP_ALL);
 	wakeref = intel_runtime_pm_get(i915);

-	if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
+	if (val & DROP_RESET_ACTIVE &&
+	    wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
 		i915_gem_set_wedged(i915);

 	/* No need to check and wait for gpu resets, only libdrm auto-restarts
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28 10:56   ` Tvrtko Ursulin
  2019-01-28  1:02 ` [PATCH 03/28] drm/i915/execlists: Suppress preempting self Chris Wilson
                   ` (30 subsequent siblings)
  31 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

After noticing that we trigger preemption events for currently executing
requests, as well as requests that complete before the preemption and
attempting to suppress those preemption events, it is wise to not
consider the queue_priority to be authoritative. As we only track the
maximum priority seen between dequeue passes, if the maximum priority
request is no longer available for dequeuing (it completed or is even
executing on another engine), we have no knowledge of the previous
queue_priority as it would require us to keep a full history of enqueued
requests -- but we already have that history in the priolists!

Rename the queue_priority to preempt_priority_hint so that we do not
confuse it as being the maximum priority in the queue, but merely an
indication that we have seen a new maximum priority value and as such we
should check whether it should preempt the currently running request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c       | 11 +++++------
 drivers/gpu/drm/i915/intel_engine_cs.c      |  4 ++--
 drivers/gpu/drm/i915/intel_guc_submission.c |  5 +++--
 drivers/gpu/drm/i915/intel_lrc.c            | 20 +++++++++++---------
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  8 ++++++--
 5 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 340faea6c08a..0da718ceab43 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -127,8 +127,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static void assert_priolists(struct intel_engine_execlists * const execlists,
-			     long queue_priority)
+static void assert_priolists(struct intel_engine_execlists * const execlists)
 {
 	struct rb_node *rb;
 	long last_prio, i;
@@ -139,7 +138,7 @@ static void assert_priolists(struct intel_engine_execlists * const execlists,
 	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
 		   rb_first(&execlists->queue.rb_root));
 
-	last_prio = (queue_priority >> I915_USER_PRIORITY_SHIFT) + 1;
+	last_prio = (INT_MAX >> I915_USER_PRIORITY_SHIFT) + 1;
 	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
 		const struct i915_priolist *p = to_priolist(rb);
 
@@ -166,7 +165,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	int idx, i;
 
 	lockdep_assert_held(&engine->timeline.lock);
-	assert_priolists(execlists, INT_MAX);
+	assert_priolists(execlists);
 
 	/* buckets sorted from highest [in slot 0] to lowest priority */
 	idx = I915_PRIORITY_COUNT - (prio & I915_PRIORITY_MASK) - 1;
@@ -353,7 +352,7 @@ static void __i915_schedule(struct i915_request *rq,
 				continue;
 		}
 
-		if (prio <= engine->execlists.queue_priority)
+		if (prio <= engine->execlists.preempt_priority_hint)
 			continue;
 
 		/*
@@ -366,7 +365,7 @@ static void __i915_schedule(struct i915_request *rq,
 			continue;
 
 		/* Defer (tasklet) submission until after all of our updates. */
-		engine->execlists.queue_priority = prio;
+		engine->execlists.preempt_priority_hint = prio;
 		tasklet_hi_schedule(&engine->execlists.tasklet);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1a5c163b98d6..1ffec0d69157 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -480,7 +480,7 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 	GEM_BUG_ON(!is_power_of_2(execlists_num_ports(execlists)));
 	GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS);
 
-	execlists->queue_priority = INT_MIN;
+	execlists->preempt_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
 }
 
@@ -1156,7 +1156,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 		}
 
 		/* Must be reset upon idling, or we may miss the busy wakeup. */
-		GEM_BUG_ON(engine->execlists.queue_priority != INT_MIN);
+		GEM_BUG_ON(engine->execlists.preempt_priority_hint != INT_MIN);
 
 		if (engine->park)
 			engine->park(engine);
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 45e2db683fe5..1bf6ac76ad99 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -731,7 +731,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 		if (intel_engine_has_preemption(engine)) {
 			struct guc_preempt_work *preempt_work =
 				&engine->i915->guc.preempt_work[engine->id];
-			int prio = execlists->queue_priority;
+			int prio = execlists->preempt_priority_hint;
 
 			if (__execlists_need_preempt(prio, port_prio(port))) {
 				execlists_set_active(execlists,
@@ -777,7 +777,8 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 			kmem_cache_free(engine->i915->priorities, p);
 	}
 done:
-	execlists->queue_priority = rb ? to_priolist(rb)->priority : INT_MIN;
+	execlists->preempt_priority_hint =
+		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit)
 		port_assign(port, last);
 	if (last)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 185867106c14..71006b031f54 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -584,7 +584,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last, execlists->queue_priority)) {
+		if (need_preempt(engine, last, execlists->preempt_priority_hint)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -692,20 +692,20 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	/*
 	 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer.
 	 *
-	 * We choose queue_priority such that if we add a request of greater
+	 * We choose the priority hint such that if we add a request of greater
 	 * priority than this, we kick the submission tasklet to decide on
 	 * the right order of submitting the requests to hardware. We must
 	 * also be prepared to reorder requests as they are in-flight on the
-	 * HW. We derive the queue_priority then as the first "hole" in
+	 * HW. We derive the priority hint then as the first "hole" in
 	 * the HW submission ports and if there are no available slots,
 	 * the priority of the lowest executing request, i.e. last.
 	 *
 	 * When we do receive a higher priority request ready to run from the
-	 * user, see queue_request(), the queue_priority is bumped to that
+	 * user, see queue_request(), the priority hint is bumped to that
 	 * request triggering preemption on the next dequeue (or subsequent
 	 * interrupt for secondary ports).
 	 */
-	execlists->queue_priority =
+	execlists->preempt_priority_hint =
 		port != execlists->port ? rq_prio(last) : INT_MIN;
 
 	if (submit) {
@@ -853,7 +853,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	execlists->queue_priority = INT_MIN;
+	execlists->preempt_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
 	GEM_BUG_ON(port_isset(execlists->port));
 
@@ -1083,8 +1083,8 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 
 static void submit_queue(struct intel_engine_cs *engine, int prio)
 {
-	if (prio > engine->execlists.queue_priority) {
-		engine->execlists.queue_priority = prio;
+	if (prio > engine->execlists.preempt_priority_hint) {
+		engine->execlists.preempt_priority_hint = prio;
 		__submit_queue_imm(engine);
 	}
 }
@@ -2718,7 +2718,9 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 
 	last = NULL;
 	count = 0;
-	drm_printf(m, "\t\tQueue priority: %d\n", execlists->queue_priority);
+	if (execlists->preempt_priority_hint != INT_MIN)
+		drm_printf(m, "\t\tPreempt priority hint: %d\n",
+			   execlists->preempt_priority_hint);
 	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
 		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
 		int i;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index f2effd001540..71a41fec738f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -294,14 +294,18 @@ struct intel_engine_execlists {
 	unsigned int port_mask;
 
 	/**
-	 * @queue_priority: Highest pending priority.
+	 * @preempt_priority_hint: Highest pending priority.
 	 *
 	 * When we add requests into the queue, or adjust the priority of
 	 * executing requests, we compute the maximum priority of those
 	 * pending requests. We can then use this value to determine if
 	 * we need to preempt the executing requests to service the queue.
+	 * However, since the we may have recorded the priority of an inflight
+	 * request we wanted to preempt but since completed, at the time of
+	 * dequeuing the priority hint may no longer may match the highest
+	 * available request priority.
 	 */
-	int queue_priority;
+	int preempt_priority_hint;
 
 	/**
 	 * @queue: queue of requests, in priority lists
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 03/28] drm/i915/execlists: Suppress preempting self
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
  2019-01-28  1:02 ` [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 04/28] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.

The sequence of events where we end up preempting for no avail is:

1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling

v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest

References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c       |  20 ++-
 drivers/gpu/drm/i915/intel_guc_submission.c |   2 +
 drivers/gpu/drm/i915/intel_lrc.c            |  96 ++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h     |   1 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c  | 131 ++++++++++++++++++++
 5 files changed, 238 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 0da718ceab43..7db1255665a8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -238,6 +238,18 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
 	return engine;
 }
 
+static bool inflight(const struct i915_request *rq,
+		     const struct intel_engine_cs *engine)
+{
+	const struct i915_request *active;
+
+	if (!rq->global_seqno)
+		return false;
+
+	active = port_request(engine->execlists.port);
+	return active->hw_context == rq->hw_context;
+}
+
 static void __i915_schedule(struct i915_request *rq,
 			    const struct i915_sched_attr *attr)
 {
@@ -327,6 +339,7 @@ static void __i915_schedule(struct i915_request *rq,
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		engine = sched_lock_engine(node, engine);
+		lockdep_assert_held(&engine->timeline.lock);
 
 		/* Recheck after acquiring the engine->timeline.lock */
 		if (prio <= node->attr.priority || node_signaled(node))
@@ -355,17 +368,16 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= engine->execlists.preempt_priority_hint)
 			continue;
 
+		engine->execlists.preempt_priority_hint = prio;
+
 		/*
 		 * If we are already the currently executing context, don't
 		 * bother evaluating if we should preempt ourselves.
 		 */
-		if (node_to_request(node)->global_seqno &&
-		    i915_seqno_passed(port_request(engine->execlists.port)->global_seqno,
-				      node_to_request(node)->global_seqno))
+		if (inflight(node_to_request(node), engine))
 			continue;
 
 		/* Defer (tasklet) submission until after all of our updates. */
-		engine->execlists.preempt_priority_hint = prio;
 		tasklet_hi_schedule(&engine->execlists.tasklet);
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 1bf6ac76ad99..4cab0826100c 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -623,6 +623,8 @@ static void inject_preempt_context(struct work_struct *work)
 				       EXECLISTS_ACTIVE_PREEMPT);
 		tasklet_schedule(&engine->execlists.tasklet);
 	}
+
+	(void)I915_SELFTEST_ONLY(engine->execlists.preempt_hang.count++);
 }
 
 /*
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 71006b031f54..79adf9827f02 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -182,13 +182,90 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static int queue_prio(const struct intel_engine_execlists *execlists)
+{
+	struct i915_priolist *p;
+	struct rb_node *rb;
+
+	rb = rb_first_cached(&execlists->queue);
+	if (!rb)
+		return INT_MIN;
+
+	/*
+	 * As the priolist[] are inverted, with the highest priority in [0],
+	 * we have to flip the index value to become priority.
+	 */
+	p = to_priolist(rb);
+	return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used);
+}
+
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *last,
-				int prio)
+				const struct i915_request *rq)
+{
+	const int last_prio = rq_prio(rq);
+
+	if (!intel_engine_has_preemption(engine))
+		return false;
+
+	if (i915_request_completed(rq))
+		return false;
+
+	/*
+	 * Check if the current priority hint merits a preemption attempt.
+	 *
+	 * We record the highest value priority we saw during rescheduling
+	 * prior to this dequeue, therefore we know that if it is strictly
+	 * less than the current tail of ESLP[0], we do not need to force
+	 * a preempt-to-idle cycle.
+	 *
+	 * However, the priority hint is a mere hint that we may need to
+	 * preempt. If that hint is stale or we may be trying to preempt
+	 * ourselves, ignore the request.
+	 */
+	if (!__execlists_need_preempt(engine->execlists.preempt_priority_hint,
+				      last_prio))
+		return false;
+
+	/*
+	 * Check against the first request in ELSP[1], it will, thanks to the
+	 * power of PI, be the highest priority of that context.
+	 */
+	if (!list_is_last(&rq->link, &engine->timeline.requests) &&
+	    rq_prio(list_next_entry(rq, link)) > last_prio)
+		return true;
+
+	/*
+	 * If the inflight context did not trigger the preemption, then maybe
+	 * it was the set of queued requests? Pick the highest priority in
+	 * the queue (the first active priolist) and see if it deserves to be
+	 * running instead of ELSP[0].
+	 *
+	 * The highest priority request in the queue can not be either
+	 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same
+	 * context, it's priority would not exceed ELSP[0] aka last_prio.
+	 */
+	return queue_prio(&engine->execlists) > last_prio;
+}
+
+__maybe_unused static inline bool
+assert_priority_queue(const struct intel_engine_execlists *execlists,
+		      const struct i915_request *prev,
+		      const struct i915_request *next)
 {
-	return (intel_engine_has_preemption(engine) &&
-		__execlists_need_preempt(prio, rq_prio(last)) &&
-		!i915_request_completed(last));
+	if (!prev)
+		return true;
+
+	/*
+	 * Without preemption, the prev may refer to the still active element
+	 * which we refuse to let go.
+	 *
+	 * Even with preemption, there are times when we think it is better not
+	 * to preempt and leave an ostensibly lower priority request in flight.
+	 */
+	if (port_request(execlists->port) == prev)
+		return true;
+
+	return rq_prio(prev) >= rq_prio(next);
 }
 
 /*
@@ -516,6 +593,8 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
 	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+
+	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
 }
 
 static void complete_preempt_context(struct intel_engine_execlists *execlists)
@@ -584,7 +663,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last, execlists->preempt_priority_hint)) {
+		if (need_preempt(engine, last)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -630,8 +709,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
-			GEM_BUG_ON(last &&
-				   need_preempt(engine, last, rq_prio(rq)));
+			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
 
 			/*
 			 * Can we combine this request with the current port?
@@ -876,6 +954,8 @@ static void process_csb(struct intel_engine_cs *engine)
 	const u32 * const buf = execlists->csb_status;
 	u8 head, tail;
 
+	lockdep_assert_held(&engine->timeline.lock);
+
 	/*
 	 * Note that csb_write, csb_status may be either in HWSP or mmio.
 	 * When reading from the csb_write mmio register, we have to be
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 71a41fec738f..246311fcdde3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -204,6 +204,7 @@ struct i915_priolist {
 
 struct st_preempt_hang {
 	struct completion completion;
+	unsigned int count;
 	bool inject_hang;
 };
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 2b2ecd76c2ac..78a2f958f164 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -268,6 +268,136 @@ static int live_late_preempt(void *arg)
 	goto err_ctx_lo;
 }
 
+struct preempt_client {
+	struct igt_spinner spin;
+	struct i915_gem_context *ctx;
+};
+
+static int preempt_client_init(struct drm_i915_private *i915,
+			       struct preempt_client *c)
+{
+	c->ctx = kernel_context(i915);
+	if (!c->ctx)
+		return -ENOMEM;
+
+	if (igt_spinner_init(&c->spin, i915))
+		goto err_ctx;
+
+	return 0;
+
+err_ctx:
+	kernel_context_close(c->ctx);
+	return -ENOMEM;
+}
+
+static void preempt_client_fini(struct preempt_client *c)
+{
+	igt_spinner_fini(&c->spin);
+	kernel_context_close(c->ctx);
+}
+
+static int live_suppress_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_sched_attr attr = {
+		.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX)
+	};
+	struct preempt_client a, b;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+
+	/*
+	 * Verify that if a preemption request does not cause a change in
+	 * the current execution order, the preempt-to-idle injection is
+	 * skipped and that we do not accidentally apply it after the CS
+	 * completion event.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	if (USES_GUC_SUBMISSION(i915))
+	       return 0; /* presume black blox */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &a))
+		goto err_unlock;
+	if (preempt_client_init(i915, &b))
+		goto err_client_a;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq_a, *rq_b;
+
+		engine->execlists.preempt_hang.count = 0;
+
+		rq_a = igt_spinner_create_request(&a.spin, a.ctx, engine,
+						  MI_NOOP);
+		if (IS_ERR(rq_a)) {
+			err = PTR_ERR(rq_a);
+			goto err_client_b;
+		}
+
+		i915_request_add(rq_a);
+		if (!igt_wait_for_spinner(&a.spin, rq_a)) {
+			pr_err("First client failed to start\n");
+			goto err_wedged;
+		}
+
+		rq_b = igt_spinner_create_request(&b.spin, b.ctx, engine,
+						  MI_NOOP);
+		if (IS_ERR(rq_b)) {
+			igt_spinner_end(&a.spin);
+			err = PTR_ERR(rq_b);
+			goto err_client_b;
+		}
+		i915_request_add(rq_b);
+
+		GEM_BUG_ON(i915_request_completed(rq_a));
+		engine->schedule(rq_a, &attr);
+		igt_spinner_end(&a.spin);
+
+		if (!igt_wait_for_spinner(&b.spin, rq_b)) {
+			pr_err("Second client failed to start\n");
+			goto err_wedged;
+		}
+		igt_spinner_end(&b.spin);
+
+		if (igt_flush_test(i915, I915_WAIT_LOCKED))
+			goto err_wedged;
+		GEM_BUG_ON(!i915_request_completed(rq_b));
+
+		if (engine->execlists.preempt_hang.count) {
+			pr_err("Preemption recorded x%d; should have been suppressed!\n",
+			       engine->execlists.preempt_hang.count);
+			err = -EINVAL;
+			goto err_client_b;
+		}
+	}
+
+	err = 0;
+err_client_b:
+	preempt_client_fini(&b);
+err_client_a:
+	preempt_client_fini(&a);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	igt_spinner_end(&b.spin);
+	igt_spinner_end(&a.spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_b;
+}
+
 static int live_preempt_hang(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -647,6 +777,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_sanitycheck),
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
+		SUBTEST(live_suppress_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 04/28] drm/i915/execlists: Suppress redundant preemption
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
  2019-01-28  1:02 ` [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint Chris Wilson
  2019-01-28  1:02 ` [PATCH 03/28] drm/i915/execlists: Suppress preempting self Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 05/28] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 39 ++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 79adf9827f02..12db24b30e78 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,6 +164,8 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -182,6 +184,34 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static inline int active_prio(const struct i915_request *rq)
+{
+	int prio = rq_prio(rq);
+
+	/*
+	 * On unwinding the active request, we give it a priority bump
+	 * equivalent to a freshly submitted request. This protects it from
+	 * being gazumped again, but it would be preferable if we didn't
+	 * let it be gazumped in the first place!
+	 *
+	 * See __unwind_incomplete_requests()
+	 */
+	if ((prio & ACTIVE_PRIORITY) != ACTIVE_PRIORITY &&
+	    i915_request_started(rq)) {
+		/*
+		 * After preemption, we insert the active request at the
+		 * end of the new priority level. This means that we will be
+		 * _lower_ priority than the preemptee all things equal (and
+		 * so the preemption is valid), so adjust our comparison
+		 * accordingly.
+		 */
+		prio |= ACTIVE_PRIORITY;
+		prio--;
+	}
+
+	return prio;
+}
+
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
 	struct i915_priolist *p;
@@ -202,7 +232,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq)
 {
-	const int last_prio = rq_prio(rq);
+	int last_prio;
 
 	if (!intel_engine_has_preemption(engine))
 		return false;
@@ -222,6 +252,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * preempt. If that hint is stale or we may be trying to preempt
 	 * ourselves, ignore the request.
 	 */
+	last_prio = active_prio(rq);
 	if (!__execlists_need_preempt(engine->execlists.preempt_priority_hint,
 				      last_prio))
 		return false;
@@ -347,7 +378,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
+	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
 
 	lockdep_assert_held(&engine->timeline.lock);
 
@@ -379,8 +410,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
-		prio |= I915_PRIORITY_NEWCLIENT;
+	if ((prio & ACTIVE_PRIORITY) != ACTIVE_PRIORITY) {
+		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 05/28] drm/i915/selftests: Exercise some AB...BA preemption chains
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (2 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 04/28] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Build a chain using 2 contexts (A, B) then request a preemption such
that a later A request runs before the spinner in B.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 103 +++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 78a2f958f164..786747cec5b2 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -4,6 +4,8 @@
  * Copyright © 2018 Intel Corporation
  */
 
+#include <linux/prime_numbers.h>
+
 #include "../i915_reset.h"
 
 #include "../i915_selftest.h"
@@ -398,6 +400,106 @@ static int live_suppress_preempt(void *arg)
 	goto err_client_b;
 }
 
+static int live_chain_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct preempt_client hi, lo;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+
+	/*
+	 * Build a chain AB...BA between two contexts (A, B) and request
+	 * preemption of the last request. It should then complete before
+	 * the previously submitted spinner in B.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &hi))
+		goto err_unlock;
+
+	if (preempt_client_init(i915, &lo))
+		goto err_client_hi;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_sched_attr attr = {
+			.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
+		};
+		int count, i;
+
+		for_each_prime_number_from(count, 1, 32) { /* must fit ring! */
+			struct i915_request *rq;
+
+			rq = igt_spinner_create_request(&hi.spin,
+							hi.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			if (!igt_wait_for_spinner(&hi.spin, rq))
+				goto err_wedged;
+
+			rq = igt_spinner_create_request(&lo.spin,
+							lo.ctx, engine,
+							MI_ARB_CHECK);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+
+			for (i = 0; i < count; i++) {
+				rq = i915_request_alloc(engine, lo.ctx);
+				if (IS_ERR(rq))
+					goto err_wedged;
+				i915_request_add(rq);
+			}
+
+			rq = i915_request_alloc(engine, hi.ctx);
+			if (IS_ERR(rq))
+				goto err_wedged;
+			i915_request_add(rq);
+			engine->schedule(rq, &attr);
+
+			igt_spinner_end(&hi.spin);
+			if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+				struct drm_printer p =
+					drm_info_printer(i915->drm.dev);
+
+				pr_err("Failed to preempt over chain of %d\n",
+				       count);
+				intel_engine_dump(engine, &p,
+						  "%s\n", engine->name);
+				goto err_wedged;
+			}
+			igt_spinner_end(&lo.spin);
+		}
+	}
+
+	err = 0;
+err_client_lo:
+	preempt_client_fini(&lo);
+err_client_hi:
+	preempt_client_fini(&hi);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	igt_spinner_end(&hi.spin);
+	igt_spinner_end(&lo.spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_lo;
+}
+
 static int live_preempt_hang(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -778,6 +880,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_preempt),
+		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (3 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 05/28] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28 10:09   ` Tvrtko Ursulin
  2019-01-28  1:02 ` [PATCH 07/28] drm/i915: Pull VM lists under the VM mutex Chris Wilson
                   ` (26 subsequent siblings)
  31 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Our goal is to remove struct_mutex and replace it with fine grained
locking. One of the thorny issues is our eviction logic for reclaiming
space for an execbuffer (or GTT mmaping, among a few other examples).
While eviction itself is easy to move under a per-VM mutex, performing
the activity tracking is less agreeable. One solution is not to do any
MRU tracking and do a simple coarse evaluation during eviction of
active/inactive, with a loose temporal ordering of last
insertion/evaluation. That keeps all the locking constrained to when we
are manipulating the VM itself, neatly avoiding the tricky handling of
possible recursive locking during execbuf and elsewhere.

Note that discarding the MRU (currently implemented as a pair of lists,
to avoid scanning the active list for a NONBLOCKING search) is unlikely
to impact upon our efficiency to reclaim VM space (where we think a LRU
model is best) as our current strategy is to use random idle replacement
first before doing a search, and over time the use of softpinned 48b
per-ppGTT is growing (thereby eliminating any need to perform any eviction
searches, in theory at least) with the remaining users being found on
much older devices (gen2-gen6).

v2: Changelog and commentary rewritten to elaborate on the duality of a
single list being both an inactive and active list.
v3: Consolidate bool parameters into a single set of flags; don't
comment on the duality of a single variable being a multiplicity of
bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c               | 10 +--
 drivers/gpu/drm/i915/i915_gem_evict.c         | 87 +++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 15 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.h           | 26 +-----
 drivers/gpu/drm/i915/i915_gem_shrinker.c      |  8 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c        |  3 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         | 42 ++++-----
 drivers/gpu/drm/i915/i915_vma.c               |  9 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  2 +-
 10 files changed, 95 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index dcbe644869b3..12a0a80bc989 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -255,10 +255,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 
 	pinned = ggtt->vm.reserved;
 	mutex_lock(&dev->struct_mutex);
-	list_for_each_entry(vma, &ggtt->vm.active_list, vm_link)
-		if (i915_vma_is_pinned(vma))
-			pinned += vma->node.size;
-	list_for_each_entry(vma, &ggtt->vm.inactive_list, vm_link)
+	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
 		if (i915_vma_is_pinned(vma))
 			pinned += vma->node.size;
 	mutex_unlock(&dev->struct_mutex);
@@ -1541,13 +1538,10 @@ static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
 
 	for_each_ggtt_vma(vma, obj) {
-		if (i915_vma_is_active(vma))
-			continue;
-
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
-		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 	}
 
 	i915 = to_i915(obj->base.dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index f6855401f247..d76839670632 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -126,31 +126,25 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	struct drm_i915_private *dev_priv = vm->i915;
 	struct drm_mm_scan scan;
 	struct list_head eviction_list;
-	struct list_head *phases[] = {
-		&vm->inactive_list,
-		&vm->active_list,
-		NULL,
-	}, **phase;
 	struct i915_vma *vma, *next;
 	struct drm_mm_node *node;
 	enum drm_mm_insert_mode mode;
+	struct i915_vma *active;
 	int ret;
 
 	lockdep_assert_held(&vm->i915->drm.struct_mutex);
 	trace_i915_gem_evict(vm, min_size, alignment, flags);
 
 	/*
-	 * The goal is to evict objects and amalgamate space in LRU order.
-	 * The oldest idle objects reside on the inactive list, which is in
-	 * retirement order. The next objects to retire are those in flight,
-	 * on the active list, again in retirement order.
+	 * The goal is to evict objects and amalgamate space in rough LRU order.
+	 * Since both active and inactive objects reside on the same list,
+	 * in a mix of creation and last scanned order, as we process the list
+	 * we sort it into inactive/active, which keeps the active portion
+	 * in a rough MRU order.
 	 *
 	 * The retirement sequence is thus:
-	 *   1. Inactive objects (already retired)
-	 *   2. Active objects (will stall on unbinding)
-	 *
-	 * On each list, the oldest objects lie at the HEAD with the freshest
-	 * object on the TAIL.
+	 *   1. Inactive objects (already retired, random order)
+	 *   2. Active objects (will stall on unbinding, oldest scanned first)
 	 */
 	mode = DRM_MM_INSERT_BEST;
 	if (flags & PIN_HIGH)
@@ -169,17 +163,46 @@ i915_gem_evict_something(struct i915_address_space *vm,
 	 */
 	if (!(flags & PIN_NONBLOCK))
 		i915_retire_requests(dev_priv);
-	else
-		phases[1] = NULL;
 
 search_again:
+	active = NULL;
 	INIT_LIST_HEAD(&eviction_list);
-	phase = phases;
-	do {
-		list_for_each_entry(vma, *phase, vm_link)
-			if (mark_free(&scan, vma, flags, &eviction_list))
-				goto found;
-	} while (*++phase);
+	list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
+		/*
+		 * We keep this list in a rough least-recently scanned order
+		 * of active elements (inactive elements are cheap to reap).
+		 * New entries are added to the end, and we move anything we
+		 * scan to the end. The assumption is that the working set
+		 * of applications is either steady state (and thanks to the
+		 * userspace bo cache it almost always is) or volatile and
+		 * frequently replaced after a frame, which are self-evicting!
+		 * Given that assumption, the MRU order of the scan list is
+		 * fairly static, and keeping it in least-recently scan order
+		 * is suitable.
+		 *
+		 * To notice when we complete one full cycle, we record the
+		 * first active element seen, before moving it to the tail.
+		 */
+		if (i915_vma_is_active(vma)) {
+			if (vma == active) {
+				if (flags & PIN_NONBLOCK)
+					break;
+
+				active = ERR_PTR(-EAGAIN);
+			}
+
+			if (active != ERR_PTR(-EAGAIN)) {
+				if (!active)
+					active = vma;
+
+				list_move_tail(&vma->vm_link, &vm->bound_list);
+				continue;
+			}
+		}
+
+		if (mark_free(&scan, vma, flags, &eviction_list))
+			goto found;
+	}
 
 	/* Nothing found, clean up and bail out! */
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
@@ -388,11 +411,6 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
  */
 int i915_gem_evict_vm(struct i915_address_space *vm)
 {
-	struct list_head *phases[] = {
-		&vm->inactive_list,
-		&vm->active_list,
-		NULL
-	}, **phase;
 	struct list_head eviction_list;
 	struct i915_vma *vma, *next;
 	int ret;
@@ -412,16 +430,13 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	}
 
 	INIT_LIST_HEAD(&eviction_list);
-	phase = phases;
-	do {
-		list_for_each_entry(vma, *phase, vm_link) {
-			if (i915_vma_is_pinned(vma))
-				continue;
+	list_for_each_entry(vma, &vm->bound_list, vm_link) {
+		if (i915_vma_is_pinned(vma))
+			continue;
 
-			__i915_vma_pin(vma);
-			list_add(&vma->evict_link, &eviction_list);
-		}
-	} while (*++phase);
+		__i915_vma_pin(vma);
+		list_add(&vma->evict_link, &eviction_list);
+	}
 
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9081e3bc5a59..2ad9070a54c1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -491,9 +491,8 @@ static void i915_address_space_init(struct i915_address_space *vm, int subclass)
 
 	stash_init(&vm->free_pages);
 
-	INIT_LIST_HEAD(&vm->active_list);
-	INIT_LIST_HEAD(&vm->inactive_list);
 	INIT_LIST_HEAD(&vm->unbound_list);
+	INIT_LIST_HEAD(&vm->bound_list);
 }
 
 static void i915_address_space_fini(struct i915_address_space *vm)
@@ -2111,8 +2110,7 @@ void i915_ppgtt_close(struct i915_address_space *vm)
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
 {
 	struct list_head *phases[] = {
-		&vm->active_list,
-		&vm->inactive_list,
+		&vm->bound_list,
 		&vm->unbound_list,
 		NULL,
 	}, **phase;
@@ -2135,8 +2133,7 @@ void i915_ppgtt_release(struct kref *kref)
 
 	ppgtt_destroy_vma(&ppgtt->vm);
 
-	GEM_BUG_ON(!list_empty(&ppgtt->vm.active_list));
-	GEM_BUG_ON(!list_empty(&ppgtt->vm.inactive_list));
+	GEM_BUG_ON(!list_empty(&ppgtt->vm.bound_list));
 	GEM_BUG_ON(!list_empty(&ppgtt->vm.unbound_list));
 
 	ppgtt->vm.cleanup(&ppgtt->vm);
@@ -2801,8 +2798,7 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	i915_gem_fini_aliasing_ppgtt(dev_priv);
 
-	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
-	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link)
+	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link)
 		WARN_ON(i915_vma_unbind(vma));
 
 	if (drm_mm_node_allocated(&ggtt->error_capture))
@@ -3514,8 +3510,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
 
 	/* clflush objects bound into the GGTT and rebind them. */
-	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
-	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link) {
+	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) {
 		struct drm_i915_gem_object *obj = vma->obj;
 
 		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a0039ea97cdc..bd679c8c56dd 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -299,32 +299,12 @@ struct i915_address_space {
 	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
 
 	/**
-	 * List of objects currently involved in rendering.
-	 *
-	 * Includes buffers having the contents of their GPU caches
-	 * flushed, not necessarily primitives. last_read_req
-	 * represents when the rendering involved will be completed.
-	 *
-	 * A reference is held on the buffer while on this list.
+	 * List of vma currently bound.
 	 */
-	struct list_head active_list;
+	struct list_head bound_list;
 
 	/**
-	 * LRU list of objects which are not in the ringbuffer and
-	 * are ready to unbind, but are still in the GTT.
-	 *
-	 * last_read_req is NULL while an object is in this list.
-	 *
-	 * A reference is not held on the buffer while on this list,
-	 * as merely being GTT-bound shouldn't prevent its being
-	 * freed, and we'll pull it off the list in the free path.
-	 */
-	struct list_head inactive_list;
-
-	/**
-	 * List of vma that have been unbound.
-	 *
-	 * A reference is not held on the buffer while on this list.
+	 * List of vma that are not unbound.
 	 */
 	struct list_head unbound_list;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index 8ceecb026910..a76d6c95c824 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -462,9 +462,13 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 
 	/* We also want to clear any cached iomaps as they wrap vmap */
 	list_for_each_entry_safe(vma, next,
-				 &i915->ggtt.vm.inactive_list, vm_link) {
+				 &i915->ggtt.vm.bound_list, vm_link) {
 		unsigned long count = vma->node.size >> PAGE_SHIFT;
-		if (vma->iomap && i915_vma_unbind(vma) == 0)
+
+		if (!vma->iomap || i915_vma_is_active(vma))
+			continue;
+
+		if (i915_vma_unbind(vma) == 0)
 			freed_pages += count;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 9df615eea2d8..a9e365789686 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -701,7 +701,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 	vma->pages = obj->mm.pages;
 	vma->flags |= I915_VMA_GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
-	list_move_tail(&vma->vm_link, &ggtt->vm.inactive_list);
+
+	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
 
 	spin_lock(&dev_priv->mm.obj_lock);
 	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 4eef0462489c..898e06014295 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1121,7 +1121,9 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 
 static u32 capture_error_bo(struct drm_i915_error_buffer *err,
 			    int count, struct list_head *head,
-			    bool pinned_only)
+			    unsigned int flags)
+#define ACTIVE_ONLY BIT(0)
+#define PINNED_ONLY BIT(1)
 {
 	struct i915_vma *vma;
 	int i = 0;
@@ -1130,7 +1132,10 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
 		if (!vma->obj)
 			continue;
 
-		if (pinned_only && !i915_vma_is_pinned(vma))
+		if (flags & ACTIVE_ONLY && !i915_vma_is_active(vma))
+			continue;
+
+		if (flags & PINNED_ONLY && !i915_vma_is_pinned(vma))
 			continue;
 
 		capture_bo(err++, vma);
@@ -1601,14 +1606,17 @@ static void gem_capture_vm(struct i915_gpu_state *error,
 	int count;
 
 	count = 0;
-	list_for_each_entry(vma, &vm->active_list, vm_link)
-		count++;
+	list_for_each_entry(vma, &vm->bound_list, vm_link)
+		if (i915_vma_is_active(vma))
+			count++;
 
 	active_bo = NULL;
 	if (count)
 		active_bo = kcalloc(count, sizeof(*active_bo), GFP_ATOMIC);
 	if (active_bo)
-		count = capture_error_bo(active_bo, count, &vm->active_list, false);
+		count = capture_error_bo(active_bo,
+					 count, &vm->bound_list,
+					 ACTIVE_ONLY);
 	else
 		count = 0;
 
@@ -1646,28 +1654,20 @@ static void capture_pinned_buffers(struct i915_gpu_state *error)
 	struct i915_address_space *vm = &error->i915->ggtt.vm;
 	struct drm_i915_error_buffer *bo;
 	struct i915_vma *vma;
-	int count_inactive, count_active;
-
-	count_inactive = 0;
-	list_for_each_entry(vma, &vm->inactive_list, vm_link)
-		count_inactive++;
+	int count;
 
-	count_active = 0;
-	list_for_each_entry(vma, &vm->active_list, vm_link)
-		count_active++;
+	count = 0;
+	list_for_each_entry(vma, &vm->bound_list, vm_link)
+		count++;
 
 	bo = NULL;
-	if (count_inactive + count_active)
-		bo = kcalloc(count_inactive + count_active,
-			     sizeof(*bo), GFP_ATOMIC);
+	if (count)
+		bo = kcalloc(count, sizeof(*bo), GFP_ATOMIC);
 	if (!bo)
 		return;
 
-	count_inactive = capture_error_bo(bo, count_inactive,
-					  &vm->active_list, true);
-	count_active = capture_error_bo(bo + count_inactive, count_active,
-					&vm->inactive_list, true);
-	error->pinned_bo_count = count_inactive + count_active;
+	error->pinned_bo_count =
+		capture_error_bo(bo, count, &vm->bound_list, PINNED_ONLY);
 	error->pinned_bo = bo;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5b4d78cdb4ca..7de28baffb8f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -79,9 +79,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 	if (--vma->active_count)
 		return;
 
-	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
-
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
 		return;
@@ -659,7 +656,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -1003,10 +1000,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_active_isset(active) && !vma->active_count++) {
-		list_move_tail(&vma->vm_link, &vma->vm->active_list);
+	if (!i915_gem_active_isset(active) && !vma->active_count++)
 		obj->active_count++;
-	}
 	i915_gem_active_set(active, rq);
 	GEM_BUG_ON(!i915_vma_is_active(vma));
 	GEM_BUG_ON(!obj->active_count);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index d0553bc69705..af9b85cb8639 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -84,7 +84,7 @@ static int populate_ggtt(struct drm_i915_private *i915,
 		return -EINVAL;
 	}
 
-	if (list_empty(&i915->ggtt.vm.inactive_list)) {
+	if (list_empty(&i915->ggtt.vm.bound_list)) {
 		pr_err("No objects on the GGTT inactive list!\n");
 		return -EINVAL;
 	}
@@ -96,7 +96,7 @@ static void unpin_ggtt(struct drm_i915_private *i915)
 {
 	struct i915_vma *vma;
 
-	list_for_each_entry(vma, &i915->ggtt.vm.inactive_list, vm_link)
+	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
 		if (vma->obj->mm.quirked)
 			i915_vma_unpin(vma);
 }
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 06bde4a273cb..8feb4af308ff 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1237,7 +1237,7 @@ static void track_vma_bind(struct i915_vma *vma)
 	__i915_gem_object_pin_pages(obj);
 
 	vma->pages = obj->mm.pages;
-	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
+	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 }
 
 static int exercise_mock(struct drm_i915_private *i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 07/28] drm/i915: Pull VM lists under the VM mutex.
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (4 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 08/28] drm/i915: Move vma lookup to its own lock Chris Wilson
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

A starting point to counter the pervasive struct_mutex. For the goal of
avoiding (or at least blocking under them!) global locks during user
request submission, a simple but important step is being able to manage
each clients GTT separately. For which, we want to replace using the
struct_mutex as the guard for all things GTT/VM and switch instead to a
specific mutex inside i915_address_space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c                 | 14 ++++++++------
 drivers/gpu/drm/i915/i915_gem_evict.c           |  2 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c             | 15 +++++++++++++--
 drivers/gpu/drm/i915/i915_gem_shrinker.c        |  4 ++++
 drivers/gpu/drm/i915/i915_gem_stolen.c          |  2 ++
 drivers/gpu/drm/i915/i915_vma.c                 | 11 +++++++++++
 drivers/gpu/drm/i915/selftests/i915_gem_evict.c |  3 +++
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c   |  3 +++
 8 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 12a0a80bc989..111a047a45b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -247,18 +247,19 @@ int
 i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
+	struct i915_ggtt *ggtt = &to_i915(dev)->ggtt;
 	struct drm_i915_gem_get_aperture *args = data;
 	struct i915_vma *vma;
 	u64 pinned;
 
+	mutex_lock(&ggtt->vm.mutex);
+
 	pinned = ggtt->vm.reserved;
-	mutex_lock(&dev->struct_mutex);
 	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
 		if (i915_vma_is_pinned(vma))
 			pinned += vma->node.size;
-	mutex_unlock(&dev->struct_mutex);
+
+	mutex_unlock(&ggtt->vm.mutex);
 
 	args->aper_size = ggtt->vm.total;
 	args->aper_available_size = args->aper_size - pinned;
@@ -1531,20 +1532,21 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
 {
-	struct drm_i915_private *i915;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 	struct list_head *list;
 	struct i915_vma *vma;
 
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
 
+	mutex_lock(&i915->ggtt.vm.mutex);
 	for_each_ggtt_vma(vma, obj) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
 		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
 	}
+	mutex_unlock(&i915->ggtt.vm.mutex);
 
-	i915 = to_i915(obj->base.dev);
 	spin_lock(&i915->mm.obj_lock);
 	list = obj->bind_count ? &i915->mm.bound_list : &i915->mm.unbound_list;
 	list_move_tail(&obj->mm.link, list);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index d76839670632..68d74c50ac39 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -430,6 +430,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 	}
 
 	INIT_LIST_HEAD(&eviction_list);
+	mutex_lock(&vm->mutex);
 	list_for_each_entry(vma, &vm->bound_list, vm_link) {
 		if (i915_vma_is_pinned(vma))
 			continue;
@@ -437,6 +438,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
 		__i915_vma_pin(vma);
 		list_add(&vma->evict_link, &eviction_list);
 	}
+	mutex_unlock(&vm->mutex);
 
 	ret = 0;
 	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 2ad9070a54c1..49b00996a15e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1931,7 +1931,10 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
 
 	INIT_LIST_HEAD(&vma->obj_link);
+
+	mutex_lock(&vma->vm->mutex);
 	list_add(&vma->vm_link, &vma->vm->unbound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	return vma;
 }
@@ -3504,9 +3507,10 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 
 	i915_check_and_clear_faults(dev_priv);
 
+	mutex_lock(&ggtt->vm.mutex);
+
 	/* First fill our portion of the GTT with scratch pages */
 	ggtt->vm.clear_range(&ggtt->vm, 0, ggtt->vm.total);
-
 	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
 
 	/* clflush objects bound into the GGTT and rebind them. */
@@ -3516,19 +3520,26 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
 			continue;
 
+		mutex_unlock(&ggtt->vm.mutex);
+
 		if (!i915_vma_unbind(vma))
-			continue;
+			goto lock;
 
 		WARN_ON(i915_vma_bind(vma,
 				      obj ? obj->cache_level : 0,
 				      PIN_UPDATE));
 		if (obj)
 			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
+
+lock:
+		mutex_lock(&ggtt->vm.mutex);
 	}
 
 	ggtt->vm.closed = false;
 	i915_ggtt_invalidate(dev_priv);
 
+	mutex_unlock(&ggtt->vm.mutex);
+
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
index a76d6c95c824..6da795c7e62e 100644
--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
@@ -461,6 +461,7 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 					       I915_SHRINK_VMAPS);
 
 	/* We also want to clear any cached iomaps as they wrap vmap */
+	mutex_lock(&i915->ggtt.vm.mutex);
 	list_for_each_entry_safe(vma, next,
 				 &i915->ggtt.vm.bound_list, vm_link) {
 		unsigned long count = vma->node.size >> PAGE_SHIFT;
@@ -468,9 +469,12 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
 		if (!vma->iomap || i915_vma_is_active(vma))
 			continue;
 
+		mutex_unlock(&i915->ggtt.vm.mutex);
 		if (i915_vma_unbind(vma) == 0)
 			freed_pages += count;
+		mutex_lock(&i915->ggtt.vm.mutex);
 	}
+	mutex_unlock(&i915->ggtt.vm.mutex);
 
 out:
 	shrinker_unlock(i915, unlock);
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index a9e365789686..74a9661479ca 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -702,7 +702,9 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
 	vma->flags |= I915_VMA_GLOBAL_BIND;
 	__i915_vma_set_map_and_fenceable(vma);
 
+	mutex_lock(&ggtt->vm.mutex);
 	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
+	mutex_unlock(&ggtt->vm.mutex);
 
 	spin_lock(&dev_priv->mm.obj_lock);
 	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 7de28baffb8f..dcbd0d345c72 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -213,7 +213,10 @@ vma_create(struct drm_i915_gem_object *obj,
 	}
 	rb_link_node(&vma->obj_node, rb, p);
 	rb_insert_color(&vma->obj_node, &obj->vma_tree);
+
+	mutex_lock(&vm->mutex);
 	list_add(&vma->vm_link, &vm->unbound_list);
+	mutex_unlock(&vm->mutex);
 
 	return vma;
 
@@ -656,7 +659,9 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
 
+	mutex_lock(&vma->vm->mutex);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	if (vma->obj) {
 		struct drm_i915_gem_object *obj = vma->obj;
@@ -689,8 +694,10 @@ i915_vma_remove(struct i915_vma *vma)
 
 	vma->ops->clear_pages(vma);
 
+	mutex_lock(&vma->vm->mutex);
 	drm_mm_remove_node(&vma->node);
 	list_move_tail(&vma->vm_link, &vma->vm->unbound_list);
+	mutex_unlock(&vma->vm->mutex);
 
 	/*
 	 * Since the unbound list is global, only move to that list if
@@ -802,7 +809,11 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
 
 	list_del(&vma->obj_link);
+
+	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
+	mutex_unlock(&vma->vm->mutex);
+
 	if (vma->obj)
 		rb_erase(&vma->obj_node, &vma->obj->vma_tree);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index af9b85cb8639..32dce7176f63 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -94,11 +94,14 @@ static int populate_ggtt(struct drm_i915_private *i915,
 
 static void unpin_ggtt(struct drm_i915_private *i915)
 {
+	struct i915_ggtt *ggtt = &i915->ggtt;
 	struct i915_vma *vma;
 
+	mutex_lock(&ggtt->vm.mutex);
 	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
 		if (vma->obj->mm.quirked)
 			i915_vma_unpin(vma);
+	mutex_unlock(&ggtt->vm.mutex);
 }
 
 static void cleanup_objects(struct drm_i915_private *i915,
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 8feb4af308ff..3850ef4a5ec8 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1237,7 +1237,10 @@ static void track_vma_bind(struct i915_vma *vma)
 	__i915_gem_object_pin_pages(obj);
 
 	vma->pages = obj->mm.pages;
+
+	mutex_lock(&vma->vm->mutex);
 	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
+	mutex_unlock(&vma->vm->mutex);
 }
 
 static int exercise_mock(struct drm_i915_private *i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 08/28] drm/i915: Move vma lookup to its own lock
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (5 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 07/28] drm/i915: Pull VM lists under the VM mutex Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 09/28] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Remove the struct_mutex requirement for looking up the vma for an
object.

v2: Highlight how the race for duplicate vma creation is resolved on
reacquiring the lock with a short comment.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c       |  6 +--
 drivers/gpu/drm/i915/i915_gem.c           | 33 +++++++-----
 drivers/gpu/drm/i915/i915_gem_object.h    | 45 +++++++++-------
 drivers/gpu/drm/i915/i915_vma.c           | 66 ++++++++++++++++-------
 drivers/gpu/drm/i915/i915_vma.h           |  2 +-
 drivers/gpu/drm/i915/selftests/i915_vma.c |  4 +-
 6 files changed, 98 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e46de507fea2..f5ac03f06e26 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -160,14 +160,14 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		   obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : "");
 	if (obj->base.name)
 		seq_printf(m, " (name: %d)", obj->base.name);
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (i915_vma_is_pinned(vma))
 			pin_count++;
 	}
 	seq_printf(m, " (pinned x %d)", pin_count);
 	if (obj->pin_global)
 		seq_printf(m, " (global)");
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -323,7 +323,7 @@ static int per_file_stats(int id, void *ptr, void *data)
 	if (obj->base.name || obj->base.dma_buf)
 		stats->shared += obj->base.size;
 
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 111a047a45b7..653c7ba4c69f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -439,15 +439,19 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	if (ret)
 		return ret;
 
-	while ((vma = list_first_entry_or_null(&obj->vma_list,
-					       struct i915_vma,
-					       obj_link))) {
+	spin_lock(&obj->vma.lock);
+	while (!ret && (vma = list_first_entry_or_null(&obj->vma.list,
+						       struct i915_vma,
+						       obj_link))) {
 		list_move_tail(&vma->obj_link, &still_in_list);
+		spin_unlock(&obj->vma.lock);
+
 		ret = i915_vma_unbind(vma);
-		if (ret)
-			break;
+
+		spin_lock(&obj->vma.lock);
 	}
-	list_splice(&still_in_list, &obj->vma_list);
+	list_splice(&still_in_list, &obj->vma.list);
+	spin_unlock(&obj->vma.lock);
 
 	return ret;
 }
@@ -3491,7 +3495,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	 * reading an invalid PTE on older architectures.
 	 */
 restart:
-	list_for_each_entry(vma, &obj->vma_list, obj_link) {
+	list_for_each_entry(vma, &obj->vma.list, obj_link) {
 		if (!drm_mm_node_allocated(&vma->node))
 			continue;
 
@@ -3569,7 +3573,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 			 */
 		}
 
-		list_for_each_entry(vma, &obj->vma_list, obj_link) {
+		list_for_each_entry(vma, &obj->vma.list, obj_link) {
 			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
@@ -3579,7 +3583,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		}
 	}
 
-	list_for_each_entry(vma, &obj->vma_list, obj_link)
+	list_for_each_entry(vma, &obj->vma.list, obj_link)
 		vma->node.color = cache_level;
 	i915_gem_object_set_cache_coherency(obj, cache_level);
 	obj->cache_dirty = true; /* Always invalidate stale cachelines */
@@ -4155,7 +4159,9 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 {
 	mutex_init(&obj->mm.lock);
 
-	INIT_LIST_HEAD(&obj->vma_list);
+	spin_lock_init(&obj->vma.lock);
+	INIT_LIST_HEAD(&obj->vma.list);
+
 	INIT_LIST_HEAD(&obj->lut_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
 
@@ -4321,14 +4327,13 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 		mutex_lock(&i915->drm.struct_mutex);
 
 		GEM_BUG_ON(i915_gem_object_is_active(obj));
-		list_for_each_entry_safe(vma, vn,
-					 &obj->vma_list, obj_link) {
+		list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) {
 			GEM_BUG_ON(i915_vma_is_active(vma));
 			vma->flags &= ~I915_VMA_PIN_MASK;
 			i915_vma_destroy(vma);
 		}
-		GEM_BUG_ON(!list_empty(&obj->vma_list));
-		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma_tree));
+		GEM_BUG_ON(!list_empty(&obj->vma.list));
+		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma.tree));
 
 		/* This serializes freeing with the shrinker. Since the free
 		 * is delayed, first by RCU then by the workqueue, we want the
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
index cb1b0144d274..5a33b6d9f942 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -87,24 +87,33 @@ struct drm_i915_gem_object {
 
 	const struct drm_i915_gem_object_ops *ops;
 
-	/**
-	 * @vma_list: List of VMAs backed by this object
-	 *
-	 * The VMA on this list are ordered by type, all GGTT vma are placed
-	 * at the head and all ppGTT vma are placed at the tail. The different
-	 * types of GGTT vma are unordered between themselves, use the
-	 * @vma_tree (which has a defined order between all VMA) to find an
-	 * exact match.
-	 */
-	struct list_head vma_list;
-	/**
-	 * @vma_tree: Ordered tree of VMAs backed by this object
-	 *
-	 * All VMA created for this object are placed in the @vma_tree for
-	 * fast retrieval via a binary search in i915_vma_instance().
-	 * They are also added to @vma_list for easy iteration.
-	 */
-	struct rb_root vma_tree;
+	struct {
+		/**
+		 * @vma.lock: protect the list/tree of vmas
+		 */
+		struct spinlock lock;
+
+		/**
+		 * @vma.list: List of VMAs backed by this object
+		 *
+		 * The VMA on this list are ordered by type, all GGTT vma are
+		 * placed at the head and all ppGTT vma are placed at the tail.
+		 * The different types of GGTT vma are unordered between
+		 * themselves, use the @vma.tree (which has a defined order
+		 * between all VMA) to quickly find an exact match.
+		 */
+		struct list_head list;
+
+		/**
+		 * @vma.tree: Ordered tree of VMAs backed by this object
+		 *
+		 * All VMA created for this object are placed in the @vma.tree
+		 * for fast retrieval via a binary search in
+		 * i915_vma_instance(). They are also added to @vma.list for
+		 * easy iteration.
+		 */
+		struct rb_root tree;
+	} vma;
 
 	/**
 	 * @lut_list: List of vma lookup entries in use for this object.
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index dcbd0d345c72..d83b8ad5f859 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -187,32 +187,52 @@ vma_create(struct drm_i915_gem_object *obj,
 								i915_gem_object_get_stride(obj));
 		GEM_BUG_ON(!is_power_of_2(vma->fence_alignment));
 
-		/*
-		 * We put the GGTT vma at the start of the vma-list, followed
-		 * by the ppGGTT vma. This allows us to break early when
-		 * iterating over only the GGTT vma for an object, see
-		 * for_each_ggtt_vma()
-		 */
 		vma->flags |= I915_VMA_GGTT;
-		list_add(&vma->obj_link, &obj->vma_list);
-	} else {
-		list_add_tail(&vma->obj_link, &obj->vma_list);
 	}
 
+	spin_lock(&obj->vma.lock);
+
 	rb = NULL;
-	p = &obj->vma_tree.rb_node;
+	p = &obj->vma.tree.rb_node;
 	while (*p) {
 		struct i915_vma *pos;
+		long cmp;
 
 		rb = *p;
 		pos = rb_entry(rb, struct i915_vma, obj_node);
-		if (i915_vma_compare(pos, vm, view) < 0)
+
+		/*
+		 * If the view already exists in the tree, another thread
+		 * already created a matching vma, so return the older instance
+		 * and dispose of ours.
+		 */
+		cmp = i915_vma_compare(pos, vm, view);
+		if (cmp == 0) {
+			spin_unlock(&obj->vma.lock);
+			kmem_cache_free(vm->i915->vmas, vma);
+			return pos;
+		}
+
+		if (cmp < 0)
 			p = &rb->rb_right;
 		else
 			p = &rb->rb_left;
 	}
 	rb_link_node(&vma->obj_node, rb, p);
-	rb_insert_color(&vma->obj_node, &obj->vma_tree);
+	rb_insert_color(&vma->obj_node, &obj->vma.tree);
+
+	if (i915_vma_is_ggtt(vma))
+		/*
+		 * We put the GGTT vma at the start of the vma-list, followed
+		 * by the ppGGTT vma. This allows us to break early when
+		 * iterating over only the GGTT vma for an object, see
+		 * for_each_ggtt_vma()
+		 */
+		list_add(&vma->obj_link, &obj->vma.list);
+	else
+		list_add_tail(&vma->obj_link, &obj->vma.list);
+
+	spin_unlock(&obj->vma.lock);
 
 	mutex_lock(&vm->mutex);
 	list_add(&vma->vm_link, &vm->unbound_list);
@@ -232,7 +252,7 @@ vma_lookup(struct drm_i915_gem_object *obj,
 {
 	struct rb_node *rb;
 
-	rb = obj->vma_tree.rb_node;
+	rb = obj->vma.tree.rb_node;
 	while (rb) {
 		struct i915_vma *vma = rb_entry(rb, struct i915_vma, obj_node);
 		long cmp;
@@ -272,16 +292,18 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 {
 	struct i915_vma *vma;
 
-	lockdep_assert_held(&obj->base.dev->struct_mutex);
 	GEM_BUG_ON(view && !i915_is_ggtt(vm));
 	GEM_BUG_ON(vm->closed);
 
+	spin_lock(&obj->vma.lock);
 	vma = vma_lookup(obj, vm, view);
-	if (!vma)
+	spin_unlock(&obj->vma.lock);
+
+	/* vma_create() will resolve the race if another creates the vma */
+	if (unlikely(!vma))
 		vma = vma_create(obj, vm, view);
 
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
-	GEM_BUG_ON(!IS_ERR(vma) && vma_lookup(obj, vm, view) != vma);
 	return vma;
 }
 
@@ -808,14 +830,18 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 
 	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
 
-	list_del(&vma->obj_link);
-
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
 	mutex_unlock(&vma->vm->mutex);
 
-	if (vma->obj)
-		rb_erase(&vma->obj_node, &vma->obj->vma_tree);
+	if (vma->obj) {
+		struct drm_i915_gem_object *obj = vma->obj;
+
+		spin_lock(&obj->vma.lock);
+		list_del(&vma->obj_link);
+		rb_erase(&vma->obj_node, &vma->obj->vma.tree);
+		spin_unlock(&obj->vma.lock);
+	}
 
 	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4f7c1c7599f4..7252abc73d3e 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -425,7 +425,7 @@ void i915_vma_parked(struct drm_i915_private *i915);
  * or the list is empty ofc.
  */
 #define for_each_ggtt_vma(V, OBJ) \
-	list_for_each_entry(V, &(OBJ)->vma_list, obj_link)		\
+	list_for_each_entry(V, &(OBJ)->vma.list, obj_link)		\
 		for_each_until(!i915_vma_is_ggtt(V))
 
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c
index f0a32edfb9b1..cf1de82741fa 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -672,7 +672,7 @@ static int igt_vma_partial(void *arg)
 		}
 
 		count = 0;
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
+		list_for_each_entry(vma, &obj->vma.list, obj_link)
 			count++;
 		if (count != nvma) {
 			pr_err("(%s) All partial vma were not recorded on the obj->vma_list: found %u, expected %u\n",
@@ -701,7 +701,7 @@ static int igt_vma_partial(void *arg)
 		i915_vma_unpin(vma);
 
 		count = 0;
-		list_for_each_entry(vma, &obj->vma_list, obj_link)
+		list_for_each_entry(vma, &obj->vma.list, obj_link)
 			count++;
 		if (count != nvma) {
 			pr_err("(%s) allocated an extra full vma!\n", p->name);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 09/28] drm/i915: Always allocate an object/vma for the HWSP
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (6 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 08/28] drm/i915: Move vma lookup to its own lock Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 10/28] drm/i915: Add timeline barrier support Chris Wilson
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Currently we only allocate an object and vma if we are using a GGTT
virtual HWSP, and a plain struct page for a physical HWSP. For
convenience later on with global timelines, it will be useful to always
have the status page being tracked by a struct i915_vma. Make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c       | 109 ++++++++++---------
 drivers/gpu/drm/i915/intel_guc_submission.c  |   6 +
 drivers/gpu/drm/i915/intel_lrc.c             |  12 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  21 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  23 +---
 drivers/gpu/drm/i915/selftests/mock_engine.c |   2 +-
 6 files changed, 93 insertions(+), 80 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 1ffec0d69157..7c57949ad586 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -506,27 +506,61 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 
 static void cleanup_status_page(struct intel_engine_cs *engine)
 {
+	struct i915_vma *vma;
+
 	/* Prevent writes into HWSP after returning the page to the system */
 	intel_engine_set_hwsp_writemask(engine, ~0u);
 
-	if (HWS_NEEDS_PHYSICAL(engine->i915)) {
-		void *addr = fetch_and_zero(&engine->status_page.page_addr);
+	vma = fetch_and_zero(&engine->status_page.vma);
+	if (!vma)
+		return;
 
-		__free_page(virt_to_page(addr));
-	}
+	if (!HWS_NEEDS_PHYSICAL(engine->i915))
+		i915_vma_unpin(vma);
+
+	i915_gem_object_unpin_map(vma->obj);
+	__i915_gem_object_release_unless_active(vma->obj);
+}
+
+static int pin_ggtt_status_page(struct intel_engine_cs *engine,
+				struct i915_vma *vma)
+{
+	unsigned int flags;
+
+	flags = PIN_GLOBAL;
+	if (!HAS_LLC(engine->i915))
+		/*
+		 * On g33, we cannot place HWS above 256MiB, so
+		 * restrict its pinning to the low mappable arena.
+		 * Though this restriction is not documented for
+		 * gen4, gen5, or byt, they also behave similarly
+		 * and hang if the HWS is placed at the top of the
+		 * GTT. To generalise, it appears that all !llc
+		 * platforms have issues with us placing the HWS
+		 * above the mappable region (even though we never
+		 * actually map it).
+		 */
+		flags |= PIN_MAPPABLE;
+	else
+		flags |= PIN_HIGH;
 
-	i915_vma_unpin_and_release(&engine->status_page.vma,
-				   I915_VMA_RELEASE_MAP);
+	return i915_vma_pin(vma, 0, 0, flags);
 }
 
 static int init_status_page(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
-	unsigned int flags;
 	void *vaddr;
 	int ret;
 
+	/*
+	 * Though the HWS register does support 36bit addresses, historically
+	 * we have had hangs and corruption reported due to wild writes if
+	 * the HWS is placed above 4G. We only allow objects to be allocated
+	 * in GFP_DMA32 for i965, and no earlier physical address users had
+	 * access to more than 4G.
+	 */
 	obj = i915_gem_object_create_internal(engine->i915, PAGE_SIZE);
 	if (IS_ERR(obj)) {
 		DRM_ERROR("Failed to allocate status page\n");
@@ -543,61 +577,30 @@ static int init_status_page(struct intel_engine_cs *engine)
 		goto err;
 	}
 
-	flags = PIN_GLOBAL;
-	if (!HAS_LLC(engine->i915))
-		/* On g33, we cannot place HWS above 256MiB, so
-		 * restrict its pinning to the low mappable arena.
-		 * Though this restriction is not documented for
-		 * gen4, gen5, or byt, they also behave similarly
-		 * and hang if the HWS is placed at the top of the
-		 * GTT. To generalise, it appears that all !llc
-		 * platforms have issues with us placing the HWS
-		 * above the mappable region (even though we never
-		 * actually map it).
-		 */
-		flags |= PIN_MAPPABLE;
-	else
-		flags |= PIN_HIGH;
-	ret = i915_vma_pin(vma, 0, 0, flags);
-	if (ret)
-		goto err;
-
 	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
-		goto err_unpin;
+		goto err;
 	}
 
+	engine->status_page.addr = memset(vaddr, 0, PAGE_SIZE);
 	engine->status_page.vma = vma;
-	engine->status_page.ggtt_offset = i915_ggtt_offset(vma);
-	engine->status_page.page_addr = memset(vaddr, 0, PAGE_SIZE);
+
+	if (!HWS_NEEDS_PHYSICAL(engine->i915)) {
+		ret = pin_ggtt_status_page(engine, vma);
+		if (ret)
+			goto err_unpin;
+	}
+
 	return 0;
 
 err_unpin:
-	i915_vma_unpin(vma);
+	i915_gem_object_unpin_map(obj);
 err:
 	i915_gem_object_put(obj);
 	return ret;
 }
 
-static int init_phys_status_page(struct intel_engine_cs *engine)
-{
-	struct page *page;
-
-	/*
-	 * Though the HWS register does support 36bit addresses, historically
-	 * we have had hangs and corruption reported due to wild writes if
-	 * the HWS is placed above 4G.
-	 */
-	page = alloc_page(GFP_KERNEL | __GFP_DMA32 | __GFP_ZERO);
-	if (!page)
-		return -ENOMEM;
-
-	engine->status_page.page_addr = page_address(page);
-
-	return 0;
-}
-
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
@@ -690,10 +693,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		goto err_unpin_preempt;
 
-	if (HWS_NEEDS_PHYSICAL(i915))
-		ret = init_phys_status_page(engine);
-	else
-		ret = init_status_page(engine);
+	ret = init_status_page(engine);
 	if (ret)
 		goto err_breadcrumbs;
 
@@ -1366,7 +1366,8 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 	}
 
 	if (HAS_EXECLISTS(dev_priv)) {
-		const u32 *hws = &engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+		const u32 *hws =
+			&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 		unsigned int idx;
 		u8 read, write;
 
@@ -1549,7 +1550,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	spin_unlock_irqrestore(&b->rb_lock, flags);
 
 	drm_printf(m, "HWSP:\n");
-	hexdump(m, engine->status_page.page_addr, PAGE_SIZE);
+	hexdump(m, engine->status_page.addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4cab0826100c..55ec122ee9e9 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -81,6 +81,12 @@
  *
  */
 
+static inline u32 intel_hws_preempt_done_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_PREEMPT_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 12db24b30e78..df331d71a89a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -174,6 +174,12 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
+static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_INDEX_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -1810,7 +1816,7 @@ static void enable_execlists(struct intel_engine_cs *engine)
 		   _MASKED_BIT_DISABLE(STOP_RING));
 
 	I915_WRITE(RING_HWS_PGA(engine->mmio_base),
-		   engine->status_page.ggtt_offset);
+		   i915_ggtt_offset(engine->status_page.vma));
 	POSTING_READ(RING_HWS_PGA(engine->mmio_base));
 }
 
@@ -2355,10 +2361,10 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	}
 
 	execlists->csb_status =
-		&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
+		&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
 
 	execlists->csb_write =
-		&engine->status_page.page_addr[intel_hws_csb_write_index(i915)];
+		&engine->status_page.addr[intel_hws_csb_write_index(i915)];
 
 	reset_csb_pointers(execlists);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index a9efc8c71254..cb6d2aa2a829 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -43,6 +43,12 @@
  */
 #define LEGACY_REQUEST_SIZE 200
 
+static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_INDEX_ADDR);
+}
+
 static unsigned int __intel_ring_space(unsigned int head,
 				       unsigned int tail,
 				       unsigned int size)
@@ -503,12 +509,17 @@ static void set_hws_pga(struct intel_engine_cs *engine, phys_addr_t phys)
 	I915_WRITE(HWS_PGA, addr);
 }
 
-static void ring_setup_phys_status_page(struct intel_engine_cs *engine)
+static struct page *status_page(struct intel_engine_cs *engine)
 {
-	struct page *page = virt_to_page(engine->status_page.page_addr);
-	phys_addr_t phys = PFN_PHYS(page_to_pfn(page));
+	struct drm_i915_gem_object *obj = engine->status_page.vma->obj;
 
-	set_hws_pga(engine, phys);
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	return sg_page(obj->mm.pages->sgl);
+}
+
+static void ring_setup_phys_status_page(struct intel_engine_cs *engine)
+{
+	set_hws_pga(engine, PFN_PHYS(page_to_pfn(status_page(engine))));
 	set_hwstam(engine, ~0u);
 }
 
@@ -575,7 +586,7 @@ static void flush_cs_tlb(struct intel_engine_cs *engine)
 
 static void ring_setup_status_page(struct intel_engine_cs *engine)
 {
-	set_hwsp(engine, engine->status_page.ggtt_offset);
+	set_hwsp(engine, i915_ggtt_offset(engine->status_page.vma));
 	set_hwstam(engine, ~0u);
 
 	flush_cs_tlb(engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 246311fcdde3..21dc695bd2f3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -32,8 +32,7 @@ struct i915_sched_attr;
 
 struct intel_hw_status_page {
 	struct i915_vma *vma;
-	u32 *page_addr;
-	u32 ggtt_offset;
+	u32 *addr;
 };
 
 #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
@@ -676,7 +675,7 @@ static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
 	/* Ensure that the compiler doesn't optimize away the load. */
-	return READ_ONCE(engine->status_page.page_addr[reg]);
+	return READ_ONCE(engine->status_page.addr[reg]);
 }
 
 static inline void
@@ -689,12 +688,12 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 	 */
 	if (static_cpu_has(X86_FEATURE_CLFLUSH)) {
 		mb();
-		clflush(&engine->status_page.page_addr[reg]);
-		engine->status_page.page_addr[reg] = value;
-		clflush(&engine->status_page.page_addr[reg]);
+		clflush(&engine->status_page.addr[reg]);
+		engine->status_page.addr[reg] = value;
+		clflush(&engine->status_page.addr[reg]);
 		mb();
 	} else {
-		WRITE_ONCE(engine->status_page.page_addr[reg], value);
+		WRITE_ONCE(engine->status_page.addr[reg], value);
 	}
 }
 
@@ -882,16 +881,6 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
-static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
-{
-	return engine->status_page.ggtt_offset + I915_GEM_HWS_INDEX_ADDR;
-}
-
-static inline u32 intel_hws_preempt_done_address(struct intel_engine_cs *engine)
-{
-	return engine->status_page.ggtt_offset + I915_GEM_HWS_PREEMPT_ADDR;
-}
-
 /* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
 int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 905318b7ae18..4e5b4dc6df0f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -200,7 +200,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.i915 = i915;
 	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
 	engine->base.id = id;
-	engine->base.status_page.page_addr = (void *)(engine + 1);
+	engine->base.status_page.addr = (void *)(engine + 1);
 
 	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 10/28] drm/i915: Add timeline barrier support
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (7 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 09/28] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 11/28] drm/i915: Move list of timelines under its own lock Chris Wilson
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
 * Assert barrier has been retired on timeline_fini. (Chris Wilson)
 * Fix mock_timeline.

v3:
 * Improved comment language. (Chris Wilson)

v4:
 * Maintain ordering with previous barriers set on the timeline.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c           | 17 ++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.c          | 21 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.h          | 22 +++++++++++++++++++
 .../gpu/drm/i915/selftests/mock_timeline.c    |  1 +
 4 files changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f4241a17e2ad..894b32258340 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -517,6 +517,19 @@ i915_request_alloc_slow(struct intel_context *ce)
 	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
 }
 
+static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+static int add_timeline_barrier(struct i915_request *rq)
+{
+	return add_barrier(rq, &rq->timeline->barrier);
+}
+
 /**
  * i915_request_alloc - allocate a request structure
  *
@@ -660,6 +673,10 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 */
 	rq->head = rq->ring->emit;
 
+	ret = add_timeline_barrier(rq);
+	if (ret)
+		goto err_unwind;
+
 	ret = engine->request_alloc(rq);
 	if (ret)
 		goto err_unwind;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 4667cc08c416..6d5774cb8504 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -33,6 +33,7 @@ void i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
@@ -69,6 +70,7 @@ void i915_timelines_park(struct drm_i915_private *i915)
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(!list_empty(&timeline->requests));
+	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 
@@ -90,6 +92,25 @@ i915_timeline_create(struct drm_i915_private *i915, const char *name)
 	return timeline;
 }
 
+int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	struct i915_request *old;
+	int err;
+
+	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+
+	/* Must maintain ordering wrt existing barriers */
+	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
+	if (old) {
+		err = i915_request_await_dma_fence(rq, &old->fence);
+		if (err)
+			return err;
+	}
+
+	i915_gem_active_set(&tl->barrier, rq);
+	return 0;
+}
+
 void __i915_timeline_free(struct kref *kref)
 {
 	struct i915_timeline *timeline =
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 38c1e15e927a..c8d7117bb205 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -64,6 +64,16 @@ struct i915_timeline {
 	 */
 	struct i915_syncmap *sync;
 
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions to this timeline be executed only after the
+	 * barrier has been completed.
+	 */
+	struct i915_gem_active barrier;
+
 	struct list_head link;
 	const char *name;
 
@@ -136,4 +146,16 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 
 void i915_timelines_park(struct drm_i915_private *i915);
 
+/**
+ * i915_timeline_set_barrier - orders submission between different timelines
+ * @timeline: timeline to set the barrier on
+ * @rq: request after which new submissions can proceed
+ *
+ * Sets the passed in request as the serialization point for all subsequent
+ * submissions on @timeline. Subsequent requests will not be submitted to GPU
+ * until the barrier has been completed.
+ */
+int i915_timeline_set_barrier(struct i915_timeline *timeline,
+			      struct i915_request *rq);
+
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index dcf3b16f5a07..408113c1cc63 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -14,6 +14,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
+	init_request_active(&timeline->barrier, NULL);
 	init_request_active(&timeline->last_request, NULL);
 	INIT_LIST_HEAD(&timeline->requests);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 11/28] drm/i915: Move list of timelines under its own lock
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (8 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 10/28] drm/i915: Add timeline barrier support Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 12/28] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Currently, the list of timelines is serialised by the struct_mutex, but
to alleviate difficulties with using that mutex in future, move the
list management under its own dedicated mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |   5 +-
 drivers/gpu/drm/i915/i915_gem.c               | 103 ++++++++++--------
 drivers/gpu/drm/i915/i915_reset.c             |   8 +-
 drivers/gpu/drm/i915/i915_timeline.c          |  38 ++++++-
 drivers/gpu/drm/i915/i915_timeline.h          |   3 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   7 +-
 .../gpu/drm/i915/selftests/mock_timeline.c    |   3 +-
 7 files changed, 109 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0133d1da3d3c..8a181b455197 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1975,7 +1975,10 @@ struct drm_i915_private {
 		void (*resume)(struct drm_i915_private *);
 		void (*cleanup_engine)(struct intel_engine_cs *engine);
 
-		struct list_head timelines;
+		struct i915_gt_timelines {
+			struct mutex mutex; /* protects list, tainted by GPU */
+			struct list_head list;
+		} timelines;
 
 		struct list_head active_rings;
 		struct list_head closed_vma;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 653c7ba4c69f..d68f3fdd8a8e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3224,33 +3224,6 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	return ret;
 }
 
-static long wait_for_timeline(struct i915_timeline *tl,
-			      unsigned int flags, long timeout)
-{
-	struct i915_request *rq;
-
-	rq = i915_gem_active_get_unlocked(&tl->last_request);
-	if (!rq)
-		return timeout;
-
-	/*
-	 * "Race-to-idle".
-	 *
-	 * Switching to the kernel context is often used a synchronous
-	 * step prior to idling, e.g. in suspend for flushing all
-	 * current operations to memory before sleeping. These we
-	 * want to complete as quickly as possible to avoid prolonged
-	 * stalls, so allow the gpu to boost to maximum clocks.
-	 */
-	if (flags & I915_WAIT_FOR_IDLE_BOOST)
-		gen6_rps_boost(rq, NULL);
-
-	timeout = i915_request_wait(rq, flags, timeout);
-	i915_request_put(rq);
-
-	return timeout;
-}
-
 static int wait_for_engines(struct drm_i915_private *i915)
 {
 	if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
@@ -3264,6 +3237,52 @@ static int wait_for_engines(struct drm_i915_private *i915)
 	return 0;
 }
 
+static long
+wait_for_timelines(struct drm_i915_private *i915,
+		   unsigned int flags, long timeout)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline *tl;
+
+	if (!READ_ONCE(i915->gt.active_requests))
+		return timeout;
+
+	mutex_lock(&gt->mutex);
+	list_for_each_entry(tl, &gt->list, link) {
+		struct i915_request *rq;
+
+		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		if (!rq)
+			continue;
+
+		mutex_unlock(&gt->mutex);
+
+		/*
+		 * "Race-to-idle".
+		 *
+		 * Switching to the kernel context is often used a synchronous
+		 * step prior to idling, e.g. in suspend for flushing all
+		 * current operations to memory before sleeping. These we
+		 * want to complete as quickly as possible to avoid prolonged
+		 * stalls, so allow the gpu to boost to maximum clocks.
+		 */
+		if (flags & I915_WAIT_FOR_IDLE_BOOST)
+			gen6_rps_boost(rq, NULL);
+
+		timeout = i915_request_wait(rq, flags, timeout);
+		i915_request_put(rq);
+		if (timeout < 0)
+			return timeout;
+
+		/* restart after reacquiring the lock */
+		mutex_lock(&gt->mutex);
+		tl = list_entry(&gt->list, typeof(*tl), link);
+	}
+	mutex_unlock(&gt->mutex);
+
+	return timeout;
+}
+
 int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 			   unsigned int flags, long timeout)
 {
@@ -3275,17 +3294,15 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 	if (!READ_ONCE(i915->gt.awake))
 		return 0;
 
+	timeout = wait_for_timelines(i915, flags, timeout);
+	if (timeout < 0)
+		return timeout;
+
 	if (flags & I915_WAIT_LOCKED) {
-		struct i915_timeline *tl;
 		int err;
 
 		lockdep_assert_held(&i915->drm.struct_mutex);
 
-		list_for_each_entry(tl, &i915->gt.timelines, link) {
-			timeout = wait_for_timeline(tl, flags, timeout);
-			if (timeout < 0)
-				return timeout;
-		}
 		if (GEM_SHOW_DEBUG() && !timeout) {
 			/* Presume that timeout was non-zero to begin with! */
 			dev_warn(&i915->drm.pdev->dev,
@@ -3299,17 +3316,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 
 		i915_retire_requests(i915);
 		GEM_BUG_ON(i915->gt.active_requests);
-	} else {
-		struct intel_engine_cs *engine;
-		enum intel_engine_id id;
-
-		for_each_engine(engine, i915, id) {
-			struct i915_timeline *tl = &engine->timeline;
-
-			timeout = wait_for_timeline(tl, flags, timeout);
-			if (timeout < 0)
-				return timeout;
-		}
 	}
 
 	return 0;
@@ -5010,6 +5016,8 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		dev_priv->gt.cleanup_engine = intel_engine_cleanup;
 	}
 
+	i915_timelines_init(dev_priv);
+
 	ret = i915_gem_init_userptr(dev_priv);
 	if (ret)
 		return ret;
@@ -5132,8 +5140,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_uc_misc:
 	intel_uc_fini_misc(dev_priv);
 
-	if (ret != -EIO)
+	if (ret != -EIO) {
 		i915_gem_cleanup_userptr(dev_priv);
+		i915_timelines_fini(dev_priv);
+	}
 
 	if (ret == -EIO) {
 		mutex_lock(&dev_priv->drm.struct_mutex);
@@ -5184,6 +5194,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv)
 
 	intel_uc_fini_misc(dev_priv);
 	i915_gem_cleanup_userptr(dev_priv);
+	i915_timelines_fini(dev_priv);
 
 	i915_gem_drain_freed_objects(dev_priv);
 
@@ -5286,7 +5297,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->priorities)
 		goto err_dependencies;
 
-	INIT_LIST_HEAD(&dev_priv->gt.timelines);
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
@@ -5330,7 +5340,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 	GEM_BUG_ON(!llist_empty(&dev_priv->mm.free_list));
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	WARN_ON(dev_priv->mm.object_count);
-	WARN_ON(!list_empty(&dev_priv->gt.timelines));
 
 	kmem_cache_destroy(dev_priv->priorities);
 	kmem_cache_destroy(dev_priv->dependencies);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 99bd3bc336b3..d2dca85a543d 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -854,7 +854,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 *
 	 * No more can be submitted until we reset the wedged bit.
 	 */
-	list_for_each_entry(tl, &i915->gt.timelines, link) {
+	mutex_lock(&i915->gt.timelines.mutex);
+	list_for_each_entry(tl, &i915->gt.timelines.list, link) {
 		struct i915_request *rq;
 		long timeout;
 
@@ -876,9 +877,12 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 		timeout = dma_fence_default_wait(&rq->fence, true,
 						 MAX_SCHEDULE_TIMEOUT);
 		i915_request_put(rq);
-		if (timeout < 0)
+		if (timeout < 0) {
+			mutex_unlock(&i915->gt.timelines.mutex);
 			goto unlock;
+		}
 	}
+	mutex_unlock(&i915->gt.timelines.mutex);
 
 	intel_engines_sanitize(i915, false);
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 6d5774cb8504..79ab03a0fdfe 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -13,7 +13,7 @@ void i915_timeline_init(struct drm_i915_private *i915,
 			struct i915_timeline *timeline,
 			const char *name)
 {
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
@@ -23,9 +23,12 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	 */
 	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
 
+	timeline->i915 = i915;
 	timeline->name = name;
 
-	list_add(&timeline->link, &i915->gt.timelines);
+	mutex_lock(&gt->mutex);
+	list_add(&timeline->link, &gt->list);
+	mutex_unlock(&gt->mutex);
 
 	/* Called during early_init before we know how many engines there are */
 
@@ -40,6 +43,17 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	i915_syncmap_init(&timeline->sync);
 }
 
+void i915_timelines_init(struct drm_i915_private *i915)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+
+	mutex_init(&gt->mutex);
+	INIT_LIST_HEAD(&gt->list);
+
+	/* via i915_gem_wait_for_idle() */
+	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
+}
+
 /**
  * i915_timelines_park - called when the driver idles
  * @i915: the drm_i915_private device
@@ -52,11 +66,11 @@ void i915_timeline_init(struct drm_i915_private *i915,
  */
 void i915_timelines_park(struct drm_i915_private *i915)
 {
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	struct i915_timeline *timeline;
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
-
-	list_for_each_entry(timeline, &i915->gt.timelines, link) {
+	mutex_lock(&gt->mutex);
+	list_for_each_entry(timeline, &gt->list, link) {
 		/*
 		 * All known fences are completed so we can scrap
 		 * the current sync point tracking and start afresh,
@@ -65,16 +79,21 @@ void i915_timelines_park(struct drm_i915_private *i915)
 		 */
 		i915_syncmap_free(&timeline->sync);
 	}
+	mutex_unlock(&gt->mutex);
 }
 
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
+	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
+
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 
+	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
+	mutex_unlock(&gt->mutex);
 }
 
 struct i915_timeline *
@@ -120,6 +139,15 @@ void __i915_timeline_free(struct kref *kref)
 	kfree(timeline);
 }
 
+void i915_timelines_fini(struct drm_i915_private *i915)
+{
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+
+	GEM_BUG_ON(!list_empty(&gt->list));
+
+	mutex_destroy(&gt->mutex);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_timeline.c"
 #include "selftests/i915_timeline.c"
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index c8d7117bb205..b0df513b6ca3 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -76,6 +76,7 @@ struct i915_timeline {
 
 	struct list_head link;
 	const char *name;
+	struct drm_i915_private *i915;
 
 	struct kref kref;
 };
@@ -144,7 +145,9 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 	return __i915_timeline_sync_is_later(tl, fence->context, fence->seqno);
 }
 
+void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
+void i915_timelines_fini(struct drm_i915_private *i915);
 
 /**
  * i915_timeline_set_barrier - orders submission between different timelines
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 8ab5a2688a0c..14ae46fda49f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -68,13 +68,14 @@ static void mock_device_release(struct drm_device *dev)
 	i915_gem_contexts_fini(i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
+	i915_timelines_fini(i915);
+
 	drain_workqueue(i915->wq);
 	i915_gem_drain_freed_objects(i915);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	mock_fini_ggtt(&i915->ggtt);
 	mutex_unlock(&i915->drm.struct_mutex);
-	WARN_ON(!list_empty(&i915->gt.timelines));
 
 	destroy_workqueue(i915->wq);
 
@@ -226,7 +227,8 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->priorities)
 		goto err_dependencies;
 
-	INIT_LIST_HEAD(&i915->gt.timelines);
+	i915_timelines_init(i915);
+
 	INIT_LIST_HEAD(&i915->gt.active_rings);
 	INIT_LIST_HEAD(&i915->gt.closed_vma);
 
@@ -253,6 +255,7 @@ struct drm_i915_private *mock_gem_device(void)
 	i915_gem_contexts_fini(i915);
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
+	i915_timelines_fini(i915);
 	kmem_cache_destroy(i915->priorities);
 err_dependencies:
 	kmem_cache_destroy(i915->dependencies);
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index 408113c1cc63..e5659aaa856d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -10,6 +10,7 @@
 
 void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 {
+	timeline->i915 = NULL;
 	timeline->fence_context = context;
 
 	spin_lock_init(&timeline->lock);
@@ -25,5 +26,5 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 void mock_timeline_fini(struct i915_timeline *timeline)
 {
-	i915_timeline_fini(timeline);
+	i915_syncmap_free(&timeline->sync);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 12/28] drm/i915: Introduce concept of per-timeline (context) HWSP
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (9 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 11/28] drm/i915: Move list of timelines under its own lock Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 13/28] drm/i915: Enlarge vma->pin_count Chris Wilson
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Supplement the per-engine HWSP with a per-timeline HWSP. That is a
per-request pointer through which we can check a local seqno,
abstracting away the presumption of a global seqno. In this first step,
we point each request back into the engine's HWSP so everything
continues to work with the global timeline.

v2: s/i915_request_hwsp/hwsp_seqno/ to emphasis that this is the current
HW value and that we are accessing it via i915_request merely as a
convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 16 ++++++----
 drivers/gpu/drm/i915/i915_request.h | 45 ++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.c    |  9 ++++--
 3 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 894b32258340..79eb1957cf99 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -182,10 +182,11 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_request(struct intel_engine_cs *engine,
 				    struct i915_request *rq)
 {
-	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d:%d\n",
 		  __func__, engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
+		  hwsp_seqno(rq),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!i915_request_completed(rq));
@@ -244,10 +245,11 @@ static void i915_request_retire(struct i915_request *request)
 {
 	struct i915_gem_active *active, *next;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  request->engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(request->engine));
 
 	lockdep_assert_held(&request->i915->drm.struct_mutex);
@@ -307,10 +309,11 @@ void i915_request_retire_upto(struct i915_request *rq)
 	struct intel_ring *ring = rq->ring;
 	struct i915_request *tmp;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  rq->engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
+		  hwsp_seqno(rq),
 		  intel_engine_get_seqno(rq->engine));
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
@@ -355,10 +358,11 @@ void __i915_request_submit(struct i915_request *request)
 	struct intel_engine_cs *engine = request->engine;
 	u32 seqno;
 
-	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d:%d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  engine->timeline.seqno + 1,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!irqs_disabled());
@@ -405,10 +409,11 @@ void __i915_request_unsubmit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
-	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d:%d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
+		  hwsp_seqno(request),
 		  intel_engine_get_seqno(engine));
 
 	GEM_BUG_ON(!irqs_disabled());
@@ -629,6 +634,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->ring = ce->ring;
 	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
+	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
 
 	spin_lock_init(&rq->lock);
 	dma_fence_init(&rq->fence,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index c0f084ca4f29..ade010fe6e26 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -130,6 +130,13 @@ struct i915_request {
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
 
+	/*
+	 * A convenience pointer to the current breadcrumb value stored in
+	 * the HW status page (or our timeline's local equivalent). The full
+	 * path would be rq->hw_context->ring->timeline->hwsp_seqno.
+	 */
+	const u32 *hwsp_seqno;
+
 	/**
 	 * GEM sequence number associated with this request on the
 	 * global execution timeline. It is zero when the request is not
@@ -285,11 +292,6 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
-static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
-					    u32 seqno);
-static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
-					      u32 seqno);
-
 /**
  * Returns true if seq1 is later than seq2.
  */
@@ -298,6 +300,35 @@ static inline bool i915_seqno_passed(u32 seq1, u32 seq2)
 	return (s32)(seq1 - seq2) >= 0;
 }
 
+static inline u32 __hwsp_seqno(const struct i915_request *rq)
+{
+	return READ_ONCE(*rq->hwsp_seqno);
+}
+
+/**
+ * hwsp_seqno - the current breadcrumb value in the HW status page
+ * @rq: the request, to chase the relevant HW status page
+ *
+ * The emphasis in naming here is that hwsp_seqno() is not a property of the
+ * request, but an indication of the current HW state (associated with this
+ * request). Its value will change as the GPU executes more requests.
+ *
+ * Returns the current breadcrumb value in the associated HW status page (or
+ * the local timeline's equivalent) for this request. The request itself
+ * has the associated breadcrumb value of rq->fence.seqno, when the HW
+ * status page has that breadcrumb or later, this request is complete.
+ */
+static inline u32 hwsp_seqno(const struct i915_request *rq)
+{
+	u32 seqno;
+
+	rcu_read_lock(); /* the HWSP may be freed at runtime */
+	seqno = __hwsp_seqno(rq);
+	rcu_read_unlock();
+
+	return seqno;
+}
+
 /**
  * i915_request_started - check if the request has begun being executed
  * @rq: the request
@@ -315,14 +346,14 @@ static inline bool i915_request_started(const struct i915_request *rq)
 	if (!seqno) /* not yet submitted to HW */
 		return false;
 
-	return intel_engine_has_started(rq->engine, seqno);
+	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
 }
 
 static inline bool
 __i915_request_completed(const struct i915_request *rq, u32 seqno)
 {
 	GEM_BUG_ON(!seqno);
-	return intel_engine_has_completed(rq->engine, seqno) &&
+	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
 		seqno == i915_request_global_seqno(rq);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index df331d71a89a..55b9d4581b02 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -554,11 +554,12 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
 
-			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
 				  engine->name, n,
 				  port[n].context_id, count,
 				  rq->global_seqno,
 				  rq->fence.context, rq->fence.seqno,
+				  hwsp_seqno(rq),
 				  intel_engine_get_seqno(engine),
 				  rq_prio(rq));
 		} else {
@@ -851,11 +852,12 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
 
-		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
+		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d:%d)\n",
 			  rq->engine->name,
 			  (unsigned int)(port - execlists->port),
 			  rq->global_seqno,
 			  rq->fence.context, rq->fence.seqno,
+			  hwsp_seqno(rq),
 			  intel_engine_get_seqno(rq->engine));
 
 		GEM_BUG_ON(!execlists->active);
@@ -1081,12 +1083,13 @@ static void process_csb(struct intel_engine_cs *engine)
 						EXECLISTS_ACTIVE_USER));
 
 		rq = port_unpack(port, &count);
-		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
 			  engine->name,
 			  port->context_id, count,
 			  rq ? rq->global_seqno : 0,
 			  rq ? rq->fence.context : 0,
 			  rq ? rq->fence.seqno : 0,
+			  rq ? hwsp_seqno(rq) : 0,
 			  intel_engine_get_seqno(engine),
 			  rq ? rq_prio(rq) : 0);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 13/28] drm/i915: Enlarge vma->pin_count
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (10 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 12/28] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 14/28] drm/i915: Allocate a status page for each timeline Chris Wilson
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Previously we only accommodated having a vma pinned by a small number of
users, with the maximum being pinned for use by the display engine. As
such, we used a small bitfield only large enough to allow the vma to
be pinned twice (for back/front buffers) in each scanout plane. Keeping
the maximum permissible pin_count small allows us to quickly catch a
potential leak. However, as we want to split a 4096B page into 64
different cachelines and pin each cacheline for use by a different
timeline, we will exceed the current maximum permissible vma->pin_count
and so time has come to enlarge it.

Whilst we are here, try to pull together the similar bits:

Address/layout specification:
 - bias, mappable, zone_4g: address limit specifiers
 - fixed: address override, limits still apply though
 - high: not strictly an address limit, but an address direction to search

Search controls:
 - nonblock, nonfault, noevict

v2: Rewrite the guideline comment on bit consumption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: John Harrison <john.C.Harrison@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.h | 26 ++++++++---------
 drivers/gpu/drm/i915/i915_vma.h     | 45 +++++++++++++++++++----------
 2 files changed, 42 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index bd679c8c56dd..03ade71b8d9a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -642,19 +642,19 @@ int i915_gem_gtt_insert(struct i915_address_space *vm,
 
 /* Flags used by pin/bind&friends. */
 #define PIN_NONBLOCK		BIT_ULL(0)
-#define PIN_MAPPABLE		BIT_ULL(1)
-#define PIN_ZONE_4G		BIT_ULL(2)
-#define PIN_NONFAULT		BIT_ULL(3)
-#define PIN_NOEVICT		BIT_ULL(4)
-
-#define PIN_MBZ			BIT_ULL(5) /* I915_VMA_PIN_OVERFLOW */
-#define PIN_GLOBAL		BIT_ULL(6) /* I915_VMA_GLOBAL_BIND */
-#define PIN_USER		BIT_ULL(7) /* I915_VMA_LOCAL_BIND */
-#define PIN_UPDATE		BIT_ULL(8)
-
-#define PIN_HIGH		BIT_ULL(9)
-#define PIN_OFFSET_BIAS		BIT_ULL(10)
-#define PIN_OFFSET_FIXED	BIT_ULL(11)
+#define PIN_NONFAULT		BIT_ULL(1)
+#define PIN_NOEVICT		BIT_ULL(2)
+#define PIN_MAPPABLE		BIT_ULL(3)
+#define PIN_ZONE_4G		BIT_ULL(4)
+#define PIN_HIGH		BIT_ULL(5)
+#define PIN_OFFSET_BIAS		BIT_ULL(6)
+#define PIN_OFFSET_FIXED	BIT_ULL(7)
+
+#define PIN_MBZ			BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */
+#define PIN_GLOBAL		BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */
+#define PIN_USER		BIT_ULL(10) /* I915_VMA_LOCAL_BIND */
+#define PIN_UPDATE		BIT_ULL(11)
+
 #define PIN_OFFSET_MASK		(-I915_GTT_PAGE_SIZE)
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 7252abc73d3e..5793abe509a2 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -71,29 +71,42 @@ struct i915_vma {
 	unsigned int open_count;
 	unsigned long flags;
 	/**
-	 * How many users have pinned this object in GTT space. The following
-	 * users can each hold at most one reference: pwrite/pread, execbuffer
-	 * (objects are not allowed multiple times for the same batchbuffer),
-	 * and the framebuffer code. When switching/pageflipping, the
-	 * framebuffer code has at most two buffers pinned per crtc.
+	 * How many users have pinned this object in GTT space.
 	 *
-	 * In the worst case this is 1 + 1 + 1 + 2*2 = 7. That would fit into 3
-	 * bits with absolutely no headroom. So use 4 bits.
+	 * This is a tightly bound, fairly small number of users, so we
+	 * stuff inside the flags field so that we can both check for overflow
+	 * and detect a no-op i915_vma_pin() in a single check, while also
+	 * pinning the vma.
+	 *
+	 * The worst case display setup would have the same vma pinned for
+	 * use on each plane on each crtc, while also building the next atomic
+	 * state and holding a pin for the length of the cleanup queue. In the
+	 * future, the flip queue may be increased from 1.
+	 * Estimated worst case: 3 [qlen] * 4 [max crtcs] * 7 [max planes] = 84
+	 *
+	 * For GEM, the number of concurrent users for pwrite/pread is
+	 * unbounded. For execbuffer, it is currently one but will in future
+	 * be extended to allow multiple clients to pin vma concurrently.
+	 *
+	 * We also use suballocated pages, with each suballocation claiming
+	 * its own pin on the shared vma. At present, this is limited to
+	 * exclusive cachelines of a single page, so a maximum of 64 possible
+	 * users.
 	 */
-#define I915_VMA_PIN_MASK 0xf
-#define I915_VMA_PIN_OVERFLOW	BIT(5)
+#define I915_VMA_PIN_MASK 0xff
+#define I915_VMA_PIN_OVERFLOW	BIT(8)
 
 	/** Flags and address space this VMA is bound to */
-#define I915_VMA_GLOBAL_BIND	BIT(6)
-#define I915_VMA_LOCAL_BIND	BIT(7)
+#define I915_VMA_GLOBAL_BIND	BIT(9)
+#define I915_VMA_LOCAL_BIND	BIT(10)
 #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW)
 
-#define I915_VMA_GGTT		BIT(8)
-#define I915_VMA_CAN_FENCE	BIT(9)
-#define I915_VMA_CLOSED		BIT(10)
-#define I915_VMA_USERFAULT_BIT	11
+#define I915_VMA_GGTT		BIT(11)
+#define I915_VMA_CAN_FENCE	BIT(12)
+#define I915_VMA_CLOSED		BIT(13)
+#define I915_VMA_USERFAULT_BIT	14
 #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
-#define I915_VMA_GGTT_WRITE	BIT(12)
+#define I915_VMA_GGTT_WRITE	BIT(15)
 
 	unsigned int active_count;
 	struct rb_root active;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 14/28] drm/i915: Allocate a status page for each timeline
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (11 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 13/28] drm/i915: Enlarge vma->pin_count Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 15/28] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Allocate a page for use as a status page by a group of timelines, as we
only need a dword of storage for each (rounded up to the cacheline for
safety) we can pack multiple timelines into the same page. Each timeline
will then be able to track its own HW seqno.

v2: Reuse the common per-engine HWSP for the solitary ringbuffer
timeline, so that we do not have to emit (using per-gen specialised
vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
sleight-of-hand for the global/per-context seqno switchover, we will
store both temporarily (and so use a custom offset for the shared timeline
HWSP until the switch over).

v3: Keep things simple and allocate a page for each timeline, page
sharing comes next.

v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
again in selftests.

v5: And caught red handed copying create timeline + check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_timeline.c          | 121 ++++++-
 drivers/gpu/drm/i915/i915_timeline.h          |  21 +-
 drivers/gpu/drm/i915/intel_engine_cs.c        |  76 ++--
 drivers/gpu/drm/i915/intel_lrc.c              |  22 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  10 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |   6 +-
 .../drm/i915/selftests/i915_live_selftests.h  |   1 +
 .../drm/i915/selftests/i915_mock_selftests.h  |   2 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 326 +++++++++++++++++-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  14 +-
 10 files changed, 543 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 79ab03a0fdfe..5b5f9dacfce9 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -9,28 +9,78 @@
 #include "i915_timeline.h"
 #include "i915_syncmap.h"
 
-void i915_timeline_init(struct drm_i915_private *i915,
-			struct i915_timeline *timeline,
-			const char *name)
+static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+
+	obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
+
+	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
+	if (IS_ERR(vma))
+		i915_gem_object_put(obj);
+
+	return vma;
+}
+
+static int hwsp_alloc(struct i915_timeline *timeline)
+{
+	struct i915_vma *vma;
+
+	vma = __hwsp_alloc(timeline->i915);
+	if (IS_ERR(vma))
+		return PTR_ERR(vma);
+
+	timeline->hwsp_ggtt = vma;
+	timeline->hwsp_offset = 0;
+
+	return 0;
+}
+
+int i915_timeline_init(struct drm_i915_private *i915,
+		       struct i915_timeline *timeline,
+		       const char *name,
+		       struct i915_vma *global_hwsp)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	void *vaddr;
+	int err;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
 	 * to mostly be tracking synchronisation between engines. It is not
 	 * a huge issue if this is not the case, but we may want to mitigate
 	 * any page crossing penalties if they become an issue.
+	 *
+	 * Called during early_init before we know how many engines there are.
 	 */
 	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
 
 	timeline->i915 = i915;
 	timeline->name = name;
+	timeline->pin_count = 0;
 
-	mutex_lock(&gt->mutex);
-	list_add(&timeline->link, &gt->list);
-	mutex_unlock(&gt->mutex);
+	if (global_hwsp) {
+		timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+	} else {
+		err = hwsp_alloc(timeline);
+		if (err)
+			return err;
+	}
+
+	vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		i915_vma_put(timeline->hwsp_ggtt);
+		return PTR_ERR(vaddr);
+	}
 
-	/* Called during early_init before we know how many engines there are */
+	timeline->hwsp_seqno =
+		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
 	timeline->fence_context = dma_fence_context_alloc(1);
 
@@ -41,6 +91,12 @@ void i915_timeline_init(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
+
+	mutex_lock(&gt->mutex);
+	list_add(&timeline->link, &gt->list);
+	mutex_unlock(&gt->mutex);
+
+	return 0;
 }
 
 void i915_timelines_init(struct drm_i915_private *i915)
@@ -86,6 +142,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
 
+	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
@@ -94,18 +151,29 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
 	mutex_unlock(&gt->mutex);
+
+	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	i915_vma_put(timeline->hwsp_ggtt);
 }
 
 struct i915_timeline *
-i915_timeline_create(struct drm_i915_private *i915, const char *name)
+i915_timeline_create(struct drm_i915_private *i915,
+		     const char *name,
+		     struct i915_vma *global_hwsp)
 {
 	struct i915_timeline *timeline;
+	int err;
 
 	timeline = kzalloc(sizeof(*timeline), GFP_KERNEL);
 	if (!timeline)
 		return ERR_PTR(-ENOMEM);
 
-	i915_timeline_init(i915, timeline, name);
+	err = i915_timeline_init(i915, timeline, name, global_hwsp);
+	if (err) {
+		kfree(timeline);
+		return ERR_PTR(err);
+	}
+
 	kref_init(&timeline->kref);
 
 	return timeline;
@@ -130,6 +198,41 @@ int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
 	return 0;
 }
 
+int i915_timeline_pin(struct i915_timeline *tl)
+{
+	int err;
+
+	if (tl->pin_count++)
+		return 0;
+	GEM_BUG_ON(!tl->pin_count);
+
+	err = i915_vma_pin(tl->hwsp_ggtt, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err)
+		goto unpin;
+
+	return 0;
+
+unpin:
+	tl->pin_count = 0;
+	return err;
+}
+
+void i915_timeline_unpin(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	if (--tl->pin_count)
+		return;
+
+	/*
+	 * Since this timeline is idle, all bariers upon which we were waiting
+	 * must also be complete and so we can discard the last used barriers
+	 * without loss of information.
+	 */
+	i915_syncmap_free(&tl->sync);
+
+	__i915_vma_unpin(tl->hwsp_ggtt);
+}
+
 void __i915_timeline_free(struct kref *kref)
 {
 	struct i915_timeline *timeline =
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index b0df513b6ca3..8e29182a2584 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -32,6 +32,8 @@
 #include "i915_syncmap.h"
 #include "i915_utils.h"
 
+struct i915_vma;
+
 struct i915_timeline {
 	u64 fence_context;
 	u32 seqno;
@@ -40,6 +42,11 @@ struct i915_timeline {
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
 
+	unsigned int pin_count;
+	const u32 *hwsp_seqno;
+	struct i915_vma *hwsp_ggtt;
+	u32 hwsp_offset;
+
 	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
@@ -81,9 +88,10 @@ struct i915_timeline {
 	struct kref kref;
 };
 
-void i915_timeline_init(struct drm_i915_private *i915,
-			struct i915_timeline *tl,
-			const char *name);
+int i915_timeline_init(struct drm_i915_private *i915,
+		       struct i915_timeline *tl,
+		       const char *name,
+		       struct i915_vma *hwsp);
 void i915_timeline_fini(struct i915_timeline *tl);
 
 static inline void
@@ -106,7 +114,9 @@ i915_timeline_set_subclass(struct i915_timeline *timeline,
 }
 
 struct i915_timeline *
-i915_timeline_create(struct drm_i915_private *i915, const char *name);
+i915_timeline_create(struct drm_i915_private *i915,
+		     const char *name,
+		     struct i915_vma *global_hwsp);
 
 static inline struct i915_timeline *
 i915_timeline_get(struct i915_timeline *timeline)
@@ -145,6 +155,9 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 	return __i915_timeline_sync_is_later(tl, fence->context, fence->seqno);
 }
 
+int i915_timeline_pin(struct i915_timeline *tl);
+void i915_timeline_unpin(struct i915_timeline *tl);
+
 void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 7c57949ad586..e961bc5017a9 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -484,26 +484,6 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 	execlists->queue = RB_ROOT_CACHED;
 }
 
-/**
- * intel_engines_setup_common - setup engine state not requiring hw access
- * @engine: Engine to setup.
- *
- * Initializes @engine@ structure members shared between legacy and execlists
- * submission modes which do not require hardware access.
- *
- * Typically done early in the submission mode specific engine setup stage.
- */
-void intel_engine_setup_common(struct intel_engine_cs *engine)
-{
-	i915_timeline_init(engine->i915, &engine->timeline, engine->name);
-	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
-
-	intel_engine_init_execlist(engine);
-	intel_engine_init_hangcheck(engine);
-	intel_engine_init_batch_pool(engine);
-	intel_engine_init_cmd_parser(engine);
-}
-
 static void cleanup_status_page(struct intel_engine_cs *engine)
 {
 	struct i915_vma *vma;
@@ -601,6 +581,44 @@ static int init_status_page(struct intel_engine_cs *engine)
 	return ret;
 }
 
+/**
+ * intel_engines_setup_common - setup engine state not requiring hw access
+ * @engine: Engine to setup.
+ *
+ * Initializes @engine@ structure members shared between legacy and execlists
+ * submission modes which do not require hardware access.
+ *
+ * Typically done early in the submission mode specific engine setup stage.
+ */
+int intel_engine_setup_common(struct intel_engine_cs *engine)
+{
+	int err;
+
+	err = init_status_page(engine);
+	if (err)
+		return err;
+
+	err = i915_timeline_init(engine->i915,
+				 &engine->timeline,
+				 engine->name,
+				 engine->status_page.vma);
+	if (err)
+		goto err_hwsp;
+
+	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
+
+	intel_engine_init_execlist(engine);
+	intel_engine_init_hangcheck(engine);
+	intel_engine_init_batch_pool(engine);
+	intel_engine_init_cmd_parser(engine);
+
+	return 0;
+
+err_hwsp:
+	cleanup_status_page(engine);
+	return err;
+}
+
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
@@ -617,7 +635,7 @@ struct measure_breadcrumb {
 static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 {
 	struct measure_breadcrumb *frame;
-	unsigned int dw;
+	int dw = -ENOMEM;
 
 	GEM_BUG_ON(!engine->i915->gt.scratch);
 
@@ -625,7 +643,10 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 	if (!frame)
 		return -ENOMEM;
 
-	i915_timeline_init(engine->i915, &frame->timeline, "measure");
+	if (i915_timeline_init(engine->i915,
+			       &frame->timeline, "measure",
+			       engine->status_page.vma))
+		goto out_frame;
 
 	INIT_LIST_HEAD(&frame->ring.request_list);
 	frame->ring.timeline = &frame->timeline;
@@ -642,8 +663,9 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 	dw = engine->emit_breadcrumb(&frame->rq, frame->cs) - frame->cs;
 
 	i915_timeline_fini(&frame->timeline);
-	kfree(frame);
 
+out_frame:
+	kfree(frame);
 	return dw;
 }
 
@@ -693,20 +715,14 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		goto err_unpin_preempt;
 
-	ret = init_status_page(engine);
-	if (ret)
-		goto err_breadcrumbs;
-
 	ret = measure_breadcrumb_dw(engine);
 	if (ret < 0)
-		goto err_status_page;
+		goto err_breadcrumbs;
 
 	engine->emit_breadcrumb_dw = ret;
 
 	return 0;
 
-err_status_page:
-	cleanup_status_page(engine);
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(engine);
 err_unpin_preempt:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 55b9d4581b02..45e28614babd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2317,10 +2317,14 @@ logical_ring_default_irqs(struct intel_engine_cs *engine)
 	engine->irq_keep_mask = GT_CONTEXT_SWITCH_INTERRUPT << shift;
 }
 
-static void
+static int
 logical_ring_setup(struct intel_engine_cs *engine)
 {
-	intel_engine_setup_common(engine);
+	int err;
+
+	err = intel_engine_setup_common(engine);
+	if (err)
+		return err;
 
 	/* Intentionally left blank. */
 	engine->buffer = NULL;
@@ -2330,6 +2334,8 @@ logical_ring_setup(struct intel_engine_cs *engine)
 
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
+
+	return 0;
 }
 
 static int logical_ring_init(struct intel_engine_cs *engine)
@@ -2378,7 +2384,9 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 {
 	int ret;
 
-	logical_ring_setup(engine);
+	ret = logical_ring_setup(engine);
+	if (ret)
+		return ret;
 
 	/* Override some for render ring. */
 	engine->init_context = gen8_init_rcs_context;
@@ -2407,7 +2415,11 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 
 int logical_xcs_ring_init(struct intel_engine_cs *engine)
 {
-	logical_ring_setup(engine);
+	int err;
+
+	err = logical_ring_setup(engine);
+	if (err)
+		return err;
 
 	return logical_ring_init(engine);
 }
@@ -2740,7 +2752,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	timeline = i915_timeline_create(ctx->i915, ctx->name);
+	timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
 		goto error_deref_obj;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cb6d2aa2a829..174795622eb1 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1545,9 +1545,13 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	struct intel_ring *ring;
 	int err;
 
-	intel_engine_setup_common(engine);
+	err = intel_engine_setup_common(engine);
+	if (err)
+		return err;
 
-	timeline = i915_timeline_create(engine->i915, engine->name);
+	timeline = i915_timeline_create(engine->i915,
+					engine->name,
+					engine->status_page.vma);
 	if (IS_ERR(timeline)) {
 		err = PTR_ERR(timeline);
 		goto err;
@@ -1571,6 +1575,8 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	if (err)
 		goto err_unpin;
 
+	GEM_BUG_ON(ring->timeline->hwsp_ggtt != engine->status_page.vma);
+
 	return 0;
 
 err_unpin:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 21dc695bd2f3..5ce54f23a945 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -717,7 +717,9 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_INDEX_ADDR		(I915_GEM_HWS_INDEX * sizeof(u32))
 #define I915_GEM_HWS_PREEMPT		0x32
 #define I915_GEM_HWS_PREEMPT_ADDR	(I915_GEM_HWS_PREEMPT * sizeof(u32))
-#define I915_GEM_HWS_SCRATCH		0x40
+#define I915_GEM_HWS_SEQNO		0x40
+#define I915_GEM_HWS_SEQNO_ADDR		(I915_GEM_HWS_SEQNO * sizeof(u32))
+#define I915_GEM_HWS_SCRATCH		0x80
 #define I915_GEM_HWS_SCRATCH_ADDR	(I915_GEM_HWS_SCRATCH * sizeof(u32))
 
 #define I915_HWS_CSB_BUF0_INDEX		0x10
@@ -823,7 +825,7 @@ intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
 
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 
-void intel_engine_setup_common(struct intel_engine_cs *engine);
+int intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index a15713cae3b3..76b4f87fc853 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -13,6 +13,7 @@ selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
 selftest(uncore, intel_uncore_live_selftests)
 selftest(workarounds, intel_workarounds_live_selftests)
 selftest(requests, i915_request_live_selftests)
+selftest(timelines, i915_timeline_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(dmabuf, i915_gem_dmabuf_live_selftests)
 selftest(coherency, i915_gem_coherency_live_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 1b70208eeea7..4a83a1c6c406 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -16,7 +16,7 @@ selftest(syncmap, i915_syncmap_mock_selftests)
 selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
 selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
-selftest(timelines, i915_gem_timeline_mock_selftests)
+selftest(timelines, i915_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
 selftest(dmabuf, i915_gem_dmabuf_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 19f1c6a5c8fb..1585b614510d 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -7,6 +7,7 @@
 #include "../i915_selftest.h"
 #include "i915_random.h"
 
+#include "igt_flush_test.h"
 #include "mock_gem_device.h"
 #include "mock_timeline.h"
 
@@ -256,7 +257,7 @@ static int bench_sync(void *arg)
 	return 0;
 }
 
-int i915_gem_timeline_mock_selftests(void)
+int i915_timeline_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_sync),
@@ -265,3 +266,326 @@ int i915_gem_timeline_mock_selftests(void)
 
 	return i915_subtests(tests, NULL);
 }
+
+static int emit_ggtt_store_dw(struct i915_request *rq, u32 addr, u32 value)
+{
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	if (INTEL_GEN(rq->i915) >= 8) {
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = addr;
+		*cs++ = 0;
+		*cs++ = value;
+	} else if (INTEL_GEN(rq->i915) >= 4) {
+		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+		*cs++ = 0;
+		*cs++ = addr;
+		*cs++ = value;
+	} else {
+		*cs++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
+		*cs++ = addr;
+		*cs++ = value;
+		*cs++ = MI_NOOP;
+	}
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
+static u32 hwsp_address(const struct i915_timeline *tl)
+{
+	return i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
+}
+
+static struct i915_request *
+tl_write(struct i915_timeline *tl, struct intel_engine_cs *engine, u32 value)
+{
+	struct i915_request *rq;
+	int err;
+
+	lockdep_assert_held(&tl->i915->drm.struct_mutex); /* lazy rq refs */
+
+	err = i915_timeline_pin(tl);
+	if (err) {
+		rq = ERR_PTR(err);
+		goto out;
+	}
+
+	rq = i915_request_alloc(engine, engine->i915->kernel_context);
+	if (IS_ERR(rq))
+		goto out_unpin;
+
+	err = emit_ggtt_store_dw(rq, hwsp_address(tl), value);
+	i915_request_add(rq);
+	if (err)
+		rq = ERR_PTR(err);
+
+out_unpin:
+	i915_timeline_unpin(tl);
+out:
+	if (IS_ERR(rq))
+		pr_err("Failed to write to timeline!\n");
+	return rq;
+}
+
+static struct i915_timeline *
+checked_i915_timeline_create(struct drm_i915_private *i915)
+{
+	struct i915_timeline *tl;
+
+	tl = i915_timeline_create(i915, "live", NULL);
+	if (IS_ERR(tl))
+		return tl;
+
+	if (*tl->hwsp_seqno != tl->seqno) {
+		pr_err("Timeline created with incorrect breadcrumb, found %x, expected %x\n",
+		       *tl->hwsp_seqno, tl->seqno);
+		i915_timeline_put(tl);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return tl;
+}
+
+static int live_hwsp_engine(void *arg)
+{
+#define NUM_TIMELINES 4096
+	struct drm_i915_private *i915 = arg;
+	struct i915_timeline **timelines;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count, n;
+	int err = 0;
+
+	/*
+	 * Create a bunch of timelines and check we can write
+	 * independently to each of their breadcrumb slots.
+	 */
+
+	timelines = kvmalloc_array(NUM_TIMELINES * I915_NUM_ENGINES,
+				   sizeof(*timelines),
+				   GFP_KERNEL);
+	if (!timelines)
+		return -ENOMEM;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for_each_engine(engine, i915, id) {
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		for (n = 0; n < NUM_TIMELINES; n++) {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			timelines[count++] = tl;
+		}
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (n = 0; n < count; n++) {
+		struct i915_timeline *tl = timelines[n];
+
+		if (!err && *tl->hwsp_seqno != n) {
+			pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+			       n, *tl->hwsp_seqno);
+			err = -EINVAL;
+		}
+		i915_timeline_put(tl);
+	}
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	kvfree(timelines);
+
+	return err;
+#undef NUM_TIMELINES
+}
+
+static int live_hwsp_alternate(void *arg)
+{
+#define NUM_TIMELINES 4096
+	struct drm_i915_private *i915 = arg;
+	struct i915_timeline **timelines;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count, n;
+	int err = 0;
+
+	/*
+	 * Create a bunch of timelines and check we can write
+	 * independently to each of their breadcrumb slots with adjacent
+	 * engines.
+	 */
+
+	timelines = kvmalloc_array(NUM_TIMELINES * I915_NUM_ENGINES,
+				   sizeof(*timelines),
+				   GFP_KERNEL);
+	if (!timelines)
+		return -ENOMEM;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for (n = 0; n < NUM_TIMELINES; n++) {
+		for_each_engine(engine, i915, id) {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			if (!intel_engine_can_store_dword(engine))
+				continue;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			timelines[count++] = tl;
+		}
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (n = 0; n < count; n++) {
+		struct i915_timeline *tl = timelines[n];
+
+		if (!err && *tl->hwsp_seqno != n) {
+			pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+			       n, *tl->hwsp_seqno);
+			err = -EINVAL;
+		}
+		i915_timeline_put(tl);
+	}
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	kvfree(timelines);
+
+	return err;
+#undef NUM_TIMELINES
+}
+
+static int live_hwsp_recycle(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	unsigned long count;
+	int err = 0;
+
+	/*
+	 * Check seqno writes into one timeline at a time. We expect to
+	 * recycle the breadcrumb slot between iterations and neither
+	 * want to confuse ourselves or the GPU.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	count = 0;
+	for_each_engine(engine, i915, id) {
+		IGT_TIMEOUT(end_time);
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		do {
+			struct i915_timeline *tl;
+			struct i915_request *rq;
+
+			tl = checked_i915_timeline_create(i915);
+			if (IS_ERR(tl)) {
+				err = PTR_ERR(tl);
+				goto out;
+			}
+
+			rq = tl_write(tl, engine, count);
+			if (IS_ERR(rq)) {
+				i915_timeline_put(tl);
+				err = PTR_ERR(rq);
+				goto out;
+			}
+
+			if (i915_request_wait(rq,
+					      I915_WAIT_LOCKED,
+					      HZ / 5) < 0) {
+				pr_err("Wait for timeline writes timed out!\n");
+				i915_timeline_put(tl);
+				err = -EIO;
+				goto out;
+			}
+
+			if (*tl->hwsp_seqno != count) {
+				pr_err("Invalid seqno stored in timeline %lu, found 0x%x\n",
+				       count, *tl->hwsp_seqno);
+				err = -EINVAL;
+			}
+
+			i915_timeline_put(tl);
+			count++;
+
+			if (err)
+				goto out;
+
+			i915_timelines_park(i915); /* Encourage recycling! */
+		} while (!__igt_timeout(end_time, NULL));
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
+int i915_timeline_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_hwsp_recycle),
+		SUBTEST(live_hwsp_engine),
+		SUBTEST(live_hwsp_alternate),
+	};
+
+	return i915_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 4e5b4dc6df0f..919c89fd6ee5 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -39,7 +39,12 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 	if (!ring)
 		return NULL;
 
-	i915_timeline_init(engine->i915, &ring->timeline, engine->name);
+	if (i915_timeline_init(engine->i915,
+			       &ring->timeline, engine->name,
+			       NULL)) {
+		kfree(ring);
+		return NULL;
+	}
 
 	ring->base.size = sz;
 	ring->base.effective_size = sz;
@@ -208,7 +213,11 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
 
-	i915_timeline_init(i915, &engine->base.timeline, engine->base.name);
+	if (i915_timeline_init(i915,
+			       &engine->base.timeline,
+			       engine->base.name,
+			       NULL))
+		goto err_free;
 	i915_timeline_set_subclass(&engine->base.timeline, TIMELINE_ENGINE);
 
 	intel_engine_init_breadcrumbs(&engine->base);
@@ -226,6 +235,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(&engine->base);
 	i915_timeline_fini(&engine->base.timeline);
+err_free:
 	kfree(engine);
 	return NULL;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 15/28] drm/i915: Share per-timeline HWSP using a slab suballocator
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (12 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 14/28] drm/i915: Allocate a status page for each timeline Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 16/28] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
                   ` (17 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.

John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.

v2: Keep all partially allocated HWSP on a freelist

This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!

v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |   4 +
 drivers/gpu/drm/i915/i915_timeline.c          | 124 ++++++++++++---
 drivers/gpu/drm/i915/i915_timeline.h          |   1 +
 drivers/gpu/drm/i915/selftests/i915_random.c  |  33 +++-
 drivers/gpu/drm/i915/selftests/i915_random.h  |   3 +
 .../gpu/drm/i915/selftests/i915_timeline.c    | 143 ++++++++++++++++++
 6 files changed, 280 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8a181b455197..6a051381f535 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1978,6 +1978,10 @@ struct drm_i915_private {
 		struct i915_gt_timelines {
 			struct mutex mutex; /* protects list, tainted by GPU */
 			struct list_head list;
+
+			/* Pack multiple timelines' seqnos into the same page */
+			spinlock_t hwsp_lock;
+			struct list_head hwsp_free_list;
 		} timelines;
 
 		struct list_head active_rings;
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 5b5f9dacfce9..17c1ec58829e 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -9,6 +9,18 @@
 #include "i915_timeline.h"
 #include "i915_syncmap.h"
 
+struct i915_timeline_hwsp {
+	struct i915_vma *vma;
+	struct list_head free_link;
+	u64 free_bitmap;
+};
+
+static inline struct i915_timeline_hwsp *
+i915_timeline_hwsp(const struct i915_timeline *tl)
+{
+	return tl->hwsp_ggtt->private;
+}
+
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -27,28 +39,89 @@ static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
 	return vma;
 }
 
-static int hwsp_alloc(struct i915_timeline *timeline)
+static struct i915_vma *
+hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 {
-	struct i915_vma *vma;
+	struct drm_i915_private *i915 = timeline->i915;
+	struct i915_gt_timelines *gt = &i915->gt.timelines;
+	struct i915_timeline_hwsp *hwsp;
 
-	vma = __hwsp_alloc(timeline->i915);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);
 
-	timeline->hwsp_ggtt = vma;
-	timeline->hwsp_offset = 0;
+	spin_lock(&gt->hwsp_lock);
 
-	return 0;
+	/* hwsp_free_list only contains HWSP that have available cachelines */
+	hwsp = list_first_entry_or_null(&gt->hwsp_free_list,
+					typeof(*hwsp), free_link);
+	if (!hwsp) {
+		struct i915_vma *vma;
+
+		spin_unlock(&gt->hwsp_lock);
+
+		hwsp = kmalloc(sizeof(*hwsp), GFP_KERNEL);
+		if (!hwsp)
+			return ERR_PTR(-ENOMEM);
+
+		vma = __hwsp_alloc(i915);
+		if (IS_ERR(vma)) {
+			kfree(hwsp);
+			return vma;
+		}
+
+		vma->private = hwsp;
+		hwsp->vma = vma;
+		hwsp->free_bitmap = ~0ull;
+
+		spin_lock(&gt->hwsp_lock);
+		list_add(&hwsp->free_link, &gt->hwsp_free_list);
+	}
+
+	GEM_BUG_ON(!hwsp->free_bitmap);
+	*cacheline = __ffs64(hwsp->free_bitmap);
+	hwsp->free_bitmap &= ~BIT_ULL(*cacheline);
+	if (!hwsp->free_bitmap)
+		list_del(&hwsp->free_link);
+
+	spin_unlock(&gt->hwsp_lock);
+
+	GEM_BUG_ON(hwsp->vma->private != hwsp);
+	return hwsp->vma;
+}
+
+static void hwsp_free(struct i915_timeline *timeline)
+{
+	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
+	struct i915_timeline_hwsp *hwsp;
+
+	hwsp = i915_timeline_hwsp(timeline);
+	if (!hwsp) /* leave global HWSP alone! */
+		return;
+
+	spin_lock(&gt->hwsp_lock);
+
+	/* As a cacheline becomes available, publish the HWSP on the freelist */
+	if (!hwsp->free_bitmap)
+		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
+
+	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+
+	/* And if no one is left using it, give the page back to the system */
+	if (hwsp->free_bitmap == ~0ull) {
+		i915_vma_put(hwsp->vma);
+		list_del(&hwsp->free_link);
+		kfree(hwsp);
+	}
+
+	spin_unlock(&gt->hwsp_lock);
 }
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
-		       struct i915_vma *global_hwsp)
+		       struct i915_vma *hwsp)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	void *vaddr;
-	int err;
 
 	/*
 	 * Ideally we want a set of engines on a single leaf as we expect
@@ -64,18 +137,22 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 
-	if (global_hwsp) {
-		timeline->hwsp_ggtt = i915_vma_get(global_hwsp);
-		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
-	} else {
-		err = hwsp_alloc(timeline);
-		if (err)
-			return err;
+	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+	if (!hwsp) {
+		unsigned int cacheline;
+
+		hwsp = hwsp_alloc(timeline, &cacheline);
+		if (IS_ERR(hwsp))
+			return PTR_ERR(hwsp);
+
+		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
 	}
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(timeline->hwsp_ggtt->obj, I915_MAP_WB);
+	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
-		i915_vma_put(timeline->hwsp_ggtt);
+		hwsp_free(timeline);
+		i915_vma_put(hwsp);
 		return PTR_ERR(vaddr);
 	}
 
@@ -106,6 +183,9 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	mutex_init(&gt->mutex);
 	INIT_LIST_HEAD(&gt->list);
 
+	spin_lock_init(&gt->hwsp_lock);
+	INIT_LIST_HEAD(&gt->hwsp_free_list);
+
 	/* via i915_gem_wait_for_idle() */
 	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
 }
@@ -146,12 +226,13 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
-	i915_syncmap_free(&timeline->sync);
-
 	mutex_lock(&gt->mutex);
 	list_del(&timeline->link);
 	mutex_unlock(&gt->mutex);
 
+	i915_syncmap_free(&timeline->sync);
+	hwsp_free(timeline);
+
 	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
 	i915_vma_put(timeline->hwsp_ggtt);
 }
@@ -247,6 +328,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	GEM_BUG_ON(!list_empty(&gt->list));
+	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
 
 	mutex_destroy(&gt->mutex);
 }
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 8e29182a2584..42ef4830dd95 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -33,6 +33,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
+struct i915_timeline_hwsp;
 
 struct i915_timeline {
 	u64 fence_context;
diff --git a/drivers/gpu/drm/i915/selftests/i915_random.c b/drivers/gpu/drm/i915/selftests/i915_random.c
index 1f415ce47018..716a3f19f030 100644
--- a/drivers/gpu/drm/i915/selftests/i915_random.c
+++ b/drivers/gpu/drm/i915/selftests/i915_random.c
@@ -41,18 +41,37 @@ u64 i915_prandom_u64_state(struct rnd_state *rnd)
 	return x;
 }
 
-void i915_random_reorder(unsigned int *order, unsigned int count,
-			 struct rnd_state *state)
+void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
+			  struct rnd_state *state)
 {
-	unsigned int i, j;
+	char stack[128];
+
+	if (WARN_ON(elsz > sizeof(stack) || count > U32_MAX))
+		return;
+
+	if (!elsz || !count)
+		return;
+
+	/* Fisher-Yates shuffle courtesy of Knuth */
+	while (--count) {
+		size_t swp;
+
+		swp = i915_prandom_u32_max_state(count + 1, state);
+		if (swp == count)
+			continue;
 
-	for (i = 0; i < count; i++) {
-		BUILD_BUG_ON(sizeof(unsigned int) > sizeof(u32));
-		j = i915_prandom_u32_max_state(count, state);
-		swap(order[i], order[j]);
+		memcpy(stack, arr + count * elsz, elsz);
+		memcpy(arr + count * elsz, arr + swp * elsz, elsz);
+		memcpy(arr + swp * elsz, stack, elsz);
 	}
 }
 
+void i915_random_reorder(unsigned int *order, unsigned int count,
+			 struct rnd_state *state)
+{
+	i915_prandom_shuffle(order, sizeof(*order), count, state);
+}
+
 unsigned int *i915_random_order(unsigned int count, struct rnd_state *state)
 {
 	unsigned int *order, i;
diff --git a/drivers/gpu/drm/i915/selftests/i915_random.h b/drivers/gpu/drm/i915/selftests/i915_random.h
index 7dffedc501ca..8e1ff9c105b6 100644
--- a/drivers/gpu/drm/i915/selftests/i915_random.h
+++ b/drivers/gpu/drm/i915/selftests/i915_random.h
@@ -54,4 +54,7 @@ void i915_random_reorder(unsigned int *order,
 			 unsigned int count,
 			 struct rnd_state *state);
 
+void i915_prandom_shuffle(void *arr, size_t elsz, size_t count,
+			  struct rnd_state *state);
+
 #endif /* !__I915_SELFTESTS_RANDOM_H__ */
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 1585b614510d..c34340f074cf 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -4,6 +4,8 @@
  * Copyright © 2017-2018 Intel Corporation
  */
 
+#include <linux/prime_numbers.h>
+
 #include "../i915_selftest.h"
 #include "i915_random.h"
 
@@ -11,6 +13,146 @@
 #include "mock_gem_device.h"
 #include "mock_timeline.h"
 
+static struct page *hwsp_page(struct i915_timeline *tl)
+{
+	struct drm_i915_gem_object *obj = tl->hwsp_ggtt->obj;
+
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	return sg_page(obj->mm.pages->sgl);
+}
+
+static unsigned long hwsp_cacheline(struct i915_timeline *tl)
+{
+	unsigned long address = (unsigned long)page_address(hwsp_page(tl));
+
+	return (address + tl->hwsp_offset) / CACHELINE_BYTES;
+}
+
+#define CACHELINES_PER_PAGE (PAGE_SIZE / CACHELINE_BYTES)
+
+struct mock_hwsp_freelist {
+	struct drm_i915_private *i915;
+	struct radix_tree_root cachelines;
+	struct i915_timeline **history;
+	unsigned long count, max;
+	struct rnd_state prng;
+};
+
+enum {
+	SHUFFLE = BIT(0),
+};
+
+static void __mock_hwsp_record(struct mock_hwsp_freelist *state,
+			       unsigned int idx,
+			       struct i915_timeline *tl)
+{
+	tl = xchg(&state->history[idx], tl);
+	if (tl) {
+		radix_tree_delete(&state->cachelines, hwsp_cacheline(tl));
+		i915_timeline_put(tl);
+	}
+}
+
+static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
+				unsigned int count,
+				unsigned int flags)
+{
+	struct i915_timeline *tl;
+	unsigned int idx;
+
+	while (count--) {
+		unsigned long cacheline;
+		int err;
+
+		tl = i915_timeline_create(state->i915, "mock", NULL);
+		if (IS_ERR(tl))
+			return PTR_ERR(tl);
+
+		cacheline = hwsp_cacheline(tl);
+		err = radix_tree_insert(&state->cachelines, cacheline, tl);
+		if (err) {
+			if (err == -EEXIST) {
+				pr_err("HWSP cacheline %lu already used; duplicate allocation!\n",
+				       cacheline);
+			}
+			i915_timeline_put(tl);
+			return err;
+		}
+
+		idx = state->count++ % state->max;
+		__mock_hwsp_record(state, idx, tl);
+	}
+
+	if (flags & SHUFFLE)
+		i915_prandom_shuffle(state->history,
+				     sizeof(*state->history),
+				     min(state->count, state->max),
+				     &state->prng);
+
+	count = i915_prandom_u32_max_state(min(state->count, state->max),
+					   &state->prng);
+	while (count--) {
+		idx = --state->count % state->max;
+		__mock_hwsp_record(state, idx, NULL);
+	}
+
+	return 0;
+}
+
+static int mock_hwsp_freelist(void *arg)
+{
+	struct mock_hwsp_freelist state;
+	const struct {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "linear", 0 },
+		{ "shuffled", SHUFFLE },
+		{ },
+	}, *p;
+	unsigned int na;
+	int err = 0;
+
+	INIT_RADIX_TREE(&state.cachelines, GFP_KERNEL);
+	state.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed);
+
+	state.i915 = mock_gem_device();
+	if (!state.i915)
+		return -ENOMEM;
+
+	/*
+	 * Create a bunch of timelines and check that their HWSP do not overlap.
+	 * Free some, and try again.
+	 */
+
+	state.max = PAGE_SIZE / sizeof(*state.history);
+	state.count = 0;
+	state.history = kcalloc(state.max, sizeof(*state.history), GFP_KERNEL);
+	if (!state.history) {
+		err = -ENOMEM;
+		goto err_put;
+	}
+
+	mutex_lock(&state.i915->drm.struct_mutex);
+	for (p = phases; p->name; p++) {
+		pr_debug("%s(%s)\n", __func__, p->name);
+		for_each_prime_number_from(na, 1, 2 * CACHELINES_PER_PAGE) {
+			err = __mock_hwsp_timeline(&state, na, p->flags);
+			if (err)
+				goto out;
+		}
+	}
+
+out:
+	for (na = 0; na < state.max; na++)
+		__mock_hwsp_record(&state, na, NULL);
+	mutex_unlock(&state.i915->drm.struct_mutex);
+	kfree(state.history);
+err_put:
+	drm_dev_put(&state.i915->drm);
+	return err;
+}
+
 struct __igt_sync {
 	const char *name;
 	u32 seqno;
@@ -260,6 +402,7 @@ static int bench_sync(void *arg)
 int i915_timeline_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(mock_hwsp_freelist),
 		SUBTEST(igt_sync),
 		SUBTEST(bench_sync),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 16/28] drm/i915: Track the context's seqno in its own timeline HWSP
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (13 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 15/28] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 17/28] drm/i915: Track active timelines Chris Wilson
                   ` (16 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c               |  2 +-
 drivers/gpu/drm/i915/i915_request.c           |  3 +-
 drivers/gpu/drm/i915/i915_request.h           | 30 +++----
 drivers/gpu/drm/i915/i915_reset.c             |  1 +
 drivers/gpu/drm/i915/i915_timeline.c          |  4 +
 drivers/gpu/drm/i915/intel_engine_cs.c        | 15 +++-
 drivers/gpu/drm/i915/intel_lrc.c              | 31 ++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 87 +++++++++++++++----
 .../gpu/drm/i915/selftests/i915_timeline.c    |  7 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  | 19 +++-
 10 files changed, 139 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d68f3fdd8a8e..1bd724d663d9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2892,7 +2892,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	 */
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 	list_for_each_entry(request, &engine->timeline.requests, link) {
-		if (__i915_request_completed(request, request->global_seqno))
+		if (i915_request_completed(request))
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 79eb1957cf99..cdbdbcff28ec 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -199,6 +199,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	spin_unlock(&engine->timeline.lock);
 
 	spin_lock(&rq->lock);
+	i915_request_mark_complete(rq);
 	if (!i915_request_signaled(rq))
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
@@ -634,7 +635,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->ring = ce->ring;
 	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = &engine->status_page.addr[I915_GEM_HWS_INDEX];
+	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
 
 	spin_lock_init(&rq->lock);
 	dma_fence_init(&rq->fence,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index ade010fe6e26..96c586d6ff4d 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -289,6 +289,7 @@ long i915_request_wait(struct i915_request *rq,
 
 static inline bool i915_request_signaled(const struct i915_request *rq)
 {
+	/* The request may live longer than its HWSP, so check flags first! */
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
@@ -340,32 +341,23 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
  */
 static inline bool i915_request_started(const struct i915_request *rq)
 {
-	u32 seqno;
-
-	seqno = i915_request_global_seqno(rq);
-	if (!seqno) /* not yet submitted to HW */
-		return false;
+	if (i915_request_signaled(rq))
+		return true;
 
-	return i915_seqno_passed(hwsp_seqno(rq), seqno - 1);
-}
-
-static inline bool
-__i915_request_completed(const struct i915_request *rq, u32 seqno)
-{
-	GEM_BUG_ON(!seqno);
-	return i915_seqno_passed(hwsp_seqno(rq), seqno) &&
-		seqno == i915_request_global_seqno(rq);
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
 }
 
 static inline bool i915_request_completed(const struct i915_request *rq)
 {
-	u32 seqno;
+	if (i915_request_signaled(rq))
+		return true;
 
-	seqno = i915_request_global_seqno(rq);
-	if (!seqno)
-		return false;
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno);
+}
 
-	return __i915_request_completed(rq, seqno);
+static inline void i915_request_mark_complete(struct i915_request *rq)
+{
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno; /* decouple from HWSP */
 }
 
 void i915_retire_requests(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index d2dca85a543d..bd82f9b1043f 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -760,6 +760,7 @@ static void nop_submit_request(struct i915_request *request)
 
 	spin_lock_irqsave(&request->engine->timeline.lock, flags);
 	__i915_request_submit(request);
+	i915_request_mark_complete(request);
 	intel_engine_write_global_seqno(request->engine, request->global_seqno);
 	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
 }
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 17c1ec58829e..34ffa6dca1b7 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -291,6 +291,10 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	if (err)
 		goto unpin;
 
+	tl->hwsp_offset =
+		i915_ggtt_offset(tl->hwsp_ggtt) +
+		offset_in_page(tl->hwsp_offset);
+
 	return 0;
 
 unpin:
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e961bc5017a9..5a80db990351 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -660,10 +660,16 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 	frame->rq.ring = &frame->ring;
 	frame->rq.timeline = &frame->timeline;
 
+	dw = i915_timeline_pin(&frame->timeline);
+	if (dw < 0)
+		goto out_timeline;
+
 	dw = engine->emit_breadcrumb(&frame->rq, frame->cs) - frame->cs;
 
-	i915_timeline_fini(&frame->timeline);
+	i915_timeline_unpin(&frame->timeline);
 
+out_timeline:
+	i915_timeline_fini(&frame->timeline);
 out_frame:
 	kfree(frame);
 	return dw;
@@ -1426,9 +1432,10 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine,
 				char hdr[80];
 
 				snprintf(hdr, sizeof(hdr),
-					 "\t\tELSP[%d] count=%d, ring->start=%08x, rq: ",
+					 "\t\tELSP[%d] count=%d, ring:{start:%08x, hwsp:%08x}, rq: ",
 					 idx, count,
-					 i915_ggtt_offset(rq->ring->vma));
+					 i915_ggtt_offset(rq->ring->vma),
+					 rq->timeline->hwsp_offset);
 				print_request(m, rq, hdr);
 			} else {
 				drm_printf(m, "\t\tELSP[%d] idle\n", idx);
@@ -1538,6 +1545,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 			   rq->ring->emit);
 		drm_printf(m, "\t\tring->space:  0x%08x\n",
 			   rq->ring->space);
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   rq->timeline->hwsp_offset);
 
 		print_request_ring(m, rq);
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 45e28614babd..f4b0be81c6db 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -941,10 +941,10 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(rq, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!rq->global_seqno);
 
-		if (i915_request_signaled(rq))
-			continue;
+		if (!i915_request_signaled(rq))
+			dma_fence_set_error(&rq->fence, -EIO);
 
-		dma_fence_set_error(&rq->fence, -EIO);
+		i915_request_mark_complete(rq);
 	}
 
 	/* Flush the queued requests to the timeline list (for retiring). */
@@ -954,9 +954,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
 			list_del_init(&rq->sched.link);
-
-			dma_fence_set_error(&rq->fence, -EIO);
 			__i915_request_submit(rq);
+			dma_fence_set_error(&rq->fence, -EIO);
+			i915_request_mark_complete(rq);
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
@@ -2155,10 +2155,17 @@ static u32 *gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
 
-	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
+	cs = gen8_emit_ggtt_write(cs,
+				  request->fence.seqno,
+				  request->timeline->hwsp_offset);
+
+	cs = gen8_emit_ggtt_write(cs,
+				  request->global_seqno,
 				  intel_hws_seqno_address(request->engine));
+
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
 
@@ -2167,18 +2174,20 @@ static u32 *gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 
 static u32 *gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 {
-	/* We're using qword write, seqno should be aligned to 8 bytes. */
-	BUILD_BUG_ON(I915_GEM_HWS_INDEX & 1);
-
 	cs = gen8_emit_ggtt_write_rcs(cs,
-				      request->global_seqno,
-				      intel_hws_seqno_address(request->engine),
+				      request->fence.seqno,
+				      request->timeline->hwsp_offset,
 				      PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH |
 				      PIPE_CONTROL_DEPTH_CACHE_FLUSH |
 				      PIPE_CONTROL_DC_FLUSH_ENABLE |
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_CS_STALL);
 
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      request->global_seqno,
+				      intel_hws_seqno_address(request->engine),
+				      PIPE_CONTROL_CS_STALL);
+
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 174795622eb1..ee3719324e2d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -326,6 +326,11 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 		 PIPE_CONTROL_DC_FLUSH_ENABLE |
 		 PIPE_CONTROL_QW_WRITE |
 		 PIPE_CONTROL_CS_STALL);
+	*cs++ = rq->timeline->hwsp_offset | PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
 	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
 	*cs++ = rq->global_seqno;
 
@@ -427,6 +432,13 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 		 PIPE_CONTROL_QW_WRITE |
 		 PIPE_CONTROL_GLOBAL_GTT_IVB |
 		 PIPE_CONTROL_CS_STALL);
+	*cs++ = rq->timeline->hwsp_offset;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = (PIPE_CONTROL_QW_WRITE |
+		 PIPE_CONTROL_GLOBAL_GTT_IVB |
+		 PIPE_CONTROL_CS_STALL);
 	*cs++ = intel_hws_seqno_address(rq->engine);
 	*cs++ = rq->global_seqno;
 
@@ -441,10 +453,19 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 
 static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
-	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(rq->timeline->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
+
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -457,14 +478,21 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
 	int i;
 
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW;
-	*cs++ = intel_hws_seqno_address(rq->engine) | MI_FLUSH_DW_USE_GTT;
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(rq->timeline->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = rq->fence.seqno;
+
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
 
 	for (i = 0; i < GEN7_XCS_WA; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_INDEX_ADDR;
-		*cs++ = rq->global_seqno;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
 	}
 
 	*cs++ = MI_FLUSH_DW;
@@ -472,7 +500,6 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = 0;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -738,7 +765,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 	rq = NULL;
 	spin_lock_irqsave(&tl->lock, flags);
 	list_for_each_entry(pos, &tl->requests, link) {
-		if (!__i915_request_completed(pos, pos->global_seqno)) {
+		if (!i915_request_completed(pos)) {
 			rq = pos;
 			break;
 		}
@@ -880,10 +907,10 @@ static void cancel_requests(struct intel_engine_cs *engine)
 	list_for_each_entry(request, &engine->timeline.requests, link) {
 		GEM_BUG_ON(!request->global_seqno);
 
-		if (i915_request_signaled(request))
-			continue;
+		if (!i915_request_signaled(request))
+			dma_fence_set_error(&request->fence, -EIO);
 
-		dma_fence_set_error(&request->fence, -EIO);
+		i915_request_mark_complete(request);
 	}
 
 	intel_write_status_page(engine,
@@ -907,14 +934,20 @@ static void i9xx_submit_request(struct i915_request *request)
 
 static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(rq->timeline->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
 	*cs++ = MI_FLUSH;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+	*cs++ = rq->fence.seqno;
+
 	*cs++ = MI_STORE_DWORD_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -927,8 +960,15 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 {
 	int i;
 
+	GEM_BUG_ON(rq->timeline->hwsp_ggtt != rq->engine->status_page.vma);
+	GEM_BUG_ON(offset_in_page(rq->timeline->hwsp_offset) != I915_GEM_HWS_SEQNO_ADDR);
+
 	*cs++ = MI_FLUSH;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+	*cs++ = rq->fence.seqno;
+
 	BUILD_BUG_ON(GEN5_WA_STORES < 1);
 	for (i = 0; i < GEN5_WA_STORES; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
@@ -937,6 +977,7 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	}
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -1169,6 +1210,10 @@ int intel_ring_pin(struct intel_ring *ring)
 
 	GEM_BUG_ON(ring->vaddr);
 
+	ret = i915_timeline_pin(ring->timeline);
+	if (ret)
+		return ret;
+
 	flags = PIN_GLOBAL;
 
 	/* Ring wraparound at offset 0 sometimes hangs. No idea why. */
@@ -1185,28 +1230,32 @@ int intel_ring_pin(struct intel_ring *ring)
 		else
 			ret = i915_gem_object_set_to_cpu_domain(vma->obj, true);
 		if (unlikely(ret))
-			return ret;
+			goto unpin_timeline;
 	}
 
 	ret = i915_vma_pin(vma, 0, 0, flags);
 	if (unlikely(ret))
-		return ret;
+		goto unpin_timeline;
 
 	if (i915_vma_is_map_and_fenceable(vma))
 		addr = (void __force *)i915_vma_pin_iomap(vma);
 	else
 		addr = i915_gem_object_pin_map(vma->obj, map);
-	if (IS_ERR(addr))
-		goto err;
+	if (IS_ERR(addr)) {
+		ret = PTR_ERR(addr);
+		goto unpin_ring;
+	}
 
 	vma->obj->pin_global++;
 
 	ring->vaddr = addr;
 	return 0;
 
-err:
+unpin_ring:
 	i915_vma_unpin(vma);
-	return PTR_ERR(addr);
+unpin_timeline:
+	i915_timeline_unpin(ring->timeline);
+	return ret;
 }
 
 void intel_ring_reset(struct intel_ring *ring, u32 tail)
@@ -1235,6 +1284,8 @@ void intel_ring_unpin(struct intel_ring *ring)
 
 	ring->vma->obj->pin_global--;
 	i915_vma_unpin(ring->vma);
+
+	i915_timeline_unpin(ring->timeline);
 }
 
 static struct i915_vma *
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index c34340f074cf..12ea69b1a1e5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -440,11 +440,6 @@ static int emit_ggtt_store_dw(struct i915_request *rq, u32 addr, u32 value)
 	return 0;
 }
 
-static u32 hwsp_address(const struct i915_timeline *tl)
-{
-	return i915_ggtt_offset(tl->hwsp_ggtt) + tl->hwsp_offset;
-}
-
 static struct i915_request *
 tl_write(struct i915_timeline *tl, struct intel_engine_cs *engine, u32 value)
 {
@@ -463,7 +458,7 @@ tl_write(struct i915_timeline *tl, struct intel_engine_cs *engine, u32 value)
 	if (IS_ERR(rq))
 		goto out_unpin;
 
-	err = emit_ggtt_store_dw(rq, hwsp_address(tl), value);
+	err = emit_ggtt_store_dw(rq, tl->hwsp_offset, value);
 	i915_request_add(rq);
 	if (err)
 		rq = ERR_PTR(err);
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 919c89fd6ee5..95e890d7f58b 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -30,6 +30,17 @@ struct mock_ring {
 	struct i915_timeline timeline;
 };
 
+static void mock_timeline_pin(struct i915_timeline *tl)
+{
+	tl->pin_count++;
+}
+
+static void mock_timeline_unpin(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	tl->pin_count--;
+}
+
 static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 {
 	const unsigned long sz = PAGE_SIZE / 2;
@@ -76,6 +87,8 @@ static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
 	mock_seqno_advance(request->base.engine, request->base.global_seqno);
+	i915_request_mark_complete(&request->base);
+	GEM_BUG_ON(!i915_request_completed(&request->base));
 }
 
 static void hw_delay_complete(struct timer_list *t)
@@ -108,6 +121,7 @@ static void hw_delay_complete(struct timer_list *t)
 
 static void mock_context_unpin(struct intel_context *ce)
 {
+	mock_timeline_unpin(ce->ring->timeline);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -129,6 +143,7 @@ mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
+	int err = -ENOMEM;
 
 	if (ce->pin_count++)
 		return ce;
@@ -139,13 +154,15 @@ mock_context_pin(struct intel_engine_cs *engine,
 			goto err;
 	}
 
+	mock_timeline_pin(ce->ring->timeline);
+
 	ce->ops = &mock_context_ops;
 	i915_gem_context_get(ctx);
 	return ce;
 
 err:
 	ce->pin_count = 0;
-	return ERR_PTR(-ENOMEM);
+	return ERR_PTR(err);
 }
 
 static int mock_request_alloc(struct i915_request *request)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 17/28] drm/i915: Track active timelines
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (14 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 16/28] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 18/28] drm/i915: Identify active requests Chris Wilson
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Now that we pin timelines around use, we have a clearly defined lifetime
and convenient points at which we can track only the active timelines.
This allows us to reduce the list iteration to only consider those
active timelines and not all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  2 +-
 drivers/gpu/drm/i915/i915_gem.c      |  4 +--
 drivers/gpu/drm/i915/i915_reset.c    |  2 +-
 drivers/gpu/drm/i915/i915_timeline.c | 39 ++++++++++++++++++----------
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6a051381f535..d072f3369ee1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1977,7 +1977,7 @@ struct drm_i915_private {
 
 		struct i915_gt_timelines {
 			struct mutex mutex; /* protects list, tainted by GPU */
-			struct list_head list;
+			struct list_head active_list;
 
 			/* Pack multiple timelines' seqnos into the same page */
 			spinlock_t hwsp_lock;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1bd724d663d9..05627000b77d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3248,7 +3248,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 		return timeout;
 
 	mutex_lock(&gt->mutex);
-	list_for_each_entry(tl, &gt->list, link) {
+	list_for_each_entry(tl, &gt->active_list, link) {
 		struct i915_request *rq;
 
 		rq = i915_gem_active_get_unlocked(&tl->last_request);
@@ -3276,7 +3276,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 
 		/* restart after reacquiring the lock */
 		mutex_lock(&gt->mutex);
-		tl = list_entry(&gt->list, typeof(*tl), link);
+		tl = list_entry(&gt->active_list, typeof(*tl), link);
 	}
 	mutex_unlock(&gt->mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index bd82f9b1043f..acf3c777e49d 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -856,7 +856,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 * No more can be submitted until we reset the wedged bit.
 	 */
 	mutex_lock(&i915->gt.timelines.mutex);
-	list_for_each_entry(tl, &i915->gt.timelines.list, link) {
+	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
 		struct i915_request *rq;
 		long timeout;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 34ffa6dca1b7..1c9794fe717e 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -120,7 +120,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 		       const char *name,
 		       struct i915_vma *hwsp)
 {
-	struct i915_gt_timelines *gt = &i915->gt.timelines;
 	void *vaddr;
 
 	/*
@@ -169,10 +168,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	i915_syncmap_init(&timeline->sync);
 
-	mutex_lock(&gt->mutex);
-	list_add(&timeline->link, &gt->list);
-	mutex_unlock(&gt->mutex);
-
 	return 0;
 }
 
@@ -181,7 +176,7 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
 	mutex_init(&gt->mutex);
-	INIT_LIST_HEAD(&gt->list);
+	INIT_LIST_HEAD(&gt->active_list);
 
 	spin_lock_init(&gt->hwsp_lock);
 	INIT_LIST_HEAD(&gt->hwsp_free_list);
@@ -190,6 +185,24 @@ void i915_timelines_init(struct drm_i915_private *i915)
 	i915_gem_shrinker_taints_mutex(i915, &gt->mutex);
 }
 
+static void timeline_add_to_active(struct i915_timeline *tl)
+{
+	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
+
+	mutex_lock(&gt->mutex);
+	list_add(&tl->link, &gt->active_list);
+	mutex_unlock(&gt->mutex);
+}
+
+static void timeline_remove_from_active(struct i915_timeline *tl)
+{
+	struct i915_gt_timelines *gt = &tl->i915->gt.timelines;
+
+	mutex_lock(&gt->mutex);
+	list_del(&tl->link);
+	mutex_unlock(&gt->mutex);
+}
+
 /**
  * i915_timelines_park - called when the driver idles
  * @i915: the drm_i915_private device
@@ -206,7 +219,7 @@ void i915_timelines_park(struct drm_i915_private *i915)
 	struct i915_timeline *timeline;
 
 	mutex_lock(&gt->mutex);
-	list_for_each_entry(timeline, &gt->list, link) {
+	list_for_each_entry(timeline, &gt->active_list, link) {
 		/*
 		 * All known fences are completed so we can scrap
 		 * the current sync point tracking and start afresh,
@@ -220,16 +233,10 @@ void i915_timelines_park(struct drm_i915_private *i915)
 
 void i915_timeline_fini(struct i915_timeline *timeline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
 
-	mutex_lock(&gt->mutex);
-	list_del(&timeline->link);
-	mutex_unlock(&gt->mutex);
-
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
 
@@ -295,6 +302,8 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	timeline_add_to_active(tl);
+
 	return 0;
 
 unpin:
@@ -308,6 +317,8 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 	if (--tl->pin_count)
 		return;
 
+	timeline_remove_from_active(tl);
+
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
 	 * must also be complete and so we can discard the last used barriers
@@ -331,7 +342,7 @@ void i915_timelines_fini(struct drm_i915_private *i915)
 {
 	struct i915_gt_timelines *gt = &i915->gt.timelines;
 
-	GEM_BUG_ON(!list_empty(&gt->list));
+	GEM_BUG_ON(!list_empty(&gt->active_list));
 	GEM_BUG_ON(!list_empty(&gt->hwsp_free_list));
 
 	mutex_destroy(&gt->mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 18/28] drm/i915: Identify active requests
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (15 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 17/28] drm/i915: Track active timelines Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 19/28] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c              | 15 ++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c   | 12 ++++++
 drivers/gpu/drm/i915/i915_request.c          | 10 ++---
 drivers/gpu/drm/i915/i915_request.h          |  1 +
 drivers/gpu/drm/i915/i915_timeline.c         |  1 +
 drivers/gpu/drm/i915/i915_timeline.h         |  2 +
 drivers/gpu/drm/i915/intel_engine_cs.c       |  8 ++--
 drivers/gpu/drm/i915/intel_lrc.c             | 39 +++++++++++++++++---
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 25 ++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h      |  6 ++-
 drivers/gpu/drm/i915/selftests/mock_engine.c |  2 +-
 11 files changed, 96 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05627000b77d..e802af64d628 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2873,6 +2873,14 @@ i915_gem_object_pwrite_gtt(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
+static bool match_ring(struct i915_request *rq)
+{
+	struct drm_i915_private *dev_priv = rq->i915;
+	u32 ring = I915_READ(RING_START(rq->engine->mmio_base));
+
+	return ring == i915_ggtt_offset(rq->ring->vma);
+}
+
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine)
 {
@@ -2895,6 +2903,13 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 		if (i915_request_completed(request))
 			continue;
 
+		if (!i915_request_started(request))
+			break;
+
+		/* More than one preemptible request may match! */
+		if (!match_ring(request))
+			break;
+
 		active = request;
 		break;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index f250109e1f66..8eedf7cac493 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1976,6 +1976,18 @@ static int eb_submit(struct i915_execbuffer *eb)
 			return err;
 	}
 
+	/*
+	 * After we completed waiting for other engines (using HW semaphores)
+	 * then we can signal that this request/batch is ready to run. This
+	 * allows us to determine if the batch is still waiting on the GPU
+	 * or actually running by checking the breadcrumb.
+	 */
+	if (eb->engine->emit_init_breadcrumb) {
+		err = eb->engine->emit_init_breadcrumb(eb->request);
+		if (err)
+			return err;
+	}
+
 	err = eb->engine->emit_bb_start(eb->request,
 					eb->batch->node.start +
 					eb->batch_start_offset,
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index cdbdbcff28ec..2171df2d3019 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -333,7 +333,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 static u32 timeline_get_seqno(struct i915_timeline *tl)
 {
-	return ++tl->seqno;
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
 }
 
 static void move_to_timeline(struct i915_request *request,
@@ -382,8 +382,8 @@ void __i915_request_submit(struct i915_request *request)
 		intel_engine_enable_signaling(request, false);
 	spin_unlock(&request->lock);
 
-	engine->emit_breadcrumb(request,
-				request->ring->vaddr + request->postfix);
+	engine->emit_fini_breadcrumb(request,
+				     request->ring->vaddr + request->postfix);
 
 	/* Transfer from per-context onto the global per-engine timeline */
 	move_to_timeline(request, &engine->timeline);
@@ -670,7 +670,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * around inside i915_request_add() there is sufficient space at
 	 * the beginning of the ring as well.
 	 */
-	rq->reserved_space = 2 * engine->emit_breadcrumb_dw * sizeof(u32);
+	rq->reserved_space = 2 * engine->emit_fini_breadcrumb_dw * sizeof(u32);
 
 	/*
 	 * Record the position of the start of the request so that
@@ -925,7 +925,7 @@ void i915_request_add(struct i915_request *request)
 	 * GPU processing the request, we never over-estimate the
 	 * position of the ring's HEAD.
 	 */
-	cs = intel_ring_begin(request, engine->emit_breadcrumb_dw);
+	cs = intel_ring_begin(request, engine->emit_fini_breadcrumb_dw);
 	GEM_BUG_ON(IS_ERR(cs));
 	request->postfix = intel_ring_offset(request, cs);
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 96c586d6ff4d..340d6216791c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -344,6 +344,7 @@ static inline bool i915_request_started(const struct i915_request *rq)
 	if (i915_request_signaled(rq))
 		return true;
 
+	/* Remember: started but may have since been preempted! */
 	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 1c9794fe717e..b354843a5040 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -135,6 +135,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->i915 = i915;
 	timeline->name = name;
 	timeline->pin_count = 0;
+	timeline->has_initial_breadcrumb = !hwsp;
 
 	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 42ef4830dd95..d167e04073c5 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -48,6 +48,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	bool has_initial_breadcrumb;
+
 	/**
 	 * List of breadcrumbs associated with GPU requests currently
 	 * outstanding.
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 5a80db990351..d25e510b7b08 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -664,7 +664,7 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 	if (dw < 0)
 		goto out_timeline;
 
-	dw = engine->emit_breadcrumb(&frame->rq, frame->cs) - frame->cs;
+	dw = engine->emit_fini_breadcrumb(&frame->rq, frame->cs) - frame->cs;
 
 	i915_timeline_unpin(&frame->timeline);
 
@@ -725,7 +725,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret < 0)
 		goto err_breadcrumbs;
 
-	engine->emit_breadcrumb_dw = ret;
+	engine->emit_fini_breadcrumb_dw = ret;
 
 	return 0;
 
@@ -1297,7 +1297,9 @@ static void print_request(struct drm_printer *m,
 	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
 		   prefix,
 		   rq->global_seqno,
-		   i915_request_completed(rq) ? "!" : "",
+		   i915_request_completed(rq) ? "!" :
+		   i915_request_started(rq) ? "*" :
+		   "",
 		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f4b0be81c6db..c767877533ef 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -734,7 +734,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 * WaIdleLiteRestore:bdw,skl
 		 * Apply the wa NOOPs to prevent
 		 * ring:HEAD == rq:TAIL as we resubmit the
-		 * request. See gen8_emit_breadcrumb() for
+		 * request. See gen8_emit_fini_breadcrumb() for
 		 * where we prepare the padding after the
 		 * end of the request.
 		 */
@@ -1394,6 +1394,34 @@ execlists_context_pin(struct intel_engine_cs *engine,
 	return __execlists_context_pin(engine, ctx, ce);
 }
 
+static int gen8_emit_init_breadcrumb(struct i915_request *rq)
+{
+	u32 *cs;
+
+	GEM_BUG_ON(!rq->timeline->has_initial_breadcrumb);
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Check if we have been preempted before we even get started.
+	 *
+	 * After this point i915_request_started() reports true, even if
+	 * we get preempted and so are no longer running.
+	 */
+	*cs++ = MI_ARB_CHECK;
+	*cs++ = MI_NOOP;
+
+	*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
+	*cs++ = rq->timeline->hwsp_offset;
+	*cs++ = 0;
+	*cs++ = rq->fence.seqno - 1;
+
+	intel_ring_advance(rq, cs);
+	return 0;
+}
+
 static int emit_pdps(struct i915_request *rq)
 {
 	const struct intel_engine_cs * const engine = rq->engine;
@@ -2150,7 +2178,7 @@ static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
 	return cs;
 }
 
-static u32 *gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
+static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 {
 	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
 	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
@@ -2172,7 +2200,7 @@ static u32 *gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 	return gen8_emit_wa_tail(request, cs);
 }
 
-static u32 *gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
+static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 {
 	cs = gen8_emit_ggtt_write_rcs(cs,
 				      request->fence.seqno,
@@ -2287,7 +2315,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->request_alloc = execlists_request_alloc;
 
 	engine->emit_flush = gen8_emit_flush;
-	engine->emit_breadcrumb = gen8_emit_breadcrumb;
+	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
+	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb;
 
 	engine->set_default_submission = intel_execlists_set_default_submission;
 
@@ -2400,7 +2429,7 @@ int logical_render_ring_init(struct intel_engine_cs *engine)
 	/* Override some for render ring. */
 	engine->init_context = gen8_init_rcs_context;
 	engine->emit_flush = gen8_emit_flush_render;
-	engine->emit_breadcrumb = gen8_emit_breadcrumb_rcs;
+	engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
 
 	ret = logical_ring_init(engine);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ee3719324e2d..668ed67336a2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1607,6 +1607,7 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 		err = PTR_ERR(timeline);
 		goto err;
 	}
+	GEM_BUG_ON(timeline->has_initial_breadcrumb);
 
 	ring = intel_engine_create_ring(engine, timeline, 32 * PAGE_SIZE);
 	i915_timeline_put(timeline);
@@ -1960,6 +1961,7 @@ static int ring_request_alloc(struct i915_request *request)
 	int ret;
 
 	GEM_BUG_ON(!request->hw_context->pin_count);
+	GEM_BUG_ON(request->timeline->has_initial_breadcrumb);
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
@@ -2296,9 +2298,14 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->context_pin = intel_ring_context_pin;
 	engine->request_alloc = ring_request_alloc;
 
-	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
+	/*
+	 * Using a global execution timeline; the previous final breadcrumb is
+	 * equivalent to our next initial bread so we can elide
+	 * engine->emit_init_breadcrumb().
+	 */
+	engine->emit_fini_breadcrumb = i9xx_emit_breadcrumb;
 	if (IS_GEN(dev_priv, 5))
-		engine->emit_breadcrumb = gen5_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen5_emit_breadcrumb;
 
 	engine->set_default_submission = i9xx_set_default_submission;
 
@@ -2327,11 +2334,11 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 	if (INTEL_GEN(dev_priv) >= 7) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_flush = gen7_render_ring_flush;
-		engine->emit_breadcrumb = gen7_rcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen7_rcs_emit_breadcrumb;
 	} else if (IS_GEN(dev_priv, 6)) {
 		engine->init_context = intel_rcs_ctx_init;
 		engine->emit_flush = gen6_render_ring_flush;
-		engine->emit_breadcrumb = gen6_rcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen6_rcs_emit_breadcrumb;
 	} else if (IS_GEN(dev_priv, 5)) {
 		engine->emit_flush = gen4_render_ring_flush;
 	} else {
@@ -2368,9 +2375,9 @@ int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine)
 		engine->irq_enable_mask = GT_BSD_USER_INTERRUPT;
 
 		if (IS_GEN(dev_priv, 6))
-			engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
 		else
-			engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
+			engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
 	} else {
 		engine->emit_flush = bsd_ring_flush;
 		if (IS_GEN(dev_priv, 5))
@@ -2394,9 +2401,9 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine)
 	engine->irq_enable_mask = GT_BLT_USER_INTERRUPT;
 
 	if (IS_GEN(dev_priv, 6))
-		engine->emit_breadcrumb = gen6_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen6_xcs_emit_breadcrumb;
 	else
-		engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
+		engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
 
 	return intel_init_ring_buffer(engine);
 }
@@ -2414,7 +2421,7 @@ int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine)
 	engine->irq_enable = hsw_vebox_irq_enable;
 	engine->irq_disable = hsw_vebox_irq_disable;
 
-	engine->emit_breadcrumb = gen7_xcs_emit_breadcrumb;
+	engine->emit_fini_breadcrumb = gen7_xcs_emit_breadcrumb;
 
 	return intel_init_ring_buffer(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5ce54f23a945..ae753c45cbe7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -468,8 +468,10 @@ struct intel_engine_cs {
 					 unsigned int dispatch_flags);
 #define I915_DISPATCH_SECURE BIT(0)
 #define I915_DISPATCH_PINNED BIT(1)
-	u32		*(*emit_breadcrumb)(struct i915_request *rq, u32 *cs);
-	int		emit_breadcrumb_dw;
+	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
+	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
+						 u32 *cs);
+	unsigned int	emit_fini_breadcrumb_dw;
 
 	/* Pass the request to the hardware queue (e.g. directly into
 	 * the legacy ringbuffer or to the end of an execlist).
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 95e890d7f58b..3b226ebc6bc4 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -227,7 +227,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
 	engine->base.emit_flush = mock_emit_flush;
-	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
+	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
 
 	if (i915_timeline_init(i915,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 19/28] drm/i915: Remove the intel_engine_notify tracepoint
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (16 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 18/28] drm/i915: Identify active requests Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

The global seqno is defunct and so we have no meaningful indicator of
forward progress for an engine. You need to listen to the request
signaling tracepoints instead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c   |  2 --
 drivers/gpu/drm/i915/i915_trace.h | 25 -------------------------
 2 files changed, 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index fe097725c27a..6a1b167ec801 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1226,8 +1226,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 		wake_up_process(tsk);
 
 	rcu_read_unlock();
-
-	trace_intel_engine_notify(engine, wait);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 43da14f08dc0..eab313c3163c 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -752,31 +752,6 @@ trace_i915_request_out(struct i915_request *rq)
 #endif
 #endif
 
-TRACE_EVENT(intel_engine_notify,
-	    TP_PROTO(struct intel_engine_cs *engine, bool waiters),
-	    TP_ARGS(engine, waiters),
-
-	    TP_STRUCT__entry(
-			     __field(u32, dev)
-			     __field(u16, class)
-			     __field(u16, instance)
-			     __field(u32, seqno)
-			     __field(bool, waiters)
-			     ),
-
-	    TP_fast_assign(
-			   __entry->dev = engine->i915->drm.primary->index;
-			   __entry->class = engine->uabi_class;
-			   __entry->instance = engine->instance;
-			   __entry->seqno = intel_engine_get_seqno(engine);
-			   __entry->waiters = waiters;
-			   ),
-
-	    TP_printk("dev=%u, engine=%u:%u, seqno=%u, waiters=%u",
-		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->seqno, __entry->waiters)
-);
-
 DEFINE_EVENT(i915_request, i915_request_retire,
 	    TP_PROTO(struct i915_request *rq),
 	    TP_ARGS(rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (17 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 19/28] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28 16:24   ` Tvrtko Ursulin
  2019-01-28  1:02 ` [PATCH 21/28] drm/i915: Drop fake breadcrumb irq Chris Wilson
                   ` (12 subsequent siblings)
  31 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.

References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  28 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |   3 +
 drivers/gpu/drm/i915/i915_gem_context.h       |   2 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |  83 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   9 +-
 drivers/gpu/drm/i915/i915_irq.c               |  88 +-
 drivers/gpu/drm/i915/i915_request.c           | 142 ++-
 drivers/gpu/drm/i915/i915_request.h           |  72 +-
 drivers/gpu/drm/i915/i915_reset.c             |  16 +-
 drivers/gpu/drm/i915/i915_scheduler.c         |   2 +-
 drivers/gpu/drm/i915/intel_breadcrumbs.c      | 818 +++++-------------
 drivers/gpu/drm/i915/intel_engine_cs.c        |  35 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  94 +-
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 drivers/gpu/drm/i915/selftests/i915_request.c | 425 +++++++++
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   5 -
 .../drm/i915/selftests/intel_breadcrumbs.c    | 470 ----------
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |   2 +-
 drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  54 ++
 drivers/gpu/drm/i915/selftests/lib_sw_fence.h |   3 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  17 +-
 drivers/gpu/drm/i915/selftests/mock_engine.h  |   6 -
 23 files changed, 894 insertions(+), 1483 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f5ac03f06e26..b1ac0f78cb42 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1316,29 +1316,16 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
 
 	for_each_engine(engine, dev_priv, id) {
-		struct intel_breadcrumbs *b = &engine->breadcrumbs;
-		struct rb_node *rb;
-
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
 			   engine->hangcheck.seqno, seqno[id],
 			   intel_engine_last_submit(engine),
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
-		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
-			   yesno(intel_engine_has_waiter(engine)),
+		seq_printf(m, "\tfake irq active? %s\n",
 			   yesno(test_bit(engine->id,
 					  &dev_priv->gpu_error.missed_irq_rings)));
 
-		spin_lock_irq(&b->rb_lock);
-		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-			struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-			seq_printf(m, "\t%s [%d] waiting for %x\n",
-				   w->tsk->comm, w->tsk->pid, w->seqno);
-		}
-		spin_unlock_irq(&b->rb_lock);
-
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
 			   (long long)acthd[id]);
@@ -2022,18 +2009,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
-static int count_irq_waiters(struct drm_i915_private *i915)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int count = 0;
-
-	for_each_engine(engine, i915, id)
-		count += intel_engine_has_waiter(engine);
-
-	return count;
-}
-
 static const char *rps_power_to_str(unsigned int power)
 {
 	static const char * const strings[] = {
@@ -2073,7 +2048,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
 	seq_printf(m, "GPU busy? %s [%d requests]\n",
 		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
-	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
 	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive));
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 93e84751370f..6faf1f6faab5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -327,6 +327,9 @@ intel_context_init(struct intel_context *ce,
 		   struct intel_engine_cs *engine)
 {
 	ce->gem_context = ctx;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
 }
 
 static struct i915_gem_context *
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 3769438228f6..6ba40ff6b91f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -164,6 +164,8 @@ struct i915_gem_context {
 	struct intel_context {
 		struct i915_gem_context *gem_context;
 		struct intel_engine_cs *active;
+		struct list_head signal_link;
+		struct list_head signals;
 		struct i915_vma *state;
 		struct intel_ring *ring;
 		u32 *lrc_reg_state;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 898e06014295..304a7ef7f7fb 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -447,9 +447,14 @@ static void error_print_request(struct drm_i915_error_state_buf *m,
 	if (!erq->seqno)
 		return;
 
-	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
+	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x%s%s, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
 		   prefix, erq->pid, erq->ban_score,
-		   erq->context, erq->seqno, erq->sched_attr.priority,
+		   erq->context, erq->seqno,
+		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+			    &erq->flags) ? "!" : "",
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &erq->flags) ? "+" : "",
+		   erq->sched_attr.priority,
 		   jiffies_to_msecs(erq->jiffies - epoch),
 		   erq->start, erq->head, erq->tail);
 }
@@ -530,7 +535,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 	}
 	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
 	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
-	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
 	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
 	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
 	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
@@ -804,21 +808,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 						    error->epoch);
 		}
 
-		if (IS_ERR(ee->waiters)) {
-			err_printf(m, "%s --- ? waiters [unable to acquire spinlock]\n",
-				   m->i915->engine[i]->name);
-		} else if (ee->num_waiters) {
-			err_printf(m, "%s --- %d waiters\n",
-				   m->i915->engine[i]->name,
-				   ee->num_waiters);
-			for (j = 0; j < ee->num_waiters; j++) {
-				err_printf(m, " seqno 0x%08x for %s [%d]\n",
-					   ee->waiters[j].seqno,
-					   ee->waiters[j].comm,
-					   ee->waiters[j].pid);
-			}
-		}
-
 		print_error_obj(m, m->i915->engine[i],
 				"ringbuffer", ee->ringbuffer);
 
@@ -1000,8 +989,6 @@ void __i915_gpu_state_free(struct kref *error_ref)
 		i915_error_object_free(ee->wa_ctx);
 
 		kfree(ee->requests);
-		if (!IS_ERR_OR_NULL(ee->waiters))
-			kfree(ee->waiters);
 	}
 
 	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
@@ -1205,59 +1192,6 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
 			I915_READ(RING_SYNC_2(engine->mmio_base));
 }
 
-static void error_record_engine_waiters(struct intel_engine_cs *engine,
-					struct drm_i915_error_engine *ee)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct drm_i915_error_waiter *waiter;
-	struct rb_node *rb;
-	int count;
-
-	ee->num_waiters = 0;
-	ee->waiters = NULL;
-
-	if (RB_EMPTY_ROOT(&b->waiters))
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	count = 0;
-	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
-		count++;
-	spin_unlock_irq(&b->rb_lock);
-
-	waiter = NULL;
-	if (count)
-		waiter = kmalloc_array(count,
-				       sizeof(struct drm_i915_error_waiter),
-				       GFP_ATOMIC);
-	if (!waiter)
-		return;
-
-	if (!spin_trylock_irq(&b->rb_lock)) {
-		kfree(waiter);
-		ee->waiters = ERR_PTR(-EDEADLK);
-		return;
-	}
-
-	ee->waiters = waiter;
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		strcpy(waiter->comm, w->tsk->comm);
-		waiter->pid = w->tsk->pid;
-		waiter->seqno = w->seqno;
-		waiter++;
-
-		if (++ee->num_waiters == count)
-			break;
-	}
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static void error_record_engine_registers(struct i915_gpu_state *error,
 					  struct intel_engine_cs *engine,
 					  struct drm_i915_error_engine *ee)
@@ -1293,7 +1227,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 
 	intel_engine_get_instdone(engine, &ee->instdone);
 
-	ee->waiting = intel_engine_has_waiter(engine);
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
 	ee->seqno = intel_engine_get_seqno(engine);
@@ -1367,6 +1300,7 @@ static void record_request(struct i915_request *request,
 {
 	struct i915_gem_context *ctx = request->gem_context;
 
+	erq->flags = request->fence.flags;
 	erq->context = ctx->hw_id;
 	erq->sched_attr = request->sched.attr;
 	erq->ban_score = atomic_read(&ctx->ban_score);
@@ -1542,7 +1476,6 @@ static void gem_record_rings(struct i915_gpu_state *error)
 		ee->engine_id = i;
 
 		error_record_engine_registers(error, engine, ee);
-		error_record_engine_waiters(engine, ee);
 		error_record_engine_execlists(engine, ee);
 
 		request = i915_gem_find_active_request(engine);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 231173786eae..74757c424aab 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -82,8 +82,6 @@ struct i915_gpu_state {
 		int engine_id;
 		/* Software tracked state */
 		bool idle;
-		bool waiting;
-		int num_waiters;
 		unsigned long hangcheck_timestamp;
 		struct i915_address_space *vm;
 		int num_requests;
@@ -147,6 +145,7 @@ struct i915_gpu_state {
 		struct drm_i915_error_object *default_state;
 
 		struct drm_i915_error_request {
+			unsigned long flags;
 			long jiffies;
 			pid_t pid;
 			u32 context;
@@ -159,12 +158,6 @@ struct i915_gpu_state {
 		} *requests, execlist[EXECLIST_MAX_PORTS];
 		unsigned int num_ports;
 
-		struct drm_i915_error_waiter {
-			char comm[TASK_COMM_LEN];
-			pid_t pid;
-			u32 seqno;
-		} *waiters;
-
 		struct {
 			u32 gfx_mode;
 			union {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 6a1b167ec801..a9911fea3ddb 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -28,12 +28,14 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
-#include <linux/sysrq.h>
-#include <linux/slab.h>
 #include <linux/circ_buf.h>
+#include <linux/slab.h>
+#include <linux/sysrq.h>
+
 #include <drm/drm_irq.h>
 #include <drm/drm_drv.h>
 #include <drm/i915_drm.h>
+
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
@@ -1168,66 +1170,6 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 	return;
 }
 
-static void notify_ring(struct intel_engine_cs *engine)
-{
-	const u32 seqno = intel_engine_get_seqno(engine);
-	struct i915_request *rq = NULL;
-	struct task_struct *tsk = NULL;
-	struct intel_wait *wait;
-
-	if (unlikely(!engine->breadcrumbs.irq_armed))
-		return;
-
-	rcu_read_lock();
-
-	spin_lock(&engine->breadcrumbs.irq_lock);
-	wait = engine->breadcrumbs.irq_wait;
-	if (wait) {
-		/*
-		 * We use a callback from the dma-fence to submit
-		 * requests after waiting on our own requests. To
-		 * ensure minimum delay in queuing the next request to
-		 * hardware, signal the fence now rather than wait for
-		 * the signaler to be woken up. We still wake up the
-		 * waiter in order to handle the irq-seqno coherency
-		 * issues (we may receive the interrupt before the
-		 * seqno is written, see __i915_request_irq_complete())
-		 * and to handle coalescing of multiple seqno updates
-		 * and many waiters.
-		 */
-		if (i915_seqno_passed(seqno, wait->seqno)) {
-			struct i915_request *waiter = wait->request;
-
-			if (waiter &&
-			    !i915_request_signaled(waiter) &&
-			    intel_wait_check_request(wait, waiter))
-				rq = i915_request_get(waiter);
-
-			tsk = wait->tsk;
-		}
-
-		engine->breadcrumbs.irq_count++;
-	} else {
-		if (engine->breadcrumbs.irq_armed)
-			__intel_engine_disarm_breadcrumbs(engine);
-	}
-	spin_unlock(&engine->breadcrumbs.irq_lock);
-
-	if (rq) {
-		spin_lock(&rq->lock);
-		dma_fence_signal_locked(&rq->fence);
-		GEM_BUG_ON(!i915_request_completed(rq));
-		spin_unlock(&rq->lock);
-
-		i915_request_put(rq);
-	}
-
-	if (tsk && tsk->state & TASK_NORMAL)
-		wake_up_process(tsk);
-
-	rcu_read_unlock();
-}
-
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
 			struct intel_rps_ei *ei)
 {
@@ -1472,20 +1414,20 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & ILK_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 }
 
 static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
 			       u32 gt_iir)
 {
 	if (gt_iir & GT_RENDER_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[RCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 	if (gt_iir & GT_BSD_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[VCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 	if (gt_iir & GT_BLT_USER_INTERRUPT)
-		notify_ring(dev_priv->engine[BCS]);
+		intel_engine_breadcrumbs_irq(dev_priv->engine[BCS]);
 
 	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
 		      GT_BSD_CS_ERROR_INTERRUPT |
@@ -1505,7 +1447,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
 		tasklet = true;
 
 	if (iir & GT_RENDER_USER_INTERRUPT) {
-		notify_ring(engine);
+		intel_engine_breadcrumbs_irq(engine);
 		tasklet |= USES_GUC_SUBMISSION(engine->i915);
 	}
 
@@ -1851,7 +1793,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 
 	if (HAS_VEBOX(dev_priv)) {
 		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VECS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VECS]);
 
 		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
 			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
@@ -4275,7 +4217,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
 		I915_WRITE16(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4383,7 +4325,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
@@ -4528,10 +4470,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 		I915_WRITE(IIR, iir);
 
 		if (iir & I915_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[RCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
 
 		if (iir & I915_BSD_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VCS]);
+			intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
 
 		if (iir & I915_MASTER_ERROR_INTERRUPT)
 			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2171df2d3019..4b1869295362 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence)
 
 static bool i915_fence_enable_signaling(struct dma_fence *fence)
 {
-	return intel_engine_enable_signaling(to_request(fence), true);
+	return i915_request_enable_breadcrumb(to_request(fence));
 }
 
 static signed long i915_fence_wait(struct dma_fence *fence,
@@ -203,7 +203,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	if (!i915_request_signaled(rq))
 		dma_fence_signal_locked(&rq->fence);
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
-		intel_engine_cancel_signaling(rq);
+		i915_request_cancel_breadcrumb(rq);
 	if (rq->waitboost) {
 		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
 		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
@@ -377,9 +377,12 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
+	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	request->global_seqno = seqno;
-	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
-		intel_engine_enable_signaling(request, false);
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
+	    !i915_request_enable_breadcrumb(request))
+		intel_engine_queue_breadcrumbs(engine);
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -389,8 +392,6 @@ void __i915_request_submit(struct i915_request *request)
 	move_to_timeline(request, &engine->timeline);
 
 	trace_i915_request_execute(request);
-
-	wake_up_all(&request->execute);
 }
 
 void i915_request_submit(struct i915_request *request)
@@ -433,7 +434,9 @@ void __i915_request_unsubmit(struct i915_request *request)
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
 	request->global_seqno = 0;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
-		intel_engine_cancel_signaling(request);
+		i915_request_cancel_breadcrumb(request);
+	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
+	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 	spin_unlock(&request->lock);
 
 	/* Transfer back from the global per-engine timeline to per-context */
@@ -646,13 +649,11 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
-	init_waitqueue_head(&rq->execute);
 
 	i915_sched_node_init(&rq->sched);
 
 	/* No zalloc, must clear what we need by hand */
 	rq->global_seqno = 0;
-	rq->signaling.wait.seqno = 0;
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
@@ -1047,13 +1048,10 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
 	return this_cpu != cpu;
 }
 
-static bool __i915_spin_request(const struct i915_request *rq,
-				u32 seqno, int state, unsigned long timeout_us)
+static bool __i915_spin_request(const struct i915_request * const rq,
+				int state, unsigned long timeout_us)
 {
-	struct intel_engine_cs *engine = rq->engine;
-	unsigned int irq, cpu;
-
-	GEM_BUG_ON(!seqno);
+	unsigned int cpu;
 
 	/*
 	 * Only wait for the request if we know it is likely to complete.
@@ -1061,12 +1059,12 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * We don't track the timestamps around requests, nor the average
 	 * request length, so we do not have a good indicator that this
 	 * request will complete within the timeout. What we do know is the
-	 * order in which requests are executed by the engine and so we can
-	 * tell if the request has started. If the request hasn't started yet,
-	 * it is a fair assumption that it will not complete within our
-	 * relatively short timeout.
+	 * order in which requests are executed by the context and so we can
+	 * tell if the request has been started. If the request is not even
+	 * running yet, it is a fair assumption that it will not complete
+	 * within our relatively short timeout.
 	 */
-	if (!intel_engine_has_started(engine, seqno))
+	if (!i915_request_is_running(rq))
 		return false;
 
 	/*
@@ -1080,20 +1078,10 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	 * takes to sleep on a request, on the order of a microsecond.
 	 */
 
-	irq = READ_ONCE(engine->breadcrumbs.irq_count);
 	timeout_us += local_clock_us(&cpu);
 	do {
-		if (intel_engine_has_completed(engine, seqno))
-			return seqno == i915_request_global_seqno(rq);
-
-		/*
-		 * Seqno are meant to be ordered *before* the interrupt. If
-		 * we see an interrupt without a corresponding seqno advance,
-		 * assume we won't see one in the near future but require
-		 * the engine->seqno_barrier() to fixup coherency.
-		 */
-		if (READ_ONCE(engine->breadcrumbs.irq_count) != irq)
-			break;
+		if (i915_request_completed(rq))
+			return true;
 
 		if (signal_pending_state(state, current))
 			break;
@@ -1107,6 +1095,18 @@ static bool __i915_spin_request(const struct i915_request *rq,
 	return false;
 }
 
+struct request_wait {
+	struct dma_fence_cb cb;
+	struct task_struct *tsk;
+};
+
+static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
+{
+	struct request_wait *wait = container_of(cb, typeof(*wait), cb);
+
+	wake_up_process(wait->tsk);
+}
+
 /**
  * i915_request_wait - wait until execution of request has finished
  * @rq: the request to wait upon
@@ -1132,8 +1132,7 @@ long i915_request_wait(struct i915_request *rq,
 {
 	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
-	DEFINE_WAIT_FUNC(exec, default_wake_function);
-	struct intel_wait wait;
+	struct request_wait wait;
 
 	might_sleep();
 	GEM_BUG_ON(timeout < 0);
@@ -1145,47 +1144,24 @@ long i915_request_wait(struct i915_request *rq,
 		return -ETIME;
 
 	trace_i915_request_wait_begin(rq, flags);
-	add_wait_queue(&rq->execute, &exec);
-	intel_wait_init(&wait);
-	if (flags & I915_WAIT_PRIORITY)
-		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
-
-restart:
-	do {
-		set_current_state(state);
-		if (intel_wait_update_request(&wait, rq))
-			break;
-
-		if (signal_pending_state(state, current)) {
-			timeout = -ERESTARTSYS;
-			goto complete;
-		}
 
-		if (!timeout) {
-			timeout = -ETIME;
-			goto complete;
-		}
+	/* Optimistic short spin before touching IRQs */
+	if (__i915_spin_request(rq, state, 5))
+		goto out;
 
-		timeout = io_schedule_timeout(timeout);
-	} while (1);
+	if (flags & I915_WAIT_PRIORITY)
+		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
 
-	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
-	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
+	wait.tsk = current;
+	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
+		goto out;
 
-	/* Optimistic short spin before touching IRQs */
-	if (__i915_spin_request(rq, wait.seqno, state, 5))
-		goto complete;
+	for (;;) {
+		set_current_state(state);
 
-	set_current_state(state);
-	if (intel_engine_add_wait(rq->engine, &wait))
-		/*
-		 * In order to check that we haven't missed the interrupt
-		 * as we enabled it, we need to kick ourselves to do a
-		 * coherent check on the seqno before we sleep.
-		 */
-		goto wakeup;
+		if (i915_request_completed(rq))
+			break;
 
-	for (;;) {
 		if (signal_pending_state(state, current)) {
 			timeout = -ERESTARTSYS;
 			break;
@@ -1197,33 +1173,13 @@ long i915_request_wait(struct i915_request *rq,
 		}
 
 		timeout = io_schedule_timeout(timeout);
-
-		if (intel_wait_complete(&wait) &&
-		    intel_wait_check_request(&wait, rq))
-			break;
-
-		set_current_state(state);
-
-wakeup:
-		if (i915_request_completed(rq))
-			break;
-
-		/* Only spin if we know the GPU is processing this request */
-		if (__i915_spin_request(rq, wait.seqno, state, 2))
-			break;
-
-		if (!intel_wait_check_request(&wait, rq)) {
-			intel_engine_remove_wait(rq->engine, &wait);
-			goto restart;
-		}
 	}
-
-	intel_engine_remove_wait(rq->engine, &wait);
-complete:
 	__set_current_state(TASK_RUNNING);
-	remove_wait_queue(&rq->execute, &exec);
-	trace_i915_request_wait_end(rq);
 
+	dma_fence_remove_callback(&rq->fence, &wait.cb);
+
+out:
+	trace_i915_request_wait_end(rq);
 	return timeout;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 340d6216791c..3cffb96203b9 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,23 +38,34 @@ struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
 
-struct intel_wait {
-	struct rb_node node;
-	struct task_struct *tsk;
-	struct i915_request *request;
-	u32 seqno;
-};
-
-struct intel_signal_node {
-	struct intel_wait wait;
-	struct list_head link;
-};
-
 struct i915_capture_list {
 	struct i915_capture_list *next;
 	struct i915_vma *vma;
 };
 
+enum {
+	/*
+	 * I915_FENCE_FLAG_ACTIVE - this request is currently submitted to HW.
+	 *
+	 * Set by __i915_request_submit() on handing over to HW, and cleared
+	 * by __i915_request_unsubmit() if we preempt this request.
+	 *
+	 * Finally cleared for consistency on retiring the request, when
+	 * we know the HW is no longer running this request.
+	 *
+	 * See i915_request_is_active()
+	 */
+	I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
+
+	/*
+	 * I915_FENCE_FLAG_SIGNAL - this request is currently on signal_list
+	 *
+	 * Internal bookkeeping used by the breadcrumb code to track when
+	 * a request is on the various signal_list.
+	 */
+	I915_FENCE_FLAG_SIGNAL,
+};
+
 /**
  * Request queue structure.
  *
@@ -97,7 +108,7 @@ struct i915_request {
 	struct intel_context *hw_context;
 	struct intel_ring *ring;
 	struct i915_timeline *timeline;
-	struct intel_signal_node signaling;
+	struct list_head signal_link;
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
@@ -116,7 +127,6 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
-	wait_queue_head_t execute;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
@@ -255,7 +265,7 @@ i915_request_put(struct i915_request *rq)
  * that it has passed the global seqno and the global seqno is unchanged
  * after the read, it is indeed complete).
  */
-static u32
+static inline u32
 i915_request_global_seqno(const struct i915_request *request)
 {
 	return READ_ONCE(request->global_seqno);
@@ -277,6 +287,10 @@ void i915_request_skip(struct i915_request *request, int error);
 void __i915_request_unsubmit(struct i915_request *request);
 void i915_request_unsubmit(struct i915_request *request);
 
+/* Note: part of the intel_breadcrumbs family */
+bool i915_request_enable_breadcrumb(struct i915_request *request);
+void i915_request_cancel_breadcrumb(struct i915_request *request);
+
 long i915_request_wait(struct i915_request *rq,
 		       unsigned int flags,
 		       long timeout)
@@ -293,6 +307,11 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
 	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
 }
 
+static inline bool i915_request_is_active(const struct i915_request *rq)
+{
+	return test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+}
+
 /**
  * Returns true if seq1 is later than seq2.
  */
@@ -330,6 +349,11 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
 	return seqno;
 }
 
+static inline bool __i915_request_has_started(const struct i915_request *rq)
+{
+	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
+}
+
 /**
  * i915_request_started - check if the request has begun being executed
  * @rq: the request
@@ -345,7 +369,23 @@ static inline bool i915_request_started(const struct i915_request *rq)
 		return true;
 
 	/* Remember: started but may have since been preempted! */
-	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
+	return __i915_request_has_started(rq);
+}
+
+/**
+ * i915_request_is_running - check if the request may actually be executing
+ * @rq: the request
+ *
+ * Returns true if the request is currently submitted to hardware, has passed
+ * its start point (i.e. the context is setup and not busywaiting). Note that
+ * it may no longer be running by the time the function returns!
+ */
+static inline bool i915_request_is_running(const struct i915_request *rq)
+{
+	if (!i915_request_is_active(rq))
+		return false;
+
+	return __i915_request_has_started(rq);
 }
 
 static inline bool i915_request_completed(const struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index acf3c777e49d..4462007a681c 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -29,7 +29,7 @@ static void engine_skip_context(struct i915_request *rq)
 
 	spin_lock(&timeline->lock);
 
-	if (rq->global_seqno) {
+	if (i915_request_is_active(rq)) {
 		list_for_each_entry_continue(rq,
 					     &engine->timeline.requests, link)
 			if (rq->gem_context == hung_ctx)
@@ -751,18 +751,20 @@ static void reset_restart(struct drm_i915_private *i915)
 
 static void nop_submit_request(struct i915_request *request)
 {
+	struct intel_engine_cs *engine = request->engine;
 	unsigned long flags;
 
 	GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
-		  request->engine->name,
-		  request->fence.context, request->fence.seqno);
+		  engine->name, request->fence.context, request->fence.seqno);
 	dma_fence_set_error(&request->fence, -EIO);
 
-	spin_lock_irqsave(&request->engine->timeline.lock, flags);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 	__i915_request_submit(request);
 	i915_request_mark_complete(request);
-	intel_engine_write_global_seqno(request->engine, request->global_seqno);
-	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
+	intel_engine_write_global_seqno(engine, request->global_seqno);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+
+	intel_engine_queue_breadcrumbs(engine);
 }
 
 void i915_gem_set_wedged(struct drm_i915_private *i915)
@@ -817,7 +819,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 
 	for_each_engine(engine, i915, id) {
 		reset_finish_engine(engine);
-		intel_engine_wakeup(engine);
+		intel_engine_signal_breadcrumbs(engine);
 	}
 
 	smp_mb__before_atomic();
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7db1255665a8..8ae68d2fc134 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -243,7 +243,7 @@ static bool inflight(const struct i915_request *rq,
 {
 	const struct i915_request *active;
 
-	if (!rq->global_seqno)
+	if (!i915_request_is_active(rq))
 		return false;
 
 	active = port_request(engine->execlists.port);
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index b58915b8708b..b0795b0ad227 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -29,48 +29,149 @@
 
 #define task_asleep(tsk) ((tsk)->state & TASK_NORMAL && !(tsk)->on_rq)
 
-static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b)
+static void irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
+
+static void irq_disable(struct intel_engine_cs *engine)
 {
-	struct intel_wait *wait;
-	unsigned int result = 0;
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->i915->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->i915->irq_lock);
+}
 
+static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+{
 	lockdep_assert_held(&b->irq_lock);
 
-	wait = b->irq_wait;
-	if (wait) {
+	GEM_BUG_ON(!b->irq_enabled);
+	if (!--b->irq_enabled)
+		irq_disable(container_of(b,
+					 struct intel_engine_cs,
+					 breadcrumbs));
+
+	b->irq_armed = false;
+}
+
+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+
+	if (!b->irq_armed)
+		return;
+
+	spin_lock_irq(&b->irq_lock);
+	if (b->irq_armed)
+		__intel_breadcrumbs_disarm_irq(b);
+	spin_unlock_irq(&b->irq_lock);
+}
+
+static inline bool __request_completed(const struct i915_request *rq)
+{
+	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
+}
+
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
+{
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_context *ce, *cn;
+	struct list_head *pos, *next;
+	LIST_HEAD(signal);
+
+	spin_lock(&b->irq_lock);
+
+	b->irq_fired = true;
+	if (b->irq_armed && list_empty(&b->signalers))
+		__intel_breadcrumbs_disarm_irq(b);
+
+	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
+		GEM_BUG_ON(list_empty(&ce->signals));
+
+		list_for_each_safe(pos, next, &ce->signals) {
+			struct i915_request *rq =
+				list_entry(pos, typeof(*rq), signal_link);
+
+			if (!__request_completed(rq))
+				break;
+
+			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
+					     &rq->fence.flags));
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+
+			/*
+			 * We may race with direct invocation of
+			 * dma_fence_signal(), e.g. i915_request_retire(),
+			 * in which case we can skip processing it ourselves.
+			 */
+			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				     &rq->fence.flags))
+				continue;
+
+			/*
+			 * Queue for execution after dropping the signaling
+			 * spinlock as the callback chain may end up adding
+			 * more signalers to the same context or engine.
+			 */
+			i915_request_get(rq);
+			list_add_tail(&rq->signal_link, &signal);
+		}
+
 		/*
-		 * N.B. Since task_asleep() and ttwu are not atomic, the
-		 * waiter may actually go to sleep after the check, causing
-		 * us to suppress a valid wakeup. We prefer to reduce the
-		 * number of false positive missed_breadcrumb() warnings
-		 * at the expense of a few false negatives, as it it easy
-		 * to trigger a false positive under heavy load. Enough
-		 * signal should remain from genuine missed_breadcrumb()
-		 * for us to detect in CI.
+		 * We process the list deletion in bulk, only using a list_add
+		 * (not list_move) above but keeping the status of
+		 * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit.
 		 */
-		bool was_asleep = task_asleep(wait->tsk);
+		if (!list_is_first(pos, &ce->signals)) {
+			/* Advance the list to the first incomplete request */
+			__list_del_many(&ce->signals, pos);
+			if (&ce->signals == pos) /* now empty */
+				list_del_init(&ce->signal_link);
+		}
+	}
+
+	spin_unlock(&b->irq_lock);
+
+	list_for_each_safe(pos, next, &signal) {
+		struct i915_request *rq =
+			list_entry(pos, typeof(*rq), signal_link);
 
-		result = ENGINE_WAKEUP_WAITER;
-		if (wake_up_process(wait->tsk) && was_asleep)
-			result |= ENGINE_WAKEUP_ASLEEP;
+		dma_fence_signal(&rq->fence);
+		i915_request_put(rq);
 	}
 
-	return result;
+	return !list_empty(&signal);
 }
 
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine)
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
-	unsigned int result;
+	bool result;
 
-	spin_lock_irqsave(&b->irq_lock, flags);
-	result = __intel_breadcrumbs_wakeup(b);
-	spin_unlock_irqrestore(&b->irq_lock, flags);
+	local_irq_disable();
+	result = intel_engine_breadcrumbs_irq(engine);
+	local_irq_enable();
 
 	return result;
 }
 
+static void signal_irq_work(struct irq_work *work)
+{
+	struct intel_engine_cs *engine =
+		container_of(work, typeof(*engine), breadcrumbs.irq_work);
+
+	intel_engine_breadcrumbs_irq(engine);
+}
+
 static unsigned long wait_timeout(void)
 {
 	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
@@ -94,19 +195,15 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	struct intel_engine_cs *engine =
 		from_timer(engine, t, breadcrumbs.hangcheck);
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned int irq_count;
 
 	if (!b->irq_armed)
 		return;
 
-	irq_count = READ_ONCE(b->irq_count);
-	if (b->hangcheck_interrupts != irq_count) {
-		b->hangcheck_interrupts = irq_count;
-		mod_timer(&b->hangcheck, wait_timeout());
-		return;
-	}
+	if (b->irq_fired)
+		goto rearm;
 
-	/* We keep the hangcheck timer alive until we disarm the irq, even
+	/*
+	 * We keep the hangcheck timer alive until we disarm the irq, even
 	 * if there are no waiters at present.
 	 *
 	 * If the waiter was currently running, assume it hasn't had a chance
@@ -118,10 +215,13 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
 	 * but we still have a waiter. Assuming all batches complete within
 	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
 	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
+	synchronize_hardirq(engine->i915->drm.irq);
+	if (intel_engine_signal_breadcrumbs(engine)) {
 		missed_breadcrumb(engine);
 		mod_timer(&b->fake_irq, jiffies + 1);
 	} else {
+rearm:
+		b->irq_fired = false;
 		mod_timer(&b->hangcheck, wait_timeout());
 	}
 }
@@ -140,11 +240,7 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	 * oldest waiter to do the coherent seqno check.
 	 */
 
-	spin_lock_irq(&b->irq_lock);
-	if (b->irq_armed && !__intel_breadcrumbs_wakeup(b))
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock_irq(&b->irq_lock);
-	if (!b->irq_armed)
+	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
 		return;
 
 	/* If the user has disabled the fake-irq, restore the hangchecking */
@@ -156,43 +252,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
 	mod_timer(&b->fake_irq, jiffies + 1);
 }
 
-static void irq_enable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_enable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-static void irq_disable(struct intel_engine_cs *engine)
-{
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->i915->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->i915->irq_lock);
-}
-
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->irq_lock);
-	GEM_BUG_ON(b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-
-	GEM_BUG_ON(!b->irq_enabled);
-	if (!--b->irq_enabled)
-		irq_disable(engine);
-
-	b->irq_armed = false;
-}
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -215,40 +274,6 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
 	spin_unlock_irq(&b->irq_lock);
 }
 
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait, *n;
-
-	if (!b->irq_armed)
-		return;
-
-	/*
-	 * We only disarm the irq when we are idle (all requests completed),
-	 * so if the bottom-half remains asleep, it missed the request
-	 * completion.
-	 */
-	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
-		missed_breadcrumb(engine);
-
-	spin_lock_irq(&b->rb_lock);
-
-	spin_lock(&b->irq_lock);
-	b->irq_wait = NULL;
-	if (b->irq_armed)
-		__intel_engine_disarm_breadcrumbs(engine);
-	spin_unlock(&b->irq_lock);
-
-	rbtree_postorder_for_each_entry_safe(wait, n, &b->waiters, node) {
-		GEM_BUG_ON(!intel_engine_signaled(engine, wait->seqno));
-		RB_CLEAR_NODE(&wait->node);
-		wake_up_process(wait->tsk);
-	}
-	b->waiters = RB_ROOT;
-
-	spin_unlock_irq(&b->rb_lock);
-}
-
 static bool use_fake_irq(const struct intel_breadcrumbs *b)
 {
 	const struct intel_engine_cs *engine =
@@ -264,7 +289,7 @@ static bool use_fake_irq(const struct intel_breadcrumbs *b)
 	 * engine->seqno_barrier(), a timing error that should be transient
 	 * and unlikely to reoccur.
 	 */
-	return READ_ONCE(b->irq_count) == b->hangcheck_interrupts;
+	return !b->irq_fired;
 }
 
 static void enable_fake_irq(struct intel_breadcrumbs *b)
@@ -276,7 +301,7 @@ static void enable_fake_irq(struct intel_breadcrumbs *b)
 		mod_timer(&b->hangcheck, wait_timeout());
 }
 
-static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
+static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
@@ -315,536 +340,149 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
 	return enabled;
 }
 
-static inline struct intel_wait *to_wait(struct rb_node *node)
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 {
-	return rb_entry(node, struct intel_wait, node);
-}
+	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
-					      struct intel_wait *wait)
-{
-	lockdep_assert_held(&b->rb_lock);
-	GEM_BUG_ON(b->irq_wait == wait);
+	spin_lock_init(&b->irq_lock);
+	INIT_LIST_HEAD(&b->signalers);
 
-	/*
-	 * This request is completed, so remove it from the tree, mark it as
-	 * complete, and *then* wake up the associated task. N.B. when the
-	 * task wakes up, it will find the empty rb_node, discern that it
-	 * has already been removed from the tree and skip the serialisation
-	 * of the b->rb_lock and b->irq_lock. This means that the destruction
-	 * of the intel_wait is not serialised with the interrupt handler
-	 * by the waiter - it must instead be serialised by the caller.
-	 */
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
+	init_irq_work(&b->irq_work, signal_irq_work);
 
-	if (wait->tsk->state != TASK_RUNNING)
-		wake_up_process(wait->tsk); /* implicit smp_wmb() */
+	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
+	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
 }
 
-static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
-					    struct rb_node *next)
+static void cancel_fake_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
 
-	spin_lock(&b->irq_lock);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(!b->irq_wait);
-	b->irq_wait = to_wait(next);
-	spin_unlock(&b->irq_lock);
-
-	/* We always wake up the next waiter that takes over as the bottom-half
-	 * as we may delegate not only the irq-seqno barrier to the next waiter
-	 * but also the task of waking up concurrent waiters.
-	 */
-	if (next)
-		wake_up_process(to_wait(next)->tsk);
+	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
+	del_timer_sync(&b->hangcheck);
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
-static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
-				    struct intel_wait *wait)
+void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node **p, *parent, *completed;
-	bool first, armed;
-	u32 seqno;
+	unsigned long flags;
 
-	GEM_BUG_ON(!wait->seqno);
+	spin_lock_irqsave(&b->irq_lock, flags);
 
-	/* Insert the request into the retirement ordered list
-	 * of waiters by walking the rbtree. If we are the oldest
-	 * seqno in the tree (the first to be retired), then
-	 * set ourselves as the bottom-half.
-	 *
-	 * As we descend the tree, prune completed branches since we hold the
-	 * spinlock we know that the first_waiter must be delayed and can
-	 * reduce some of the sequential wake up latency if we take action
-	 * ourselves and wake up the completed tasks in parallel. Also, by
-	 * removing stale elements in the tree, we may be able to reduce the
-	 * ping-pong between the old bottom-half and ourselves as first-waiter.
+	/*
+	 * Leave the fake_irq timer enabled (if it is running), but clear the
+	 * bit so that it turns itself off on its next wake up and goes back
+	 * to the long hangcheck interval if still required.
 	 */
-	armed = false;
-	first = true;
-	parent = NULL;
-	completed = NULL;
-	seqno = intel_engine_get_seqno(engine);
-
-	 /* If the request completed before we managed to grab the spinlock,
-	  * return now before adding ourselves to the rbtree. We let the
-	  * current bottom-half handle any pending wakeups and instead
-	  * try and get out of the way quickly.
-	  */
-	if (i915_seqno_passed(seqno, wait->seqno)) {
-		RB_CLEAR_NODE(&wait->node);
-		return first;
-	}
-
-	p = &b->waiters.rb_node;
-	while (*p) {
-		parent = *p;
-		if (wait->seqno == to_wait(parent)->seqno) {
-			/* We have multiple waiters on the same seqno, select
-			 * the highest priority task (that with the smallest
-			 * task->prio) to serve as the bottom-half for this
-			 * group.
-			 */
-			if (wait->tsk->prio > to_wait(parent)->tsk->prio) {
-				p = &parent->rb_right;
-				first = false;
-			} else {
-				p = &parent->rb_left;
-			}
-		} else if (i915_seqno_passed(wait->seqno,
-					     to_wait(parent)->seqno)) {
-			p = &parent->rb_right;
-			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
-				completed = parent;
-			else
-				first = false;
-		} else {
-			p = &parent->rb_left;
-		}
-	}
-	rb_link_node(&wait->node, parent, p);
-	rb_insert_color(&wait->node, &b->waiters);
-
-	if (first) {
-		spin_lock(&b->irq_lock);
-		b->irq_wait = wait;
-		/* After assigning ourselves as the new bottom-half, we must
-		 * perform a cursory check to prevent a missed interrupt.
-		 * Either we miss the interrupt whilst programming the hardware,
-		 * or if there was a previous waiter (for a later seqno) they
-		 * may be woken instead of us (due to the inherent race
-		 * in the unlocked read of b->irq_seqno_bh in the irq handler)
-		 * and so we miss the wake up.
-		 */
-		armed = __intel_breadcrumbs_enable_irq(b);
-		spin_unlock(&b->irq_lock);
-	}
-
-	if (completed) {
-		/* Advance the bottom-half (b->irq_wait) before we wake up
-		 * the waiters who may scribble over their intel_wait
-		 * just as the interrupt handler is dereferencing it via
-		 * b->irq_wait.
-		 */
-		if (!first) {
-			struct rb_node *next = rb_next(completed);
-			GEM_BUG_ON(next == &wait->node);
-			__intel_breadcrumbs_next(engine, next);
-		}
-
-		do {
-			struct intel_wait *crumb = to_wait(completed);
-			completed = rb_prev(completed);
-			__intel_breadcrumbs_finish(b, crumb);
-		} while (completed);
-	}
-
-	GEM_BUG_ON(!b->irq_wait);
-	GEM_BUG_ON(!b->irq_armed);
-	GEM_BUG_ON(rb_first(&b->waiters) != &b->irq_wait->node);
-
-	return armed;
-}
-
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	bool armed;
-
-	spin_lock_irq(&b->rb_lock);
-	armed = __intel_engine_add_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
-	if (armed)
-		return armed;
-
-	/* Make the caller recheck if its request has already started. */
-	return intel_engine_has_started(engine, wait->seqno);
-}
-
-static inline bool chain_wakeup(struct rb_node *rb, int priority)
-{
-	return rb && to_wait(rb)->tsk->prio <= priority;
-}
+	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 
-static inline int wakeup_priority(struct intel_breadcrumbs *b,
-				  struct task_struct *tsk)
-{
-	if (tsk == b->signaler)
-		return INT_MIN;
+	if (b->irq_enabled)
+		irq_enable(engine);
 	else
-		return tsk->prio;
-}
-
-static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
-				       struct intel_wait *wait)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	if (RB_EMPTY_NODE(&wait->node))
-		goto out;
-
-	if (b->irq_wait == wait) {
-		const int priority = wakeup_priority(b, wait->tsk);
-		struct rb_node *next;
-
-		/* We are the current bottom-half. Find the next candidate,
-		 * the first waiter in the queue on the remaining oldest
-		 * request. As multiple seqnos may complete in the time it
-		 * takes us to wake up and find the next waiter, we have to
-		 * wake up that waiter for it to perform its own coherent
-		 * completion check.
-		 */
-		next = rb_next(&wait->node);
-		if (chain_wakeup(next, priority)) {
-			/* If the next waiter is already complete,
-			 * wake it up and continue onto the next waiter. So
-			 * if have a small herd, they will wake up in parallel
-			 * rather than sequentially, which should reduce
-			 * the overall latency in waking all the completed
-			 * clients.
-			 *
-			 * However, waking up a chain adds extra latency to
-			 * the first_waiter. This is undesirable if that
-			 * waiter is a high priority task.
-			 */
-			u32 seqno = intel_engine_get_seqno(engine);
-
-			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
-				struct rb_node *n = rb_next(next);
-
-				__intel_breadcrumbs_finish(b, to_wait(next));
-				next = n;
-				if (!chain_wakeup(next, priority))
-					break;
-			}
-		}
-
-		__intel_breadcrumbs_next(engine, next);
-	} else {
-		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
-	}
-
-	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
-	rb_erase(&wait->node, &b->waiters);
-	RB_CLEAR_NODE(&wait->node);
+		irq_disable(engine);
 
-out:
-	GEM_BUG_ON(b->irq_wait == wait);
-	GEM_BUG_ON(rb_first(&b->waiters) !=
-		   (b->irq_wait ? &b->irq_wait->node : NULL));
+	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait)
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	/* Quick check to see if this waiter was already decoupled from
-	 * the tree by the bottom-half to avoid contention on the spinlock
-	 * by the herd.
-	 */
-	if (RB_EMPTY_NODE(&wait->node)) {
-		GEM_BUG_ON(READ_ONCE(b->irq_wait) == wait);
-		return;
-	}
-
-	spin_lock_irq(&b->rb_lock);
-	__intel_engine_remove_wait(engine, wait);
-	spin_unlock_irq(&b->rb_lock);
+	cancel_fake_irq(engine);
 }
 
-static void signaler_set_rtpriority(void)
+bool i915_request_enable_breadcrumb(struct i915_request *rq)
 {
-	 struct sched_param param = { .sched_priority = 1 };
-
-	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
-}
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-static int intel_breadcrumbs_signaler(void *arg)
-{
-	struct intel_engine_cs *engine = arg;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct i915_request *rq, *n;
+	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
 
-	/* Install ourselves with high priority to reduce signalling latency */
-	signaler_set_rtpriority();
+	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
+		return true;
 
-	do {
-		bool do_schedule = true;
-		LIST_HEAD(list);
-		u32 seqno;
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
+	    !__request_completed(rq)) {
+		struct intel_context *ce = rq->hw_context;
+		struct list_head *pos;
 
-		set_current_state(TASK_INTERRUPTIBLE);
-		if (list_empty(&b->signals))
-			goto sleep;
+		__intel_breadcrumbs_arm_irq(b);
 
 		/*
-		 * We are either woken up by the interrupt bottom-half,
-		 * or by a client adding a new signaller. In both cases,
-		 * the GPU seqno may have advanced beyond our oldest signal.
-		 * If it has, propagate the signal, remove the waiter and
-		 * check again with the next oldest signal. Otherwise we
-		 * need to wait for a new interrupt from the GPU or for
-		 * a new client.
+		 * We keep the seqno in retirement order, so we can break
+		 * inside intel_engine_breadcrumbs_irq as soon as we've passed
+		 * the last completed request (or seen a request that hasn't
+		 * event started). We could iterate the timeline->requests list,
+		 * but keeping a separate signalers_list has the advantage of
+		 * hopefully being much smaller than the full list and so
+		 * provides faster iteration and detection when there are no
+		 * more interrupts required for this context.
+		 *
+		 * We typically expect to add new signalers in order, so we
+		 * start looking for our insertion point from the tail of
+		 * the list.
 		 */
-		seqno = intel_engine_get_seqno(engine);
-
-		spin_lock_irq(&b->rb_lock);
-		list_for_each_entry_safe(rq, n, &b->signals, signaling.link) {
-			u32 this = rq->signaling.wait.seqno;
+		list_for_each_prev(pos, &ce->signals) {
+			struct i915_request *it =
+				list_entry(pos, typeof(*it), signal_link);
 
-			GEM_BUG_ON(!rq->signaling.wait.seqno);
-
-			if (!i915_seqno_passed(seqno, this))
+			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
 				break;
-
-			if (likely(this == i915_request_global_seqno(rq))) {
-				__intel_engine_remove_wait(engine,
-							   &rq->signaling.wait);
-
-				rq->signaling.wait.seqno = 0;
-				__list_del_entry(&rq->signaling.link);
-
-				if (!i915_request_signaled(rq)) {
-					list_add_tail(&rq->signaling.link,
-						      &list);
-					i915_request_get(rq);
-				}
-			}
 		}
-		spin_unlock_irq(&b->rb_lock);
-
-		if (!list_empty(&list)) {
-			local_bh_disable();
-			list_for_each_entry_safe(rq, n, &list, signaling.link) {
-				dma_fence_signal(&rq->fence);
-				GEM_BUG_ON(!i915_request_completed(rq));
-				i915_request_put(rq);
-			}
-			local_bh_enable(); /* kick start the tasklets */
-
-			/*
-			 * If the engine is saturated we may be continually
-			 * processing completed requests. This angers the
-			 * NMI watchdog if we never let anything else
-			 * have access to the CPU. Let's pretend to be nice
-			 * and relinquish the CPU if we burn through the
-			 * entire RT timeslice!
-			 */
-			do_schedule = need_resched();
-		}
-
-		if (unlikely(do_schedule)) {
-sleep:
-			if (kthread_should_park())
-				kthread_parkme();
-
-			if (unlikely(kthread_should_stop()))
-				break;
-
-			schedule();
-		}
-	} while (1);
-	__set_current_state(TASK_RUNNING);
-
-	return 0;
-}
+		list_add(&rq->signal_link, pos);
+		if (pos == &ce->signals) /* catch transitions from empty list */
+			list_move_tail(&ce->signal_link, &b->signalers);
 
-static void insert_signal(struct intel_breadcrumbs *b,
-			  struct i915_request *request,
-			  const u32 seqno)
-{
-	struct i915_request *iter;
-
-	lockdep_assert_held(&b->rb_lock);
-
-	/*
-	 * A reasonable assumption is that we are called to add signals
-	 * in sequence, as the requests are submitted for execution and
-	 * assigned a global_seqno. This will be the case for the majority
-	 * of internally generated signals (inter-engine signaling).
-	 *
-	 * Out of order waiters triggering random signaling enabling will
-	 * be more problematic, but hopefully rare enough and the list
-	 * small enough that the O(N) insertion sort is not an issue.
-	 */
-
-	list_for_each_entry_reverse(iter, &b->signals, signaling.link)
-		if (i915_seqno_passed(seqno, iter->signaling.wait.seqno))
-			break;
-
-	list_add(&request->signaling.link, &iter->signaling.link);
-}
-
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup)
-{
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct intel_wait *wait = &request->signaling.wait;
-	u32 seqno;
-
-	/*
-	 * Note that we may be called from an interrupt handler on another
-	 * device (e.g. nouveau signaling a fence completion causing us
-	 * to submit a request, and so enable signaling). As such,
-	 * we need to make sure that all other users of b->rb_lock protect
-	 * against interrupts, i.e. use spin_lock_irqsave.
-	 */
-
-	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
-
-	seqno = i915_request_global_seqno(request);
-	if (!seqno) /* will be enabled later upon execution */
-		return true;
-
-	GEM_BUG_ON(wait->seqno);
-	wait->tsk = b->signaler;
-	wait->request = request;
-	wait->seqno = seqno;
-
-	/*
-	 * Add ourselves into the list of waiters, but registering our
-	 * bottom-half as the signaller thread. As per usual, only the oldest
-	 * waiter (not just signaller) is tasked as the bottom-half waking
-	 * up all completed waiters after the user interrupt.
-	 *
-	 * If we are the oldest waiter, enable the irq (after which we
-	 * must double check that the seqno did not complete).
-	 */
-	spin_lock(&b->rb_lock);
-	insert_signal(b, request, seqno);
-	wakeup &= __intel_engine_add_wait(engine, wait);
-	spin_unlock(&b->rb_lock);
-
-	if (wakeup) {
-		wake_up_process(b->signaler);
-		return !intel_wait_complete(wait);
+		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 	}
+	spin_unlock(&b->irq_lock);
 
-	return true;
+	return !__request_completed(rq);
 }
 
-void intel_engine_cancel_signaling(struct i915_request *request)
+void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = request->engine;
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
 
-	GEM_BUG_ON(!irqs_disabled());
-	lockdep_assert_held(&request->lock);
-
-	if (!READ_ONCE(request->signaling.wait.seqno))
+	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
-	spin_lock(&b->rb_lock);
-	__intel_engine_remove_wait(engine, &request->signaling.wait);
-	if (fetch_and_zero(&request->signaling.wait.seqno))
-		__list_del_entry(&request->signaling.link);
-	spin_unlock(&b->rb_lock);
-}
-
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct task_struct *tsk;
-
-	spin_lock_init(&b->rb_lock);
-	spin_lock_init(&b->irq_lock);
-
-	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
-	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
-
-	INIT_LIST_HEAD(&b->signals);
-
-	/* Spawn a thread to provide a common bottom-half for all signals.
-	 * As this is an asynchronous interface we cannot steal the current
-	 * task for handling the bottom-half to the user interrupt, therefore
-	 * we create a thread to do the coherent seqno dance after the
-	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
-	 */
-	tsk = kthread_run(intel_breadcrumbs_signaler, engine,
-			  "i915/signal:%d", engine->id);
-	if (IS_ERR(tsk))
-		return PTR_ERR(tsk);
-
-	b->signaler = tsk;
-
-	return 0;
-}
+	spin_lock(&b->irq_lock);
+	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
+		struct intel_context *ce = rq->hw_context;
 
-static void cancel_fake_irq(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+		list_del(&rq->signal_link);
+		if (list_empty(&ce->signals))
+			list_del_init(&ce->signal_link);
 
-	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
-	del_timer_sync(&b->hangcheck);
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+	}
+	spin_unlock(&b->irq_lock);
 }
 
-void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	unsigned long flags;
+	struct intel_context *ce;
+	struct i915_request *rq;
 
-	spin_lock_irqsave(&b->irq_lock, flags);
-
-	/*
-	 * Leave the fake_irq timer enabled (if it is running), but clear the
-	 * bit so that it turns itself off on its next wake up and goes back
-	 * to the long hangcheck interval if still required.
-	 */
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-
-	if (b->irq_enabled)
-		irq_enable(engine);
-	else
-		irq_disable(engine);
-
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-}
-
-void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	if (list_empty(&b->signalers))
+		return;
 
-	/* The engines should be idle and all requests accounted for! */
-	WARN_ON(READ_ONCE(b->irq_wait));
-	WARN_ON(!RB_EMPTY_ROOT(&b->waiters));
-	WARN_ON(!list_empty(&b->signals));
+	drm_printf(p, "Signals:\n");
 
-	if (!IS_ERR_OR_NULL(b->signaler))
-		kthread_stop(b->signaler);
+	spin_lock_irq(&b->irq_lock);
+	list_for_each_entry(ce, &b->signalers, signal_link) {
+		list_for_each_entry(rq, &ce->signals, signal_link) {
+			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
+				   rq->fence.context, rq->fence.seqno,
+				   i915_request_completed(rq) ? "!" :
+				   i915_request_started(rq) ? "*" :
+				   "",
+				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
+		}
+	}
+	spin_unlock_irq(&b->irq_lock);
 
-	cancel_fake_irq(engine);
+	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
+		drm_printf(p, "Fake irq active\n");
 }
-
-#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-#include "selftests/intel_breadcrumbs.c"
-#endif
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index d25e510b7b08..ee43c98227bb 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -458,12 +458,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
 {
 	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-
-	/* After manually advancing the seqno, fake the interrupt in case
-	 * there are any waiters for that seqno.
-	 */
-	intel_engine_wakeup(engine);
-
 	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
 }
 
@@ -607,6 +601,7 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
 
 	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
 
+	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlist(engine);
 	intel_engine_init_hangcheck(engine);
 	intel_engine_init_batch_pool(engine);
@@ -717,20 +712,14 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 		}
 	}
 
-	ret = intel_engine_init_breadcrumbs(engine);
-	if (ret)
-		goto err_unpin_preempt;
-
 	ret = measure_breadcrumb_dw(engine);
 	if (ret < 0)
-		goto err_breadcrumbs;
+		goto err_unpin_preempt;
 
 	engine->emit_fini_breadcrumb_dw = ret;
 
 	return 0;
 
-err_breadcrumbs:
-	intel_engine_fini_breadcrumbs(engine);
 err_unpin_preempt:
 	if (i915->preempt_context)
 		__intel_context_unpin(i915->preempt_context, engine);
@@ -1294,12 +1283,14 @@ static void print_request(struct drm_printer *m,
 
 	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
 
-	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
+	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
 		   prefix,
 		   rq->global_seqno,
 		   i915_request_completed(rq) ? "!" :
 		   i915_request_started(rq) ? "*" :
 		   "",
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &rq->fence.flags) ?  "+" : "",
 		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
@@ -1492,12 +1483,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
 {
-	struct intel_breadcrumbs * const b = &engine->breadcrumbs;
 	struct i915_gpu_error * const error = &engine->i915->gpu_error;
 	struct i915_request *rq;
 	intel_wakeref_t wakeref;
-	unsigned long flags;
-	struct rb_node *rb;
 
 	if (header) {
 		va_list ap;
@@ -1565,21 +1553,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 
 	intel_execlists_show_requests(engine, m, print_request, 8);
 
-	spin_lock_irqsave(&b->rb_lock, flags);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
-
-		drm_printf(m, "\t%s [%d:%c] waiting for %x\n",
-			   w->tsk->comm, w->tsk->pid,
-			   task_state_to_char(w->tsk),
-			   w->seqno);
-	}
-	spin_unlock_irqrestore(&b->rb_lock, flags);
-
 	drm_printf(m, "HWSP:\n");
 	hexdump(m, engine->status_page.addr, PAGE_SIZE);
 
 	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
+
+	intel_engine_print_breadcrumbs(engine, m);
 }
 
 static u8 user_class_map[] = {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 668ed67336a2..b889b27f8aeb 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -743,7 +743,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	}
 
 	/* Papering over lost _interrupts_ immediately following the restart */
-	intel_engine_wakeup(engine);
+	intel_engine_queue_breadcrumbs(engine);
 out:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index ae753c45cbe7..efef2aa1abce 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -5,6 +5,7 @@
 #include <drm/drm_util.h>
 
 #include <linux/hashtable.h>
+#include <linux/irq_work.h>
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
@@ -381,22 +382,19 @@ struct intel_engine_cs {
 	 * the overhead of waking that client is much preferred.
 	 */
 	struct intel_breadcrumbs {
-		spinlock_t irq_lock; /* protects irq_*; irqsafe */
-		struct intel_wait *irq_wait; /* oldest waiter by retirement */
+		spinlock_t irq_lock;
+		struct list_head signalers;
 
-		spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
-		struct rb_root waiters; /* sorted by retirement, priority */
-		struct list_head signals; /* sorted by retirement */
-		struct task_struct *signaler; /* used for fence signalling */
+		struct irq_work irq_work; /* for use from inside irq_lock */
 
 		struct timer_list fake_irq; /* used after a missed interrupt */
 		struct timer_list hangcheck; /* detect missed interrupts */
 
 		unsigned int hangcheck_interrupts;
 		unsigned int irq_enabled;
-		unsigned int irq_count;
 
-		bool irq_armed : 1;
+		bool irq_armed;
+		bool irq_fired;
 	} breadcrumbs;
 
 	struct {
@@ -885,83 +883,29 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
-/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
-int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
-
-static inline void intel_wait_init(struct intel_wait *wait)
-{
-	wait->tsk = current;
-	wait->request = NULL;
-}
-
-static inline void intel_wait_init_for_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->tsk = current;
-	wait->seqno = seqno;
-}
-
-static inline bool intel_wait_has_seqno(const struct intel_wait *wait)
-{
-	return wait->seqno;
-}
-
-static inline bool
-intel_wait_update_seqno(struct intel_wait *wait, u32 seqno)
-{
-	wait->seqno = seqno;
-	return intel_wait_has_seqno(wait);
-}
-
-static inline bool
-intel_wait_update_request(struct intel_wait *wait,
-			  const struct i915_request *rq)
-{
-	return intel_wait_update_seqno(wait, i915_request_global_seqno(rq));
-}
-
-static inline bool
-intel_wait_check_seqno(const struct intel_wait *wait, u32 seqno)
-{
-	return wait->seqno == seqno;
-}
-
-static inline bool
-intel_wait_check_request(const struct intel_wait *wait,
-			 const struct i915_request *rq)
-{
-	return intel_wait_check_seqno(wait, i915_request_global_seqno(rq));
-}
+void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
+void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
-static inline bool intel_wait_complete(const struct intel_wait *wait)
-{
-	return RB_EMPTY_NODE(&wait->node);
-}
+void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
+void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
 
-bool intel_engine_add_wait(struct intel_engine_cs *engine,
-			   struct intel_wait *wait);
-void intel_engine_remove_wait(struct intel_engine_cs *engine,
-			      struct intel_wait *wait);
-bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup);
-void intel_engine_cancel_signaling(struct i915_request *request);
+bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
+void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
 
-static inline bool intel_engine_has_waiter(const struct intel_engine_cs *engine)
+static inline void
+intel_engine_queue_breadcrumbs(struct intel_engine_cs *engine)
 {
-	return READ_ONCE(engine->breadcrumbs.irq_wait);
+	irq_work_queue(&engine->breadcrumbs.irq_work);
 }
 
-unsigned int intel_engine_wakeup(struct intel_engine_cs *engine);
-#define ENGINE_WAKEUP_WAITER BIT(0)
-#define ENGINE_WAKEUP_ASLEEP BIT(1)
-
-void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
-void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
-
-void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
-void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
+bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine);
 
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
 
+void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
+				    struct drm_printer *p);
+
 static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
 {
 	memset(batch, 0, 6 * sizeof(u32));
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 4a83a1c6c406..88e5ab586337 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -15,7 +15,6 @@ selftest(scatterlist, scatterlist_mock_selftests)
 selftest(syncmap, i915_syncmap_mock_selftests)
 selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
-selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
 selftest(timelines, i915_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 4d4b86b5fa11..6733dc5b6b4c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -25,9 +25,12 @@
 #include <linux/prime_numbers.h>
 
 #include "../i915_selftest.h"
+#include "i915_random.h"
 #include "igt_live_test.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
+#include "mock_drm.h"
 #include "mock_gem_device.h"
 
 static int igt_add_request(void *arg)
@@ -247,6 +250,254 @@ static int igt_request_rewind(void *arg)
 	return err;
 }
 
+struct smoketest {
+	struct intel_engine_cs *engine;
+	struct i915_gem_context **contexts;
+	atomic_long_t num_waits, num_fences;
+	int ncontexts, max_batch;
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,
+					      struct intel_engine_cs *);
+};
+
+static struct i915_request *
+__mock_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return mock_request(engine, ctx, 0);
+}
+
+static struct i915_request *
+__live_request_alloc(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	return i915_request_alloc(engine, ctx);
+}
+
+static int __igt_breadcrumbs_smoketest(void *arg)
+{
+	struct smoketest *t = arg;
+	struct mutex * const BKL = &t->engine->i915->drm.struct_mutex;
+	const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
+	const unsigned int total = 4 * t->ncontexts + 1;
+	unsigned int num_waits = 0, num_fences = 0;
+	struct i915_request **requests;
+	I915_RND_STATE(prng);
+	unsigned int *order;
+	int err = 0;
+
+	/*
+	 * A very simple test to catch the most egregious of list handling bugs.
+	 *
+	 * At its heart, we simply create oodles of requests running across
+	 * multiple kthreads and enable signaling on them, for the sole purpose
+	 * of stressing our breadcrumb handling. The only inspection we do is
+	 * that the fences were marked as signaled.
+	 */
+
+	requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
+	if (!requests)
+		return -ENOMEM;
+
+	order = i915_random_order(total, &prng);
+	if (!order) {
+		err = -ENOMEM;
+		goto out_requests;
+	}
+
+	while (!kthread_should_stop()) {
+		struct i915_sw_fence *submit, *wait;
+		unsigned int n, count;
+
+		submit = heap_fence_create(GFP_KERNEL);
+		if (!submit) {
+			err = -ENOMEM;
+			break;
+		}
+
+		wait = heap_fence_create(GFP_KERNEL);
+		if (!wait) {
+			i915_sw_fence_commit(submit);
+			heap_fence_put(submit);
+			err = ENOMEM;
+			break;
+		}
+
+		i915_random_reorder(order, total, &prng);
+		count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
+
+		for (n = 0; n < count; n++) {
+			struct i915_gem_context *ctx =
+				t->contexts[order[n] % t->ncontexts];
+			struct i915_request *rq;
+
+			mutex_lock(BKL);
+
+			rq = t->request_alloc(ctx, t->engine);
+			if (IS_ERR(rq)) {
+				mutex_unlock(BKL);
+				err = PTR_ERR(rq);
+				count = n;
+				break;
+			}
+
+			err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+							       submit,
+							       GFP_KERNEL);
+
+			requests[n] = i915_request_get(rq);
+			i915_request_add(rq);
+
+			mutex_unlock(BKL);
+
+			if (err >= 0)
+				err = i915_sw_fence_await_dma_fence(wait,
+								    &rq->fence,
+								    0,
+								    GFP_KERNEL);
+
+			if (err < 0) {
+				i915_request_put(rq);
+				count = n;
+				break;
+			}
+		}
+
+		i915_sw_fence_commit(submit);
+		i915_sw_fence_commit(wait);
+
+		if (!wait_event_timeout(wait->wait,
+					i915_sw_fence_done(wait),
+					HZ / 2)) {
+			struct i915_request *rq = requests[count - 1];
+
+			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
+			       count,
+			       rq->fence.context, rq->fence.seqno,
+			       t->engine->name);
+			i915_gem_set_wedged(t->engine->i915);
+			GEM_BUG_ON(!i915_request_completed(rq));
+			i915_sw_fence_wait(wait);
+			err = -EIO;
+		}
+
+		for (n = 0; n < count; n++) {
+			struct i915_request *rq = requests[n];
+
+			if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+				      &rq->fence.flags)) {
+				pr_err("%llu:%llu was not signaled!\n",
+				       rq->fence.context, rq->fence.seqno);
+				err = -EINVAL;
+			}
+
+			i915_request_put(rq);
+		}
+
+		heap_fence_put(wait);
+		heap_fence_put(submit);
+
+		if (err < 0)
+			break;
+
+		num_fences += count;
+		num_waits++;
+
+		cond_resched();
+	}
+
+	atomic_long_add(num_fences, &t->num_fences);
+	atomic_long_add(num_waits, &t->num_waits);
+
+	kfree(order);
+out_requests:
+	kfree(requests);
+	return err;
+}
+
+static int mock_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t = {
+		.engine = i915->engine[RCS],
+		.ncontexts = 1024,
+		.max_batch = 1024,
+		.request_alloc = __mock_request_alloc
+	};
+	unsigned int ncpus = num_online_cpus();
+	struct task_struct **threads;
+	unsigned int n;
+	int ret = 0;
+
+	/*
+	 * Smoketest our breadcrumb/signal handling for requests across multiple
+	 * threads. A very simple test to only catch the most egregious of bugs.
+	 * See __igt_breadcrumbs_smoketest();
+	 */
+
+	threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
+	if (!threads)
+		return -ENOMEM;
+
+	t.contexts =
+		kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
+	if (!t.contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+	for (n = 0; n < t.ncontexts; n++) {
+		t.contexts[n] = mock_context(t.engine->i915, "mock");
+		if (!t.contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+
+	for (n = 0; n < ncpus; n++) {
+		threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
+					 &t, "igt/%d", n);
+		if (IS_ERR(threads[n])) {
+			ret = PTR_ERR(threads[n]);
+			ncpus = n;
+			break;
+		}
+
+		get_task_struct(threads[n]);
+	}
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+	for (n = 0; n < ncpus; n++) {
+		int err;
+
+		err = kthread_stop(threads[n]);
+		if (err < 0 && !ret)
+			ret = err;
+
+		put_task_struct(threads[n]);
+	}
+	pr_info("Completed %lu waits for %lu fence across %d cpus\n",
+		atomic_long_read(&t.num_waits),
+		atomic_long_read(&t.num_fences),
+		ncpus);
+
+	mutex_lock(&t.engine->i915->drm.struct_mutex);
+out_contexts:
+	for (n = 0; n < t.ncontexts; n++) {
+		if (!t.contexts[n])
+			break;
+		mock_context_close(t.contexts[n]);
+	}
+	mutex_unlock(&t.engine->i915->drm.struct_mutex);
+	kfree(t.contexts);
+out_threads:
+	kfree(threads);
+
+	return ret;
+}
+
 int i915_request_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
@@ -254,6 +505,7 @@ int i915_request_mock_selftests(void)
 		SUBTEST(igt_wait_request),
 		SUBTEST(igt_fence_wait),
 		SUBTEST(igt_request_rewind),
+		SUBTEST(mock_breadcrumbs_smoketest),
 	};
 	struct drm_i915_private *i915;
 	intel_wakeref_t wakeref;
@@ -812,6 +1064,178 @@ static int live_sequential_engines(void *arg)
 	return err;
 }
 
+static int
+max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+	int ret;
+
+	/*
+	 * Before execlists, all contexts share the same ringbuffer. With
+	 * execlists, each context/engine has a separate ringbuffer and
+	 * for the purposes of this test, inexhaustible.
+	 *
+	 * For the global ringbuffer though, we have to be very careful
+	 * that we do not wrap while preventing the execution of requests
+	 * with a unsignaled fence.
+	 */
+	if (HAS_EXECLISTS(ctx->i915))
+		return INT_MAX;
+
+	rq = i915_request_alloc(engine, ctx);
+	if (IS_ERR(rq)) {
+		ret = PTR_ERR(rq);
+	} else {
+		int sz;
+
+		ret = rq->ring->size - rq->reserved_space;
+		i915_request_add(rq);
+
+		sz = rq->ring->emit - rq->head;
+		if (sz < 0)
+			sz += rq->ring->size;
+		ret /= sz;
+		ret /= 2; /* leave half spare, in case of emergency! */
+	}
+
+	return ret;
+}
+
+static int live_breadcrumbs_smoketest(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct smoketest t[I915_NUM_ENGINES];
+	unsigned int ncpus = num_online_cpus();
+	unsigned long num_waits, num_fences;
+	struct intel_engine_cs *engine;
+	struct task_struct **threads;
+	struct igt_live_test live;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	struct drm_file *file;
+	unsigned int n;
+	int ret = 0;
+
+	/*
+	 * Smoketest our breadcrumb/signal handling for requests across multiple
+	 * threads. A very simple test to only catch the most egregious of bugs.
+	 * See __igt_breadcrumbs_smoketest();
+	 *
+	 * On real hardware this time.
+	 */
+
+	wakeref = intel_runtime_pm_get(i915);
+
+	file = mock_file(i915);
+	if (IS_ERR(file)) {
+		ret = PTR_ERR(file);
+		goto out_rpm;
+	}
+
+	threads = kcalloc(ncpus * I915_NUM_ENGINES,
+			  sizeof(*threads),
+			  GFP_KERNEL);
+	if (!threads) {
+		ret = -ENOMEM;
+		goto out_file;
+	}
+
+	memset(&t[0], 0, sizeof(t[0]));
+	t[0].request_alloc = __live_request_alloc;
+	t[0].ncontexts = 64;
+	t[0].contexts = kmalloc_array(t[0].ncontexts,
+				      sizeof(*t[0].contexts),
+				      GFP_KERNEL);
+	if (!t[0].contexts) {
+		ret = -ENOMEM;
+		goto out_threads;
+	}
+
+	mutex_lock(&i915->drm.struct_mutex);
+	for (n = 0; n < t[0].ncontexts; n++) {
+		t[0].contexts[n] = live_context(i915, file);
+		if (!t[0].contexts[n]) {
+			ret = -ENOMEM;
+			goto out_contexts;
+		}
+	}
+
+	ret = igt_live_test_begin(&live, i915, __func__, "");
+	if (ret)
+		goto out_contexts;
+
+	for_each_engine(engine, i915, id) {
+		t[id] = t[0];
+		t[id].engine = engine;
+		t[id].max_batch = max_batches(t[0].contexts[0], engine);
+		if (t[id].max_batch < 0) {
+			ret = t[id].max_batch;
+			mutex_unlock(&i915->drm.struct_mutex);
+			goto out_flush;
+		}
+		/* One ring interleaved between requests from all cpus */
+		t[id].max_batch /= num_online_cpus() + 1;
+		pr_debug("Limiting batches to %d requests on %s\n",
+			 t[id].max_batch, engine->name);
+
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk;
+
+			tsk = kthread_run(__igt_breadcrumbs_smoketest,
+					  &t[id], "igt/%d.%d", id, n);
+			if (IS_ERR(tsk)) {
+				ret = PTR_ERR(tsk);
+				mutex_unlock(&i915->drm.struct_mutex);
+				goto out_flush;
+			}
+
+			get_task_struct(tsk);
+			threads[id * ncpus + n] = tsk;
+		}
+	}
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
+
+out_flush:
+	num_waits = 0;
+	num_fences = 0;
+	for_each_engine(engine, i915, id) {
+		for (n = 0; n < ncpus; n++) {
+			struct task_struct *tsk = threads[id * ncpus + n];
+			int err;
+
+			if (!tsk)
+				continue;
+
+			err = kthread_stop(tsk);
+			if (err < 0 && !ret)
+				ret = err;
+
+			put_task_struct(tsk);
+		}
+
+		num_waits += atomic_long_read(&t[id].num_waits);
+		num_fences += atomic_long_read(&t[id].num_fences);
+	}
+	pr_info("Completed %lu waits for %lu fences across %d engines and %d cpus\n",
+		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
+
+	mutex_lock(&i915->drm.struct_mutex);
+	ret = igt_live_test_end(&live) ?: ret;
+out_contexts:
+	mutex_unlock(&i915->drm.struct_mutex);
+	kfree(t[0].contexts);
+out_threads:
+	kfree(threads);
+out_file:
+	mock_file_free(i915, file);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+
+	return ret;
+}
+
 int i915_request_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -819,6 +1243,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_all_engines),
 		SUBTEST(live_sequential_engines),
 		SUBTEST(live_empty_request),
+		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
 	if (i915_terminally_wedged(&i915->gpu_error))
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 0e70df0230b8..9ebd9225684e 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -185,11 +185,6 @@ void igt_spinner_fini(struct igt_spinner *spin)
 
 bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
 {
-	if (!wait_event_timeout(rq->execute,
-				READ_ONCE(rq->global_seqno),
-				msecs_to_jiffies(10)))
-		return false;
-
 	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
 					       rq->fence.seqno),
 			     10) &&
diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
deleted file mode 100644
index f03b407fdbe2..000000000000
--- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
+++ /dev/null
@@ -1,470 +0,0 @@
-/*
- * Copyright © 2016 Intel Corporation
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice (including the next
- * paragraph) shall be included in all copies or substantial portions of the
- * Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
- * IN THE SOFTWARE.
- *
- */
-
-#include "../i915_selftest.h"
-#include "i915_random.h"
-
-#include "mock_gem_device.h"
-#include "mock_engine.h"
-
-static int check_rbtree(struct intel_engine_cs *engine,
-			const unsigned long *bitmap,
-			const struct intel_wait *waiters,
-			const int count)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-	struct rb_node *rb;
-	int n;
-
-	if (&b->irq_wait->node != rb_first(&b->waiters)) {
-		pr_err("First waiter does not match first element of wait-tree\n");
-		return -EINVAL;
-	}
-
-	n = find_first_bit(bitmap, count);
-	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
-		struct intel_wait *w = container_of(rb, typeof(*w), node);
-		int idx = w - waiters;
-
-		if (!test_bit(idx, bitmap)) {
-			pr_err("waiter[%d, seqno=%d] removed but still in wait-tree\n",
-			       idx, w->seqno);
-			return -EINVAL;
-		}
-
-		if (n != idx) {
-			pr_err("waiter[%d, seqno=%d] does not match expected next element in tree [%d]\n",
-			       idx, w->seqno, n);
-			return -EINVAL;
-		}
-
-		n = find_next_bit(bitmap, count, n + 1);
-	}
-
-	return 0;
-}
-
-static int check_completion(struct intel_engine_cs *engine,
-			    const unsigned long *bitmap,
-			    const struct intel_wait *waiters,
-			    const int count)
-{
-	int n;
-
-	for (n = 0; n < count; n++) {
-		if (intel_wait_complete(&waiters[n]) != !!test_bit(n, bitmap))
-			continue;
-
-		pr_err("waiter[%d, seqno=%d] is %s, but expected %s\n",
-		       n, waiters[n].seqno,
-		       intel_wait_complete(&waiters[n]) ? "complete" : "active",
-		       test_bit(n, bitmap) ? "active" : "complete");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int check_rbtree_empty(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	if (b->irq_wait) {
-		pr_err("Empty breadcrumbs still has a waiter\n");
-		return -EINVAL;
-	}
-
-	if (!RB_EMPTY_ROOT(&b->waiters)) {
-		pr_err("Empty breadcrumbs, but wait-tree not empty\n");
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
-static int igt_random_insert_remove(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned int *order;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	order = i915_random_order(count, &prng);
-	if (!order)
-		goto out_bitmap;
-
-	for (n = 0; n < count; n++)
-		intel_wait_init_for_seqno(&waiters[n], seqno_bias + n);
-
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_order;
-
-	/* Add and remove waiters into the rbtree in random order. At each
-	 * step, we verify that the rbtree is correctly ordered.
-	 */
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_add_wait(engine, &waiters[i]);
-		__set_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	i915_random_reorder(order, count, &prng);
-	for (n = 0; n < count; n++) {
-		int i = order[n];
-
-		intel_engine_remove_wait(engine, &waiters[i]);
-		__clear_bit(i, bitmap);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err)
-			goto out_order;
-	}
-
-	err = check_rbtree_empty(engine);
-out_order:
-	kfree(order);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-static int igt_insert_complete(void *arg)
-{
-	const u32 seqno_bias = 0x1000;
-	struct intel_engine_cs *engine = arg;
-	struct intel_wait *waiters;
-	const int count = 4096;
-	unsigned long *bitmap;
-	int err = -ENOMEM;
-	int n, m;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
-			 GFP_KERNEL);
-	if (!bitmap)
-		goto out_waiters;
-
-	for (n = 0; n < count; n++) {
-		intel_wait_init_for_seqno(&waiters[n], n + seqno_bias);
-		intel_engine_add_wait(engine, &waiters[n]);
-		__set_bit(n, bitmap);
-	}
-	err = check_rbtree(engine, bitmap, waiters, count);
-	if (err)
-		goto out_bitmap;
-
-	/* On each step, we advance the seqno so that several waiters are then
-	 * complete (we increase the seqno by increasingly larger values to
-	 * retire more and more waiters at once). All retired waiters should
-	 * be woken and removed from the rbtree, and so that we check.
-	 */
-	for (n = 0; n < count; n = m) {
-		int seqno = 2 * n;
-
-		GEM_BUG_ON(find_first_bit(bitmap, count) != n);
-
-		if (intel_wait_complete(&waiters[n])) {
-			pr_err("waiter[%d, seqno=%d] completed too early\n",
-			       n, waiters[n].seqno);
-			err = -EINVAL;
-			goto out_bitmap;
-		}
-
-		/* complete the following waiters */
-		mock_seqno_advance(engine, seqno + seqno_bias);
-		for (m = n; m <= seqno; m++) {
-			if (m == count)
-				break;
-
-			GEM_BUG_ON(!test_bit(m, bitmap));
-			__clear_bit(m, bitmap);
-		}
-
-		intel_engine_remove_wait(engine, &waiters[n]);
-		RB_CLEAR_NODE(&waiters[n].node);
-
-		err = check_rbtree(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("rbtree corrupt after seqno advance to %d\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-
-		err = check_completion(engine, bitmap, waiters, count);
-		if (err) {
-			pr_err("completions after seqno advance to %d failed\n",
-			       seqno + seqno_bias);
-			goto out_bitmap;
-		}
-	}
-
-	err = check_rbtree_empty(engine);
-out_bitmap:
-	kfree(bitmap);
-out_waiters:
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-struct igt_wakeup {
-	struct task_struct *tsk;
-	atomic_t *ready, *set, *done;
-	struct intel_engine_cs *engine;
-	unsigned long flags;
-#define STOP 0
-#define IDLE 1
-	wait_queue_head_t *wq;
-	u32 seqno;
-};
-
-static bool wait_for_ready(struct igt_wakeup *w)
-{
-	DEFINE_WAIT(ready);
-
-	set_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->done))
-		wake_up_var(w->done);
-
-	if (test_bit(STOP, &w->flags))
-		goto out;
-
-	for (;;) {
-		prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
-		if (atomic_read(w->ready) == 0)
-			break;
-
-		schedule();
-	}
-	finish_wait(w->wq, &ready);
-
-out:
-	clear_bit(IDLE, &w->flags);
-	if (atomic_dec_and_test(w->set))
-		wake_up_var(w->set);
-
-	return !test_bit(STOP, &w->flags);
-}
-
-static int igt_wakeup_thread(void *arg)
-{
-	struct igt_wakeup *w = arg;
-	struct intel_wait wait;
-
-	while (wait_for_ready(w)) {
-		GEM_BUG_ON(kthread_should_stop());
-
-		intel_wait_init_for_seqno(&wait, w->seqno);
-		intel_engine_add_wait(w->engine, &wait);
-		for (;;) {
-			set_current_state(TASK_UNINTERRUPTIBLE);
-			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
-					      w->seqno))
-				break;
-
-			if (test_bit(STOP, &w->flags)) /* emergency escape */
-				break;
-
-			schedule();
-		}
-		intel_engine_remove_wait(w->engine, &wait);
-		__set_current_state(TASK_RUNNING);
-	}
-
-	return 0;
-}
-
-static void igt_wake_all_sync(atomic_t *ready,
-			      atomic_t *set,
-			      atomic_t *done,
-			      wait_queue_head_t *wq,
-			      int count)
-{
-	atomic_set(set, count);
-	atomic_set(ready, 0);
-	wake_up_all(wq);
-
-	wait_var_event(set, !atomic_read(set));
-	atomic_set(ready, count);
-	atomic_set(done, count);
-}
-
-static int igt_wakeup(void *arg)
-{
-	I915_RND_STATE(prng);
-	struct intel_engine_cs *engine = arg;
-	struct igt_wakeup *waiters;
-	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
-	const int count = 4096;
-	const u32 max_seqno = count / 4;
-	atomic_t ready, set, done;
-	int err = -ENOMEM;
-	int n, step;
-
-	mock_engine_reset(engine);
-
-	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
-	if (!waiters)
-		goto out_engines;
-
-	/* Create a large number of threads, each waiting on a random seqno.
-	 * Multiple waiters will be waiting for the same seqno.
-	 */
-	atomic_set(&ready, count);
-	for (n = 0; n < count; n++) {
-		waiters[n].wq = &wq;
-		waiters[n].ready = &ready;
-		waiters[n].set = &set;
-		waiters[n].done = &done;
-		waiters[n].engine = engine;
-		waiters[n].flags = BIT(IDLE);
-
-		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
-					     "i915/igt:%d", n);
-		if (IS_ERR(waiters[n].tsk))
-			goto out_waiters;
-
-		get_task_struct(waiters[n].tsk);
-	}
-
-	for (step = 1; step <= max_seqno; step <<= 1) {
-		u32 seqno;
-
-		/* The waiter threads start paused as we assign them a random
-		 * seqno and reset the engine. Once the engine is reset,
-		 * we signal that the threads may begin their wait upon their
-		 * seqno.
-		 */
-		for (n = 0; n < count; n++) {
-			GEM_BUG_ON(!test_bit(IDLE, &waiters[n].flags));
-			waiters[n].seqno =
-				1 + prandom_u32_state(&prng) % max_seqno;
-		}
-		mock_seqno_advance(engine, 0);
-		igt_wake_all_sync(&ready, &set, &done, &wq, count);
-
-		/* Simulate the GPU doing chunks of work, with one or more
-		 * seqno appearing to finish at the same time. A random number
-		 * of threads will be waiting upon the update and hopefully be
-		 * woken.
-		 */
-		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
-			usleep_range(50, 500);
-			mock_seqno_advance(engine, seqno);
-		}
-		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
-
-		/* With the seqno now beyond any of the waiting threads, they
-		 * should all be woken, see that they are complete and signal
-		 * that they are ready for the next test. We wait until all
-		 * threads are complete and waiting for us (i.e. not a seqno).
-		 */
-		if (!wait_var_event_timeout(&done,
-					    !atomic_read(&done), 10 * HZ)) {
-			pr_err("Timed out waiting for %d remaining waiters\n",
-			       atomic_read(&done));
-			err = -ETIMEDOUT;
-			break;
-		}
-
-		err = check_rbtree_empty(engine);
-		if (err)
-			break;
-	}
-
-out_waiters:
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		set_bit(STOP, &waiters[n].flags);
-	}
-	mock_seqno_advance(engine, INT_MAX); /* wakeup any broken waiters */
-	igt_wake_all_sync(&ready, &set, &done, &wq, n);
-
-	for (n = 0; n < count; n++) {
-		if (IS_ERR(waiters[n].tsk))
-			break;
-
-		kthread_stop(waiters[n].tsk);
-		put_task_struct(waiters[n].tsk);
-	}
-
-	kvfree(waiters);
-out_engines:
-	mock_engine_flush(engine);
-	return err;
-}
-
-int intel_breadcrumbs_mock_selftests(void)
-{
-	static const struct i915_subtest tests[] = {
-		SUBTEST(igt_random_insert_remove),
-		SUBTEST(igt_insert_complete),
-		SUBTEST(igt_wakeup),
-	};
-	struct drm_i915_private *i915;
-	int err;
-
-	i915 = mock_gem_device();
-	if (!i915)
-		return -ENOMEM;
-
-	err = i915_subtests(tests, i915->engine[RCS]);
-	drm_dev_put(&i915->drm);
-
-	return err;
-}
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 2c38ea5892d9..7b6f3bea9ef8 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1127,7 +1127,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 	wait_for_completion(&arg.completion);
 
-	if (wait_for(waitqueue_active(&rq->execute), 10)) {
+	if (wait_for(!list_empty(&rq->fence.cb_list), 10)) {
 		struct drm_printer p = drm_info_printer(i915->drm.dev);
 
 		pr_err("igt/evict_vma kthread did not wait\n");
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
index b26f07b55d86..2bfa72c1654b 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
@@ -76,3 +76,57 @@ void timed_fence_fini(struct timed_fence *tf)
 	destroy_timer_on_stack(&tf->timer);
 	i915_sw_fence_fini(&tf->fence);
 }
+
+struct heap_fence {
+	struct i915_sw_fence fence;
+	union {
+		struct kref ref;
+		struct rcu_head rcu;
+	};
+};
+
+static int __i915_sw_fence_call
+heap_fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		break;
+
+	case FENCE_FREE:
+		heap_fence_put(&h->fence);
+	}
+
+	return NOTIFY_DONE;
+}
+
+struct i915_sw_fence *heap_fence_create(gfp_t gfp)
+{
+	struct heap_fence *h;
+
+	h = kmalloc(sizeof(*h), gfp);
+	if (!h)
+		return NULL;
+
+	i915_sw_fence_init(&h->fence, heap_fence_notify);
+	refcount_set(&h->ref.refcount, 2);
+
+	return &h->fence;
+}
+
+static void heap_fence_release(struct kref *ref)
+{
+	struct heap_fence *h = container_of(ref, typeof(*h), ref);
+
+	i915_sw_fence_fini(&h->fence);
+
+	kfree_rcu(h, rcu);
+}
+
+void heap_fence_put(struct i915_sw_fence *fence)
+{
+	struct heap_fence *h = container_of(fence, typeof(*h), fence);
+
+	kref_put(&h->ref, heap_fence_release);
+}
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
index 474aafb92ae1..1f9927e10f3a 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
@@ -39,4 +39,7 @@ struct timed_fence {
 void timed_fence_init(struct timed_fence *tf, unsigned long expires);
 void timed_fence_fini(struct timed_fence *tf);
 
+struct i915_sw_fence *heap_fence_create(gfp_t gfp);
+void heap_fence_put(struct i915_sw_fence *fence);
+
 #endif /* _LIB_SW_FENCE_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 3b226ebc6bc4..08f0cab02e0f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -86,17 +86,21 @@ static struct mock_request *first_request(struct mock_engine *engine)
 static void advance(struct mock_request *request)
 {
 	list_del_init(&request->link);
-	mock_seqno_advance(request->base.engine, request->base.global_seqno);
+	intel_engine_write_global_seqno(request->base.engine,
+					request->base.global_seqno);
 	i915_request_mark_complete(&request->base);
 	GEM_BUG_ON(!i915_request_completed(&request->base));
+
+	intel_engine_queue_breadcrumbs(request->base.engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
 	struct mock_request *request;
+	unsigned long flags;
 
-	spin_lock(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 
 	/* Timer fired, first request is complete */
 	request = first_request(engine);
@@ -116,7 +120,7 @@ static void hw_delay_complete(struct timer_list *t)
 		advance(request);
 	}
 
-	spin_unlock(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 static void mock_context_unpin(struct intel_context *ce)
@@ -191,11 +195,12 @@ static void mock_submit_request(struct i915_request *request)
 	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
+	unsigned long flags;
 
 	i915_request_submit(request);
 	GEM_BUG_ON(!request->global_seqno);
 
-	spin_lock_irq(&engine->hw_lock);
+	spin_lock_irqsave(&engine->hw_lock, flags);
 	list_add_tail(&mock->link, &engine->hw_queue);
 	if (mock->link.prev == &engine->hw_queue) {
 		if (mock->delay)
@@ -203,7 +208,7 @@ static void mock_submit_request(struct i915_request *request)
 		else
 			advance(mock);
 	}
-	spin_unlock_irq(&engine->hw_lock);
+	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
@@ -273,7 +278,7 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 
 void mock_engine_reset(struct intel_engine_cs *engine)
 {
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, 0);
+	intel_engine_write_global_seqno(engine, 0);
 }
 
 void mock_engine_free(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.h b/drivers/gpu/drm/i915/selftests/mock_engine.h
index 133d0c21790d..b9cc3a245f16 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.h
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.h
@@ -46,10 +46,4 @@ void mock_engine_flush(struct intel_engine_cs *engine);
 void mock_engine_reset(struct intel_engine_cs *engine);
 void mock_engine_free(struct intel_engine_cs *engine);
 
-static inline void mock_seqno_advance(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-	intel_engine_wakeup(engine);
-}
-
 #endif /* !__MOCK_ENGINE_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 21/28] drm/i915: Drop fake breadcrumb irq
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (18 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 22/28] drm/i915: Generalise GPU activity tracking Chris Wilson
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Missed breadcrumb detection is defunct due to the tight coupling with
dma_fence signaling and the myriad ways we may signal fences from
everywhere but from an interrupt, i.e. we frequently signal a fence
before we even see its interrupt. This means that even if we miss an
interrupt for a fence, it still is signaled before our breadcrumb
hangcheck fires, so simplify the breadcrumb hangchecking by moving it
into the GPU hangcheck and forgo fake interrupts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  93 -----------
 drivers/gpu/drm/i915/i915_gpu_error.c         |   2 -
 drivers/gpu/drm/i915/i915_gpu_error.h         |   5 -
 drivers/gpu/drm/i915/intel_breadcrumbs.c      | 147 +-----------------
 drivers/gpu/drm/i915/intel_hangcheck.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h       |   5 -
 .../gpu/drm/i915/selftests/igt_live_test.c    |   7 -
 7 files changed, 5 insertions(+), 256 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b1ac0f78cb42..fa2c226fc779 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1322,9 +1322,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			   intel_engine_last_submit(engine),
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
-		seq_printf(m, "\tfake irq active? %s\n",
-			   yesno(test_bit(engine->id,
-					  &dev_priv->gpu_error.missed_irq_rings)));
 
 		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
 			   (long long)engine->hangcheck.acthd,
@@ -3900,94 +3897,6 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops,
 			i915_wedged_get, i915_wedged_set,
 			"%llu\n");
 
-static int
-fault_irq_set(struct drm_i915_private *i915,
-	      unsigned long *irq,
-	      unsigned long val)
-{
-	int err;
-
-	err = mutex_lock_interruptible(&i915->drm.struct_mutex);
-	if (err)
-		return err;
-
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED |
-				     I915_WAIT_INTERRUPTIBLE,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err)
-		goto err_unlock;
-
-	*irq = val;
-	mutex_unlock(&i915->drm.struct_mutex);
-
-	/* Flush idle worker to disarm irq */
-	drain_delayed_work(&i915->gt.idle_work);
-
-	return 0;
-
-err_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
-	return err;
-}
-
-static int
-i915_ring_missed_irq_get(void *data, u64 *val)
-{
-	struct drm_i915_private *dev_priv = data;
-
-	*val = dev_priv->gpu_error.missed_irq_rings;
-	return 0;
-}
-
-static int
-i915_ring_missed_irq_set(void *data, u64 val)
-{
-	struct drm_i915_private *i915 = data;
-
-	return fault_irq_set(i915, &i915->gpu_error.missed_irq_rings, val);
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(i915_ring_missed_irq_fops,
-			i915_ring_missed_irq_get, i915_ring_missed_irq_set,
-			"0x%08llx\n");
-
-static int
-i915_ring_test_irq_get(void *data, u64 *val)
-{
-	struct drm_i915_private *dev_priv = data;
-
-	*val = dev_priv->gpu_error.test_irq_rings;
-
-	return 0;
-}
-
-static int
-i915_ring_test_irq_set(void *data, u64 val)
-{
-	struct drm_i915_private *i915 = data;
-
-	/* GuC keeps the user interrupt permanently enabled for submission */
-	if (USES_GUC_SUBMISSION(i915))
-		return -ENODEV;
-
-	/*
-	 * From icl, we can no longer individually mask interrupt generation
-	 * from each engine.
-	 */
-	if (INTEL_GEN(i915) >= 11)
-		return -ENODEV;
-
-	val &= INTEL_INFO(i915)->ring_mask;
-	DRM_DEBUG_DRIVER("Masking interrupts on rings 0x%08llx\n", val);
-
-	return fault_irq_set(i915, &i915->gpu_error.test_irq_rings, val);
-}
-
-DEFINE_SIMPLE_ATTRIBUTE(i915_ring_test_irq_fops,
-			i915_ring_test_irq_get, i915_ring_test_irq_set,
-			"0x%08llx\n");
-
 #define DROP_UNBOUND	BIT(0)
 #define DROP_BOUND	BIT(1)
 #define DROP_RETIRE	BIT(2)
@@ -4751,8 +4660,6 @@ static const struct i915_debugfs_files {
 } i915_debugfs_files[] = {
 	{"i915_wedged", &i915_wedged_fops},
 	{"i915_cache_sharing", &i915_cache_sharing_fops},
-	{"i915_ring_missed_irq", &i915_ring_missed_irq_fops},
-	{"i915_ring_test_irq", &i915_ring_test_irq_fops},
 	{"i915_gem_drop_caches", &i915_drop_caches_fops},
 #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
 	{"i915_error_state", &i915_error_state_fops},
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 304a7ef7f7fb..6e2e5ed2bd0a 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -723,8 +723,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 	err_printf(m, "FORCEWAKE: 0x%08x\n", error->forcewake);
 	err_printf(m, "DERRMR: 0x%08x\n", error->derrmr);
 	err_printf(m, "CCID: 0x%08x\n", error->ccid);
-	err_printf(m, "Missed interrupts: 0x%08lx\n",
-		   m->i915->gpu_error.missed_irq_rings);
 
 	for (i = 0; i < error->nfence; i++)
 		err_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 74757c424aab..53b1f22dd365 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -204,8 +204,6 @@ struct i915_gpu_error {
 
 	atomic_t pending_fb_pin;
 
-	unsigned long missed_irq_rings;
-
 	/**
 	 * State variable controlling the reset flow and count
 	 *
@@ -274,9 +272,6 @@ struct i915_gpu_error {
 	 */
 	wait_queue_head_t reset_queue;
 
-	/* For missed irq/seqno simulation. */
-	unsigned long test_irq_rings;
-
 	struct i915_gpu_restart *restart;
 };
 
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index b0795b0ad227..cacaa1d04d17 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -91,7 +91,6 @@ bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
 
 	spin_lock(&b->irq_lock);
 
-	b->irq_fired = true;
 	if (b->irq_armed && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
@@ -172,86 +171,6 @@ static void signal_irq_work(struct irq_work *work)
 	intel_engine_breadcrumbs_irq(engine);
 }
 
-static unsigned long wait_timeout(void)
-{
-	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
-}
-
-static noinline void missed_breadcrumb(struct intel_engine_cs *engine)
-{
-	if (GEM_SHOW_DEBUG()) {
-		struct drm_printer p = drm_debug_printer(__func__);
-
-		intel_engine_dump(engine, &p,
-				  "%s missed breadcrumb at %pS\n",
-				  engine->name, __builtin_return_address(0));
-	}
-
-	set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-}
-
-static void intel_breadcrumbs_hangcheck(struct timer_list *t)
-{
-	struct intel_engine_cs *engine =
-		from_timer(engine, t, breadcrumbs.hangcheck);
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	if (!b->irq_armed)
-		return;
-
-	if (b->irq_fired)
-		goto rearm;
-
-	/*
-	 * We keep the hangcheck timer alive until we disarm the irq, even
-	 * if there are no waiters at present.
-	 *
-	 * If the waiter was currently running, assume it hasn't had a chance
-	 * to process the pending interrupt (e.g, low priority task on a loaded
-	 * system) and wait until it sleeps before declaring a missed interrupt.
-	 *
-	 * If the waiter was asleep (and not even pending a wakeup), then we
-	 * must have missed an interrupt as the GPU has stopped advancing
-	 * but we still have a waiter. Assuming all batches complete within
-	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
-	 */
-	synchronize_hardirq(engine->i915->drm.irq);
-	if (intel_engine_signal_breadcrumbs(engine)) {
-		missed_breadcrumb(engine);
-		mod_timer(&b->fake_irq, jiffies + 1);
-	} else {
-rearm:
-		b->irq_fired = false;
-		mod_timer(&b->hangcheck, wait_timeout());
-	}
-}
-
-static void intel_breadcrumbs_fake_irq(struct timer_list *t)
-{
-	struct intel_engine_cs *engine =
-		from_timer(engine, t, breadcrumbs.fake_irq);
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	/*
-	 * The timer persists in case we cannot enable interrupts,
-	 * or if we have previously seen seqno/interrupt incoherency
-	 * ("missed interrupt" syndrome, better known as a "missed breadcrumb").
-	 * Here the worker will wake up every jiffie in order to kick the
-	 * oldest waiter to do the coherent seqno check.
-	 */
-
-	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
-		return;
-
-	/* If the user has disabled the fake-irq, restore the hangchecking */
-	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings)) {
-		mod_timer(&b->hangcheck, wait_timeout());
-		return;
-	}
-
-	mod_timer(&b->fake_irq, jiffies + 1);
-}
-
 void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
@@ -274,43 +193,14 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
 	spin_unlock_irq(&b->irq_lock);
 }
 
-static bool use_fake_irq(const struct intel_breadcrumbs *b)
-{
-	const struct intel_engine_cs *engine =
-		container_of(b, struct intel_engine_cs, breadcrumbs);
-
-	if (!test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
-		return false;
-
-	/*
-	 * Only start with the heavy weight fake irq timer if we have not
-	 * seen any interrupts since enabling it the first time. If the
-	 * interrupts are still arriving, it means we made a mistake in our
-	 * engine->seqno_barrier(), a timing error that should be transient
-	 * and unlikely to reoccur.
-	 */
-	return !b->irq_fired;
-}
-
-static void enable_fake_irq(struct intel_breadcrumbs *b)
-{
-	/* Ensure we never sleep indefinitely */
-	if (!b->irq_enabled || use_fake_irq(b))
-		mod_timer(&b->fake_irq, jiffies + 1);
-	else
-		mod_timer(&b->hangcheck, wait_timeout());
-}
-
-static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
+static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
 	struct intel_engine_cs *engine =
 		container_of(b, struct intel_engine_cs, breadcrumbs);
-	struct drm_i915_private *i915 = engine->i915;
-	bool enabled;
 
 	lockdep_assert_held(&b->irq_lock);
 	if (b->irq_armed)
-		return false;
+		return;
 
 	/*
 	 * The breadcrumb irq will be disarmed on the interrupt after the
@@ -328,16 +218,8 @@ static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	 * the driver is idle) we disarm the breadcrumbs.
 	 */
 
-	/* No interrupts? Kick the waiter every jiffie! */
-	enabled = false;
-	if (!b->irq_enabled++ &&
-	    !test_bit(engine->id, &i915->gpu_error.test_irq_rings)) {
+	if (!b->irq_enabled++)
 		irq_enable(engine);
-		enabled = true;
-	}
-
-	enable_fake_irq(b);
-	return enabled;
 }
 
 void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
@@ -348,18 +230,6 @@ void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
 	INIT_LIST_HEAD(&b->signalers);
 
 	init_irq_work(&b->irq_work, signal_irq_work);
-
-	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
-	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
-}
-
-static void cancel_fake_irq(struct intel_engine_cs *engine)
-{
-	struct intel_breadcrumbs *b = &engine->breadcrumbs;
-
-	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
-	del_timer_sync(&b->hangcheck);
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
 }
 
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
@@ -369,13 +239,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 
 	spin_lock_irqsave(&b->irq_lock, flags);
 
-	/*
-	 * Leave the fake_irq timer enabled (if it is running), but clear the
-	 * bit so that it turns itself off on its next wake up and goes back
-	 * to the long hangcheck interval if still required.
-	 */
-	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
-
 	if (b->irq_enabled)
 		irq_enable(engine);
 	else
@@ -386,7 +249,6 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
 {
-	cancel_fake_irq(engine);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
@@ -482,7 +344,4 @@ void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
 		}
 	}
 	spin_unlock_irq(&b->irq_lock);
-
-	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
-		drm_printf(p, "Fake irq active\n");
 }
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 5662d6fed523..a219c796e56d 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -275,6 +275,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	for_each_engine(engine, dev_priv, id) {
 		struct hangcheck hc;
 
+		intel_engine_signal_breadcrumbs(engine);
+
 		hangcheck_load_sample(engine, &hc);
 		hangcheck_accumulate_sample(engine, &hc);
 		hangcheck_store_sample(engine, &hc);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index efef2aa1abce..8bbdf9fba196 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -387,14 +387,9 @@ struct intel_engine_cs {
 
 		struct irq_work irq_work; /* for use from inside irq_lock */
 
-		struct timer_list fake_irq; /* used after a missed interrupt */
-		struct timer_list hangcheck; /* detect missed interrupts */
-
-		unsigned int hangcheck_interrupts;
 		unsigned int irq_enabled;
 
 		bool irq_armed;
-		bool irq_fired;
 	} breadcrumbs;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index 5deb485fb942..3e902761cd16 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -35,7 +35,6 @@ int igt_live_test_begin(struct igt_live_test *t,
 		return err;
 	}
 
-	i915->gpu_error.missed_irq_rings = 0;
 	t->reset_global = i915_reset_count(&i915->gpu_error);
 
 	for_each_engine(engine, i915, id)
@@ -75,11 +74,5 @@ int igt_live_test_end(struct igt_live_test *t)
 		return -EIO;
 	}
 
-	if (i915->gpu_error.missed_irq_rings) {
-		pr_err("%s(%s): Missed interrupts on engines %lx\n",
-		       t->func, t->name, i915->gpu_error.missed_irq_rings);
-		return -EIO;
-	}
-
 	return 0;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 22/28] drm/i915: Generalise GPU activity tracking
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (19 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 21/28] drm/i915: Drop fake breadcrumb irq Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  8:09   ` [PATCH] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
  2019-01-28  1:02 ` [PATCH 23/28] " Chris Wilson
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

We currently track GPU memory usage inside VMA, such that we never
release memory used by the GPU until after it has finished accessing it.
However, we may want to track other resources aside from VMA, or we may
want to split a VMA into multiple independent regions and track each
separately. For this purpose, generalise our request tracking (akin to
struct reservation_object) so that we can embed it into other objects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   4 +-
 drivers/gpu/drm/i915/i915_active.c            | 226 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_active.h            |  76 ++++++
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   3 +-
 drivers/gpu/drm/i915/i915_vma.c               | 173 +++-----------
 drivers/gpu/drm/i915/i915_vma.h               |   9 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  | 158 ++++++++++++
 .../drm/i915/selftests/i915_live_selftests.h  |   3 +-
 8 files changed, 498 insertions(+), 154 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_active.c
 create mode 100644 drivers/gpu/drm/i915/i915_active.h
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_active.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 210d0e8777b6..1787e1299b1b 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -57,7 +57,9 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o
 i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 
 # GEM code
-i915-y += i915_cmd_parser.o \
+i915-y += \
+	  i915_active.o \
+	  i915_cmd_parser.o \
 	  i915_gem_batch_pool.o \
 	  i915_gem_clflush.o \
 	  i915_gem_context.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
new file mode 100644
index 000000000000..e0182e19cb8b
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -0,0 +1,226 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_active.h"
+
+#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
+
+struct active_node {
+	struct i915_gem_active base;
+	struct i915_active *ref;
+	struct rb_node node;
+	u64 timeline;
+};
+
+static void
+__active_retire(struct i915_active *ref)
+{
+	GEM_BUG_ON(!ref->count);
+	if (!--ref->count)
+		ref->retire(ref);
+}
+
+static void
+node_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct active_node, base)->ref);
+}
+
+static void
+last_retire(struct i915_gem_active *base, struct i915_request *rq)
+{
+	__active_retire(container_of(base, struct i915_active, last));
+}
+
+static struct i915_gem_active *
+active_instance(struct i915_active *ref, u64 idx)
+{
+	struct active_node *node;
+	struct rb_node **p, *parent;
+	struct i915_request *old;
+
+	/*
+	 * We track the most recently used timeline to skip a rbtree search
+	 * for the common case, under typical loads we never need the rbtree
+	 * at all. We can reuse the last slot if it is empty, that is
+	 * after the previous activity has been retired, or if it matches the
+	 * current timeline.
+	 *
+	 * Note that we allow the timeline to be active simultaneously in
+	 * the rbtree and the last cache. We do this to avoid having
+	 * to search and replace the rbtree element for a new timeline, with
+	 * the cost being that we must be aware that the ref may be retired
+	 * twice for the same timeline (as the older rbtree element will be
+	 * retired before the new request added to last).
+	 */
+	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	if (!old || old->fence.context == idx)
+		goto out;
+
+	/* Move the currently active fence into the rbtree */
+	idx = old->fence.context;
+
+	parent = NULL;
+	p = &ref->tree.rb_node;
+	while (*p) {
+		parent = *p;
+
+		node = rb_entry(parent, struct active_node, node);
+		if (node->timeline == idx)
+			goto replace;
+
+		if (node->timeline < idx)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+
+	/* kmalloc may retire the ref->last (thanks shrinker)! */
+	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+		kfree(node);
+		goto out;
+	}
+
+	if (unlikely(!node))
+		return ERR_PTR(-ENOMEM);
+
+	init_request_active(&node->base, node_retire);
+	node->ref = ref;
+	node->timeline = idx;
+
+	rb_link_node(&node->node, parent, p);
+	rb_insert_color(&node->node, &ref->tree);
+
+replace:
+	/*
+	 * Overwrite the previous active slot in the rbtree with last,
+	 * leaving last zeroed. If the previous slot is still active,
+	 * we must be careful as we now only expect to receive one retire
+	 * callback not two, and so much undo the active counting for the
+	 * overwritten slot.
+	 */
+	if (i915_gem_active_isset(&node->base)) {
+		/* Retire ourselves from the old rq->active_list */
+		__list_del_entry(&node->base.link);
+		ref->count--;
+		GEM_BUG_ON(!ref->count);
+	}
+	GEM_BUG_ON(list_empty(&ref->last.link));
+	list_replace_init(&ref->last.link, &node->base.link);
+	node->base.request = fetch_and_zero(&ref->last.request);
+
+out:
+	return &ref->last;
+}
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref))
+{
+	ref->i915 = i915;
+	ref->retire = retire;
+	ref->tree = RB_ROOT;
+	init_request_active(&ref->last, last_retire);
+	ref->count = 0;
+}
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq)
+{
+	struct i915_gem_active *active;
+
+	active = active_instance(ref, timeline);
+	if (IS_ERR(active))
+		return PTR_ERR(active);
+
+	if (!i915_gem_active_isset(active))
+		ref->count++;
+	i915_gem_active_set(active, rq);
+
+	return 0;
+}
+
+bool i915_active_acquire(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	return !ref->count++;
+}
+
+void i915_active_release(struct i915_active *ref)
+{
+	lockdep_assert_held(BKL(ref));
+	__active_retire(ref);
+}
+
+int i915_active_wait(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret;
+
+	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	if (ret)
+		return ret;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		if (ret)
+			return ret;
+
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+
+	return 0;
+}
+
+static int __i915_request_await_active(struct i915_request *rq,
+				       struct i915_gem_active *active)
+{
+	struct i915_request *barrier =
+		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+
+	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
+}
+
+int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
+{
+	struct active_node *it, *n;
+	int ret;
+
+	ret = __i915_request_await_active(rq, &ref->last);
+	if (ret)
+		return ret;
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		ret = __i915_request_await_active(rq, &it->base);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+void i915_active_fini(struct i915_active *ref)
+{
+	struct active_node *it, *n;
+
+	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+
+	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
+		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		kfree(it);
+	}
+	ref->tree = RB_ROOT;
+}
+
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_active.c"
+#endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
new file mode 100644
index 000000000000..0057b84565f7
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -0,0 +1,76 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_ACTIVE_H_
+#define _I915_ACTIVE_H_
+
+#include <linux/rbtree.h>
+
+#include "i915_request.h"
+
+struct drm_i915_private;
+
+/*
+ * GPU activity tracking
+ *
+ * Each set of commands submitted to the GPU compromises a single request that
+ * signals a fence upon completion. struct i915_request combines the
+ * command submission, scheduling and fence signaling roles. If we want to see
+ * if a particular task is complete, we need to grab the fence (struct
+ * i915_request) for that task and check or wait for it to be signaled. More
+ * often though we want to track the status of a bunch of tasks, for example
+ * to wait for the GPU to finish accessing some memory across a variety of
+ * different command pipelines from different clients. We could choose to
+ * track every single request associated with the task, but knowing that
+ * each request belongs to an ordered timeline (later requests within a
+ * timeline must wait for earlier requests), we need only track the
+ * latest request in each timeline to determine the overall status of the
+ * task.
+ *
+ * struct i915_active provides this tracking across timelines. It builds a
+ * composite shared-fence, and is updated as new work is submitted to the task,
+ * forming a snapshot of the current status. It should be embedded into the
+ * different resources that need to track their associated GPU activity to
+ * provide a callback when that GPU activity has ceased, or otherwise to
+ * provide a serialisation point either for request submission or for CPU
+ * synchronisation.
+ */
+
+struct i915_active {
+	struct drm_i915_private *i915;
+
+	struct rb_root tree;
+	struct i915_gem_active last;
+	unsigned int count;
+
+	void (*retire)(struct i915_active *ref);
+};
+
+void i915_active_init(struct drm_i915_private *i915,
+		      struct i915_active *ref,
+		      void (*retire)(struct i915_active *ref));
+
+int i915_active_ref(struct i915_active *ref,
+		    u64 timeline,
+		    struct i915_request *rq);
+
+int i915_active_wait(struct i915_active *ref);
+
+int i915_request_await_active(struct i915_request *rq,
+			      struct i915_active *ref);
+
+bool i915_active_acquire(struct i915_active *ref);
+void i915_active_release(struct i915_active *ref);
+
+static inline bool
+i915_active_is_idle(const struct i915_active *ref)
+{
+	return !ref->count;
+}
+
+void i915_active_fini(struct i915_active *ref);
+
+#endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 49b00996a15e..e625659c03a2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1917,14 +1917,13 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
+	i915_active_init(i915, &vma->active, NULL);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
 	vma->private = ppgtt;
 
-	vma->active = RB_ROOT;
-
 	vma->size = size;
 	vma->fence_size = size;
 	vma->flags = I915_VMA_GGTT;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d83b8ad5f859..d4772061e642 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -63,22 +63,23 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason)
 
 #endif
 
-struct i915_vma_active {
-	struct i915_gem_active base;
-	struct i915_vma *vma;
-	struct rb_node node;
-	u64 timeline;
-};
+static void obj_bump_mru(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
 
-static void
-__i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
+	spin_lock(&i915->mm.obj_lock);
+	if (obj->bind_count)
+		list_move_tail(&obj->mm.link, &i915->mm.bound_list);
+	spin_unlock(&i915->mm.obj_lock);
+
+	obj->mm.dirty = true; /* be paranoid  */
+}
+
+static void __i915_vma_retire(struct i915_active *ref)
 {
+	struct i915_vma *vma = container_of(ref, typeof(*vma), active);
 	struct drm_i915_gem_object *obj = vma->obj;
 
-	GEM_BUG_ON(!i915_vma_is_active(vma));
-	if (--vma->active_count)
-		return;
-
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
 		return;
@@ -90,16 +91,12 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 		reservation_object_unlock(obj->resv);
 	}
 
-	/* Bump our place on the bound list to keep it roughly in LRU order
+	/*
+	 * Bump our place on the bound list to keep it roughly in LRU order
 	 * so that we don't steal from recently used but inactive objects
 	 * (unless we are forced to ofc!)
 	 */
-	spin_lock(&rq->i915->mm.obj_lock);
-	if (obj->bind_count)
-		list_move_tail(&obj->mm.link, &rq->i915->mm.bound_list);
-	spin_unlock(&rq->i915->mm.obj_lock);
-
-	obj->mm.dirty = true; /* be paranoid  */
+	obj_bump_mru(obj);
 
 	if (i915_gem_object_has_active_reference(obj)) {
 		i915_gem_object_clear_active_reference(obj);
@@ -107,21 +104,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
 	}
 }
 
-static void
-i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	struct i915_vma_active *active =
-		container_of(base, typeof(*active), base);
-
-	__i915_vma_retire(active->vma, rq);
-}
-
-static void
-i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
-{
-	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
-}
-
 static struct i915_vma *
 vma_create(struct drm_i915_gem_object *obj,
 	   struct i915_address_space *vm,
@@ -137,10 +119,9 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	vma->active = RB_ROOT;
-
-	init_request_active(&vma->last_active, i915_vma_last_retire);
+	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
+
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
 	vma->obj = obj;
@@ -823,7 +804,6 @@ void i915_vma_reopen(struct i915_vma *vma)
 static void __i915_vma_destroy(struct i915_vma *vma)
 {
 	struct drm_i915_private *i915 = vma->vm->i915;
-	struct i915_vma_active *iter, *n;
 
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
@@ -843,10 +823,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 		spin_unlock(&obj->vma.lock);
 	}
 
-	rbtree_postorder_for_each_entry_safe(iter, n, &vma->active, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&iter->base));
-		kfree(iter);
-	}
+	i915_active_fini(&vma->active);
 
 	kmem_cache_free(i915->vmas, vma);
 }
@@ -931,104 +908,15 @@ static void export_fence(struct i915_vma *vma,
 	reservation_object_unlock(resv);
 }
 
-static struct i915_gem_active *active_instance(struct i915_vma *vma, u64 idx)
-{
-	struct i915_vma_active *active;
-	struct rb_node **p, *parent;
-	struct i915_request *old;
-
-	/*
-	 * We track the most recently used timeline to skip a rbtree search
-	 * for the common case, under typical loads we never need the rbtree
-	 * at all. We can reuse the last_active slot if it is empty, that is
-	 * after the previous activity has been retired, or if the active
-	 * matches the current timeline.
-	 *
-	 * Note that we allow the timeline to be active simultaneously in
-	 * the rbtree and the last_active cache. We do this to avoid having
-	 * to search and replace the rbtree element for a new timeline, with
-	 * the cost being that we must be aware that the vma may be retired
-	 * twice for the same timeline (as the older rbtree element will be
-	 * retired before the new request added to last_active).
-	 */
-	old = i915_gem_active_raw(&vma->last_active,
-				  &vma->vm->i915->drm.struct_mutex);
-	if (!old || old->fence.context == idx)
-		goto out;
-
-	/* Move the currently active fence into the rbtree */
-	idx = old->fence.context;
-
-	parent = NULL;
-	p = &vma->active.rb_node;
-	while (*p) {
-		parent = *p;
-
-		active = rb_entry(parent, struct i915_vma_active, node);
-		if (active->timeline == idx)
-			goto replace;
-
-		if (active->timeline < idx)
-			p = &parent->rb_right;
-		else
-			p = &parent->rb_left;
-	}
-
-	active = kmalloc(sizeof(*active), GFP_KERNEL);
-
-	/* kmalloc may retire the vma->last_active request (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&vma->last_active,
-					  &vma->vm->i915->drm.struct_mutex))) {
-		kfree(active);
-		goto out;
-	}
-
-	if (unlikely(!active))
-		return ERR_PTR(-ENOMEM);
-
-	init_request_active(&active->base, i915_vma_retire);
-	active->vma = vma;
-	active->timeline = idx;
-
-	rb_link_node(&active->node, parent, p);
-	rb_insert_color(&active->node, &vma->active);
-
-replace:
-	/*
-	 * Overwrite the previous active slot in the rbtree with last_active,
-	 * leaving last_active zeroed. If the previous slot is still active,
-	 * we must be careful as we now only expect to receive one retire
-	 * callback not two, and so much undo the active counting for the
-	 * overwritten slot.
-	 */
-	if (i915_gem_active_isset(&active->base)) {
-		/* Retire ourselves from the old rq->active_list */
-		__list_del_entry(&active->base.link);
-		vma->active_count--;
-		GEM_BUG_ON(!vma->active_count);
-	}
-	GEM_BUG_ON(list_empty(&vma->last_active.link));
-	list_replace_init(&vma->last_active.link, &active->base.link);
-	active->base.request = fetch_and_zero(&vma->last_active.request);
-
-out:
-	return &vma->last_active;
-}
-
 int i915_vma_move_to_active(struct i915_vma *vma,
 			    struct i915_request *rq,
 			    unsigned int flags)
 {
 	struct drm_i915_gem_object *obj = vma->obj;
-	struct i915_gem_active *active;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 
-	active = active_instance(vma, rq->fence.context);
-	if (IS_ERR(active))
-		return PTR_ERR(active);
-
 	/*
 	 * Add a reference if we're newly entering the active list.
 	 * The order in which we add operations to the retirement queue is
@@ -1037,9 +925,15 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 	 * add the active reference first and queue for it to be dropped
 	 * *last*.
 	 */
-	if (!i915_gem_active_isset(active) && !vma->active_count++)
+	if (!vma->active.count)
 		obj->active_count++;
-	i915_gem_active_set(active, rq);
+
+	if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) {
+		if (!vma->active.count)
+			obj->active_count--;
+		return -ENOMEM;
+	}
+
 	GEM_BUG_ON(!i915_vma_is_active(vma));
 	GEM_BUG_ON(!obj->active_count);
 
@@ -1073,8 +967,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 	 */
 	might_sleep();
 	if (i915_vma_is_active(vma)) {
-		struct i915_vma_active *active, *n;
-
 		/*
 		 * When a closed VMA is retired, it is unbound - eek.
 		 * In order to prevent it from being recursively closed,
@@ -1090,19 +982,10 @@ int i915_vma_unbind(struct i915_vma *vma)
 		 */
 		__i915_vma_pin(vma);
 
-		ret = i915_gem_active_retire(&vma->last_active,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_wait(&vma->active);
 		if (ret)
 			goto unpin;
 
-		rbtree_postorder_for_each_entry_safe(active, n,
-						     &vma->active, node) {
-			ret = i915_gem_active_retire(&active->base,
-						     &vma->vm->i915->drm.struct_mutex);
-			if (ret)
-				goto unpin;
-		}
-
 		ret = i915_gem_active_retire(&vma->last_fence,
 					     &vma->vm->i915->drm.struct_mutex);
 unpin:
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5793abe509a2..3c03d4569481 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -34,6 +34,7 @@
 #include "i915_gem_fence_reg.h"
 #include "i915_gem_object.h"
 
+#include "i915_active.h"
 #include "i915_request.h"
 
 enum i915_cache_level;
@@ -108,9 +109,7 @@ struct i915_vma {
 #define I915_VMA_USERFAULT	BIT(I915_VMA_USERFAULT_BIT)
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
-	unsigned int active_count;
-	struct rb_root active;
-	struct i915_gem_active last_active;
+	struct i915_active active;
 	struct i915_gem_active last_fence;
 
 	/**
@@ -154,9 +153,9 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 void i915_vma_unpin_and_release(struct i915_vma **p_vma, unsigned int flags);
 #define I915_VMA_RELEASE_MAP BIT(0)
 
-static inline bool i915_vma_is_active(struct i915_vma *vma)
+static inline bool i915_vma_is_active(const struct i915_vma *vma)
 {
-	return vma->active_count;
+	return !i915_active_is_idle(&vma->active);
 }
 
 int __must_check i915_vma_move_to_active(struct i915_vma *vma,
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
new file mode 100644
index 000000000000..7c5c3068565b
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -0,0 +1,158 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include "../i915_selftest.h"
+
+#include "igt_flush_test.h"
+#include "lib_sw_fence.h"
+
+struct live_active {
+	struct i915_active base;
+	bool retired;
+};
+
+static void __live_active_retire(struct i915_active *base)
+{
+	struct live_active *active = container_of(base, typeof(*active), base);
+
+	active->retired = true;
+}
+
+static int __live_active_setup(struct drm_i915_private *i915,
+			       struct live_active *active)
+{
+	struct intel_engine_cs *engine;
+	struct i915_sw_fence *submit;
+	enum intel_engine_id id;
+	unsigned int count = 0;
+	int err = 0;
+
+	i915_active_init(i915, &active->base, __live_active_retire);
+	active->retired = false;
+
+	if (!i915_active_acquire(&active->base)) {
+		pr_err("First i915_active_acquire should report being idle\n");
+		return -EINVAL;
+	}
+
+	submit = heap_fence_create(GFP_KERNEL);
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+						       submit,
+						       GFP_KERNEL);
+		if (err < 0) {
+			pr_err("Failed to allocate submission fence!\n");
+			i915_request_add(rq);
+			break;
+		}
+
+		err = i915_active_ref(&active->base, rq->fence.context, rq);
+		if (err) {
+			pr_err("Failed to track active ref!\n");
+			i915_request_add(rq);
+			break;
+		}
+
+		i915_request_add(rq);
+		count++;
+	}
+
+	i915_active_release(&active->base);
+	if (active->retired) {
+		pr_err("i915_active retired before submission!\n");
+		err = -EINVAL;
+	}
+	if (active->base.count != count) {
+		pr_err("i915_active not tracking all requests, found %d, expected %d\n",
+		       active->base.count, count);
+		err = -EINVAL;
+	}
+
+	i915_sw_fence_commit(submit);
+	heap_fence_put(submit);
+
+	return err;
+}
+
+static int live_active_wait(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests upon waiting */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	i915_active_wait(&active.base);
+	if (!active.retired) {
+		pr_err("i915_active not retired after waiting!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+static int live_active_retire(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct live_active active;
+	intel_wakeref_t wakeref;
+	int err;
+
+	/* Check that we get a callback when requests are indirectly retired */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	err = __live_active_setup(i915, &active);
+
+	/* waits for & retires all requests */
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	if (!active.retired) {
+		pr_err("i915_active not retired after flushing!\n");
+		err = -EINVAL;
+	}
+
+	i915_active_fini(&active.base);
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
+int i915_active_live_selftests(struct drm_i915_private *i915)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(live_active_wait),
+		SUBTEST(live_active_retire),
+	};
+
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return 0;
+
+	return i915_subtests(tests, i915);
+}
diff --git a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
index 76b4f87fc853..6d766925ad04 100644
--- a/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_live_selftests.h
@@ -12,8 +12,9 @@
 selftest(sanitycheck, i915_live_sanitycheck) /* keep first (igt selfcheck) */
 selftest(uncore, intel_uncore_live_selftests)
 selftest(workarounds, intel_workarounds_live_selftests)
-selftest(requests, i915_request_live_selftests)
 selftest(timelines, i915_timeline_live_selftests)
+selftest(requests, i915_request_live_selftests)
+selftest(active, i915_active_live_selftests)
 selftest(objects, i915_gem_object_live_selftests)
 selftest(dmabuf, i915_gem_dmabuf_live_selftests)
 selftest(coherency, i915_gem_coherency_live_selftests)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 23/28] drm/i915: Allocate active tracking nodes from a slabcache
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (20 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 22/28] drm/i915: Generalise GPU activity tracking Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 24/28] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Wrap the active tracking for a GPU references in a slabcache for faster
allocations, and keep track of inflight nodes so we can reap the
stale entries upon parking (thereby trimming our memory usage).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c            | 55 ++++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h            | 29 +++++++++-
 drivers/gpu/drm/i915/i915_drv.h               |  2 +
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++++-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  2 +-
 drivers/gpu/drm/i915/i915_vma.c               |  3 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  3 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 ++-
 8 files changed, 101 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index e0182e19cb8b..3c7abbde42ac 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -7,7 +7,9 @@
 #include "i915_drv.h"
 #include "i915_active.h"
 
-#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
+#define i915_from_gt(x) \
+	container_of(x, struct drm_i915_private, gt.active_refs)
+#define BKL(ref) (&i915_from_gt((ref)->gt)->drm.struct_mutex)
 
 struct active_node {
 	struct i915_gem_active base;
@@ -79,11 +81,11 @@ active_instance(struct i915_active *ref, u64 idx)
 			p = &parent->rb_left;
 	}
 
-	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	node = kmem_cache_alloc(ref->gt->slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
 	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
-		kfree(node);
+		kmem_cache_free(ref->gt->slab_cache, node);
 		goto out;
 	}
 
@@ -94,6 +96,9 @@ active_instance(struct i915_active *ref, u64 idx)
 	node->ref = ref;
 	node->timeline = idx;
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		list_add(&ref->active_link, &ref->gt->active_refs);
+
 	rb_link_node(&node->node, parent, p);
 	rb_insert_color(&node->node, &ref->tree);
 
@@ -119,11 +124,11 @@ active_instance(struct i915_active *ref, u64 idx)
 	return &ref->last;
 }
 
-void i915_active_init(struct drm_i915_private *i915,
+void i915_active_init(struct i915_gt_active *gt,
 		      struct i915_active *ref,
 		      void (*retire)(struct i915_active *ref))
 {
-	ref->i915 = i915;
+	ref->gt = gt;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
 	init_request_active(&ref->last, last_retire);
@@ -161,6 +166,7 @@ void i915_active_release(struct i915_active *ref)
 
 int i915_active_wait(struct i915_active *ref)
 {
+	struct kmem_cache *slab = ref->gt->slab_cache;
 	struct active_node *it, *n;
 	int ret;
 
@@ -168,15 +174,19 @@ int i915_active_wait(struct i915_active *ref)
 	if (ret)
 		return ret;
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return 0;
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		ret = i915_gem_active_retire(&it->base, BKL(ref));
 		if (ret)
 			return ret;
 
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
+	list_del(&ref->active_link);
 
 	return 0;
 }
@@ -210,15 +220,46 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 
 void i915_active_fini(struct i915_active *ref)
 {
+	struct kmem_cache *slab = ref->gt->slab_cache;
 	struct active_node *it, *n;
 
+	lockdep_assert_held(BKL(ref));
 	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return;
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
+	list_del(&ref->active_link);
+}
+
+int i915_gt_active_init(struct i915_gt_active *gt)
+{
+	gt->slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
+	if (!gt->slab_cache)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&gt->active_refs);
+
+	return 0;
+}
+
+void i915_gt_active_park(struct i915_gt_active *gt)
+{
+	struct i915_active *it, *n;
+
+	list_for_each_entry_safe(it, n, &gt->active_refs, active_link)
+		i915_active_fini(it);
+}
+
+void i915_gt_active_fini(struct i915_gt_active *gt)
+{
+	GEM_BUG_ON(!list_empty(&gt->active_refs));
+	kmem_cache_destroy(gt->slab_cache);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 0057b84565f7..eed34e8903fc 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -7,11 +7,13 @@
 #ifndef _I915_ACTIVE_H_
 #define _I915_ACTIVE_H_
 
+#include <linux/list.h>
 #include <linux/rbtree.h>
 
 #include "i915_request.h"
 
-struct drm_i915_private;
+struct i915_gt_active;
+struct kmem_cache;
 
 /*
  * GPU activity tracking
@@ -40,7 +42,8 @@ struct drm_i915_private;
  */
 
 struct i915_active {
-	struct drm_i915_private *i915;
+	struct i915_gt_active *gt;
+	struct list_head active_link;
 
 	struct rb_root tree;
 	struct i915_gem_active last;
@@ -49,7 +52,7 @@ struct i915_active {
 	void (*retire)(struct i915_active *ref);
 };
 
-void i915_active_init(struct drm_i915_private *i915,
+void i915_active_init(struct i915_gt_active *gt,
 		      struct i915_active *ref,
 		      void (*retire)(struct i915_active *ref));
 
@@ -73,4 +76,24 @@ i915_active_is_idle(const struct i915_active *ref)
 
 void i915_active_fini(struct i915_active *ref);
 
+/*
+ * Active refs memory management
+ *
+ * To be more economical with memory, we reap all the i915_active trees on
+ * parking the GPU (when we know the GPU is inactive) and allocate the nodes
+ * from a local slab cache to hopefully reduce the fragmentation as we will
+ * then be able to free all pages en masse upon idling.
+ */
+
+struct i915_gt_active {
+	struct list_head active_refs;
+	struct kmem_cache *slab_cache;
+};
+
+int i915_gt_active_init(struct i915_gt_active *gt);
+void i915_gt_active_park(struct i915_gt_active *gt);
+void i915_gt_active_fini(struct i915_gt_active *gt);
+
+#define i915_gt_active(i915) (&(i915)->gt.active_refs)
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d072f3369ee1..4a2387f89eb0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1984,6 +1984,8 @@ struct drm_i915_private {
 			struct list_head hwsp_free_list;
 		} timelines;
 
+		struct i915_gt_active active_refs;
+
 		struct list_head active_rings;
 		struct list_head closed_vma;
 		u32 active_requests;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e802af64d628..8edc00212118 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -171,6 +171,7 @@ static u32 __i915_gem_park(struct drm_i915_private *i915)
 
 	intel_engines_park(i915);
 	i915_timelines_park(i915);
+	i915_gt_active_park(i915_gt_active(i915));
 
 	i915_pmu_gt_parked(i915);
 	i915_vma_parked(i915);
@@ -5031,15 +5032,19 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		dev_priv->gt.cleanup_engine = intel_engine_cleanup;
 	}
 
+	ret = i915_gt_active_init(i915_gt_active(dev_priv));
+	if (ret)
+		return ret;
+
 	i915_timelines_init(dev_priv);
 
 	ret = i915_gem_init_userptr(dev_priv);
 	if (ret)
-		return ret;
+		goto err_timelines;
 
 	ret = intel_uc_init_misc(dev_priv);
 	if (ret)
-		return ret;
+		goto err_userptr;
 
 	ret = intel_wopcm_init(&dev_priv->wopcm);
 	if (ret)
@@ -5155,9 +5160,13 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_uc_misc:
 	intel_uc_fini_misc(dev_priv);
 
-	if (ret != -EIO) {
+err_userptr:
+	if (ret != -EIO)
 		i915_gem_cleanup_userptr(dev_priv);
+err_timelines:
+	if (ret != -EIO) {
 		i915_timelines_fini(dev_priv);
+		i915_gt_active_fini(i915_gt_active(dev_priv));
 	}
 
 	if (ret == -EIO) {
@@ -5210,6 +5219,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv)
 	intel_uc_fini_misc(dev_priv);
 	i915_gem_cleanup_userptr(dev_priv);
 	i915_timelines_fini(dev_priv);
+	i915_gt_active_fini(i915_gt_active(dev_priv));
 
 	i915_gem_drain_freed_objects(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e625659c03a2..d8819de0d6ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1917,7 +1917,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(i915, &vma->active, NULL);
+	i915_active_init(i915_gt_active(i915), &vma->active, NULL);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = &ggtt->vm;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d4772061e642..2456bfb4877b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -119,7 +119,8 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
+	i915_active_init(i915_gt_active(vm->i915),
+			 &vma->active, __i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = vm;
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 7c5c3068565b..0e923476920e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -30,7 +30,8 @@ static int __live_active_setup(struct drm_i915_private *i915,
 	unsigned int count = 0;
 	int err = 0;
 
-	i915_active_init(i915, &active->base, __live_active_retire);
+	i915_active_init(i915_gt_active(i915),
+			 &active->base, __live_active_retire);
 	active->retired = false;
 
 	if (!i915_active_acquire(&active->base)) {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 14ae46fda49f..23aeca1f5985 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -66,9 +66,9 @@ static void mock_device_release(struct drm_device *dev)
 	for_each_engine(engine, i915, id)
 		mock_engine_free(engine);
 	i915_gem_contexts_fini(i915);
-	mutex_unlock(&i915->drm.struct_mutex);
-
 	i915_timelines_fini(i915);
+	i915_gt_active_fini(i915_gt_active(i915));
+	mutex_unlock(&i915->drm.struct_mutex);
 
 	drain_workqueue(i915->wq);
 	i915_gem_drain_freed_objects(i915);
@@ -227,6 +227,9 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->priorities)
 		goto err_dependencies;
 
+	if (i915_gt_active_init(i915_gt_active(i915)))
+		goto err_priorities;
+
 	i915_timelines_init(i915);
 
 	INIT_LIST_HEAD(&i915->gt.active_rings);
@@ -256,6 +259,8 @@ struct drm_i915_private *mock_gem_device(void)
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_timelines_fini(i915);
+	i915_gt_active_fini(i915_gt_active(i915));
+err_priorities:
 	kmem_cache_destroy(i915->priorities);
 err_dependencies:
 	kmem_cache_destroy(i915->dependencies);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 24/28] drm/i915: Pull i915_gem_active into the i915_active family
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (21 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 23/28] " Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 25/28] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.c            |  64 ++-
 drivers/gpu/drm/i915/i915_active.h            | 362 ++++++++++++++++-
 drivers/gpu/drm/i915/i915_debugfs.c           |   2 +-
 drivers/gpu/drm/i915/i915_gem.c               |  10 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |   4 +-
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     |   4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   2 +-
 drivers/gpu/drm/i915/i915_gem_object.h        |   2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  10 +-
 drivers/gpu/drm/i915/i915_request.c           |  35 +-
 drivers/gpu/drm/i915/i915_request.h           | 383 ------------------
 drivers/gpu/drm/i915/i915_reset.c             |   2 +-
 drivers/gpu/drm/i915/i915_timeline.c          |  25 +-
 drivers/gpu/drm/i915/i915_timeline.h          |  14 +-
 drivers/gpu/drm/i915/i915_vma.c               |  12 +-
 drivers/gpu/drm/i915/i915_vma.h               |   2 +-
 drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
 drivers/gpu/drm/i915/intel_overlay.c          |  29 +-
 .../gpu/drm/i915/selftests/mock_timeline.c    |   4 +-
 19 files changed, 468 insertions(+), 500 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 3c7abbde42ac..007098e44959 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -12,7 +12,7 @@
 #define BKL(ref) (&i915_from_gt((ref)->gt)->drm.struct_mutex)
 
 struct active_node {
-	struct i915_gem_active base;
+	struct i915_active_request base;
 	struct i915_active *ref;
 	struct rb_node node;
 	u64 timeline;
@@ -27,18 +27,18 @@ __active_retire(struct i915_active *ref)
 }
 
 static void
-node_retire(struct i915_gem_active *base, struct i915_request *rq)
+node_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct active_node, base)->ref);
 }
 
 static void
-last_retire(struct i915_gem_active *base, struct i915_request *rq)
+last_retire(struct i915_active_request *base, struct i915_request *rq)
 {
 	__active_retire(container_of(base, struct i915_active, last));
 }
 
-static struct i915_gem_active *
+static struct i915_active_request *
 active_instance(struct i915_active *ref, u64 idx)
 {
 	struct active_node *node;
@@ -59,7 +59,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * twice for the same timeline (as the older rbtree element will be
 	 * retired before the new request added to last).
 	 */
-	old = i915_gem_active_raw(&ref->last, BKL(ref));
+	old = i915_active_request_raw(&ref->last, BKL(ref));
 	if (!old || old->fence.context == idx)
 		goto out;
 
@@ -84,7 +84,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	node = kmem_cache_alloc(ref->gt->slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
-	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
+	if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) {
 		kmem_cache_free(ref->gt->slab_cache, node);
 		goto out;
 	}
@@ -92,7 +92,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	if (unlikely(!node))
 		return ERR_PTR(-ENOMEM);
 
-	init_request_active(&node->base, node_retire);
+	i915_active_request_init(&node->base, NULL, node_retire);
 	node->ref = ref;
 	node->timeline = idx;
 
@@ -110,7 +110,7 @@ active_instance(struct i915_active *ref, u64 idx)
 	 * callback not two, and so much undo the active counting for the
 	 * overwritten slot.
 	 */
-	if (i915_gem_active_isset(&node->base)) {
+	if (i915_active_request_isset(&node->base)) {
 		/* Retire ourselves from the old rq->active_list */
 		__list_del_entry(&node->base.link);
 		ref->count--;
@@ -131,7 +131,7 @@ void i915_active_init(struct i915_gt_active *gt,
 	ref->gt = gt;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
-	init_request_active(&ref->last, last_retire);
+	i915_active_request_init(&ref->last, NULL, last_retire);
 	ref->count = 0;
 }
 
@@ -139,15 +139,15 @@ int i915_active_ref(struct i915_active *ref,
 		    u64 timeline,
 		    struct i915_request *rq)
 {
-	struct i915_gem_active *active;
+	struct i915_active_request *active;
 
 	active = active_instance(ref, timeline);
 	if (IS_ERR(active))
 		return PTR_ERR(active);
 
-	if (!i915_gem_active_isset(active))
+	if (!i915_active_request_isset(active))
 		ref->count++;
-	i915_gem_active_set(active, rq);
+	__i915_active_request_set(active, rq);
 
 	return 0;
 }
@@ -170,7 +170,7 @@ int i915_active_wait(struct i915_active *ref)
 	struct active_node *it, *n;
 	int ret;
 
-	ret = i915_gem_active_retire(&ref->last, BKL(ref));
+	ret = i915_active_request_retire(&ref->last, BKL(ref));
 	if (ret)
 		return ret;
 
@@ -178,11 +178,11 @@ int i915_active_wait(struct i915_active *ref)
 		return 0;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = i915_gem_active_retire(&it->base, BKL(ref));
+		ret = i915_active_request_retire(&it->base, BKL(ref));
 		if (ret)
 			return ret;
 
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		GEM_BUG_ON(i915_active_request_isset(&it->base));
 		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
@@ -191,11 +191,11 @@ int i915_active_wait(struct i915_active *ref)
 	return 0;
 }
 
-static int __i915_request_await_active(struct i915_request *rq,
-				       struct i915_gem_active *active)
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active)
 {
 	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
+		i915_active_request_raw(active, &rq->i915->drm.struct_mutex);
 
 	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
 }
@@ -205,12 +205,12 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 	struct active_node *it, *n;
 	int ret;
 
-	ret = __i915_request_await_active(rq, &ref->last);
+	ret = i915_request_await_active_request(rq, &ref->last);
 	if (ret)
 		return ret;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		ret = __i915_request_await_active(rq, &it->base);
+		ret = i915_request_await_active_request(rq, &it->base);
 		if (ret)
 			return ret;
 	}
@@ -224,13 +224,13 @@ void i915_active_fini(struct i915_active *ref)
 	struct active_node *it, *n;
 
 	lockdep_assert_held(BKL(ref));
-	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
+	GEM_BUG_ON(i915_active_request_isset(&ref->last));
 
 	if (RB_EMPTY_ROOT(&ref->tree))
 		return;
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
-		GEM_BUG_ON(i915_gem_active_isset(&it->base));
+		GEM_BUG_ON(i915_active_request_isset(&it->base));
 		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
@@ -262,6 +262,26 @@ void i915_gt_active_fini(struct i915_gt_active *gt)
 	kmem_cache_destroy(gt->slab_cache);
 }
 
+int i915_active_request_set(struct i915_active_request *active,
+			    struct i915_request *rq)
+{
+	int err;
+
+	/* Must maintain ordering wrt previous active requests */
+	err = i915_request_await_active_request(rq, active);
+	if (err)
+		return err;
+
+	__i915_active_request_set(active, rq);
+	return 0;
+}
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request)
+{
+	/* Space left intentionally blank */
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/i915_active.c"
 #endif
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index eed34e8903fc..d770c9bbaafe 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -7,14 +7,368 @@
 #ifndef _I915_ACTIVE_H_
 #define _I915_ACTIVE_H_
 
+#include <linux/lockdep.h>
 #include <linux/list.h>
 #include <linux/rbtree.h>
-
-#include "i915_request.h"
+#include <linux/rcupdate.h>
 
 struct i915_gt_active;
 struct kmem_cache;
 
+/*
+ * We treat requests as fences. This is not be to confused with our
+ * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
+ * We use the fences to synchronize access from the CPU with activity on the
+ * GPU, for example, we should not rewrite an object's PTE whilst the GPU
+ * is reading them. We also track fences at a higher level to provide
+ * implicit synchronisation around GEM objects, e.g. set-domain will wait
+ * for outstanding GPU rendering before marking the object ready for CPU
+ * access, or a pageflip will wait until the GPU is complete before showing
+ * the frame on the scanout.
+ *
+ * In order to use a fence, the object must track the fence it needs to
+ * serialise with. For example, GEM objects want to track both read and
+ * write access so that we can perform concurrent read operations between
+ * the CPU and GPU engines, as well as waiting for all rendering to
+ * complete, or waiting for the last GPU user of a "fence register". The
+ * object then embeds a #i915_active_request to track the most recent (in
+ * retirement order) request relevant for the desired mode of access.
+ * The #i915_active_request is updated with i915_active_request_set() to
+ * track the most recent fence request, typically this is done as part of
+ * i915_vma_move_to_active().
+ *
+ * When the #i915_active_request completes (is retired), it will
+ * signal its completion to the owner through a callback as well as mark
+ * itself as idle (i915_active_request.request == NULL). The owner
+ * can then perform any action, such as delayed freeing of an active
+ * resource including itself.
+ */
+struct i915_active_request;
+
+typedef void (*i915_active_retire_fn)(struct i915_active_request *,
+				      struct i915_request *);
+
+struct i915_active_request {
+	struct i915_request __rcu *request;
+	struct list_head link;
+	i915_active_retire_fn retire;
+};
+
+void i915_active_retire_noop(struct i915_active_request *active,
+			     struct i915_request *request);
+
+/**
+ * i915_active_request_init - prepares the activity tracker for use
+ * @active - the active tracker
+ * @rq - initial request to track, can be NULL
+ * @func - a callback when then the tracker is retired (becomes idle),
+ *         can be NULL
+ *
+ * i915_active_request_init() prepares the embedded @active struct for use as
+ * an activity tracker, that is for tracking the last known active request
+ * associated with it. When the last request becomes idle, when it is retired
+ * after completion, the optional callback @func is invoked.
+ */
+static inline void
+i915_active_request_init(struct i915_active_request *active,
+			 struct i915_request *rq,
+			 i915_active_retire_fn retire)
+{
+	RCU_INIT_POINTER(active->request, rq);
+	INIT_LIST_HEAD(&active->link);
+	active->retire = retire ?: i915_active_retire_noop;
+}
+
+#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL)
+
+/**
+ * i915_active_request_set - updates the tracker to watch the current request
+ * @active - the active tracker
+ * @request - the request to watch
+ *
+ * __i915_active_request_set() watches the given @request for completion. Whilst
+ * that @request is busy, the @active reports busy. When that @request is
+ * retired, the @active tracker is updated to report idle.
+ */
+static inline void
+__i915_active_request_set(struct i915_active_request *active,
+			  struct i915_request *request)
+{
+	list_move(&active->link, &request->active_list);
+	rcu_assign_pointer(active->request, request);
+}
+
+int __must_check
+i915_active_request_set(struct i915_active_request *active,
+			struct i915_request *rq);
+
+/**
+ * i915_active_request_set_retire_fn - updates the retirement callback
+ * @active - the active tracker
+ * @fn - the routine called when the request is retired
+ * @mutex - struct_mutex used to guard retirements
+ *
+ * i915_active_request_set_retire_fn() updates the function pointer that
+ * is called when the final request associated with the @active tracker
+ * is retired.
+ */
+static inline void
+i915_active_request_set_retire_fn(struct i915_active_request *active,
+				  i915_active_retire_fn fn,
+				  struct mutex *mutex)
+{
+	lockdep_assert_held(mutex);
+	active->retire = fn ?: i915_active_retire_noop;
+}
+
+static inline struct i915_request *
+__i915_active_request_peek(const struct i915_active_request *active)
+{
+	/*
+	 * Inside the error capture (running with the driver in an unknown
+	 * state), we want to bend the rules slightly (a lot).
+	 *
+	 * Work is in progress to make it safer, in the meantime this keeps
+	 * the known issue from spamming the logs.
+	 */
+	return rcu_dereference_protected(active->request, 1);
+}
+
+/**
+ * i915_active_request_raw - return the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_raw() returns the current request being tracked, or NULL.
+ * It does not obtain a reference on the request for the caller, so the caller
+ * must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_raw(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return rcu_dereference_protected(active->request,
+					 lockdep_is_held(mutex));
+}
+
+/**
+ * i915_active_request_peek - report the active request being monitored
+ * @active - the active tracker
+ *
+ * i915_active_request_peek() returns the current request being tracked if
+ * still active, or NULL. It does not obtain a reference on the request
+ * for the caller, so the caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_peek(const struct i915_active_request *active,
+			 struct mutex *mutex)
+{
+	struct i915_request *request;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request || i915_request_completed(request))
+		return NULL;
+
+	return request;
+}
+
+/**
+ * i915_active_request_get - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get() returns a reference to the active request, or NULL
+ * if the active tracker is idle. The caller must hold struct_mutex.
+ */
+static inline struct i915_request *
+i915_active_request_get(const struct i915_active_request *active,
+			struct mutex *mutex)
+{
+	return i915_request_get(i915_active_request_peek(active, mutex));
+}
+
+/**
+ * __i915_active_request_get_rcu - return a reference to the active request
+ * @active - the active tracker
+ *
+ * __i915_active_request_get() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The caller must hold the RCU read
+ * lock, but the returned pointer is safe to use outside of RCU.
+ */
+static inline struct i915_request *
+__i915_active_request_get_rcu(const struct i915_active_request *active)
+{
+	/*
+	 * Performing a lockless retrieval of the active request is super
+	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
+	 * slab of request objects will not be freed whilst we hold the
+	 * RCU read lock. It does not guarantee that the request itself
+	 * will not be freed and then *reused*. Viz,
+	 *
+	 * Thread A			Thread B
+	 *
+	 * rq = active.request
+	 *				retire(rq) -> free(rq);
+	 *				(rq is now first on the slab freelist)
+	 *				active.request = NULL
+	 *
+	 *				rq = new submission on a new object
+	 * ref(rq)
+	 *
+	 * To prevent the request from being reused whilst the caller
+	 * uses it, we take a reference like normal. Whilst acquiring
+	 * the reference we check that it is not in a destroyed state
+	 * (refcnt == 0). That prevents the request being reallocated
+	 * whilst the caller holds on to it. To check that the request
+	 * was not reallocated as we acquired the reference we have to
+	 * check that our request remains the active request across
+	 * the lookup, in the same manner as a seqlock. The visibility
+	 * of the pointer versus the reference counting is controlled
+	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
+	 *
+	 * In the middle of all that, we inspect whether the request is
+	 * complete. Retiring is lazy so the request may be completed long
+	 * before the active tracker is updated. Querying whether the
+	 * request is complete is far cheaper (as it involves no locked
+	 * instructions setting cachelines to exclusive) than acquiring
+	 * the reference, so we do it first. The RCU read lock ensures the
+	 * pointer dereference is valid, but does not ensure that the
+	 * seqno nor HWS is the right one! However, if the request was
+	 * reallocated, that means the active tracker's request was complete.
+	 * If the new request is also complete, then both are and we can
+	 * just report the active tracker is idle. If the new request is
+	 * incomplete, then we acquire a reference on it and check that
+	 * it remained the active request.
+	 *
+	 * It is then imperative that we do not zero the request on
+	 * reallocation, so that we can chase the dangling pointers!
+	 * See i915_request_alloc().
+	 */
+	do {
+		struct i915_request *request;
+
+		request = rcu_dereference(active->request);
+		if (!request || i915_request_completed(request))
+			return NULL;
+
+		/*
+		 * An especially silly compiler could decide to recompute the
+		 * result of i915_request_completed, more specifically
+		 * re-emit the load for request->fence.seqno. A race would catch
+		 * a later seqno value, which could flip the result from true to
+		 * false. Which means part of the instructions below might not
+		 * be executed, while later on instructions are executed. Due to
+		 * barriers within the refcounting the inconsistency can't reach
+		 * past the call to i915_request_get_rcu, but not executing
+		 * that while still executing i915_request_put() creates
+		 * havoc enough.  Prevent this with a compiler barrier.
+		 */
+		barrier();
+
+		request = i915_request_get_rcu(request);
+
+		/*
+		 * What stops the following rcu_access_pointer() from occurring
+		 * before the above i915_request_get_rcu()? If we were
+		 * to read the value before pausing to get the reference to
+		 * the request, we may not notice a change in the active
+		 * tracker.
+		 *
+		 * The rcu_access_pointer() is a mere compiler barrier, which
+		 * means both the CPU and compiler are free to perform the
+		 * memory read without constraint. The compiler only has to
+		 * ensure that any operations after the rcu_access_pointer()
+		 * occur afterwards in program order. This means the read may
+		 * be performed earlier by an out-of-order CPU, or adventurous
+		 * compiler.
+		 *
+		 * The atomic operation at the heart of
+		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
+		 * atomic_inc_not_zero() which is only a full memory barrier
+		 * when successful. That is, if i915_request_get_rcu()
+		 * returns the request (and so with the reference counted
+		 * incremented) then the following read for rcu_access_pointer()
+		 * must occur after the atomic operation and so confirm
+		 * that this request is the one currently being tracked.
+		 *
+		 * The corresponding write barrier is part of
+		 * rcu_assign_pointer().
+		 */
+		if (!request || request == rcu_access_pointer(active->request))
+			return rcu_pointer_handoff(request);
+
+		i915_request_put(request);
+	} while (1);
+}
+
+/**
+ * i915_active_request_get_unlocked - return a reference to the active request
+ * @active - the active tracker
+ *
+ * i915_active_request_get_unlocked() returns a reference to the active request,
+ * or NULL if the active tracker is idle. The reference is obtained under RCU,
+ * so no locking is required by the caller.
+ *
+ * The reference should be freed with i915_request_put().
+ */
+static inline struct i915_request *
+i915_active_request_get_unlocked(const struct i915_active_request *active)
+{
+	struct i915_request *request;
+
+	rcu_read_lock();
+	request = __i915_active_request_get_rcu(active);
+	rcu_read_unlock();
+
+	return request;
+}
+
+/**
+ * i915_active_request_isset - report whether the active tracker is assigned
+ * @active - the active tracker
+ *
+ * i915_active_request_isset() returns true if the active tracker is currently
+ * assigned to a request. Due to the lazy retiring, that request may be idle
+ * and this may report stale information.
+ */
+static inline bool
+i915_active_request_isset(const struct i915_active_request *active)
+{
+	return rcu_access_pointer(active->request);
+}
+
+/**
+ * i915_active_request_retire - waits until the request is retired
+ * @active - the active request on which to wait
+ *
+ * i915_active_request_retire() waits until the request is completed,
+ * and then ensures that at least the retirement handler for this
+ * @active tracker is called before returning. If the @active
+ * tracker is idle, the function returns immediately.
+ */
+static inline int __must_check
+i915_active_request_retire(struct i915_active_request *active,
+			   struct mutex *mutex)
+{
+	struct i915_request *request;
+	long ret;
+
+	request = i915_active_request_raw(active, mutex);
+	if (!request)
+		return 0;
+
+	ret = i915_request_wait(request,
+				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
+				MAX_SCHEDULE_TIMEOUT);
+	if (ret < 0)
+		return ret;
+
+	list_del_init(&active->link);
+	RCU_INIT_POINTER(active->request, NULL);
+
+	active->retire(active, request);
+
+	return 0;
+}
+
 /*
  * GPU activity tracking
  *
@@ -46,7 +400,7 @@ struct i915_active {
 	struct list_head active_link;
 
 	struct rb_root tree;
-	struct i915_gem_active last;
+	struct i915_active_request last;
 	unsigned int count;
 
 	void (*retire)(struct i915_active *ref);
@@ -64,6 +418,8 @@ int i915_active_wait(struct i915_active *ref);
 
 int i915_request_await_active(struct i915_request *rq,
 			      struct i915_active *ref);
+int i915_request_await_active_request(struct i915_request *rq,
+				      struct i915_active_request *active);
 
 bool i915_active_acquire(struct i915_active *ref);
 void i915_active_release(struct i915_active *ref);
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fa2c226fc779..5f07925754fa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -207,7 +207,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 		if (vma->fence)
 			seq_printf(m, " , fence: %d%s",
 				   vma->fence->id,
-				   i915_gem_active_isset(&vma->last_fence) ? "*" : "");
+				   i915_active_request_isset(&vma->last_fence) ? "*" : "");
 		seq_puts(m, ")");
 	}
 	if (obj->stolen)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8edc00212118..a132004ed72f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3021,7 +3021,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 
 	GEM_BUG_ON(i915->gt.active_requests);
 	for_each_engine(engine, i915, id) {
-		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
+		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
 		GEM_BUG_ON(engine->last_retired_context !=
 			   to_intel_context(i915->kernel_context, engine));
 	}
@@ -3267,7 +3267,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 	list_for_each_entry(tl, &gt->active_list, link) {
 		struct i915_request *rq;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
@@ -4168,7 +4168,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 }
 
 static void
-frontbuffer_retire(struct i915_gem_active *active, struct i915_request *request)
+frontbuffer_retire(struct i915_active_request *active,
+		   struct i915_request *request)
 {
 	struct drm_i915_gem_object *obj =
 		container_of(active, typeof(*obj), frontbuffer_write);
@@ -4195,7 +4196,8 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	obj->resv = &obj->__builtin_resv;
 
 	obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
-	init_request_active(&obj->frontbuffer_write, frontbuffer_retire);
+	i915_active_request_init(&obj->frontbuffer_write,
+				 NULL, frontbuffer_retire);
 
 	obj->mm.madv = I915_MADV_WILLNEED;
 	INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 6faf1f6faab5..ea8e818d22bf 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -653,8 +653,8 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	GEM_BUG_ON(timeline == &engine->timeline);
 
-	rq = i915_gem_active_raw(&timeline->last_request,
-				 &engine->i915->drm.struct_mutex);
+	rq = i915_active_request_raw(&timeline->last_request,
+				     &engine->i915->drm.struct_mutex);
 	if (rq && rq->engine == engine) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index 46e259661294..e037e94792f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -223,7 +223,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 			 i915_gem_object_get_tiling(vma->obj)))
 			return -EINVAL;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
@@ -232,7 +232,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 	if (fence->vma) {
 		struct i915_vma *old = fence->vma;
 
-		ret = i915_gem_active_retire(&old->last_fence,
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);
 		if (ret)
 			return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d8819de0d6ee..be79c377fc59 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1918,7 +1918,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 		return ERR_PTR(-ENOMEM);
 
 	i915_active_init(i915_gt_active(i915), &vma->active, NULL);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = &ggtt->vm;
 	vma->ops = &pd_vma_ops;
diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/i915_gem_object.h
index 5a33b6d9f942..9e8b0e7324fd 100644
--- a/drivers/gpu/drm/i915/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/i915_gem_object.h
@@ -175,7 +175,7 @@ struct drm_i915_gem_object {
 
 	atomic_t frontbuffer_bits;
 	unsigned int frontbuffer_ggtt_origin; /* write once */
-	struct i915_gem_active frontbuffer_write;
+	struct i915_active_request frontbuffer_write;
 
 	/** Current tiling stride for the object, if it's tiled. */
 	unsigned int tiling_and_stride;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 6e2e5ed2bd0a..9a65341fec09 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1062,23 +1062,23 @@ i915_error_object_create(struct drm_i915_private *i915,
 }
 
 /* The error capture is special as tries to run underneath the normal
- * locking rules - so we use the raw version of the i915_gem_active lookup.
+ * locking rules - so we use the raw version of the i915_active_request lookup.
  */
 static inline u32
-__active_get_seqno(struct i915_gem_active *active)
+__active_get_seqno(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->global_seqno : 0;
 }
 
 static inline int
-__active_get_engine_id(struct i915_gem_active *active)
+__active_get_engine_id(struct i915_active_request *active)
 {
 	struct i915_request *request;
 
-	request = __i915_gem_active_peek(active);
+	request = __i915_active_request_peek(active);
 	return request ? request->engine->id : -1;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4b1869295362..a09f47ccc703 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -29,6 +29,7 @@
 #include <linux/sched/signal.h>
 
 #include "i915_drv.h"
+#include "i915_active.h"
 #include "i915_reset.h"
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -125,12 +126,6 @@ static void unreserve_gt(struct drm_i915_private *i915)
 		i915_gem_park(i915);
 }
 
-void i915_gem_retire_noop(struct i915_gem_active *active,
-			  struct i915_request *request)
-{
-	/* Space left intentionally blank */
-}
-
 static void advance_ring(struct i915_request *request)
 {
 	struct intel_ring *ring = request->ring;
@@ -244,7 +239,7 @@ static void __retire_engine_upto(struct intel_engine_cs *engine,
 
 static void i915_request_retire(struct i915_request *request)
 {
-	struct i915_gem_active *active, *next;
+	struct i915_active_request *active, *next;
 
 	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
 		  request->engine->name,
@@ -278,10 +273,10 @@ static void i915_request_retire(struct i915_request *request)
 		 * we may spend an inordinate amount of time simply handling
 		 * the retirement of requests and processing their callbacks.
 		 * Of which, this loop itself is particularly hot due to the
-		 * cache misses when jumping around the list of i915_gem_active.
-		 * So we try to keep this loop as streamlined as possible and
-		 * also prefetch the next i915_gem_active to try and hide
-		 * the likely cache miss.
+		 * cache misses when jumping around the list of
+		 * i915_active_request.  So we try to keep this loop as
+		 * streamlined as possible and also prefetch the next
+		 * i915_active_request to try and hide the likely cache miss.
 		 */
 		prefetchw(next);
 
@@ -526,17 +521,9 @@ i915_request_alloc_slow(struct intel_context *ce)
 	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
 }
 
-static int add_barrier(struct i915_request *rq, struct i915_gem_active *active)
-{
-	struct i915_request *barrier =
-		i915_gem_active_raw(active, &rq->i915->drm.struct_mutex);
-
-	return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0;
-}
-
 static int add_timeline_barrier(struct i915_request *rq)
 {
-	return add_barrier(rq, &rq->timeline->barrier);
+	return i915_request_await_active_request(rq, &rq->timeline->barrier);
 }
 
 /**
@@ -595,7 +582,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * We use RCU to look up requests in flight. The lookups may
 	 * race with the request being allocated from the slab freelist.
 	 * That is the request we are writing to here, may be in the process
-	 * of being read by __i915_gem_active_get_rcu(). As such,
+	 * of being read by __i915_active_request_get_rcu(). As such,
 	 * we have to be very careful when overwriting the contents. During
 	 * the RCU lookup, we change chase the request->engine pointer,
 	 * read the request->global_seqno and increment the reference count.
@@ -937,8 +924,8 @@ void i915_request_add(struct i915_request *request)
 	 * see a more recent value in the hws than we are tracking.
 	 */
 
-	prev = i915_gem_active_raw(&timeline->last_request,
-				   &request->i915->drm.struct_mutex);
+	prev = i915_active_request_raw(&timeline->last_request,
+				       &request->i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
 		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
 					     &request->submitq);
@@ -954,7 +941,7 @@ void i915_request_add(struct i915_request *request)
 	spin_unlock_irq(&timeline->lock);
 
 	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
-	i915_gem_active_set(&timeline->last_request, request);
+	__i915_active_request_set(&timeline->last_request, request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
 	if (list_is_first(&request->ring_link, &ring->request_list)) {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 3cffb96203b9..40f3e8dcbdd5 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -403,387 +403,4 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 
 void i915_retire_requests(struct drm_i915_private *i915);
 
-/*
- * We treat requests as fences. This is not be to confused with our
- * "fence registers" but pipeline synchronisation objects ala GL_ARB_sync.
- * We use the fences to synchronize access from the CPU with activity on the
- * GPU, for example, we should not rewrite an object's PTE whilst the GPU
- * is reading them. We also track fences at a higher level to provide
- * implicit synchronisation around GEM objects, e.g. set-domain will wait
- * for outstanding GPU rendering before marking the object ready for CPU
- * access, or a pageflip will wait until the GPU is complete before showing
- * the frame on the scanout.
- *
- * In order to use a fence, the object must track the fence it needs to
- * serialise with. For example, GEM objects want to track both read and
- * write access so that we can perform concurrent read operations between
- * the CPU and GPU engines, as well as waiting for all rendering to
- * complete, or waiting for the last GPU user of a "fence register". The
- * object then embeds a #i915_gem_active to track the most recent (in
- * retirement order) request relevant for the desired mode of access.
- * The #i915_gem_active is updated with i915_gem_active_set() to track the
- * most recent fence request, typically this is done as part of
- * i915_vma_move_to_active().
- *
- * When the #i915_gem_active completes (is retired), it will
- * signal its completion to the owner through a callback as well as mark
- * itself as idle (i915_gem_active.request == NULL). The owner
- * can then perform any action, such as delayed freeing of an active
- * resource including itself.
- */
-struct i915_gem_active;
-
-typedef void (*i915_gem_retire_fn)(struct i915_gem_active *,
-				   struct i915_request *);
-
-struct i915_gem_active {
-	struct i915_request __rcu *request;
-	struct list_head link;
-	i915_gem_retire_fn retire;
-};
-
-void i915_gem_retire_noop(struct i915_gem_active *,
-			  struct i915_request *request);
-
-/**
- * init_request_active - prepares the activity tracker for use
- * @active - the active tracker
- * @func - a callback when then the tracker is retired (becomes idle),
- *         can be NULL
- *
- * init_request_active() prepares the embedded @active struct for use as
- * an activity tracker, that is for tracking the last known active request
- * associated with it. When the last request becomes idle, when it is retired
- * after completion, the optional callback @func is invoked.
- */
-static inline void
-init_request_active(struct i915_gem_active *active,
-		    i915_gem_retire_fn retire)
-{
-	RCU_INIT_POINTER(active->request, NULL);
-	INIT_LIST_HEAD(&active->link);
-	active->retire = retire ?: i915_gem_retire_noop;
-}
-
-/**
- * i915_gem_active_set - updates the tracker to watch the current request
- * @active - the active tracker
- * @request - the request to watch
- *
- * i915_gem_active_set() watches the given @request for completion. Whilst
- * that @request is busy, the @active reports busy. When that @request is
- * retired, the @active tracker is updated to report idle.
- */
-static inline void
-i915_gem_active_set(struct i915_gem_active *active,
-		    struct i915_request *request)
-{
-	list_move(&active->link, &request->active_list);
-	rcu_assign_pointer(active->request, request);
-}
-
-/**
- * i915_gem_active_set_retire_fn - updates the retirement callback
- * @active - the active tracker
- * @fn - the routine called when the request is retired
- * @mutex - struct_mutex used to guard retirements
- *
- * i915_gem_active_set_retire_fn() updates the function pointer that
- * is called when the final request associated with the @active tracker
- * is retired.
- */
-static inline void
-i915_gem_active_set_retire_fn(struct i915_gem_active *active,
-			      i915_gem_retire_fn fn,
-			      struct mutex *mutex)
-{
-	lockdep_assert_held(mutex);
-	active->retire = fn ?: i915_gem_retire_noop;
-}
-
-static inline struct i915_request *
-__i915_gem_active_peek(const struct i915_gem_active *active)
-{
-	/*
-	 * Inside the error capture (running with the driver in an unknown
-	 * state), we want to bend the rules slightly (a lot).
-	 *
-	 * Work is in progress to make it safer, in the meantime this keeps
-	 * the known issue from spamming the logs.
-	 */
-	return rcu_dereference_protected(active->request, 1);
-}
-
-/**
- * i915_gem_active_raw - return the active request
- * @active - the active tracker
- *
- * i915_gem_active_raw() returns the current request being tracked, or NULL.
- * It does not obtain a reference on the request for the caller, so the caller
- * must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_raw(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return rcu_dereference_protected(active->request,
-					 lockdep_is_held(mutex));
-}
-
-/**
- * i915_gem_active_peek - report the active request being monitored
- * @active - the active tracker
- *
- * i915_gem_active_peek() returns the current request being tracked if
- * still active, or NULL. It does not obtain a reference on the request
- * for the caller, so the caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_peek(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	struct i915_request *request;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request || i915_request_completed(request))
-		return NULL;
-
-	return request;
-}
-
-/**
- * i915_gem_active_get - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold struct_mutex.
- */
-static inline struct i915_request *
-i915_gem_active_get(const struct i915_gem_active *active, struct mutex *mutex)
-{
-	return i915_request_get(i915_gem_active_peek(active, mutex));
-}
-
-/**
- * __i915_gem_active_get_rcu - return a reference to the active request
- * @active - the active tracker
- *
- * __i915_gem_active_get() returns a reference to the active request, or NULL
- * if the active tracker is idle. The caller must hold the RCU read lock, but
- * the returned pointer is safe to use outside of RCU.
- */
-static inline struct i915_request *
-__i915_gem_active_get_rcu(const struct i915_gem_active *active)
-{
-	/*
-	 * Performing a lockless retrieval of the active request is super
-	 * tricky. SLAB_TYPESAFE_BY_RCU merely guarantees that the backing
-	 * slab of request objects will not be freed whilst we hold the
-	 * RCU read lock. It does not guarantee that the request itself
-	 * will not be freed and then *reused*. Viz,
-	 *
-	 * Thread A			Thread B
-	 *
-	 * rq = active.request
-	 *				retire(rq) -> free(rq);
-	 *				(rq is now first on the slab freelist)
-	 *				active.request = NULL
-	 *
-	 *				rq = new submission on a new object
-	 * ref(rq)
-	 *
-	 * To prevent the request from being reused whilst the caller
-	 * uses it, we take a reference like normal. Whilst acquiring
-	 * the reference we check that it is not in a destroyed state
-	 * (refcnt == 0). That prevents the request being reallocated
-	 * whilst the caller holds on to it. To check that the request
-	 * was not reallocated as we acquired the reference we have to
-	 * check that our request remains the active request across
-	 * the lookup, in the same manner as a seqlock. The visibility
-	 * of the pointer versus the reference counting is controlled
-	 * by using RCU barriers (rcu_dereference and rcu_assign_pointer).
-	 *
-	 * In the middle of all that, we inspect whether the request is
-	 * complete. Retiring is lazy so the request may be completed long
-	 * before the active tracker is updated. Querying whether the
-	 * request is complete is far cheaper (as it involves no locked
-	 * instructions setting cachelines to exclusive) than acquiring
-	 * the reference, so we do it first. The RCU read lock ensures the
-	 * pointer dereference is valid, but does not ensure that the
-	 * seqno nor HWS is the right one! However, if the request was
-	 * reallocated, that means the active tracker's request was complete.
-	 * If the new request is also complete, then both are and we can
-	 * just report the active tracker is idle. If the new request is
-	 * incomplete, then we acquire a reference on it and check that
-	 * it remained the active request.
-	 *
-	 * It is then imperative that we do not zero the request on
-	 * reallocation, so that we can chase the dangling pointers!
-	 * See i915_request_alloc().
-	 */
-	do {
-		struct i915_request *request;
-
-		request = rcu_dereference(active->request);
-		if (!request || i915_request_completed(request))
-			return NULL;
-
-		/*
-		 * An especially silly compiler could decide to recompute the
-		 * result of i915_request_completed, more specifically
-		 * re-emit the load for request->fence.seqno. A race would catch
-		 * a later seqno value, which could flip the result from true to
-		 * false. Which means part of the instructions below might not
-		 * be executed, while later on instructions are executed. Due to
-		 * barriers within the refcounting the inconsistency can't reach
-		 * past the call to i915_request_get_rcu, but not executing
-		 * that while still executing i915_request_put() creates
-		 * havoc enough.  Prevent this with a compiler barrier.
-		 */
-		barrier();
-
-		request = i915_request_get_rcu(request);
-
-		/*
-		 * What stops the following rcu_access_pointer() from occurring
-		 * before the above i915_request_get_rcu()? If we were
-		 * to read the value before pausing to get the reference to
-		 * the request, we may not notice a change in the active
-		 * tracker.
-		 *
-		 * The rcu_access_pointer() is a mere compiler barrier, which
-		 * means both the CPU and compiler are free to perform the
-		 * memory read without constraint. The compiler only has to
-		 * ensure that any operations after the rcu_access_pointer()
-		 * occur afterwards in program order. This means the read may
-		 * be performed earlier by an out-of-order CPU, or adventurous
-		 * compiler.
-		 *
-		 * The atomic operation at the heart of
-		 * i915_request_get_rcu(), see dma_fence_get_rcu(), is
-		 * atomic_inc_not_zero() which is only a full memory barrier
-		 * when successful. That is, if i915_request_get_rcu()
-		 * returns the request (and so with the reference counted
-		 * incremented) then the following read for rcu_access_pointer()
-		 * must occur after the atomic operation and so confirm
-		 * that this request is the one currently being tracked.
-		 *
-		 * The corresponding write barrier is part of
-		 * rcu_assign_pointer().
-		 */
-		if (!request || request == rcu_access_pointer(active->request))
-			return rcu_pointer_handoff(request);
-
-		i915_request_put(request);
-	} while (1);
-}
-
-/**
- * i915_gem_active_get_unlocked - return a reference to the active request
- * @active - the active tracker
- *
- * i915_gem_active_get_unlocked() returns a reference to the active request,
- * or NULL if the active tracker is idle. The reference is obtained under RCU,
- * so no locking is required by the caller.
- *
- * The reference should be freed with i915_request_put().
- */
-static inline struct i915_request *
-i915_gem_active_get_unlocked(const struct i915_gem_active *active)
-{
-	struct i915_request *request;
-
-	rcu_read_lock();
-	request = __i915_gem_active_get_rcu(active);
-	rcu_read_unlock();
-
-	return request;
-}
-
-/**
- * i915_gem_active_isset - report whether the active tracker is assigned
- * @active - the active tracker
- *
- * i915_gem_active_isset() returns true if the active tracker is currently
- * assigned to a request. Due to the lazy retiring, that request may be idle
- * and this may report stale information.
- */
-static inline bool
-i915_gem_active_isset(const struct i915_gem_active *active)
-{
-	return rcu_access_pointer(active->request);
-}
-
-/**
- * i915_gem_active_wait - waits until the request is completed
- * @active - the active request on which to wait
- * @flags - how to wait
- * @timeout - how long to wait at most
- * @rps - userspace client to charge for a waitboost
- *
- * i915_gem_active_wait() waits until the request is completed before
- * returning, without requiring any locks to be held. Note that it does not
- * retire any requests before returning.
- *
- * This function relies on RCU in order to acquire the reference to the active
- * request without holding any locks. See __i915_gem_active_get_rcu() for the
- * glory details on how that is managed. Once the reference is acquired, we
- * can then wait upon the request, and afterwards release our reference,
- * free of any locking.
- *
- * This function wraps i915_request_wait(), see it for the full details on
- * the arguments.
- *
- * Returns 0 if successful, or a negative error code.
- */
-static inline int
-i915_gem_active_wait(const struct i915_gem_active *active, unsigned int flags)
-{
-	struct i915_request *request;
-	long ret = 0;
-
-	request = i915_gem_active_get_unlocked(active);
-	if (request) {
-		ret = i915_request_wait(request, flags, MAX_SCHEDULE_TIMEOUT);
-		i915_request_put(request);
-	}
-
-	return ret < 0 ? ret : 0;
-}
-
-/**
- * i915_gem_active_retire - waits until the request is retired
- * @active - the active request on which to wait
- *
- * i915_gem_active_retire() waits until the request is completed,
- * and then ensures that at least the retirement handler for this
- * @active tracker is called before returning. If the @active
- * tracker is idle, the function returns immediately.
- */
-static inline int __must_check
-i915_gem_active_retire(struct i915_gem_active *active,
-		       struct mutex *mutex)
-{
-	struct i915_request *request;
-	long ret;
-
-	request = i915_gem_active_raw(active, mutex);
-	if (!request)
-		return 0;
-
-	ret = i915_request_wait(request,
-				I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
-				MAX_SCHEDULE_TIMEOUT);
-	if (ret < 0)
-		return ret;
-
-	list_del_init(&active->link);
-	RCU_INIT_POINTER(active->request, NULL);
-
-	active->retire(active, request);
-
-	return 0;
-}
-
-#define for_each_active(mask, idx) \
-	for (; mask ? idx = ffs(mask) - 1, 1 : 0; mask &= ~BIT(idx))
-
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4462007a681c..0e0ddf2e6815 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -862,7 +862,7 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 		struct i915_request *rq;
 		long timeout;
 
-		rq = i915_gem_active_get_unlocked(&tl->last_request);
+		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b354843a5040..b2202d2e58a2 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -163,8 +163,8 @@ int i915_timeline_init(struct drm_i915_private *i915,
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
@@ -236,7 +236,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
-	GEM_BUG_ON(i915_gem_active_isset(&timeline->barrier));
+	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
 	hwsp_free(timeline);
@@ -268,25 +268,6 @@ i915_timeline_create(struct drm_i915_private *i915,
 	return timeline;
 }
 
-int i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
-{
-	struct i915_request *old;
-	int err;
-
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
-
-	/* Must maintain ordering wrt existing barriers */
-	old = i915_gem_active_raw(&tl->barrier, &rq->i915->drm.struct_mutex);
-	if (old) {
-		err = i915_request_await_dma_fence(rq, &old->fence);
-		if (err)
-			return err;
-	}
-
-	i915_gem_active_set(&tl->barrier, rq);
-	return 0;
-}
-
 int i915_timeline_pin(struct i915_timeline *tl)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index d167e04073c5..7bec7d2e45bf 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -28,6 +28,7 @@
 #include <linux/list.h>
 #include <linux/kref.h>
 
+#include "i915_active.h"
 #include "i915_request.h"
 #include "i915_syncmap.h"
 #include "i915_utils.h"
@@ -58,10 +59,10 @@ struct i915_timeline {
 
 	/* Contains an RCU guarded pointer to the last request. No reference is
 	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_gem_active_get_request_rcu(), or hold the
+	 * the request using i915_active_request_get_request_rcu(), or hold the
 	 * struct_mutex.
 	 */
-	struct i915_gem_active last_request;
+	struct i915_active_request last_request;
 
 	/**
 	 * We track the most recent seqno that we wait on in every context so
@@ -82,7 +83,7 @@ struct i915_timeline {
 	 * subsequent submissions to this timeline be executed only after the
 	 * barrier has been completed.
 	 */
-	struct i915_gem_active barrier;
+	struct i915_active_request barrier;
 
 	struct list_head link;
 	const char *name;
@@ -174,7 +175,10 @@ void i915_timelines_fini(struct drm_i915_private *i915);
  * submissions on @timeline. Subsequent requests will not be submitted to GPU
  * until the barrier has been completed.
  */
-int i915_timeline_set_barrier(struct i915_timeline *timeline,
-			      struct i915_request *rq);
+static inline int
+i915_timeline_set_barrier(struct i915_timeline *tl, struct i915_request *rq)
+{
+	return i915_active_request_set(&tl->barrier, rq);
+}
 
 #endif
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 2456bfb4877b..376821c37d72 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -121,7 +121,7 @@ vma_create(struct drm_i915_gem_object *obj,
 
 	i915_active_init(i915_gt_active(vm->i915),
 			 &vma->active, __i915_vma_retire);
-	init_request_active(&vma->last_fence, NULL);
+	INIT_ACTIVE_REQUEST(&vma->last_fence);
 
 	vma->vm = vm;
 	vma->ops = &vm->vma_ops;
@@ -809,7 +809,7 @@ static void __i915_vma_destroy(struct i915_vma *vma)
 	GEM_BUG_ON(vma->node.allocated);
 	GEM_BUG_ON(vma->fence);
 
-	GEM_BUG_ON(i915_gem_active_isset(&vma->last_fence));
+	GEM_BUG_ON(i915_active_request_isset(&vma->last_fence));
 
 	mutex_lock(&vma->vm->mutex);
 	list_del(&vma->vm_link);
@@ -943,14 +943,14 @@ int i915_vma_move_to_active(struct i915_vma *vma,
 		obj->write_domain = I915_GEM_DOMAIN_RENDER;
 
 		if (intel_fb_obj_invalidate(obj, ORIGIN_CS))
-			i915_gem_active_set(&obj->frontbuffer_write, rq);
+			__i915_active_request_set(&obj->frontbuffer_write, rq);
 
 		obj->read_domains = 0;
 	}
 	obj->read_domains |= I915_GEM_GPU_DOMAINS;
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
-		i915_gem_active_set(&vma->last_fence, rq);
+		__i915_active_request_set(&vma->last_fence, rq);
 
 	export_fence(vma, rq, flags);
 	return 0;
@@ -987,8 +987,8 @@ int i915_vma_unbind(struct i915_vma *vma)
 		if (ret)
 			goto unpin;
 
-		ret = i915_gem_active_retire(&vma->last_fence,
-					     &vma->vm->i915->drm.struct_mutex);
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);
 unpin:
 		__i915_vma_unpin(vma);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 3c03d4569481..7c742027f866 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -110,7 +110,7 @@ struct i915_vma {
 #define I915_VMA_GGTT_WRITE	BIT(15)
 
 	struct i915_active active;
-	struct i915_gem_active last_fence;
+	struct i915_active_request last_fence;
 
 	/**
 	 * Support different GGTT views into the same object.
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ee43c98227bb..4b4004d56e53 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1086,7 +1086,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 * the last request that remains in the timeline. When idle, it is
 	 * the last executed context as tracked by retirement.
 	 */
-	rq = __i915_gem_active_peek(&engine->timeline.last_request);
+	rq = __i915_active_request_peek(&engine->timeline.last_request);
 	if (rq)
 		return rq->hw_context == kernel_context;
 	else
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index a9238fd07e30..f7abfd817a11 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -186,7 +186,7 @@ struct intel_overlay {
 	struct overlay_registers __iomem *regs;
 	u32 flip_addr;
 	/* flip handling */
-	struct i915_gem_active last_flip;
+	struct i915_active_request last_flip;
 };
 
 static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
@@ -214,22 +214,22 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv,
 
 static void intel_overlay_submit_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
-	GEM_BUG_ON(i915_gem_active_peek(&overlay->last_flip,
-					&overlay->i915->drm.struct_mutex));
-	i915_gem_active_set_retire_fn(&overlay->last_flip, retire,
-				      &overlay->i915->drm.struct_mutex);
-	i915_gem_active_set(&overlay->last_flip, rq);
+	GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip,
+					    &overlay->i915->drm.struct_mutex));
+	i915_active_request_set_retire_fn(&overlay->last_flip, retire,
+					  &overlay->i915->drm.struct_mutex);
+	__i915_active_request_set(&overlay->last_flip, rq);
 	i915_request_add(rq);
 }
 
 static int intel_overlay_do_wait_request(struct intel_overlay *overlay,
 					 struct i915_request *rq,
-					 i915_gem_retire_fn retire)
+					 i915_active_retire_fn retire)
 {
 	intel_overlay_submit_request(overlay, rq, retire);
-	return i915_gem_active_retire(&overlay->last_flip,
+	return i915_active_request_retire(&overlay->last_flip,
 				      &overlay->i915->drm.struct_mutex);
 }
 
@@ -351,8 +351,9 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay)
 	i915_vma_put(vma);
 }
 
-static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
-					       struct i915_request *rq)
+static void
+intel_overlay_release_old_vid_tail(struct i915_active_request *active,
+				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
 		container_of(active, typeof(*overlay), last_flip);
@@ -360,7 +361,7 @@ static void intel_overlay_release_old_vid_tail(struct i915_gem_active *active,
 	intel_overlay_release_old_vma(overlay);
 }
 
-static void intel_overlay_off_tail(struct i915_gem_active *active,
+static void intel_overlay_off_tail(struct i915_active_request *active,
 				   struct i915_request *rq)
 {
 	struct intel_overlay *overlay =
@@ -423,7 +424,7 @@ static int intel_overlay_off(struct intel_overlay *overlay)
  * We have to be careful not to repeat work forever an make forward progess. */
 static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay)
 {
-	return i915_gem_active_retire(&overlay->last_flip,
+	return i915_active_request_retire(&overlay->last_flip,
 				      &overlay->i915->drm.struct_mutex);
 }
 
@@ -1357,7 +1358,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv)
 	overlay->contrast = 75;
 	overlay->saturation = 146;
 
-	init_request_active(&overlay->last_flip, NULL);
+	INIT_ACTIVE_REQUEST(&overlay->last_flip);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index e5659aaa856d..d2de9ece2118 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -15,8 +15,8 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 
 	spin_lock_init(&timeline->lock);
 
-	init_request_active(&timeline->barrier, NULL);
-	init_request_active(&timeline->last_request, NULL);
+	INIT_ACTIVE_REQUEST(&timeline->barrier);
+	INIT_ACTIVE_REQUEST(&timeline->last_request);
 	INIT_LIST_HEAD(&timeline->requests);
 
 	i915_syncmap_init(&timeline->sync);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 25/28] drm/i915: Keep timeline HWSP allocated until the system is idle
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (22 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 24/28] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 26/28] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until the entire system is idle, as any other
timeline active on the GPU may still refer back to the already retired
timeline. We both have to delay recycling available cachelines and
unpinning old HWSP until the next idle point (i.e. on parking).

That we have to keep the HWSP alive for external references on HW raises
an interesting conundrum. On a busy system, we may never see a global
idle point, essentially meaning the resource will be leaking until we
are forced to sleep. What we need is a set of RCU primitives for the GPU!
This should also help mitigate the resource starvation issues
promulgating from keeping all logical state pinned until idle (instead
of as currently handled until the next context switch).

v2: Use idle barriers to free stale HWSP as soon as all current requests
are idle, rather than rely on the system reaching a global idle point.
(Tvrtko)
v3: Replace the idle barrier with read locks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c  |  30 ++--
 drivers/gpu/drm/i915/i915_timeline.c | 209 +++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h |   9 +-
 3 files changed, 217 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a09f47ccc703..07e4c3c68ecd 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -326,11 +326,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -539,8 +534,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -615,24 +612,26 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -693,6 +692,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(i915->requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b2202d2e58a2..3eb134a8123f 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,28 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	unsigned int cacheline : 6;
+	unsigned int free : 1;
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +80,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +98,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +108,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +121,80 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, cl->cacheline);
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __idle_cacheline_park(struct i915_timeline_cacheline *cl)
+{
+	i915_active_fini(&cl->active);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (!cl->free)
+		__idle_cacheline_park(cl);
+	else
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+
+	GEM_BUG_ON(cacheline >= 64);
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->cacheline = cacheline;
+	cl->free = false;
+
+	i915_active_init(i915_gt_active(hwsp_to_i915(hwsp)),
+			 &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	if (!cl)
+		return;
+
+	GEM_BUG_ON(cl->free);
+	cl->free = true;
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,22 +216,32 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
 	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
+		timeline->hwsp_cacheline = cl;
 	}
 	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
 
 	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
+		cacheline_free(timeline->hwsp_cacheline);
 		i915_vma_put(hwsp);
 		return PTR_ERR(vaddr);
 	}
@@ -239,7 +329,7 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
+	cacheline_free(timeline->hwsp_cacheline);
 
 	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
 	i915_vma_put(timeline->hwsp_ggtt);
@@ -284,6 +374,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -293,6 +384,93 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	struct i915_vma *vma;
+	unsigned int cacheline;
+	int err;
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		goto err_hwsp;
+	}
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	__i915_vma_pin(tl->hwsp_ggtt);
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	return 0;
+
+err_cacheline:
+	kfree(cl);
+err_hwsp:
+	__idle_hwsp_free(vma->private, cacheline);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+int i915_timeline_read_lock(struct i915_timeline *tl, struct i915_request *rq)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	return i915_active_ref(&tl->hwsp_cacheline->active,
+			       rq->fence.context, rq);
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -300,6 +478,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 7bec7d2e45bf..d78ec6fbc000 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -49,6 +49,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -160,6 +162,11 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
+int i915_timeline_read_lock(struct i915_timeline *tl,
+			    struct i915_request *rq);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
 void i915_timelines_init(struct drm_i915_private *i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 26/28] drm/i915/execlists: Refactor out can_merge_rq()
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (23 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 25/28] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 27/28] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (6 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c767877533ef..9b330cb17c76 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 }
 
 __maybe_unused static inline bool
-assert_priority_queue(const struct intel_engine_execlists *execlists,
-		      const struct i915_request *prev,
+assert_priority_queue(const struct i915_request *prev,
 		      const struct i915_request *next)
 {
-	if (!prev)
-		return true;
+	const struct intel_engine_execlists *execlists =
+		&prev->engine->execlists;
 
 	/*
 	 * Without preemption, the prev may refer to the still active element
@@ -595,6 +594,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
 	return true;
 }
 
+static bool can_merge_rq(const struct i915_request *prev,
+			 const struct i915_request *next)
+{
+	GEM_BUG_ON(!assert_priority_queue(prev, next));
+
+	if (!can_merge_ctx(prev->hw_context, next->hw_context))
+		return false;
+
+	return true;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -747,8 +757,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
-			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
-
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
@@ -760,8 +768,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last &&
-			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
+			if (last && !can_merge_rq(last, rq)) {
+				if (last->hw_context == rq->hw_context)
+					goto done;
+
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -781,7 +791,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				    ctx_single_port_submission(rq->hw_context))
 					goto done;
 
-				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
 				if (submit)
 					port_assign(port, last);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 27/28] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (24 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 26/28] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  1:02 ` [PATCH 28/28] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we know have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

But what AB-BA deadlocks? If you remove B, there can be no deadlock...
The issue is that with a deep ELSP queue, we can queue up a pair of
AB-BA on different engines, thus forming a classic mutual exclusion
deadlock. We side-step that issue by restricting the queue depth to
avoid having multiple semaphores in flight and so we only ever take one
set of locks at a time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c       | 153 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_scheduler.c     |   1 +
 drivers/gpu/drm/i915/i915_scheduler.h     |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
 drivers/gpu/drm/i915/intel_lrc.c          |  14 +-
 8 files changed, 178 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 07e4c3c68ecd..6d825cd28ae6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -326,6 +327,76 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kfree(cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+	unsigned long flags;
+
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &signal->fence.flags))
+		return 0;
+
+	cb = kmalloc(sizeof(*cb), gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irqsave(&signal->lock, flags);
+	if (test_bit(I915_FENCE_FLAG_ACTIVE, &signal->fence.flags)) {
+		i915_sw_fence_complete(cb->fence);
+		kfree(cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irqrestore(&signal->lock, flags);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -373,6 +444,7 @@ void __i915_request_submit(struct i915_request *request)
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
+	__notify_execute_cb(request);
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -613,6 +685,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	}
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 
 	tl = ce->ring->timeline;
 	ret = i915_timeline_get_seqno(tl, rq, &seqno);
@@ -700,6 +773,81 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 *cs;
+	int err;
+
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+
+	err = i915_timeline_read_lock(from->timeline, to);
+	if (err)
+		return err;
+
+	/*
+	 * If we know our signaling request has started, we know that it
+	 * must, at least, have passed its initial breadcrumb and that its
+	 * seqno can only increase, therefore any change in its breadcrumb
+	 * must indicate completion. By using a "not equal to start" compare
+	 * we avoid the murky issue of how to handle seqno wraparound in an
+	 * async environment (short answer, we must stop the world whenever
+	 * any context wraps!) as the likelihood of missing one request then
+	 * seeing the same start value for a new request is 1 in 2^31, and
+	 * even then we know that the new request has started and is in
+	 * progress, so we are sure it will complete soon enough (not to
+	 * worry about).
+	 */
+	if (i915_request_started(from)) {
+		cs = intel_ring_begin(to, 4);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		*cs++ = MI_SEMAPHORE_WAIT |
+			MI_SEMAPHORE_GLOBAL_GTT |
+			MI_SEMAPHORE_POLL |
+			MI_SEMAPHORE_SAD_NEQ_SDD;
+		*cs++ = from->fence.seqno - 1;
+		*cs++ = from->timeline->hwsp_offset;
+		*cs++ = 0;
+
+		intel_ring_advance(to, cs);
+	} else {
+		int err;
+
+		err = i915_request_await_execution(to, from, gfp);
+		if (err)
+			return err;
+
+		cs = intel_ring_begin(to, 4);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		/*
+		 * Using greater-than-or-equal here means we have to worry
+		 * about seqno wraparound. To side step that issue, we swap
+		 * the timeline HWSP upon wrapping, so that everyone listening
+		 * for the old (pre-wrap) values do not see the much smaller
+		 * (post-wrap) values than they were expecting (and so wait
+		 * forever).
+		 */
+		*cs++ = MI_SEMAPHORE_WAIT |
+			MI_SEMAPHORE_GLOBAL_GTT |
+			MI_SEMAPHORE_POLL |
+			MI_SEMAPHORE_SAD_GTE_SDD;
+		*cs++ = from->fence.seqno;
+		*cs++ = from->timeline->hwsp_offset;
+		*cs++ = 0;
+
+		intel_ring_advance(to, cs);
+	}
+
+	to->sched.semaphore = true;
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -723,6 +871,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (HAS_EXECLISTS(to->i915) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 40f3e8dcbdd5..66a374ee177a 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -127,6 +127,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8ae68d2fc134..7f9c4018530b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -29,6 +29,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->semaphore = false;
 }
 
 static struct i915_dependency *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..d764cf10536f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -72,6 +72,7 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	bool semaphore;
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index b96a31bc1080..0efaadd3bc32 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -106,7 +106,12 @@
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
 #define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9b330cb17c76..f7f16b8d3422 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -415,7 +415,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
 	 */
-	if ((prio & ACTIVE_PRIORITY) != ACTIVE_PRIORITY) {
+	if ((prio & ACTIVE_PRIORITY) != ACTIVE_PRIORITY &&
+	    i915_request_started(active)) {
 		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
@@ -599,6 +600,17 @@ static bool can_merge_rq(const struct i915_request *prev,
 {
 	GEM_BUG_ON(!assert_priority_queue(prev, next));
 
+	/*
+	 * To avoid AB-BA deadlocks, we simply restrict ourselves to only
+	 * submitting one semaphore (think HW spinlock) to HW at a time. This
+	 * prevents the execution callback on a later sempahore from being
+	 * queued on another engine, so no cycle can be formed. Preemption
+	 * rules should mean that if this semaphore is preempted, its
+	 * dependency chain is preserved and suitably promoted via PI.
+	 */
+	if (prev->sched.semaphore && !i915_request_started(prev))
+		return false;
+
 	if (!can_merge_ctx(prev->hw_context, next->hw_context))
 		return false;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 28/28] drm/i915: Prioritise non-busywait semaphore workloads
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (25 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 27/28] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-01-28  1:02 ` Chris Wilson
  2019-01-28  2:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device Patchwork
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  1:02 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   |  8 +++++++-
 drivers/gpu/drm/i915/i915_scheduler.c |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.h | 11 +++++++----
 drivers/gpu/drm/i915/intel_lrc.c      |  5 +++--
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 6d825cd28ae6..30cd8724d39f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -844,7 +844,7 @@ emit_semaphore_wait(struct i915_request *to,
 		intel_ring_advance(to, cs);
 	}
 
-	to->sched.semaphore = true;
+	to->sched.semaphore |= I915_SCHED_HAS_SEMAPHORE;
 	return 0;
 }
 
@@ -867,6 +867,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 			return ret;
 	}
 
+	if (from->sched.semaphore && !i915_request_started(from))
+		to->sched.semaphore |= I915_SCHED_CHAIN_SEMAPHORE;
+
 	if (to->engine == from->engine) {
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
@@ -1117,6 +1120,9 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		if (!request->sched.semaphore)
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7f9c4018530b..702204d4e729 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -29,7 +29,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
-	node->semaphore = false;
+	node->semaphore = 0;
 }
 
 static struct i915_dependency *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index d764cf10536f..d84f09e8c248 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 struct i915_sched_attr {
 	/**
@@ -72,7 +73,9 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
-	bool semaphore;
+	unsigned long semaphore;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
+#define I915_SCHED_CHAIN_SEMAPHORE	BIT(1)
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f7f16b8d3422..4fcee493dddb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
@@ -608,7 +608,8 @@ static bool can_merge_rq(const struct i915_request *prev,
 	 * rules should mean that if this semaphore is preempted, its
 	 * dependency chain is preserved and suitably promoted via PI.
 	 */
-	if (prev->sched.semaphore && !i915_request_started(prev))
+	if (prev->sched.semaphore & I915_SCHED_HAS_SEMAPHORE &&
+	    !i915_request_started(prev))
 		return false;
 
 	if (!can_merge_ctx(prev->hw_context, next->hw_context))
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (26 preceding siblings ...)
  2019-01-28  1:02 ` [PATCH 28/28] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-01-28  2:33 ` Patchwork
  2019-01-28  2:46 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2019-01-28  2:33 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
URL   : https://patchwork.freedesktop.org/series/55819/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
d1a18083cdf2 drm/i915: Wait for a moment before forcibly resetting the device
5ad153d0bd5c drm/i915: Rename execlists->queue_priority to preempt_priority_hint
40c5003bdadf drm/i915/execlists: Suppress preempting self
-:14: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#14: 
value as it may match a second preemption request within the same time period

-:34: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")'
#34: 
References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")

-:310: WARNING:SUSPECT_CODE_INDENT: suspect code indent for conditional statements (8, 15)
#310: FILE: drivers/gpu/drm/i915/selftests/intel_lrc.c:321:
+	if (USES_GUC_SUBMISSION(i915))
+	       return 0; /* presume black blox */

-:311: WARNING:TABSTOP: Statements should start on a tabstop
#311: FILE: drivers/gpu/drm/i915/selftests/intel_lrc.c:322:
+	       return 0; /* presume black blox */

total: 1 errors, 3 warnings, 0 checks, 331 lines checked
df9f13ca4ec4 drm/i915/execlists: Suppress redundant preemption
7130ed077421 drm/i915/selftests: Exercise some AB...BA preemption chains
86ccc8e3bbae drm/i915: Stop tracking MRU activity on VMA
c13adc525292 drm/i915: Pull VM lists under the VM mutex.
14422ca9a152 drm/i915: Move vma lookup to its own lock
-:161: WARNING:USE_SPINLOCK_T: struct spinlock should be spinlock_t
#161: FILE: drivers/gpu/drm/i915/i915_gem_object.h:94:
+		struct spinlock lock;

total: 0 errors, 1 warnings, 0 checks, 290 lines checked
4ee1baed5b4f drm/i915: Always allocate an object/vma for the HWSP
db09c28b2b22 drm/i915: Add timeline barrier support
25fd0b43e01d drm/i915: Move list of timelines under its own lock
7668d538487a drm/i915: Introduce concept of per-timeline (context) HWSP
ea85f8149c30 drm/i915: Enlarge vma->pin_count
0bd928489b01 drm/i915: Allocate a status page for each timeline
acd492aca17a drm/i915: Share per-timeline HWSP using a slab suballocator
-:85: CHECK:SPACING: No space is necessary after a cast
#85: FILE: drivers/gpu/drm/i915/i915_timeline.c:49:
+	BUILD_BUG_ON(BITS_PER_TYPE(u64) * CACHELINE_BYTES > PAGE_SIZE);

total: 0 errors, 0 warnings, 1 checks, 408 lines checked
986a556f530d drm/i915: Track the context's seqno in its own timeline HWSP
054cafc5f0a9 drm/i915: Track active timelines
fc685c80d89a drm/i915: Identify active requests
e1f1d9e2cffc drm/i915: Remove the intel_engine_notify tracepoint
f3a2f67ac514 drm/i915: Replace global breadcrumbs with per-context interrupt tracking
-:18: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#18: 
Before commit 688e6c725816, the solution was simple. Every client

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#21: 
688e6c725816 introduced an rbtree so that only the earliest waiter on

-:56: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#56: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:56: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")'
#56: 
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")

-:2194: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct i915_gem_context *' should also have an identifier name
#2194: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:2194: WARNING:FUNCTION_ARGUMENTS: function definition argument 'struct intel_engine_cs *' should also have an identifier name
#2194: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:258:
+	struct i915_request *(*request_alloc)(struct i915_gem_context *,

-:2220: WARNING:LINE_SPACING: Missing a blank line after declarations
#2220: FILE: drivers/gpu/drm/i915/selftests/i915_request.c:284:
+	struct i915_request **requests;
+	I915_RND_STATE(prng);

-:2652: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#2652: 
deleted file mode 100644

total: 3 errors, 5 warnings, 0 checks, 2586 lines checked
65852a23b056 drm/i915: Drop fake breadcrumb irq
c1f917a45d01 drm/i915: Generalise GPU activity tracking
-:31: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#31: 
new file mode 100644

-:36: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#36: FILE: drivers/gpu/drm/i915/i915_active.c:1:
+/*

-:268: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#268: FILE: drivers/gpu/drm/i915/i915_active.h:1:
+/*

-:673: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#673: FILE: drivers/gpu/drm/i915/selftests/i915_active.c:1:
+/*

total: 0 errors, 4 warnings, 0 checks, 777 lines checked
f2e9ea6da317 drm/i915: Allocate active tracking nodes from a slabcache
45f78b1208c6 drm/i915: Pull i915_gem_active into the i915_active family
-:689: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#689: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:227:
+		ret = i915_active_request_retire(&vma->last_fence,
 					     &vma->obj->base.dev->struct_mutex);

-:698: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#698: FILE: drivers/gpu/drm/i915/i915_gem_fence_reg.c:236:
+		ret = i915_active_request_retire(&old->last_fence,
 					     &old->obj->base.dev->struct_mutex);

-:1405: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#1405: FILE: drivers/gpu/drm/i915/i915_vma.c:991:
+		ret = i915_active_request_retire(&vma->last_fence,
+					      &vma->vm->i915->drm.struct_mutex);

total: 0 errors, 0 warnings, 3 checks, 1378 lines checked
27f6d4f4cd1c drm/i915: Keep timeline HWSP allocated until the system is idle
14aaf2f6c78e drm/i915/execlists: Refactor out can_merge_rq()
e5a1d3d08737 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:310: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#310: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:312: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#312: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:313: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#313: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:314: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#314: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:315: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#315: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 274 lines checked
c24e37a562af drm/i915: Prioritise non-busywait semaphore workloads

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (27 preceding siblings ...)
  2019-01-28  2:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device Patchwork
@ 2019-01-28  2:46 ` Patchwork
  2019-01-28  2:57 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2019-01-28  2:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
URL   : https://patchwork.freedesktop.org/series/55819/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Wait for a moment before forcibly resetting the device
Okay!

Commit: drm/i915: Rename execlists->queue_priority to preempt_priority_hint
Okay!

Commit: drm/i915/execlists: Suppress preempting self
-drivers/gpu/drm/i915/intel_ringbuffer.h:601:23: warning: expression using sizeof(void)

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915/selftests: Exercise some AB...BA preemption chains
Okay!

Commit: drm/i915: Stop tracking MRU activity on VMA
Okay!

Commit: drm/i915: Pull VM lists under the VM mutex.
Okay!

Commit: drm/i915: Move vma lookup to its own lock
Okay!

Commit: drm/i915: Always allocate an object/vma for the HWSP
Okay!

Commit: drm/i915: Add timeline barrier support
Okay!

Commit: drm/i915: Move list of timelines under its own lock
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3541:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)

Commit: drm/i915: Introduce concept of per-timeline (context) HWSP
Okay!

Commit: drm/i915: Enlarge vma->pin_count
Okay!

Commit: drm/i915: Allocate a status page for each timeline
+./include/linux/mm.h:619:13: error: not a function <noident>
+./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/mm.h:619:13: warning: call with no type!

Commit: drm/i915: Share per-timeline HWSP using a slab suballocator
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3544:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:89:38: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_timeline.c:92:44: warning: expression using sizeof(void)
+./include/linux/slab.h:664:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/slab.h:664:13: warning: call with no type!

Commit: drm/i915: Track the context's seqno in its own timeline HWSP
Okay!

Commit: drm/i915: Track active timelines
Okay!

Commit: drm/i915: Identify active requests
Okay!

Commit: drm/i915: Remove the intel_engine_notify tracepoint
Okay!

Commit: drm/i915: Replace global breadcrumbs with per-context interrupt tracking
+drivers/gpu/drm/i915/selftests/i915_request.c:280:40: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_request.c:280:40: warning: expression using sizeof(void)
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: not a function <noident>
-./include/linux/mm.h:619:13: error: undefined identifier '__builtin_mul_overflow'
-./include/linux/mm.h:619:13: warning: call with no type!
+./include/linux/slab.h:664:13: error: not a function <noident>
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Drop fake breadcrumb irq
Okay!

Commit: drm/i915: Generalise GPU activity tracking
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Allocate active tracking nodes from a slabcache
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3548:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3550:16: warning: expression using sizeof(void)

Commit: drm/i915: Pull i915_gem_active into the i915_active family
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until the system is idle
Okay!

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
Okay!

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (28 preceding siblings ...)
  2019-01-28  2:46 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-01-28  2:57 ` Patchwork
  2019-01-28  4:23 ` ✗ Fi.CI.IGT: failure " Patchwork
  2019-01-28  9:24 ` [PATCH 01/28] " Mika Kuoppala
  31 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2019-01-28  2:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
URL   : https://patchwork.freedesktop.org/series/55819/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5490 -> Patchwork_12055
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/55819/revisions/1/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12055 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_frontbuffer_tracking@basic:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103167]

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

#### Possible fixes ####

  * igt@gem_exec_reloc@basic-gtt-noreloc:
    - fi-skl-guc:         {SKIP} [fdo#109271] -> PASS +81

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS

  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278

Participating hosts (43 -> 38)
------------------------------

  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-apl-guc 

Build changes
-------------

    * Linux: CI_DRM_5490 -> Patchwork_12055

  CI_DRM_5490: 310d38b4b51e06ef7096716430e2ef262c3e45fe @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4790: dcdf4b04e16312f8f52ad389388d834f9d74b8f0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12055: c24e37a562af5f885681aad621d1f6b25aca65a9 @ git://anongit.freedesktop.org/gfx-ci/linux

== Linux commits ==

c24e37a562af drm/i915: Prioritise non-busywait semaphore workloads
e5a1d3d08737 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
14aaf2f6c78e drm/i915/execlists: Refactor out can_merge_rq()
27f6d4f4cd1c drm/i915: Keep timeline HWSP allocated until the system is idle
45f78b1208c6 drm/i915: Pull i915_gem_active into the i915_active family
f2e9ea6da317 drm/i915: Allocate active tracking nodes from a slabcache
c1f917a45d01 drm/i915: Generalise GPU activity tracking
65852a23b056 drm/i915: Drop fake breadcrumb irq
f3a2f67ac514 drm/i915: Replace global breadcrumbs with per-context interrupt tracking
e1f1d9e2cffc drm/i915: Remove the intel_engine_notify tracepoint
fc685c80d89a drm/i915: Identify active requests
054cafc5f0a9 drm/i915: Track active timelines
986a556f530d drm/i915: Track the context's seqno in its own timeline HWSP
acd492aca17a drm/i915: Share per-timeline HWSP using a slab suballocator
0bd928489b01 drm/i915: Allocate a status page for each timeline
ea85f8149c30 drm/i915: Enlarge vma->pin_count
7668d538487a drm/i915: Introduce concept of per-timeline (context) HWSP
25fd0b43e01d drm/i915: Move list of timelines under its own lock
db09c28b2b22 drm/i915: Add timeline barrier support
4ee1baed5b4f drm/i915: Always allocate an object/vma for the HWSP
14422ca9a152 drm/i915: Move vma lookup to its own lock
c13adc525292 drm/i915: Pull VM lists under the VM mutex.
86ccc8e3bbae drm/i915: Stop tracking MRU activity on VMA
7130ed077421 drm/i915/selftests: Exercise some AB...BA preemption chains
df9f13ca4ec4 drm/i915/execlists: Suppress redundant preemption
40c5003bdadf drm/i915/execlists: Suppress preempting self
5ad153d0bd5c drm/i915: Rename execlists->queue_priority to preempt_priority_hint
d1a18083cdf2 drm/i915: Wait for a moment before forcibly resetting the device

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12055/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (29 preceding siblings ...)
  2019-01-28  2:57 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-01-28  4:23 ` Patchwork
  2019-01-28  9:24 ` [PATCH 01/28] " Mika Kuoppala
  31 siblings, 0 replies; 39+ messages in thread
From: Patchwork @ 2019-01-28  4:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device
URL   : https://patchwork.freedesktop.org/series/55819/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5490_full -> Patchwork_12055_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12055_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12055_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12055_full:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@mock_timelines:
    - shard-glk:          PASS -> DMESG-WARN
    - shard-hsw:          PASS -> DMESG-WARN
    - shard-apl:          PASS -> DMESG-WARN

  
Known issues
------------

  Here are the changes found in Patchwork_12055_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_eio@reset-stress:
    - shard-snb:          PASS -> INCOMPLETE [fdo#105411]

  * igt@gem_exec_schedule@pi-ringfull-blt:
    - shard-glk:          NOTRUN -> FAIL [fdo#103158]
    - shard-apl:          NOTRUN -> FAIL [fdo#103158]

  * igt@gem_exec_schedule@pi-ringfull-bsd2:
    - shard-kbl:          NOTRUN -> FAIL [fdo#103158] +1

  * igt@kms_color@pipe-a-degamma:
    - shard-apl:          PASS -> FAIL [fdo#104782] / [fdo#108145]

  * igt@kms_cursor_crc@cursor-64x64-dpms:
    - shard-apl:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-64x64-suspend:
    - shard-glk:          PASS -> FAIL [fdo#103232] +2

  * igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
    - shard-glk:          NOTRUN -> FAIL [fdo#105454]

  * igt@kms_flip@2x-flip-vs-expired-vblank:
    - shard-glk:          PASS -> FAIL [fdo#105363]

  * igt@kms_plane@pixel-format-pipe-c-planes:
    - shard-apl:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-glk:          NOTRUN -> FAIL [fdo#108145]
    - shard-apl:          NOTRUN -> FAIL [fdo#108145]

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-none:
    - shard-glk:          PASS -> FAIL [fdo#103166] +3

  * igt@kms_setmode@basic:
    - shard-kbl:          PASS -> FAIL [fdo#99912]

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-glk:          PASS -> INCOMPLETE [fdo#103359] / [k.org#198133]

  * igt@syncobj_wait@single-wait-all-for-submit-unsubmitted:
    - shard-apl:          PASS -> INCOMPLETE [fdo#103927]

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-contexts-1us:
    - shard-kbl:          DMESG-WARN -> PASS +2
    - shard-apl:          DMESG-WARN -> PASS +3

  * igt@gem_eio@in-flight-external:
    - shard-glk:          DMESG-WARN -> PASS +3

  * igt@i915_selftest@live_requests:
    - shard-apl:          DMESG-FAIL -> PASS

  * igt@kms_cursor_crc@cursor-64x64-sliding:
    - shard-glk:          FAIL [fdo#103232] -> PASS +1

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-glk:          FAIL [fdo#102887] / [fdo#105363] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - shard-apl:          FAIL [fdo#103166] -> PASS +2

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-y:
    - shard-glk:          FAIL [fdo#103166] -> PASS

  * igt@kms_rotation_crc@multiplane-rotation-cropping-top:
    - shard-apl:          DMESG-FAIL [fdo#108950] -> PASS

  * igt@perf_pmu@rc6-runtime-pm-long:
    - shard-kbl:          FAIL [fdo#105010] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102887]: https://bugs.freedesktop.org/show_bug.cgi?id=102887
  [fdo#103158]: https://bugs.freedesktop.org/show_bug.cgi?id=103158
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105010]: https://bugs.freedesktop.org/show_bug.cgi?id=105010
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#105454]: https://bugs.freedesktop.org/show_bug.cgi?id=105454
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108950]: https://bugs.freedesktop.org/show_bug.cgi?id=108950
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109373]: https://bugs.freedesktop.org/show_bug.cgi?id=109373
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133
  [k.org#202321]: https://bugzilla.kernel.org/show_bug.cgi?id=202321


Participating hosts (7 -> 5)
------------------------------

  Missing    (2): shard-skl shard-iclb 


Build changes
-------------

    * Linux: CI_DRM_5490 -> Patchwork_12055

  CI_DRM_5490: 310d38b4b51e06ef7096716430e2ef262c3e45fe @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4790: dcdf4b04e16312f8f52ad389388d834f9d74b8f0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12055: c24e37a562af5f885681aad621d1f6b25aca65a9 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12055/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] drm/i915: Allocate active tracking nodes from a slabcache
  2019-01-28  1:02 ` [PATCH 22/28] drm/i915: Generalise GPU activity tracking Chris Wilson
@ 2019-01-28  8:09   ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  8:09 UTC (permalink / raw)
  To: intel-gfx

Wrap the active tracking for a GPU references in a slabcache for faster
allocations, and keep track of inflight nodes so we can reap the
stale entries upon parking (thereby trimming our memory usage).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
Move i915_gt_active_fini() out of struct_mutex for mock device teardown,
beware the rcu_barrier() inside kmem_cache_destroy.
---
 drivers/gpu/drm/i915/i915_active.c            | 55 ++++++++++++++++---
 drivers/gpu/drm/i915/i915_active.h            | 29 +++++++++-
 drivers/gpu/drm/i915/i915_drv.h               |  2 +
 drivers/gpu/drm/i915/i915_gem.c               | 16 +++++-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  2 +-
 drivers/gpu/drm/i915/i915_vma.c               |  3 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  3 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  6 ++
 8 files changed, 100 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index e0182e19cb8b..3c7abbde42ac 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -7,7 +7,9 @@
 #include "i915_drv.h"
 #include "i915_active.h"
 
-#define BKL(ref) (&(ref)->i915->drm.struct_mutex)
+#define i915_from_gt(x) \
+	container_of(x, struct drm_i915_private, gt.active_refs)
+#define BKL(ref) (&i915_from_gt((ref)->gt)->drm.struct_mutex)
 
 struct active_node {
 	struct i915_gem_active base;
@@ -79,11 +81,11 @@ active_instance(struct i915_active *ref, u64 idx)
 			p = &parent->rb_left;
 	}
 
-	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	node = kmem_cache_alloc(ref->gt->slab_cache, GFP_KERNEL);
 
 	/* kmalloc may retire the ref->last (thanks shrinker)! */
 	if (unlikely(!i915_gem_active_raw(&ref->last, BKL(ref)))) {
-		kfree(node);
+		kmem_cache_free(ref->gt->slab_cache, node);
 		goto out;
 	}
 
@@ -94,6 +96,9 @@ active_instance(struct i915_active *ref, u64 idx)
 	node->ref = ref;
 	node->timeline = idx;
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		list_add(&ref->active_link, &ref->gt->active_refs);
+
 	rb_link_node(&node->node, parent, p);
 	rb_insert_color(&node->node, &ref->tree);
 
@@ -119,11 +124,11 @@ active_instance(struct i915_active *ref, u64 idx)
 	return &ref->last;
 }
 
-void i915_active_init(struct drm_i915_private *i915,
+void i915_active_init(struct i915_gt_active *gt,
 		      struct i915_active *ref,
 		      void (*retire)(struct i915_active *ref))
 {
-	ref->i915 = i915;
+	ref->gt = gt;
 	ref->retire = retire;
 	ref->tree = RB_ROOT;
 	init_request_active(&ref->last, last_retire);
@@ -161,6 +166,7 @@ void i915_active_release(struct i915_active *ref)
 
 int i915_active_wait(struct i915_active *ref)
 {
+	struct kmem_cache *slab = ref->gt->slab_cache;
 	struct active_node *it, *n;
 	int ret;
 
@@ -168,15 +174,19 @@ int i915_active_wait(struct i915_active *ref)
 	if (ret)
 		return ret;
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return 0;
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		ret = i915_gem_active_retire(&it->base, BKL(ref));
 		if (ret)
 			return ret;
 
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
+	list_del(&ref->active_link);
 
 	return 0;
 }
@@ -210,15 +220,46 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref)
 
 void i915_active_fini(struct i915_active *ref)
 {
+	struct kmem_cache *slab = ref->gt->slab_cache;
 	struct active_node *it, *n;
 
+	lockdep_assert_held(BKL(ref));
 	GEM_BUG_ON(i915_gem_active_isset(&ref->last));
 
+	if (RB_EMPTY_ROOT(&ref->tree))
+		return;
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
 		GEM_BUG_ON(i915_gem_active_isset(&it->base));
-		kfree(it);
+		kmem_cache_free(slab, it);
 	}
 	ref->tree = RB_ROOT;
+	list_del(&ref->active_link);
+}
+
+int i915_gt_active_init(struct i915_gt_active *gt)
+{
+	gt->slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
+	if (!gt->slab_cache)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&gt->active_refs);
+
+	return 0;
+}
+
+void i915_gt_active_park(struct i915_gt_active *gt)
+{
+	struct i915_active *it, *n;
+
+	list_for_each_entry_safe(it, n, &gt->active_refs, active_link)
+		i915_active_fini(it);
+}
+
+void i915_gt_active_fini(struct i915_gt_active *gt)
+{
+	GEM_BUG_ON(!list_empty(&gt->active_refs));
+	kmem_cache_destroy(gt->slab_cache);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 0057b84565f7..eed34e8903fc 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -7,11 +7,13 @@
 #ifndef _I915_ACTIVE_H_
 #define _I915_ACTIVE_H_
 
+#include <linux/list.h>
 #include <linux/rbtree.h>
 
 #include "i915_request.h"
 
-struct drm_i915_private;
+struct i915_gt_active;
+struct kmem_cache;
 
 /*
  * GPU activity tracking
@@ -40,7 +42,8 @@ struct drm_i915_private;
  */
 
 struct i915_active {
-	struct drm_i915_private *i915;
+	struct i915_gt_active *gt;
+	struct list_head active_link;
 
 	struct rb_root tree;
 	struct i915_gem_active last;
@@ -49,7 +52,7 @@ struct i915_active {
 	void (*retire)(struct i915_active *ref);
 };
 
-void i915_active_init(struct drm_i915_private *i915,
+void i915_active_init(struct i915_gt_active *gt,
 		      struct i915_active *ref,
 		      void (*retire)(struct i915_active *ref));
 
@@ -73,4 +76,24 @@ i915_active_is_idle(const struct i915_active *ref)
 
 void i915_active_fini(struct i915_active *ref);
 
+/*
+ * Active refs memory management
+ *
+ * To be more economical with memory, we reap all the i915_active trees on
+ * parking the GPU (when we know the GPU is inactive) and allocate the nodes
+ * from a local slab cache to hopefully reduce the fragmentation as we will
+ * then be able to free all pages en masse upon idling.
+ */
+
+struct i915_gt_active {
+	struct list_head active_refs;
+	struct kmem_cache *slab_cache;
+};
+
+int i915_gt_active_init(struct i915_gt_active *gt);
+void i915_gt_active_park(struct i915_gt_active *gt);
+void i915_gt_active_fini(struct i915_gt_active *gt);
+
+#define i915_gt_active(i915) (&(i915)->gt.active_refs)
+
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d072f3369ee1..4a2387f89eb0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1984,6 +1984,8 @@ struct drm_i915_private {
 			struct list_head hwsp_free_list;
 		} timelines;
 
+		struct i915_gt_active active_refs;
+
 		struct list_head active_rings;
 		struct list_head closed_vma;
 		u32 active_requests;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e802af64d628..8edc00212118 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -171,6 +171,7 @@ static u32 __i915_gem_park(struct drm_i915_private *i915)
 
 	intel_engines_park(i915);
 	i915_timelines_park(i915);
+	i915_gt_active_park(i915_gt_active(i915));
 
 	i915_pmu_gt_parked(i915);
 	i915_vma_parked(i915);
@@ -5031,15 +5032,19 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		dev_priv->gt.cleanup_engine = intel_engine_cleanup;
 	}
 
+	ret = i915_gt_active_init(i915_gt_active(dev_priv));
+	if (ret)
+		return ret;
+
 	i915_timelines_init(dev_priv);
 
 	ret = i915_gem_init_userptr(dev_priv);
 	if (ret)
-		return ret;
+		goto err_timelines;
 
 	ret = intel_uc_init_misc(dev_priv);
 	if (ret)
-		return ret;
+		goto err_userptr;
 
 	ret = intel_wopcm_init(&dev_priv->wopcm);
 	if (ret)
@@ -5155,9 +5160,13 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_uc_misc:
 	intel_uc_fini_misc(dev_priv);
 
-	if (ret != -EIO) {
+err_userptr:
+	if (ret != -EIO)
 		i915_gem_cleanup_userptr(dev_priv);
+err_timelines:
+	if (ret != -EIO) {
 		i915_timelines_fini(dev_priv);
+		i915_gt_active_fini(i915_gt_active(dev_priv));
 	}
 
 	if (ret == -EIO) {
@@ -5210,6 +5219,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv)
 	intel_uc_fini_misc(dev_priv);
 	i915_gem_cleanup_userptr(dev_priv);
 	i915_timelines_fini(dev_priv);
+	i915_gt_active_fini(i915_gt_active(dev_priv));
 
 	i915_gem_drain_freed_objects(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e625659c03a2..d8819de0d6ee 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1917,7 +1917,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	if (!vma)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(i915, &vma->active, NULL);
+	i915_active_init(i915_gt_active(i915), &vma->active, NULL);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = &ggtt->vm;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d4772061e642..2456bfb4877b 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -119,7 +119,8 @@ vma_create(struct drm_i915_gem_object *obj,
 	if (vma == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	i915_active_init(vm->i915, &vma->active, __i915_vma_retire);
+	i915_active_init(i915_gt_active(vm->i915),
+			 &vma->active, __i915_vma_retire);
 	init_request_active(&vma->last_fence, NULL);
 
 	vma->vm = vm;
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 7c5c3068565b..0e923476920e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -30,7 +30,8 @@ static int __live_active_setup(struct drm_i915_private *i915,
 	unsigned int count = 0;
 	int err = 0;
 
-	i915_active_init(i915, &active->base, __live_active_retire);
+	i915_active_init(i915_gt_active(i915),
+			 &active->base, __live_active_retire);
 	active->retired = false;
 
 	if (!i915_active_acquire(&active->base)) {
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 14ae46fda49f..40d7813a024e 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -69,6 +69,7 @@ static void mock_device_release(struct drm_device *dev)
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	i915_timelines_fini(i915);
+	i915_gt_active_fini(i915_gt_active(i915));
 
 	drain_workqueue(i915->wq);
 	i915_gem_drain_freed_objects(i915);
@@ -227,6 +228,9 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->priorities)
 		goto err_dependencies;
 
+	if (i915_gt_active_init(i915_gt_active(i915)))
+		goto err_priorities;
+
 	i915_timelines_init(i915);
 
 	INIT_LIST_HEAD(&i915->gt.active_rings);
@@ -256,6 +260,8 @@ struct drm_i915_private *mock_gem_device(void)
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_timelines_fini(i915);
+	i915_gt_active_fini(i915_gt_active(i915));
+err_priorities:
 	kmem_cache_destroy(i915->priorities);
 err_dependencies:
 	kmem_cache_destroy(i915->dependencies);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
                   ` (30 preceding siblings ...)
  2019-01-28  4:23 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2019-01-28  9:24 ` Mika Kuoppala
  2019-01-28  9:38   ` Chris Wilson
  31 siblings, 1 reply; 39+ messages in thread
From: Mika Kuoppala @ 2019-01-28  9:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> During igt, we ask to reset the device if any requests are still
> outstanding at the end of a test, as this quickly kills off any
> erroneous hanging request streams that may escape a test. However, since
> it may take the device a few milliseconds to flush itself after the end
> of a normal test, *cough* guc *cough*, we may accidentally tell the
> device to reset itself after it idles. If we wait a moment, our usual
> I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
> than umpteen hangchecks!), we can differentiate better between a stuck
> engine and a healthy one, and so avoid prematurely forcing the reset and
> any extra complications that may entail.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 3b995f9fdc06..e46de507fea2 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4051,7 +4051,8 @@ i915_drop_caches_set(void *data, u64 val)
>  		  val, val & DROP_ALL);
>  	wakeref = intel_runtime_pm_get(i915);
>  
> -	if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
> +	if (val & DROP_RESET_ACTIVE &&
> +	    wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
>  		i915_gem_set_wedged(i915);

Some of the compilications have been welcomed. But it is still
better to try to entail them into tests explicitly rather
than using indirect test harness stress.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


>  
>  	/* No need to check and wait for gpu resets, only libdrm auto-restarts
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device
  2019-01-28  9:24 ` [PATCH 01/28] " Mika Kuoppala
@ 2019-01-28  9:38   ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28  9:38 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-01-28 09:24:12)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > During igt, we ask to reset the device if any requests are still
> > outstanding at the end of a test, as this quickly kills off any
> > erroneous hanging request streams that may escape a test. However, since
> > it may take the device a few milliseconds to flush itself after the end
> > of a normal test, *cough* guc *cough*, we may accidentally tell the
> > device to reset itself after it idles. If we wait a moment, our usual
> > I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
> > than umpteen hangchecks!), we can differentiate better between a stuck
> > engine and a healthy one, and so avoid prematurely forcing the reset and
> > any extra complications that may entail.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 3b995f9fdc06..e46de507fea2 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -4051,7 +4051,8 @@ i915_drop_caches_set(void *data, u64 val)
> >                 val, val & DROP_ALL);
> >       wakeref = intel_runtime_pm_get(i915);
> >  
> > -     if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
> > +     if (val & DROP_RESET_ACTIVE &&
> > +         wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
> >               i915_gem_set_wedged(i915);
> 
> Some of the compilications have been welcomed. But it is still
> better to try to entail them into tests explicitly rather
> than using indirect test harness stress.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Pushed to mask potential problems in *-guc BAT.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA
  2019-01-28  1:02 ` [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
@ 2019-01-28 10:09   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2019-01-28 10:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 28/01/2019 01:02, Chris Wilson wrote:
> Our goal is to remove struct_mutex and replace it with fine grained
> locking. One of the thorny issues is our eviction logic for reclaiming
> space for an execbuffer (or GTT mmaping, among a few other examples).
> While eviction itself is easy to move under a per-VM mutex, performing
> the activity tracking is less agreeable. One solution is not to do any
> MRU tracking and do a simple coarse evaluation during eviction of
> active/inactive, with a loose temporal ordering of last
> insertion/evaluation. That keeps all the locking constrained to when we
> are manipulating the VM itself, neatly avoiding the tricky handling of
> possible recursive locking during execbuf and elsewhere.
> 
> Note that discarding the MRU (currently implemented as a pair of lists,
> to avoid scanning the active list for a NONBLOCKING search) is unlikely
> to impact upon our efficiency to reclaim VM space (where we think a LRU
> model is best) as our current strategy is to use random idle replacement
> first before doing a search, and over time the use of softpinned 48b
> per-ppGTT is growing (thereby eliminating any need to perform any eviction
> searches, in theory at least) with the remaining users being found on
> much older devices (gen2-gen6).
> 
> v2: Changelog and commentary rewritten to elaborate on the duality of a
> single list being both an inactive and active list.
> v3: Consolidate bool parameters into a single set of flags; don't
> comment on the duality of a single variable being a multiplicity of
> bits.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_gem.c               | 10 +--
>   drivers/gpu/drm/i915/i915_gem_evict.c         | 87 +++++++++++--------
>   drivers/gpu/drm/i915/i915_gem_gtt.c           | 15 ++--
>   drivers/gpu/drm/i915/i915_gem_gtt.h           | 26 +-----
>   drivers/gpu/drm/i915/i915_gem_shrinker.c      |  8 +-
>   drivers/gpu/drm/i915/i915_gem_stolen.c        |  3 +-
>   drivers/gpu/drm/i915/i915_gpu_error.c         | 42 ++++-----
>   drivers/gpu/drm/i915/i915_vma.c               |  9 +-
>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |  2 +-
>   10 files changed, 95 insertions(+), 111 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index dcbe644869b3..12a0a80bc989 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -255,10 +255,7 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
>   
>   	pinned = ggtt->vm.reserved;
>   	mutex_lock(&dev->struct_mutex);
> -	list_for_each_entry(vma, &ggtt->vm.active_list, vm_link)
> -		if (i915_vma_is_pinned(vma))
> -			pinned += vma->node.size;
> -	list_for_each_entry(vma, &ggtt->vm.inactive_list, vm_link)
> +	list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link)
>   		if (i915_vma_is_pinned(vma))
>   			pinned += vma->node.size;
>   	mutex_unlock(&dev->struct_mutex);
> @@ -1541,13 +1538,10 @@ static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
>   	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
>   
>   	for_each_ggtt_vma(vma, obj) {
> -		if (i915_vma_is_active(vma))
> -			continue;
> -
>   		if (!drm_mm_node_allocated(&vma->node))
>   			continue;
>   
> -		list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
> +		list_move_tail(&vma->vm_link, &vma->vm->bound_list);
>   	}
>   
>   	i915 = to_i915(obj->base.dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index f6855401f247..d76839670632 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -126,31 +126,25 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   	struct drm_i915_private *dev_priv = vm->i915;
>   	struct drm_mm_scan scan;
>   	struct list_head eviction_list;
> -	struct list_head *phases[] = {
> -		&vm->inactive_list,
> -		&vm->active_list,
> -		NULL,
> -	}, **phase;
>   	struct i915_vma *vma, *next;
>   	struct drm_mm_node *node;
>   	enum drm_mm_insert_mode mode;
> +	struct i915_vma *active;
>   	int ret;
>   
>   	lockdep_assert_held(&vm->i915->drm.struct_mutex);
>   	trace_i915_gem_evict(vm, min_size, alignment, flags);
>   
>   	/*
> -	 * The goal is to evict objects and amalgamate space in LRU order.
> -	 * The oldest idle objects reside on the inactive list, which is in
> -	 * retirement order. The next objects to retire are those in flight,
> -	 * on the active list, again in retirement order.
> +	 * The goal is to evict objects and amalgamate space in rough LRU order.
> +	 * Since both active and inactive objects reside on the same list,
> +	 * in a mix of creation and last scanned order, as we process the list
> +	 * we sort it into inactive/active, which keeps the active portion
> +	 * in a rough MRU order.
>   	 *
>   	 * The retirement sequence is thus:
> -	 *   1. Inactive objects (already retired)
> -	 *   2. Active objects (will stall on unbinding)
> -	 *
> -	 * On each list, the oldest objects lie at the HEAD with the freshest
> -	 * object on the TAIL.
> +	 *   1. Inactive objects (already retired, random order)
> +	 *   2. Active objects (will stall on unbinding, oldest scanned first)
>   	 */
>   	mode = DRM_MM_INSERT_BEST;
>   	if (flags & PIN_HIGH)
> @@ -169,17 +163,46 @@ i915_gem_evict_something(struct i915_address_space *vm,
>   	 */
>   	if (!(flags & PIN_NONBLOCK))
>   		i915_retire_requests(dev_priv);
> -	else
> -		phases[1] = NULL;
>   
>   search_again:
> +	active = NULL;
>   	INIT_LIST_HEAD(&eviction_list);
> -	phase = phases;
> -	do {
> -		list_for_each_entry(vma, *phase, vm_link)
> -			if (mark_free(&scan, vma, flags, &eviction_list))
> -				goto found;
> -	} while (*++phase);
> +	list_for_each_entry_safe(vma, next, &vm->bound_list, vm_link) {
> +		/*
> +		 * We keep this list in a rough least-recently scanned order
> +		 * of active elements (inactive elements are cheap to reap).
> +		 * New entries are added to the end, and we move anything we
> +		 * scan to the end. The assumption is that the working set
> +		 * of applications is either steady state (and thanks to the
> +		 * userspace bo cache it almost always is) or volatile and
> +		 * frequently replaced after a frame, which are self-evicting!
> +		 * Given that assumption, the MRU order of the scan list is
> +		 * fairly static, and keeping it in least-recently scan order
> +		 * is suitable.
> +		 *
> +		 * To notice when we complete one full cycle, we record the
> +		 * first active element seen, before moving it to the tail.
> +		 */
> +		if (i915_vma_is_active(vma)) {
> +			if (vma == active) {
> +				if (flags & PIN_NONBLOCK)
> +					break;
> +
> +				active = ERR_PTR(-EAGAIN);
> +			}
> +
> +			if (active != ERR_PTR(-EAGAIN)) {
> +				if (!active)
> +					active = vma;
> +
> +				list_move_tail(&vma->vm_link, &vm->bound_list);
> +				continue;
> +			}
> +		}
> +
> +		if (mark_free(&scan, vma, flags, &eviction_list))
> +			goto found;
> +	}
>   
>   	/* Nothing found, clean up and bail out! */
>   	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
> @@ -388,11 +411,6 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
>    */
>   int i915_gem_evict_vm(struct i915_address_space *vm)
>   {
> -	struct list_head *phases[] = {
> -		&vm->inactive_list,
> -		&vm->active_list,
> -		NULL
> -	}, **phase;
>   	struct list_head eviction_list;
>   	struct i915_vma *vma, *next;
>   	int ret;
> @@ -412,16 +430,13 @@ int i915_gem_evict_vm(struct i915_address_space *vm)
>   	}
>   
>   	INIT_LIST_HEAD(&eviction_list);
> -	phase = phases;
> -	do {
> -		list_for_each_entry(vma, *phase, vm_link) {
> -			if (i915_vma_is_pinned(vma))
> -				continue;
> +	list_for_each_entry(vma, &vm->bound_list, vm_link) {
> +		if (i915_vma_is_pinned(vma))
> +			continue;
>   
> -			__i915_vma_pin(vma);
> -			list_add(&vma->evict_link, &eviction_list);
> -		}
> -	} while (*++phase);
> +		__i915_vma_pin(vma);
> +		list_add(&vma->evict_link, &eviction_list);
> +	}
>   
>   	ret = 0;
>   	list_for_each_entry_safe(vma, next, &eviction_list, evict_link) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 9081e3bc5a59..2ad9070a54c1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -491,9 +491,8 @@ static void i915_address_space_init(struct i915_address_space *vm, int subclass)
>   
>   	stash_init(&vm->free_pages);
>   
> -	INIT_LIST_HEAD(&vm->active_list);
> -	INIT_LIST_HEAD(&vm->inactive_list);
>   	INIT_LIST_HEAD(&vm->unbound_list);
> +	INIT_LIST_HEAD(&vm->bound_list);
>   }
>   
>   static void i915_address_space_fini(struct i915_address_space *vm)
> @@ -2111,8 +2110,7 @@ void i915_ppgtt_close(struct i915_address_space *vm)
>   static void ppgtt_destroy_vma(struct i915_address_space *vm)
>   {
>   	struct list_head *phases[] = {
> -		&vm->active_list,
> -		&vm->inactive_list,
> +		&vm->bound_list,
>   		&vm->unbound_list,
>   		NULL,
>   	}, **phase;
> @@ -2135,8 +2133,7 @@ void i915_ppgtt_release(struct kref *kref)
>   
>   	ppgtt_destroy_vma(&ppgtt->vm);
>   
> -	GEM_BUG_ON(!list_empty(&ppgtt->vm.active_list));
> -	GEM_BUG_ON(!list_empty(&ppgtt->vm.inactive_list));
> +	GEM_BUG_ON(!list_empty(&ppgtt->vm.bound_list));
>   	GEM_BUG_ON(!list_empty(&ppgtt->vm.unbound_list));
>   
>   	ppgtt->vm.cleanup(&ppgtt->vm);
> @@ -2801,8 +2798,7 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
>   	mutex_lock(&dev_priv->drm.struct_mutex);
>   	i915_gem_fini_aliasing_ppgtt(dev_priv);
>   
> -	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
> -	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link)
> +	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link)
>   		WARN_ON(i915_vma_unbind(vma));
>   
>   	if (drm_mm_node_allocated(&ggtt->error_capture))
> @@ -3514,8 +3510,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
>   	ggtt->vm.closed = true; /* skip rewriting PTE on VMA unbind */
>   
>   	/* clflush objects bound into the GGTT and rebind them. */
> -	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
> -	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link) {
> +	list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) {
>   		struct drm_i915_gem_object *obj = vma->obj;
>   
>   		if (!(vma->flags & I915_VMA_GLOBAL_BIND))
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index a0039ea97cdc..bd679c8c56dd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -299,32 +299,12 @@ struct i915_address_space {
>   	struct i915_page_directory_pointer *scratch_pdp; /* GEN8+ & 48b PPGTT */
>   
>   	/**
> -	 * List of objects currently involved in rendering.
> -	 *
> -	 * Includes buffers having the contents of their GPU caches
> -	 * flushed, not necessarily primitives. last_read_req
> -	 * represents when the rendering involved will be completed.
> -	 *
> -	 * A reference is held on the buffer while on this list.
> +	 * List of vma currently bound.
>   	 */
> -	struct list_head active_list;
> +	struct list_head bound_list;
>   
>   	/**
> -	 * LRU list of objects which are not in the ringbuffer and
> -	 * are ready to unbind, but are still in the GTT.
> -	 *
> -	 * last_read_req is NULL while an object is in this list.
> -	 *
> -	 * A reference is not held on the buffer while on this list,
> -	 * as merely being GTT-bound shouldn't prevent its being
> -	 * freed, and we'll pull it off the list in the free path.
> -	 */
> -	struct list_head inactive_list;
> -
> -	/**
> -	 * List of vma that have been unbound.
> -	 *
> -	 * A reference is not held on the buffer while on this list.
> +	 * List of vma that are not unbound.
>   	 */
>   	struct list_head unbound_list;
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> index 8ceecb026910..a76d6c95c824 100644
> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> @@ -462,9 +462,13 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr
>   
>   	/* We also want to clear any cached iomaps as they wrap vmap */
>   	list_for_each_entry_safe(vma, next,
> -				 &i915->ggtt.vm.inactive_list, vm_link) {
> +				 &i915->ggtt.vm.bound_list, vm_link) {
>   		unsigned long count = vma->node.size >> PAGE_SHIFT;
> -		if (vma->iomap && i915_vma_unbind(vma) == 0)
> +
> +		if (!vma->iomap || i915_vma_is_active(vma))
> +			continue;
> +
> +		if (i915_vma_unbind(vma) == 0)
>   			freed_pages += count;
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 9df615eea2d8..a9e365789686 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -701,7 +701,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv
>   	vma->pages = obj->mm.pages;
>   	vma->flags |= I915_VMA_GLOBAL_BIND;
>   	__i915_vma_set_map_and_fenceable(vma);
> -	list_move_tail(&vma->vm_link, &ggtt->vm.inactive_list);
> +
> +	list_move_tail(&vma->vm_link, &ggtt->vm.bound_list);
>   
>   	spin_lock(&dev_priv->mm.obj_lock);
>   	list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 4eef0462489c..898e06014295 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1121,7 +1121,9 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>   
>   static u32 capture_error_bo(struct drm_i915_error_buffer *err,
>   			    int count, struct list_head *head,
> -			    bool pinned_only)
> +			    unsigned int flags)
> +#define ACTIVE_ONLY BIT(0)
> +#define PINNED_ONLY BIT(1)
>   {
>   	struct i915_vma *vma;
>   	int i = 0;
> @@ -1130,7 +1132,10 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
>   		if (!vma->obj)
>   			continue;
>   
> -		if (pinned_only && !i915_vma_is_pinned(vma))
> +		if (flags & ACTIVE_ONLY && !i915_vma_is_active(vma))
> +			continue;
> +
> +		if (flags & PINNED_ONLY && !i915_vma_is_pinned(vma))
>   			continue;
>   
>   		capture_bo(err++, vma);
> @@ -1601,14 +1606,17 @@ static void gem_capture_vm(struct i915_gpu_state *error,
>   	int count;
>   
>   	count = 0;
> -	list_for_each_entry(vma, &vm->active_list, vm_link)
> -		count++;
> +	list_for_each_entry(vma, &vm->bound_list, vm_link)
> +		if (i915_vma_is_active(vma))
> +			count++;
>   
>   	active_bo = NULL;
>   	if (count)
>   		active_bo = kcalloc(count, sizeof(*active_bo), GFP_ATOMIC);
>   	if (active_bo)
> -		count = capture_error_bo(active_bo, count, &vm->active_list, false);
> +		count = capture_error_bo(active_bo,
> +					 count, &vm->bound_list,
> +					 ACTIVE_ONLY);
>   	else
>   		count = 0;
>   
> @@ -1646,28 +1654,20 @@ static void capture_pinned_buffers(struct i915_gpu_state *error)
>   	struct i915_address_space *vm = &error->i915->ggtt.vm;
>   	struct drm_i915_error_buffer *bo;
>   	struct i915_vma *vma;
> -	int count_inactive, count_active;
> -
> -	count_inactive = 0;
> -	list_for_each_entry(vma, &vm->inactive_list, vm_link)
> -		count_inactive++;
> +	int count;
>   
> -	count_active = 0;
> -	list_for_each_entry(vma, &vm->active_list, vm_link)
> -		count_active++;
> +	count = 0;
> +	list_for_each_entry(vma, &vm->bound_list, vm_link)
> +		count++;
>   
>   	bo = NULL;
> -	if (count_inactive + count_active)
> -		bo = kcalloc(count_inactive + count_active,
> -			     sizeof(*bo), GFP_ATOMIC);
> +	if (count)
> +		bo = kcalloc(count, sizeof(*bo), GFP_ATOMIC);
>   	if (!bo)
>   		return;
>   
> -	count_inactive = capture_error_bo(bo, count_inactive,
> -					  &vm->active_list, true);
> -	count_active = capture_error_bo(bo + count_inactive, count_active,
> -					&vm->inactive_list, true);
> -	error->pinned_bo_count = count_inactive + count_active;
> +	error->pinned_bo_count =
> +		capture_error_bo(bo, count, &vm->bound_list, PINNED_ONLY);
>   	error->pinned_bo = bo;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 5b4d78cdb4ca..7de28baffb8f 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -79,9 +79,6 @@ __i915_vma_retire(struct i915_vma *vma, struct i915_request *rq)
>   	if (--vma->active_count)
>   		return;
>   
> -	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
> -	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
> -
>   	GEM_BUG_ON(!i915_gem_object_is_active(obj));
>   	if (--obj->active_count)
>   		return;
> @@ -659,7 +656,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>   	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>   	GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level));
>   
> -	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
> +	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
>   
>   	if (vma->obj) {
>   		struct drm_i915_gem_object *obj = vma->obj;
> @@ -1003,10 +1000,8 @@ int i915_vma_move_to_active(struct i915_vma *vma,
>   	 * add the active reference first and queue for it to be dropped
>   	 * *last*.
>   	 */
> -	if (!i915_gem_active_isset(active) && !vma->active_count++) {
> -		list_move_tail(&vma->vm_link, &vma->vm->active_list);
> +	if (!i915_gem_active_isset(active) && !vma->active_count++)
>   		obj->active_count++;
> -	}
>   	i915_gem_active_set(active, rq);
>   	GEM_BUG_ON(!i915_vma_is_active(vma));
>   	GEM_BUG_ON(!obj->active_count);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index d0553bc69705..af9b85cb8639 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -84,7 +84,7 @@ static int populate_ggtt(struct drm_i915_private *i915,
>   		return -EINVAL;
>   	}
>   
> -	if (list_empty(&i915->ggtt.vm.inactive_list)) {
> +	if (list_empty(&i915->ggtt.vm.bound_list)) {
>   		pr_err("No objects on the GGTT inactive list!\n");
>   		return -EINVAL;
>   	}
> @@ -96,7 +96,7 @@ static void unpin_ggtt(struct drm_i915_private *i915)
>   {
>   	struct i915_vma *vma;
>   
> -	list_for_each_entry(vma, &i915->ggtt.vm.inactive_list, vm_link)
> +	list_for_each_entry(vma, &i915->ggtt.vm.bound_list, vm_link)
>   		if (vma->obj->mm.quirked)
>   			i915_vma_unpin(vma);
>   }
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 06bde4a273cb..8feb4af308ff 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1237,7 +1237,7 @@ static void track_vma_bind(struct i915_vma *vma)
>   	__i915_gem_object_pin_pages(obj);
>   
>   	vma->pages = obj->mm.pages;
> -	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
> +	list_move_tail(&vma->vm_link, &vma->vm->bound_list);
>   }
>   
>   static int exercise_mock(struct drm_i915_private *i915,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint
  2019-01-28  1:02 ` [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint Chris Wilson
@ 2019-01-28 10:56   ` Tvrtko Ursulin
  2019-01-28 11:04     ` Chris Wilson
  0 siblings, 1 reply; 39+ messages in thread
From: Tvrtko Ursulin @ 2019-01-28 10:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 28/01/2019 01:02, Chris Wilson wrote:
> After noticing that we trigger preemption events for currently executing
> requests, as well as requests that complete before the preemption and
> attempting to suppress those preemption events, it is wise to not
> consider the queue_priority to be authoritative. As we only track the
> maximum priority seen between dequeue passes, if the maximum priority
> request is no longer available for dequeuing (it completed or is even
> executing on another engine), we have no knowledge of the previous
> queue_priority as it would require us to keep a full history of enqueued
> requests -- but we already have that history in the priolists!
> 
> Rename the queue_priority to preempt_priority_hint so that we do not
> confuse it as being the maximum priority in the queue, but merely an
> indication that we have seen a new maximum priority value and as such we
> should check whether it should preempt the currently running request.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_scheduler.c       | 11 +++++------
>   drivers/gpu/drm/i915/intel_engine_cs.c      |  4 ++--
>   drivers/gpu/drm/i915/intel_guc_submission.c |  5 +++--
>   drivers/gpu/drm/i915/intel_lrc.c            | 20 +++++++++++---------
>   drivers/gpu/drm/i915/intel_ringbuffer.h     |  8 ++++++--
>   5 files changed, 27 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 340faea6c08a..0da718ceab43 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -127,8 +127,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   	return rb_entry(rb, struct i915_priolist, node);
>   }
>   
> -static void assert_priolists(struct intel_engine_execlists * const execlists,
> -			     long queue_priority)
> +static void assert_priolists(struct intel_engine_execlists * const execlists)
>   {
>   	struct rb_node *rb;
>   	long last_prio, i;
> @@ -139,7 +138,7 @@ static void assert_priolists(struct intel_engine_execlists * const execlists,
>   	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
>   		   rb_first(&execlists->queue.rb_root));
>   
> -	last_prio = (queue_priority >> I915_USER_PRIORITY_SHIFT) + 1;
> +	last_prio = (INT_MAX >> I915_USER_PRIORITY_SHIFT) + 1;
>   	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
>   		const struct i915_priolist *p = to_priolist(rb);
>   
> @@ -166,7 +165,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>   	int idx, i;
>   
>   	lockdep_assert_held(&engine->timeline.lock);
> -	assert_priolists(execlists, INT_MAX);
> +	assert_priolists(execlists);
>   
>   	/* buckets sorted from highest [in slot 0] to lowest priority */
>   	idx = I915_PRIORITY_COUNT - (prio & I915_PRIORITY_MASK) - 1;
> @@ -353,7 +352,7 @@ static void __i915_schedule(struct i915_request *rq,
>   				continue;
>   		}
>   
> -		if (prio <= engine->execlists.queue_priority)
> +		if (prio <= engine->execlists.preempt_priority_hint)
>   			continue;
>   
>   		/*
> @@ -366,7 +365,7 @@ static void __i915_schedule(struct i915_request *rq,
>   			continue;
>   
>   		/* Defer (tasklet) submission until after all of our updates. */
> -		engine->execlists.queue_priority = prio;
> +		engine->execlists.preempt_priority_hint = prio;

I am wondering whether stopping tracking the queue priority here, and 
making it mean one thing only, would simplify things.

We make queue_priority strictly track the priority of whatever is in 
port0 only, updated on dequeue and after context switch. Ie. 
execlists.queue_priority gets the meaning of "top of the hw backend 
queue priority".

For the purpose of kicking the tasklet that should work I think. It 
wouldn't interrupt the port0 rq, and then on CS, dequeue would inspect 
the real queue and see it there is need to preempt.

At the end of __i915_schedule we peek at top of the queue and decide 
whether to kick the tasklet.

So we end up with two heads of queue priority. The HW backend one, and 
the scheduling tree. Which seems like a clear separation of duties.

need_preempt() on dequeue then compares the two priorities only. Just 
needs additional protection against not preempting the same context.

I hope I did not miss something, what do you think?

>   		tasklet_hi_schedule(&engine->execlists.tasklet);
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 1a5c163b98d6..1ffec0d69157 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -480,7 +480,7 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(!is_power_of_2(execlists_num_ports(execlists)));
>   	GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS);
>   
> -	execlists->queue_priority = INT_MIN;
> +	execlists->preempt_priority_hint = INT_MIN;
>   	execlists->queue = RB_ROOT_CACHED;
>   }
>   
> @@ -1156,7 +1156,7 @@ void intel_engines_park(struct drm_i915_private *i915)
>   		}
>   
>   		/* Must be reset upon idling, or we may miss the busy wakeup. */
> -		GEM_BUG_ON(engine->execlists.queue_priority != INT_MIN);
> +		GEM_BUG_ON(engine->execlists.preempt_priority_hint != INT_MIN);
>   
>   		if (engine->park)
>   			engine->park(engine);
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 45e2db683fe5..1bf6ac76ad99 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -731,7 +731,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   		if (intel_engine_has_preemption(engine)) {
>   			struct guc_preempt_work *preempt_work =
>   				&engine->i915->guc.preempt_work[engine->id];
> -			int prio = execlists->queue_priority;
> +			int prio = execlists->preempt_priority_hint;
>   
>   			if (__execlists_need_preempt(prio, port_prio(port))) {
>   				execlists_set_active(execlists,
> @@ -777,7 +777,8 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   			kmem_cache_free(engine->i915->priorities, p);
>   	}
>   done:
> -	execlists->queue_priority = rb ? to_priolist(rb)->priority : INT_MIN;
> +	execlists->preempt_priority_hint =
> +		rb ? to_priolist(rb)->priority : INT_MIN;
>   	if (submit)
>   		port_assign(port, last);
>   	if (last)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 185867106c14..71006b031f54 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -584,7 +584,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
>   			return;
>   
> -		if (need_preempt(engine, last, execlists->queue_priority)) {
> +		if (need_preempt(engine, last, execlists->preempt_priority_hint)) {
>   			inject_preempt_context(engine);
>   			return;
>   		}
> @@ -692,20 +692,20 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	/*
>   	 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer.
>   	 *
> -	 * We choose queue_priority such that if we add a request of greater
> +	 * We choose the priority hint such that if we add a request of greater
>   	 * priority than this, we kick the submission tasklet to decide on
>   	 * the right order of submitting the requests to hardware. We must
>   	 * also be prepared to reorder requests as they are in-flight on the
> -	 * HW. We derive the queue_priority then as the first "hole" in
> +	 * HW. We derive the priority hint then as the first "hole" in
>   	 * the HW submission ports and if there are no available slots,
>   	 * the priority of the lowest executing request, i.e. last.
>   	 *
>   	 * When we do receive a higher priority request ready to run from the
> -	 * user, see queue_request(), the queue_priority is bumped to that
> +	 * user, see queue_request(), the priority hint is bumped to that
>   	 * request triggering preemption on the next dequeue (or subsequent
>   	 * interrupt for secondary ports).
>   	 */
> -	execlists->queue_priority =
> +	execlists->preempt_priority_hint =
>   		port != execlists->port ? rq_prio(last) : INT_MIN;
>   
>   	if (submit) {
> @@ -853,7 +853,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   
>   	/* Remaining _unready_ requests will be nop'ed when submitted */
>   
> -	execlists->queue_priority = INT_MIN;
> +	execlists->preempt_priority_hint = INT_MIN;
>   	execlists->queue = RB_ROOT_CACHED;
>   	GEM_BUG_ON(port_isset(execlists->port));
>   
> @@ -1083,8 +1083,8 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   
>   static void submit_queue(struct intel_engine_cs *engine, int prio)
>   {
> -	if (prio > engine->execlists.queue_priority) {
> -		engine->execlists.queue_priority = prio;
> +	if (prio > engine->execlists.preempt_priority_hint) {
> +		engine->execlists.preempt_priority_hint = prio;
>   		__submit_queue_imm(engine);
>   	}
>   }
> @@ -2718,7 +2718,9 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   
>   	last = NULL;
>   	count = 0;
> -	drm_printf(m, "\t\tQueue priority: %d\n", execlists->queue_priority);
> +	if (execlists->preempt_priority_hint != INT_MIN)
> +		drm_printf(m, "\t\tPreempt priority hint: %d\n",
> +			   execlists->preempt_priority_hint);
>   	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
>   		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
>   		int i;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index f2effd001540..71a41fec738f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -294,14 +294,18 @@ struct intel_engine_execlists {
>   	unsigned int port_mask;
>   
>   	/**
> -	 * @queue_priority: Highest pending priority.
> +	 * @preempt_priority_hint: Highest pending priority.
>   	 *
>   	 * When we add requests into the queue, or adjust the priority of
>   	 * executing requests, we compute the maximum priority of those
>   	 * pending requests. We can then use this value to determine if
>   	 * we need to preempt the executing requests to service the queue.
> +	 * However, since the we may have recorded the priority of an inflight

s/the/then/ or when? Not sure I still parse the sentence easily.

> +	 * request we wanted to preempt but since completed, at the time of

but has since completed? Which has?

> +	 * dequeuing the priority hint may no longer may match the highest
> +	 * available request priority.
>   	 */
> -	int queue_priority;
> +	int preempt_priority_hint;
>   
>   	/**
>   	 * @queue: queue of requests, in priority lists
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint
  2019-01-28 10:56   ` Tvrtko Ursulin
@ 2019-01-28 11:04     ` Chris Wilson
  0 siblings, 0 replies; 39+ messages in thread
From: Chris Wilson @ 2019-01-28 11:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-01-28 10:56:24)
> 
> On 28/01/2019 01:02, Chris Wilson wrote:
> > After noticing that we trigger preemption events for currently executing
> > requests, as well as requests that complete before the preemption and
> > attempting to suppress those preemption events, it is wise to not
> > consider the queue_priority to be authoritative. As we only track the
> > maximum priority seen between dequeue passes, if the maximum priority
> > request is no longer available for dequeuing (it completed or is even
> > executing on another engine), we have no knowledge of the previous
> > queue_priority as it would require us to keep a full history of enqueued
> > requests -- but we already have that history in the priolists!
> > 
> > Rename the queue_priority to preempt_priority_hint so that we do not
> > confuse it as being the maximum priority in the queue, but merely an
> > indication that we have seen a new maximum priority value and as such we
> > should check whether it should preempt the currently running request.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_scheduler.c       | 11 +++++------
> >   drivers/gpu/drm/i915/intel_engine_cs.c      |  4 ++--
> >   drivers/gpu/drm/i915/intel_guc_submission.c |  5 +++--
> >   drivers/gpu/drm/i915/intel_lrc.c            | 20 +++++++++++---------
> >   drivers/gpu/drm/i915/intel_ringbuffer.h     |  8 ++++++--
> >   5 files changed, 27 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 340faea6c08a..0da718ceab43 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -127,8 +127,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >       return rb_entry(rb, struct i915_priolist, node);
> >   }
> >   
> > -static void assert_priolists(struct intel_engine_execlists * const execlists,
> > -                          long queue_priority)
> > +static void assert_priolists(struct intel_engine_execlists * const execlists)
> >   {
> >       struct rb_node *rb;
> >       long last_prio, i;
> > @@ -139,7 +138,7 @@ static void assert_priolists(struct intel_engine_execlists * const execlists,
> >       GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
> >                  rb_first(&execlists->queue.rb_root));
> >   
> > -     last_prio = (queue_priority >> I915_USER_PRIORITY_SHIFT) + 1;
> > +     last_prio = (INT_MAX >> I915_USER_PRIORITY_SHIFT) + 1;
> >       for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
> >               const struct i915_priolist *p = to_priolist(rb);
> >   
> > @@ -166,7 +165,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
> >       int idx, i;
> >   
> >       lockdep_assert_held(&engine->timeline.lock);
> > -     assert_priolists(execlists, INT_MAX);
> > +     assert_priolists(execlists);
> >   
> >       /* buckets sorted from highest [in slot 0] to lowest priority */
> >       idx = I915_PRIORITY_COUNT - (prio & I915_PRIORITY_MASK) - 1;
> > @@ -353,7 +352,7 @@ static void __i915_schedule(struct i915_request *rq,
> >                               continue;
> >               }
> >   
> > -             if (prio <= engine->execlists.queue_priority)
> > +             if (prio <= engine->execlists.preempt_priority_hint)
> >                       continue;
> >   
> >               /*
> > @@ -366,7 +365,7 @@ static void __i915_schedule(struct i915_request *rq,
> >                       continue;
> >   
> >               /* Defer (tasklet) submission until after all of our updates. */
> > -             engine->execlists.queue_priority = prio;
> > +             engine->execlists.preempt_priority_hint = prio;
> 
> I am wondering whether stopping tracking the queue priority here, and 
> making it mean one thing only, would simplify things.

No, it doesn't simply things. We already track queue_priority (it's the
queue!) What we need is a very quick test as to whether we even need to
consider preemption.
 
> We make queue_priority strictly track the priority of whatever is in 
> port0 only, updated on dequeue and after context switch. Ie. 
> execlists.queue_priority gets the meaning of "top of the hw backend 
> queue priority".

That is already tracked in port0.
 
> For the purpose of kicking the tasklet that should work I think. It 
> wouldn't interrupt the port0 rq, and then on CS, dequeue would inspect 
> the real queue and see it there is need to preempt.

... hence the patches ...
 
> At the end of __i915_schedule we peek at top of the queue and decide 
> whether to kick the tasklet.
> 
> So we end up with two heads of queue priority. The HW backend one, and 
> the scheduling tree. Which seems like a clear separation of duties.
> 
> need_preempt() on dequeue then compares the two priorities only. Just 
> needs additional protection against not preempting the same context.
> 
> I hope I did not miss something, what do you think?

That we really, really need this patch as it appears to be quite easy to
confuse the purpose of what is being tracked here as opposed to what we
track for the HW status and the various queues.

And that the actual priority value of port0 does not reflect its
post-premption value! Nor that we know accurately the preemption hint
having applied any suppression or veng.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking
  2019-01-28  1:02 ` [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
@ 2019-01-28 16:24   ` Tvrtko Ursulin
  0 siblings, 0 replies; 39+ messages in thread
From: Tvrtko Ursulin @ 2019-01-28 16:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 28/01/2019 01:02, Chris Wilson wrote:
> A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
> thundering i915_wait_request herd"), the issue of handling multiple
> clients waiting in parallel was brought to our attention. The
> requirement was that every client should be woken immediately upon its
> request being signaled, without incurring any cpu overhead.
> 
> To handle certain fragility of our hw meant that we could not do a
> simple check inside the irq handler (some generations required almost
> unbounded delays before we could be sure of seqno coherency) and so
> request completion checking required delegation.
> 
> Before commit 688e6c725816, the solution was simple. Every client
> waiting on a request would be woken on every interrupt and each would do
> a heavyweight check to see if their request was complete. Commit
> 688e6c725816 introduced an rbtree so that only the earliest waiter on
> the global timeline would woken, and would wake the next and so on.
> (Along with various complications to handle requests being reordered
> along the global timeline, and also a requirement for kthread to provide
> a delegate for fence signaling that had no process context.)
> 
> The global rbtree depends on knowing the execution timeline (and global
> seqno). Without knowing that order, we must instead check all contexts
> queued to the HW to see which may have advanced. We trim that list by
> only checking queued contexts that are being waited on, but still we
> keep a list of all active contexts and their active signalers that we
> inspect from inside the irq handler. By moving the waiters onto the fence
> signal list, we can combine the client wakeup with the dma_fence
> signaling (a dramatic reduction in complexity, but does require the HW
> being coherent, the seqno must be visible from the cpu before the
> interrupt is raised - we keep a timer backup just in case).
> 
> Having previously fixed all the issues with irq-seqno serialisation (by
> inserting delays onto the GPU after each request instead of random delays
> on the CPU after each interrupt), we can rely on the seqno state to
> perfom direct wakeups from the interrupt handler. This allows us to
> preserve our single context switch behaviour of the current routine,
> with the only downside that we lose the RT priority sorting of wakeups.
> In general, direct wakeup latency of multiple clients is about the same
> (about 10% better in most cases) with a reduction in total CPU time spent
> in the waiter (about 20-50% depending on gen). Average herd behaviour is
> improved, but at the cost of not delegating wakeups on task_prio.
> 
> v2: Capture fence signaling state for error state and add comments to
> warm even the most cold of hearts.
> v3: Check if the request is still active before busywaiting
> v4: Reduce the amount of pointer misdirection with list_for_each_safe
> and using a local i915_request variable inside the loops
> v5: Add a missing pluralisation to a purely informative selftest message.

There were some more changes which you implemented after the last round 
of review.

I have no further significant complaints.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Ideally John re-appears to lend another pair of eyes to the problem. 
Given the size of the change it would be good to double check it.

Regards,

Tvrtko

> 
> References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c           |  28 +-
>   drivers/gpu/drm/i915/i915_gem_context.c       |   3 +
>   drivers/gpu/drm/i915/i915_gem_context.h       |   2 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  83 +-
>   drivers/gpu/drm/i915/i915_gpu_error.h         |   9 +-
>   drivers/gpu/drm/i915/i915_irq.c               |  88 +-
>   drivers/gpu/drm/i915/i915_request.c           | 142 ++-
>   drivers/gpu/drm/i915/i915_request.h           |  72 +-
>   drivers/gpu/drm/i915/i915_reset.c             |  16 +-
>   drivers/gpu/drm/i915/i915_scheduler.c         |   2 +-
>   drivers/gpu/drm/i915/intel_breadcrumbs.c      | 818 +++++-------------
>   drivers/gpu/drm/i915/intel_engine_cs.c        |  35 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c       |   2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |  94 +-
>   .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
>   drivers/gpu/drm/i915/selftests/i915_request.c | 425 +++++++++
>   drivers/gpu/drm/i915/selftests/igt_spinner.c  |   5 -
>   .../drm/i915/selftests/intel_breadcrumbs.c    | 470 ----------
>   .../gpu/drm/i915/selftests/intel_hangcheck.c  |   2 +-
>   drivers/gpu/drm/i915/selftests/lib_sw_fence.c |  54 ++
>   drivers/gpu/drm/i915/selftests/lib_sw_fence.h |   3 +
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  17 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.h  |   6 -
>   23 files changed, 894 insertions(+), 1483 deletions(-)
>   delete mode 100644 drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index f5ac03f06e26..b1ac0f78cb42 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1316,29 +1316,16 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
>   
>   	for_each_engine(engine, dev_priv, id) {
> -		struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -		struct rb_node *rb;
> -
>   		seq_printf(m, "%s:\n", engine->name);
>   		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
>   			   engine->hangcheck.seqno, seqno[id],
>   			   intel_engine_last_submit(engine),
>   			   jiffies_to_msecs(jiffies -
>   					    engine->hangcheck.action_timestamp));
> -		seq_printf(m, "\twaiters? %s, fake irq active? %s\n",
> -			   yesno(intel_engine_has_waiter(engine)),
> +		seq_printf(m, "\tfake irq active? %s\n",
>   			   yesno(test_bit(engine->id,
>   					  &dev_priv->gpu_error.missed_irq_rings)));
>   
> -		spin_lock_irq(&b->rb_lock);
> -		for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -			struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -			seq_printf(m, "\t%s [%d] waiting for %x\n",
> -				   w->tsk->comm, w->tsk->pid, w->seqno);
> -		}
> -		spin_unlock_irq(&b->rb_lock);
> -
>   		seq_printf(m, "\tACTHD = 0x%08llx [current 0x%08llx]\n",
>   			   (long long)engine->hangcheck.acthd,
>   			   (long long)acthd[id]);
> @@ -2022,18 +2009,6 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
>   	return 0;
>   }
>   
> -static int count_irq_waiters(struct drm_i915_private *i915)
> -{
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> -	int count = 0;
> -
> -	for_each_engine(engine, i915, id)
> -		count += intel_engine_has_waiter(engine);
> -
> -	return count;
> -}
> -
>   static const char *rps_power_to_str(unsigned int power)
>   {
>   	static const char * const strings[] = {
> @@ -2073,7 +2048,6 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
>   	seq_printf(m, "GPU busy? %s [%d requests]\n",
>   		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
> -	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
>   	seq_printf(m, "Boosts outstanding? %d\n",
>   		   atomic_read(&rps->num_waiters));
>   	seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive));
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 93e84751370f..6faf1f6faab5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -327,6 +327,9 @@ intel_context_init(struct intel_context *ce,
>   		   struct intel_engine_cs *engine)
>   {
>   	ce->gem_context = ctx;
> +
> +	INIT_LIST_HEAD(&ce->signal_link);
> +	INIT_LIST_HEAD(&ce->signals);
>   }
>   
>   static struct i915_gem_context *
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 3769438228f6..6ba40ff6b91f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -164,6 +164,8 @@ struct i915_gem_context {
>   	struct intel_context {
>   		struct i915_gem_context *gem_context;
>   		struct intel_engine_cs *active;
> +		struct list_head signal_link;
> +		struct list_head signals;
>   		struct i915_vma *state;
>   		struct intel_ring *ring;
>   		u32 *lrc_reg_state;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 898e06014295..304a7ef7f7fb 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -447,9 +447,14 @@ static void error_print_request(struct drm_i915_error_state_buf *m,
>   	if (!erq->seqno)
>   		return;
>   
> -	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
> +	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x%s%s, prio %d, emitted %dms, start %08x, head %08x, tail %08x\n",
>   		   prefix, erq->pid, erq->ban_score,
> -		   erq->context, erq->seqno, erq->sched_attr.priority,
> +		   erq->context, erq->seqno,
> +		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +			    &erq->flags) ? "!" : "",
> +		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +			    &erq->flags) ? "+" : "",
> +		   erq->sched_attr.priority,
>   		   jiffies_to_msecs(erq->jiffies - epoch),
>   		   erq->start, erq->head, erq->tail);
>   }
> @@ -530,7 +535,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>   	}
>   	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
>   	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
> -	err_printf(m, "  waiting: %s\n", yesno(ee->waiting));
>   	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
>   	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
>   	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
> @@ -804,21 +808,6 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>   						    error->epoch);
>   		}
>   
> -		if (IS_ERR(ee->waiters)) {
> -			err_printf(m, "%s --- ? waiters [unable to acquire spinlock]\n",
> -				   m->i915->engine[i]->name);
> -		} else if (ee->num_waiters) {
> -			err_printf(m, "%s --- %d waiters\n",
> -				   m->i915->engine[i]->name,
> -				   ee->num_waiters);
> -			for (j = 0; j < ee->num_waiters; j++) {
> -				err_printf(m, " seqno 0x%08x for %s [%d]\n",
> -					   ee->waiters[j].seqno,
> -					   ee->waiters[j].comm,
> -					   ee->waiters[j].pid);
> -			}
> -		}
> -
>   		print_error_obj(m, m->i915->engine[i],
>   				"ringbuffer", ee->ringbuffer);
>   
> @@ -1000,8 +989,6 @@ void __i915_gpu_state_free(struct kref *error_ref)
>   		i915_error_object_free(ee->wa_ctx);
>   
>   		kfree(ee->requests);
> -		if (!IS_ERR_OR_NULL(ee->waiters))
> -			kfree(ee->waiters);
>   	}
>   
>   	for (i = 0; i < ARRAY_SIZE(error->active_bo); i++)
> @@ -1205,59 +1192,6 @@ static void gen6_record_semaphore_state(struct intel_engine_cs *engine,
>   			I915_READ(RING_SYNC_2(engine->mmio_base));
>   }
>   
> -static void error_record_engine_waiters(struct intel_engine_cs *engine,
> -					struct drm_i915_error_engine *ee)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct drm_i915_error_waiter *waiter;
> -	struct rb_node *rb;
> -	int count;
> -
> -	ee->num_waiters = 0;
> -	ee->waiters = NULL;
> -
> -	if (RB_EMPTY_ROOT(&b->waiters))
> -		return;
> -
> -	if (!spin_trylock_irq(&b->rb_lock)) {
> -		ee->waiters = ERR_PTR(-EDEADLK);
> -		return;
> -	}
> -
> -	count = 0;
> -	for (rb = rb_first(&b->waiters); rb != NULL; rb = rb_next(rb))
> -		count++;
> -	spin_unlock_irq(&b->rb_lock);
> -
> -	waiter = NULL;
> -	if (count)
> -		waiter = kmalloc_array(count,
> -				       sizeof(struct drm_i915_error_waiter),
> -				       GFP_ATOMIC);
> -	if (!waiter)
> -		return;
> -
> -	if (!spin_trylock_irq(&b->rb_lock)) {
> -		kfree(waiter);
> -		ee->waiters = ERR_PTR(-EDEADLK);
> -		return;
> -	}
> -
> -	ee->waiters = waiter;
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -		strcpy(waiter->comm, w->tsk->comm);
> -		waiter->pid = w->tsk->pid;
> -		waiter->seqno = w->seqno;
> -		waiter++;
> -
> -		if (++ee->num_waiters == count)
> -			break;
> -	}
> -	spin_unlock_irq(&b->rb_lock);
> -}
> -
>   static void error_record_engine_registers(struct i915_gpu_state *error,
>   					  struct intel_engine_cs *engine,
>   					  struct drm_i915_error_engine *ee)
> @@ -1293,7 +1227,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
>   
>   	intel_engine_get_instdone(engine, &ee->instdone);
>   
> -	ee->waiting = intel_engine_has_waiter(engine);
>   	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ee->acthd = intel_engine_get_active_head(engine);
>   	ee->seqno = intel_engine_get_seqno(engine);
> @@ -1367,6 +1300,7 @@ static void record_request(struct i915_request *request,
>   {
>   	struct i915_gem_context *ctx = request->gem_context;
>   
> +	erq->flags = request->fence.flags;
>   	erq->context = ctx->hw_id;
>   	erq->sched_attr = request->sched.attr;
>   	erq->ban_score = atomic_read(&ctx->ban_score);
> @@ -1542,7 +1476,6 @@ static void gem_record_rings(struct i915_gpu_state *error)
>   		ee->engine_id = i;
>   
>   		error_record_engine_registers(error, engine, ee);
> -		error_record_engine_waiters(engine, ee);
>   		error_record_engine_execlists(engine, ee);
>   
>   		request = i915_gem_find_active_request(engine);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 231173786eae..74757c424aab 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -82,8 +82,6 @@ struct i915_gpu_state {
>   		int engine_id;
>   		/* Software tracked state */
>   		bool idle;
> -		bool waiting;
> -		int num_waiters;
>   		unsigned long hangcheck_timestamp;
>   		struct i915_address_space *vm;
>   		int num_requests;
> @@ -147,6 +145,7 @@ struct i915_gpu_state {
>   		struct drm_i915_error_object *default_state;
>   
>   		struct drm_i915_error_request {
> +			unsigned long flags;
>   			long jiffies;
>   			pid_t pid;
>   			u32 context;
> @@ -159,12 +158,6 @@ struct i915_gpu_state {
>   		} *requests, execlist[EXECLIST_MAX_PORTS];
>   		unsigned int num_ports;
>   
> -		struct drm_i915_error_waiter {
> -			char comm[TASK_COMM_LEN];
> -			pid_t pid;
> -			u32 seqno;
> -		} *waiters;
> -
>   		struct {
>   			u32 gfx_mode;
>   			union {
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 6a1b167ec801..a9911fea3ddb 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -28,12 +28,14 @@
>   
>   #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>   
> -#include <linux/sysrq.h>
> -#include <linux/slab.h>
>   #include <linux/circ_buf.h>
> +#include <linux/slab.h>
> +#include <linux/sysrq.h>
> +
>   #include <drm/drm_irq.h>
>   #include <drm/drm_drv.h>
>   #include <drm/i915_drm.h>
> +
>   #include "i915_drv.h"
>   #include "i915_trace.h"
>   #include "intel_drv.h"
> @@ -1168,66 +1170,6 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
>   	return;
>   }
>   
> -static void notify_ring(struct intel_engine_cs *engine)
> -{
> -	const u32 seqno = intel_engine_get_seqno(engine);
> -	struct i915_request *rq = NULL;
> -	struct task_struct *tsk = NULL;
> -	struct intel_wait *wait;
> -
> -	if (unlikely(!engine->breadcrumbs.irq_armed))
> -		return;
> -
> -	rcu_read_lock();
> -
> -	spin_lock(&engine->breadcrumbs.irq_lock);
> -	wait = engine->breadcrumbs.irq_wait;
> -	if (wait) {
> -		/*
> -		 * We use a callback from the dma-fence to submit
> -		 * requests after waiting on our own requests. To
> -		 * ensure minimum delay in queuing the next request to
> -		 * hardware, signal the fence now rather than wait for
> -		 * the signaler to be woken up. We still wake up the
> -		 * waiter in order to handle the irq-seqno coherency
> -		 * issues (we may receive the interrupt before the
> -		 * seqno is written, see __i915_request_irq_complete())
> -		 * and to handle coalescing of multiple seqno updates
> -		 * and many waiters.
> -		 */
> -		if (i915_seqno_passed(seqno, wait->seqno)) {
> -			struct i915_request *waiter = wait->request;
> -
> -			if (waiter &&
> -			    !i915_request_signaled(waiter) &&
> -			    intel_wait_check_request(wait, waiter))
> -				rq = i915_request_get(waiter);
> -
> -			tsk = wait->tsk;
> -		}
> -
> -		engine->breadcrumbs.irq_count++;
> -	} else {
> -		if (engine->breadcrumbs.irq_armed)
> -			__intel_engine_disarm_breadcrumbs(engine);
> -	}
> -	spin_unlock(&engine->breadcrumbs.irq_lock);
> -
> -	if (rq) {
> -		spin_lock(&rq->lock);
> -		dma_fence_signal_locked(&rq->fence);
> -		GEM_BUG_ON(!i915_request_completed(rq));
> -		spin_unlock(&rq->lock);
> -
> -		i915_request_put(rq);
> -	}
> -
> -	if (tsk && tsk->state & TASK_NORMAL)
> -		wake_up_process(tsk);
> -
> -	rcu_read_unlock();
> -}
> -
>   static void vlv_c0_read(struct drm_i915_private *dev_priv,
>   			struct intel_rps_ei *ei)
>   {
> @@ -1472,20 +1414,20 @@ static void ilk_gt_irq_handler(struct drm_i915_private *dev_priv,
>   			       u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[RCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   	if (gt_iir & ILK_BSD_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[VCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   }
>   
>   static void snb_gt_irq_handler(struct drm_i915_private *dev_priv,
>   			       u32 gt_iir)
>   {
>   	if (gt_iir & GT_RENDER_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[RCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   	if (gt_iir & GT_BSD_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[VCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   	if (gt_iir & GT_BLT_USER_INTERRUPT)
> -		notify_ring(dev_priv->engine[BCS]);
> +		intel_engine_breadcrumbs_irq(dev_priv->engine[BCS]);
>   
>   	if (gt_iir & (GT_BLT_CS_ERROR_INTERRUPT |
>   		      GT_BSD_CS_ERROR_INTERRUPT |
> @@ -1505,7 +1447,7 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir)
>   		tasklet = true;
>   
>   	if (iir & GT_RENDER_USER_INTERRUPT) {
> -		notify_ring(engine);
> +		intel_engine_breadcrumbs_irq(engine);
>   		tasklet |= USES_GUC_SUBMISSION(engine->i915);
>   	}
>   
> @@ -1851,7 +1793,7 @@ static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
>   
>   	if (HAS_VEBOX(dev_priv)) {
>   		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[VECS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[VECS]);
>   
>   		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
>   			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
> @@ -4275,7 +4217,7 @@ static irqreturn_t i8xx_irq_handler(int irq, void *arg)
>   		I915_WRITE16(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i8xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4383,7 +4325,7 @@ static irqreturn_t i915_irq_handler(int irq, void *arg)
>   		I915_WRITE(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> @@ -4528,10 +4470,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
>   		I915_WRITE(IIR, iir);
>   
>   		if (iir & I915_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[RCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[RCS]);
>   
>   		if (iir & I915_BSD_USER_INTERRUPT)
> -			notify_ring(dev_priv->engine[VCS]);
> +			intel_engine_breadcrumbs_irq(dev_priv->engine[VCS]);
>   
>   		if (iir & I915_MASTER_ERROR_INTERRUPT)
>   			i9xx_error_irq_handler(dev_priv, eir, eir_stuck);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2171df2d3019..4b1869295362 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -60,7 +60,7 @@ static bool i915_fence_signaled(struct dma_fence *fence)
>   
>   static bool i915_fence_enable_signaling(struct dma_fence *fence)
>   {
> -	return intel_engine_enable_signaling(to_request(fence), true);
> +	return i915_request_enable_breadcrumb(to_request(fence));
>   }
>   
>   static signed long i915_fence_wait(struct dma_fence *fence,
> @@ -203,7 +203,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
>   	if (!i915_request_signaled(rq))
>   		dma_fence_signal_locked(&rq->fence);
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags))
> -		intel_engine_cancel_signaling(rq);
> +		i915_request_cancel_breadcrumb(rq);
>   	if (rq->waitboost) {
>   		GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters));
>   		atomic_dec(&rq->i915->gt_pm.rps.num_waiters);
> @@ -377,9 +377,12 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>   	request->global_seqno = seqno;
> -	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
> -		intel_engine_enable_signaling(request, false);
> +	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
> +	    !i915_request_enable_breadcrumb(request))
> +		intel_engine_queue_breadcrumbs(engine);
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> @@ -389,8 +392,6 @@ void __i915_request_submit(struct i915_request *request)
>   	move_to_timeline(request, &engine->timeline);
>   
>   	trace_i915_request_execute(request);
> -
> -	wake_up_all(&request->execute);
>   }
>   
>   void i915_request_submit(struct i915_request *request)
> @@ -433,7 +434,9 @@ void __i915_request_unsubmit(struct i915_request *request)
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
>   	request->global_seqno = 0;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
> -		intel_engine_cancel_signaling(request);
> +		i915_request_cancel_breadcrumb(request);
> +	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> +	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>   	spin_unlock(&request->lock);
>   
>   	/* Transfer back from the global per-engine timeline to per-context */
> @@ -646,13 +649,11 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
> -	init_waitqueue_head(&rq->execute);
>   
>   	i915_sched_node_init(&rq->sched);
>   
>   	/* No zalloc, must clear what we need by hand */
>   	rq->global_seqno = 0;
> -	rq->signaling.wait.seqno = 0;
>   	rq->file_priv = NULL;
>   	rq->batch = NULL;
>   	rq->capture_list = NULL;
> @@ -1047,13 +1048,10 @@ static bool busywait_stop(unsigned long timeout, unsigned int cpu)
>   	return this_cpu != cpu;
>   }
>   
> -static bool __i915_spin_request(const struct i915_request *rq,
> -				u32 seqno, int state, unsigned long timeout_us)
> +static bool __i915_spin_request(const struct i915_request * const rq,
> +				int state, unsigned long timeout_us)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> -	unsigned int irq, cpu;
> -
> -	GEM_BUG_ON(!seqno);
> +	unsigned int cpu;
>   
>   	/*
>   	 * Only wait for the request if we know it is likely to complete.
> @@ -1061,12 +1059,12 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	 * We don't track the timestamps around requests, nor the average
>   	 * request length, so we do not have a good indicator that this
>   	 * request will complete within the timeout. What we do know is the
> -	 * order in which requests are executed by the engine and so we can
> -	 * tell if the request has started. If the request hasn't started yet,
> -	 * it is a fair assumption that it will not complete within our
> -	 * relatively short timeout.
> +	 * order in which requests are executed by the context and so we can
> +	 * tell if the request has been started. If the request is not even
> +	 * running yet, it is a fair assumption that it will not complete
> +	 * within our relatively short timeout.
>   	 */
> -	if (!intel_engine_has_started(engine, seqno))
> +	if (!i915_request_is_running(rq))
>   		return false;
>   
>   	/*
> @@ -1080,20 +1078,10 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	 * takes to sleep on a request, on the order of a microsecond.
>   	 */
>   
> -	irq = READ_ONCE(engine->breadcrumbs.irq_count);
>   	timeout_us += local_clock_us(&cpu);
>   	do {
> -		if (intel_engine_has_completed(engine, seqno))
> -			return seqno == i915_request_global_seqno(rq);
> -
> -		/*
> -		 * Seqno are meant to be ordered *before* the interrupt. If
> -		 * we see an interrupt without a corresponding seqno advance,
> -		 * assume we won't see one in the near future but require
> -		 * the engine->seqno_barrier() to fixup coherency.
> -		 */
> -		if (READ_ONCE(engine->breadcrumbs.irq_count) != irq)
> -			break;
> +		if (i915_request_completed(rq))
> +			return true;
>   
>   		if (signal_pending_state(state, current))
>   			break;
> @@ -1107,6 +1095,18 @@ static bool __i915_spin_request(const struct i915_request *rq,
>   	return false;
>   }
>   
> +struct request_wait {
> +	struct dma_fence_cb cb;
> +	struct task_struct *tsk;
> +};
> +
> +static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb)
> +{
> +	struct request_wait *wait = container_of(cb, typeof(*wait), cb);
> +
> +	wake_up_process(wait->tsk);
> +}
> +
>   /**
>    * i915_request_wait - wait until execution of request has finished
>    * @rq: the request to wait upon
> @@ -1132,8 +1132,7 @@ long i915_request_wait(struct i915_request *rq,
>   {
>   	const int state = flags & I915_WAIT_INTERRUPTIBLE ?
>   		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> -	DEFINE_WAIT_FUNC(exec, default_wake_function);
> -	struct intel_wait wait;
> +	struct request_wait wait;
>   
>   	might_sleep();
>   	GEM_BUG_ON(timeout < 0);
> @@ -1145,47 +1144,24 @@ long i915_request_wait(struct i915_request *rq,
>   		return -ETIME;
>   
>   	trace_i915_request_wait_begin(rq, flags);
> -	add_wait_queue(&rq->execute, &exec);
> -	intel_wait_init(&wait);
> -	if (flags & I915_WAIT_PRIORITY)
> -		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
> -
> -restart:
> -	do {
> -		set_current_state(state);
> -		if (intel_wait_update_request(&wait, rq))
> -			break;
> -
> -		if (signal_pending_state(state, current)) {
> -			timeout = -ERESTARTSYS;
> -			goto complete;
> -		}
>   
> -		if (!timeout) {
> -			timeout = -ETIME;
> -			goto complete;
> -		}
> +	/* Optimistic short spin before touching IRQs */
> +	if (__i915_spin_request(rq, state, 5))
> +		goto out;
>   
> -		timeout = io_schedule_timeout(timeout);
> -	} while (1);
> +	if (flags & I915_WAIT_PRIORITY)
> +		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
>   
> -	GEM_BUG_ON(!intel_wait_has_seqno(&wait));
> -	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
> +	wait.tsk = current;
> +	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
> +		goto out;
>   
> -	/* Optimistic short spin before touching IRQs */
> -	if (__i915_spin_request(rq, wait.seqno, state, 5))
> -		goto complete;
> +	for (;;) {
> +		set_current_state(state);
>   
> -	set_current_state(state);
> -	if (intel_engine_add_wait(rq->engine, &wait))
> -		/*
> -		 * In order to check that we haven't missed the interrupt
> -		 * as we enabled it, we need to kick ourselves to do a
> -		 * coherent check on the seqno before we sleep.
> -		 */
> -		goto wakeup;
> +		if (i915_request_completed(rq))
> +			break;
>   
> -	for (;;) {
>   		if (signal_pending_state(state, current)) {
>   			timeout = -ERESTARTSYS;
>   			break;
> @@ -1197,33 +1173,13 @@ long i915_request_wait(struct i915_request *rq,
>   		}
>   
>   		timeout = io_schedule_timeout(timeout);
> -
> -		if (intel_wait_complete(&wait) &&
> -		    intel_wait_check_request(&wait, rq))
> -			break;
> -
> -		set_current_state(state);
> -
> -wakeup:
> -		if (i915_request_completed(rq))
> -			break;
> -
> -		/* Only spin if we know the GPU is processing this request */
> -		if (__i915_spin_request(rq, wait.seqno, state, 2))
> -			break;
> -
> -		if (!intel_wait_check_request(&wait, rq)) {
> -			intel_engine_remove_wait(rq->engine, &wait);
> -			goto restart;
> -		}
>   	}
> -
> -	intel_engine_remove_wait(rq->engine, &wait);
> -complete:
>   	__set_current_state(TASK_RUNNING);
> -	remove_wait_queue(&rq->execute, &exec);
> -	trace_i915_request_wait_end(rq);
>   
> +	dma_fence_remove_callback(&rq->fence, &wait.cb);
> +
> +out:
> +	trace_i915_request_wait_end(rq);
>   	return timeout;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 340d6216791c..3cffb96203b9 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -38,23 +38,34 @@ struct drm_i915_gem_object;
>   struct i915_request;
>   struct i915_timeline;
>   
> -struct intel_wait {
> -	struct rb_node node;
> -	struct task_struct *tsk;
> -	struct i915_request *request;
> -	u32 seqno;
> -};
> -
> -struct intel_signal_node {
> -	struct intel_wait wait;
> -	struct list_head link;
> -};
> -
>   struct i915_capture_list {
>   	struct i915_capture_list *next;
>   	struct i915_vma *vma;
>   };
>   
> +enum {
> +	/*
> +	 * I915_FENCE_FLAG_ACTIVE - this request is currently submitted to HW.
> +	 *
> +	 * Set by __i915_request_submit() on handing over to HW, and cleared
> +	 * by __i915_request_unsubmit() if we preempt this request.
> +	 *
> +	 * Finally cleared for consistency on retiring the request, when
> +	 * we know the HW is no longer running this request.
> +	 *
> +	 * See i915_request_is_active()
> +	 */
> +	I915_FENCE_FLAG_ACTIVE = DMA_FENCE_FLAG_USER_BITS,
> +
> +	/*
> +	 * I915_FENCE_FLAG_SIGNAL - this request is currently on signal_list
> +	 *
> +	 * Internal bookkeeping used by the breadcrumb code to track when
> +	 * a request is on the various signal_list.
> +	 */
> +	I915_FENCE_FLAG_SIGNAL,
> +};
> +
>   /**
>    * Request queue structure.
>    *
> @@ -97,7 +108,7 @@ struct i915_request {
>   	struct intel_context *hw_context;
>   	struct intel_ring *ring;
>   	struct i915_timeline *timeline;
> -	struct intel_signal_node signaling;
> +	struct list_head signal_link;
>   
>   	/*
>   	 * The rcu epoch of when this request was allocated. Used to judiciously
> @@ -116,7 +127,6 @@ struct i915_request {
>   	 */
>   	struct i915_sw_fence submit;
>   	wait_queue_entry_t submitq;
> -	wait_queue_head_t execute;
>   
>   	/*
>   	 * A list of everyone we wait upon, and everyone who waits upon us.
> @@ -255,7 +265,7 @@ i915_request_put(struct i915_request *rq)
>    * that it has passed the global seqno and the global seqno is unchanged
>    * after the read, it is indeed complete).
>    */
> -static u32
> +static inline u32
>   i915_request_global_seqno(const struct i915_request *request)
>   {
>   	return READ_ONCE(request->global_seqno);
> @@ -277,6 +287,10 @@ void i915_request_skip(struct i915_request *request, int error);
>   void __i915_request_unsubmit(struct i915_request *request);
>   void i915_request_unsubmit(struct i915_request *request);
>   
> +/* Note: part of the intel_breadcrumbs family */
> +bool i915_request_enable_breadcrumb(struct i915_request *request);
> +void i915_request_cancel_breadcrumb(struct i915_request *request);
> +
>   long i915_request_wait(struct i915_request *rq,
>   		       unsigned int flags,
>   		       long timeout)
> @@ -293,6 +307,11 @@ static inline bool i915_request_signaled(const struct i915_request *rq)
>   	return test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags);
>   }
>   
> +static inline bool i915_request_is_active(const struct i915_request *rq)
> +{
> +	return test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +}
> +
>   /**
>    * Returns true if seq1 is later than seq2.
>    */
> @@ -330,6 +349,11 @@ static inline u32 hwsp_seqno(const struct i915_request *rq)
>   	return seqno;
>   }
>   
> +static inline bool __i915_request_has_started(const struct i915_request *rq)
> +{
> +	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
> +}
> +
>   /**
>    * i915_request_started - check if the request has begun being executed
>    * @rq: the request
> @@ -345,7 +369,23 @@ static inline bool i915_request_started(const struct i915_request *rq)
>   		return true;
>   
>   	/* Remember: started but may have since been preempted! */
> -	return i915_seqno_passed(hwsp_seqno(rq), rq->fence.seqno - 1);
> +	return __i915_request_has_started(rq);
> +}
> +
> +/**
> + * i915_request_is_running - check if the request may actually be executing
> + * @rq: the request
> + *
> + * Returns true if the request is currently submitted to hardware, has passed
> + * its start point (i.e. the context is setup and not busywaiting). Note that
> + * it may no longer be running by the time the function returns!
> + */
> +static inline bool i915_request_is_running(const struct i915_request *rq)
> +{
> +	if (!i915_request_is_active(rq))
> +		return false;
> +
> +	return __i915_request_has_started(rq);
>   }
>   
>   static inline bool i915_request_completed(const struct i915_request *rq)
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index acf3c777e49d..4462007a681c 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -29,7 +29,7 @@ static void engine_skip_context(struct i915_request *rq)
>   
>   	spin_lock(&timeline->lock);
>   
> -	if (rq->global_seqno) {
> +	if (i915_request_is_active(rq)) {
>   		list_for_each_entry_continue(rq,
>   					     &engine->timeline.requests, link)
>   			if (rq->gem_context == hung_ctx)
> @@ -751,18 +751,20 @@ static void reset_restart(struct drm_i915_private *i915)
>   
>   static void nop_submit_request(struct i915_request *request)
>   {
> +	struct intel_engine_cs *engine = request->engine;
>   	unsigned long flags;
>   
>   	GEM_TRACE("%s fence %llx:%lld -> -EIO\n",
> -		  request->engine->name,
> -		  request->fence.context, request->fence.seqno);
> +		  engine->name, request->fence.context, request->fence.seqno);
>   	dma_fence_set_error(&request->fence, -EIO);
>   
> -	spin_lock_irqsave(&request->engine->timeline.lock, flags);
> +	spin_lock_irqsave(&engine->timeline.lock, flags);
>   	__i915_request_submit(request);
>   	i915_request_mark_complete(request);
> -	intel_engine_write_global_seqno(request->engine, request->global_seqno);
> -	spin_unlock_irqrestore(&request->engine->timeline.lock, flags);
> +	intel_engine_write_global_seqno(engine, request->global_seqno);
> +	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> +
> +	intel_engine_queue_breadcrumbs(engine);
>   }
>   
>   void i915_gem_set_wedged(struct drm_i915_private *i915)
> @@ -817,7 +819,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>   
>   	for_each_engine(engine, i915, id) {
>   		reset_finish_engine(engine);
> -		intel_engine_wakeup(engine);
> +		intel_engine_signal_breadcrumbs(engine);
>   	}
>   
>   	smp_mb__before_atomic();
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 7db1255665a8..8ae68d2fc134 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -243,7 +243,7 @@ static bool inflight(const struct i915_request *rq,
>   {
>   	const struct i915_request *active;
>   
> -	if (!rq->global_seqno)
> +	if (!i915_request_is_active(rq))
>   		return false;
>   
>   	active = port_request(engine->execlists.port);
> diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> index b58915b8708b..b0795b0ad227 100644
> --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
> @@ -29,48 +29,149 @@
>   
>   #define task_asleep(tsk) ((tsk)->state & TASK_NORMAL && !(tsk)->on_rq)
>   
> -static unsigned int __intel_breadcrumbs_wakeup(struct intel_breadcrumbs *b)
> +static void irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->i915->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->i915->irq_lock);
> +}
> +
> +static void irq_disable(struct intel_engine_cs *engine)
>   {
> -	struct intel_wait *wait;
> -	unsigned int result = 0;
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->i915->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->i915->irq_lock);
> +}
>   
> +static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
> +{
>   	lockdep_assert_held(&b->irq_lock);
>   
> -	wait = b->irq_wait;
> -	if (wait) {
> +	GEM_BUG_ON(!b->irq_enabled);
> +	if (!--b->irq_enabled)
> +		irq_disable(container_of(b,
> +					 struct intel_engine_cs,
> +					 breadcrumbs));
> +
> +	b->irq_armed = false;
> +}
> +
> +void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +
> +	if (!b->irq_armed)
> +		return;
> +
> +	spin_lock_irq(&b->irq_lock);
> +	if (b->irq_armed)
> +		__intel_breadcrumbs_disarm_irq(b);
> +	spin_unlock_irq(&b->irq_lock);
> +}
> +
> +static inline bool __request_completed(const struct i915_request *rq)
> +{
> +	return i915_seqno_passed(__hwsp_seqno(rq), rq->fence.seqno);
> +}
> +
> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine)
> +{
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct intel_context *ce, *cn;
> +	struct list_head *pos, *next;
> +	LIST_HEAD(signal);
> +
> +	spin_lock(&b->irq_lock);
> +
> +	b->irq_fired = true;
> +	if (b->irq_armed && list_empty(&b->signalers))
> +		__intel_breadcrumbs_disarm_irq(b);
> +
> +	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
> +		GEM_BUG_ON(list_empty(&ce->signals));
> +
> +		list_for_each_safe(pos, next, &ce->signals) {
> +			struct i915_request *rq =
> +				list_entry(pos, typeof(*rq), signal_link);
> +
> +			if (!__request_completed(rq))
> +				break;
> +
> +			GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL,
> +					     &rq->fence.flags));
> +			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +
> +			/*
> +			 * We may race with direct invocation of
> +			 * dma_fence_signal(), e.g. i915_request_retire(),
> +			 * in which case we can skip processing it ourselves.
> +			 */
> +			if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +				     &rq->fence.flags))
> +				continue;
> +
> +			/*
> +			 * Queue for execution after dropping the signaling
> +			 * spinlock as the callback chain may end up adding
> +			 * more signalers to the same context or engine.
> +			 */
> +			i915_request_get(rq);
> +			list_add_tail(&rq->signal_link, &signal);
> +		}
> +
>   		/*
> -		 * N.B. Since task_asleep() and ttwu are not atomic, the
> -		 * waiter may actually go to sleep after the check, causing
> -		 * us to suppress a valid wakeup. We prefer to reduce the
> -		 * number of false positive missed_breadcrumb() warnings
> -		 * at the expense of a few false negatives, as it it easy
> -		 * to trigger a false positive under heavy load. Enough
> -		 * signal should remain from genuine missed_breadcrumb()
> -		 * for us to detect in CI.
> +		 * We process the list deletion in bulk, only using a list_add
> +		 * (not list_move) above but keeping the status of
> +		 * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit.
>   		 */
> -		bool was_asleep = task_asleep(wait->tsk);
> +		if (!list_is_first(pos, &ce->signals)) {
> +			/* Advance the list to the first incomplete request */
> +			__list_del_many(&ce->signals, pos);
> +			if (&ce->signals == pos) /* now empty */
> +				list_del_init(&ce->signal_link);
> +		}
> +	}
> +
> +	spin_unlock(&b->irq_lock);
> +
> +	list_for_each_safe(pos, next, &signal) {
> +		struct i915_request *rq =
> +			list_entry(pos, typeof(*rq), signal_link);
>   
> -		result = ENGINE_WAKEUP_WAITER;
> -		if (wake_up_process(wait->tsk) && was_asleep)
> -			result |= ENGINE_WAKEUP_ASLEEP;
> +		dma_fence_signal(&rq->fence);
> +		i915_request_put(rq);
>   	}
>   
> -	return result;
> +	return !list_empty(&signal);
>   }
>   
> -unsigned int intel_engine_wakeup(struct intel_engine_cs *engine)
> +bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned long flags;
> -	unsigned int result;
> +	bool result;
>   
> -	spin_lock_irqsave(&b->irq_lock, flags);
> -	result = __intel_breadcrumbs_wakeup(b);
> -	spin_unlock_irqrestore(&b->irq_lock, flags);
> +	local_irq_disable();
> +	result = intel_engine_breadcrumbs_irq(engine);
> +	local_irq_enable();
>   
>   	return result;
>   }
>   
> +static void signal_irq_work(struct irq_work *work)
> +{
> +	struct intel_engine_cs *engine =
> +		container_of(work, typeof(*engine), breadcrumbs.irq_work);
> +
> +	intel_engine_breadcrumbs_irq(engine);
> +}
> +
>   static unsigned long wait_timeout(void)
>   {
>   	return round_jiffies_up(jiffies + DRM_I915_HANGCHECK_JIFFIES);
> @@ -94,19 +195,15 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
>   	struct intel_engine_cs *engine =
>   		from_timer(engine, t, breadcrumbs.hangcheck);
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned int irq_count;
>   
>   	if (!b->irq_armed)
>   		return;
>   
> -	irq_count = READ_ONCE(b->irq_count);
> -	if (b->hangcheck_interrupts != irq_count) {
> -		b->hangcheck_interrupts = irq_count;
> -		mod_timer(&b->hangcheck, wait_timeout());
> -		return;
> -	}
> +	if (b->irq_fired)
> +		goto rearm;
>   
> -	/* We keep the hangcheck timer alive until we disarm the irq, even
> +	/*
> +	 * We keep the hangcheck timer alive until we disarm the irq, even
>   	 * if there are no waiters at present.
>   	 *
>   	 * If the waiter was currently running, assume it hasn't had a chance
> @@ -118,10 +215,13 @@ static void intel_breadcrumbs_hangcheck(struct timer_list *t)
>   	 * but we still have a waiter. Assuming all batches complete within
>   	 * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
>   	 */
> -	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
> +	synchronize_hardirq(engine->i915->drm.irq);
> +	if (intel_engine_signal_breadcrumbs(engine)) {
>   		missed_breadcrumb(engine);
>   		mod_timer(&b->fake_irq, jiffies + 1);
>   	} else {
> +rearm:
> +		b->irq_fired = false;
>   		mod_timer(&b->hangcheck, wait_timeout());
>   	}
>   }
> @@ -140,11 +240,7 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
>   	 * oldest waiter to do the coherent seqno check.
>   	 */
>   
> -	spin_lock_irq(&b->irq_lock);
> -	if (b->irq_armed && !__intel_breadcrumbs_wakeup(b))
> -		__intel_engine_disarm_breadcrumbs(engine);
> -	spin_unlock_irq(&b->irq_lock);
> -	if (!b->irq_armed)
> +	if (!intel_engine_signal_breadcrumbs(engine) && !b->irq_armed)
>   		return;
>   
>   	/* If the user has disabled the fake-irq, restore the hangchecking */
> @@ -156,43 +252,6 @@ static void intel_breadcrumbs_fake_irq(struct timer_list *t)
>   	mod_timer(&b->fake_irq, jiffies + 1);
>   }
>   
> -static void irq_enable(struct intel_engine_cs *engine)
> -{
> -	if (!engine->irq_enable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->i915->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->i915->irq_lock);
> -}
> -
> -static void irq_disable(struct intel_engine_cs *engine)
> -{
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->i915->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->i915->irq_lock);
> -}
> -
> -void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	lockdep_assert_held(&b->irq_lock);
> -	GEM_BUG_ON(b->irq_wait);
> -	GEM_BUG_ON(!b->irq_armed);
> -
> -	GEM_BUG_ON(!b->irq_enabled);
> -	if (!--b->irq_enabled)
> -		irq_disable(engine);
> -
> -	b->irq_armed = false;
> -}
> -
>   void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> @@ -215,40 +274,6 @@ void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine)
>   	spin_unlock_irq(&b->irq_lock);
>   }
>   
> -void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct intel_wait *wait, *n;
> -
> -	if (!b->irq_armed)
> -		return;
> -
> -	/*
> -	 * We only disarm the irq when we are idle (all requests completed),
> -	 * so if the bottom-half remains asleep, it missed the request
> -	 * completion.
> -	 */
> -	if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
> -		missed_breadcrumb(engine);
> -
> -	spin_lock_irq(&b->rb_lock);
> -
> -	spin_lock(&b->irq_lock);
> -	b->irq_wait = NULL;
> -	if (b->irq_armed)
> -		__intel_engine_disarm_breadcrumbs(engine);
> -	spin_unlock(&b->irq_lock);
> -
> -	rbtree_postorder_for_each_entry_safe(wait, n, &b->waiters, node) {
> -		GEM_BUG_ON(!intel_engine_signaled(engine, wait->seqno));
> -		RB_CLEAR_NODE(&wait->node);
> -		wake_up_process(wait->tsk);
> -	}
> -	b->waiters = RB_ROOT;
> -
> -	spin_unlock_irq(&b->rb_lock);
> -}
> -
>   static bool use_fake_irq(const struct intel_breadcrumbs *b)
>   {
>   	const struct intel_engine_cs *engine =
> @@ -264,7 +289,7 @@ static bool use_fake_irq(const struct intel_breadcrumbs *b)
>   	 * engine->seqno_barrier(), a timing error that should be transient
>   	 * and unlikely to reoccur.
>   	 */
> -	return READ_ONCE(b->irq_count) == b->hangcheck_interrupts;
> +	return !b->irq_fired;
>   }
>   
>   static void enable_fake_irq(struct intel_breadcrumbs *b)
> @@ -276,7 +301,7 @@ static void enable_fake_irq(struct intel_breadcrumbs *b)
>   		mod_timer(&b->hangcheck, wait_timeout());
>   }
>   
> -static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
> +static bool __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   {
>   	struct intel_engine_cs *engine =
>   		container_of(b, struct intel_engine_cs, breadcrumbs);
> @@ -315,536 +340,149 @@ static bool __intel_breadcrumbs_enable_irq(struct intel_breadcrumbs *b)
>   	return enabled;
>   }
>   
> -static inline struct intel_wait *to_wait(struct rb_node *node)
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	return rb_entry(node, struct intel_wait, node);
> -}
> +	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   
> -static inline void __intel_breadcrumbs_finish(struct intel_breadcrumbs *b,
> -					      struct intel_wait *wait)
> -{
> -	lockdep_assert_held(&b->rb_lock);
> -	GEM_BUG_ON(b->irq_wait == wait);
> +	spin_lock_init(&b->irq_lock);
> +	INIT_LIST_HEAD(&b->signalers);
>   
> -	/*
> -	 * This request is completed, so remove it from the tree, mark it as
> -	 * complete, and *then* wake up the associated task. N.B. when the
> -	 * task wakes up, it will find the empty rb_node, discern that it
> -	 * has already been removed from the tree and skip the serialisation
> -	 * of the b->rb_lock and b->irq_lock. This means that the destruction
> -	 * of the intel_wait is not serialised with the interrupt handler
> -	 * by the waiter - it must instead be serialised by the caller.
> -	 */
> -	rb_erase(&wait->node, &b->waiters);
> -	RB_CLEAR_NODE(&wait->node);
> +	init_irq_work(&b->irq_work, signal_irq_work);
>   
> -	if (wait->tsk->state != TASK_RUNNING)
> -		wake_up_process(wait->tsk); /* implicit smp_wmb() */
> +	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
> +	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
>   }
>   
> -static inline void __intel_breadcrumbs_next(struct intel_engine_cs *engine,
> -					    struct rb_node *next)
> +static void cancel_fake_irq(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
>   
> -	spin_lock(&b->irq_lock);
> -	GEM_BUG_ON(!b->irq_armed);
> -	GEM_BUG_ON(!b->irq_wait);
> -	b->irq_wait = to_wait(next);
> -	spin_unlock(&b->irq_lock);
> -
> -	/* We always wake up the next waiter that takes over as the bottom-half
> -	 * as we may delegate not only the irq-seqno barrier to the next waiter
> -	 * but also the task of waking up concurrent waiters.
> -	 */
> -	if (next)
> -		wake_up_process(to_wait(next)->tsk);
> +	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
> +	del_timer_sync(&b->hangcheck);
> +	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>   }
>   
> -static bool __intel_engine_add_wait(struct intel_engine_cs *engine,
> -				    struct intel_wait *wait)
> +void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct rb_node **p, *parent, *completed;
> -	bool first, armed;
> -	u32 seqno;
> +	unsigned long flags;
>   
> -	GEM_BUG_ON(!wait->seqno);
> +	spin_lock_irqsave(&b->irq_lock, flags);
>   
> -	/* Insert the request into the retirement ordered list
> -	 * of waiters by walking the rbtree. If we are the oldest
> -	 * seqno in the tree (the first to be retired), then
> -	 * set ourselves as the bottom-half.
> -	 *
> -	 * As we descend the tree, prune completed branches since we hold the
> -	 * spinlock we know that the first_waiter must be delayed and can
> -	 * reduce some of the sequential wake up latency if we take action
> -	 * ourselves and wake up the completed tasks in parallel. Also, by
> -	 * removing stale elements in the tree, we may be able to reduce the
> -	 * ping-pong between the old bottom-half and ourselves as first-waiter.
> +	/*
> +	 * Leave the fake_irq timer enabled (if it is running), but clear the
> +	 * bit so that it turns itself off on its next wake up and goes back
> +	 * to the long hangcheck interval if still required.
>   	 */
> -	armed = false;
> -	first = true;
> -	parent = NULL;
> -	completed = NULL;
> -	seqno = intel_engine_get_seqno(engine);
> -
> -	 /* If the request completed before we managed to grab the spinlock,
> -	  * return now before adding ourselves to the rbtree. We let the
> -	  * current bottom-half handle any pending wakeups and instead
> -	  * try and get out of the way quickly.
> -	  */
> -	if (i915_seqno_passed(seqno, wait->seqno)) {
> -		RB_CLEAR_NODE(&wait->node);
> -		return first;
> -	}
> -
> -	p = &b->waiters.rb_node;
> -	while (*p) {
> -		parent = *p;
> -		if (wait->seqno == to_wait(parent)->seqno) {
> -			/* We have multiple waiters on the same seqno, select
> -			 * the highest priority task (that with the smallest
> -			 * task->prio) to serve as the bottom-half for this
> -			 * group.
> -			 */
> -			if (wait->tsk->prio > to_wait(parent)->tsk->prio) {
> -				p = &parent->rb_right;
> -				first = false;
> -			} else {
> -				p = &parent->rb_left;
> -			}
> -		} else if (i915_seqno_passed(wait->seqno,
> -					     to_wait(parent)->seqno)) {
> -			p = &parent->rb_right;
> -			if (i915_seqno_passed(seqno, to_wait(parent)->seqno))
> -				completed = parent;
> -			else
> -				first = false;
> -		} else {
> -			p = &parent->rb_left;
> -		}
> -	}
> -	rb_link_node(&wait->node, parent, p);
> -	rb_insert_color(&wait->node, &b->waiters);
> -
> -	if (first) {
> -		spin_lock(&b->irq_lock);
> -		b->irq_wait = wait;
> -		/* After assigning ourselves as the new bottom-half, we must
> -		 * perform a cursory check to prevent a missed interrupt.
> -		 * Either we miss the interrupt whilst programming the hardware,
> -		 * or if there was a previous waiter (for a later seqno) they
> -		 * may be woken instead of us (due to the inherent race
> -		 * in the unlocked read of b->irq_seqno_bh in the irq handler)
> -		 * and so we miss the wake up.
> -		 */
> -		armed = __intel_breadcrumbs_enable_irq(b);
> -		spin_unlock(&b->irq_lock);
> -	}
> -
> -	if (completed) {
> -		/* Advance the bottom-half (b->irq_wait) before we wake up
> -		 * the waiters who may scribble over their intel_wait
> -		 * just as the interrupt handler is dereferencing it via
> -		 * b->irq_wait.
> -		 */
> -		if (!first) {
> -			struct rb_node *next = rb_next(completed);
> -			GEM_BUG_ON(next == &wait->node);
> -			__intel_breadcrumbs_next(engine, next);
> -		}
> -
> -		do {
> -			struct intel_wait *crumb = to_wait(completed);
> -			completed = rb_prev(completed);
> -			__intel_breadcrumbs_finish(b, crumb);
> -		} while (completed);
> -	}
> -
> -	GEM_BUG_ON(!b->irq_wait);
> -	GEM_BUG_ON(!b->irq_armed);
> -	GEM_BUG_ON(rb_first(&b->waiters) != &b->irq_wait->node);
> -
> -	return armed;
> -}
> -
> -bool intel_engine_add_wait(struct intel_engine_cs *engine,
> -			   struct intel_wait *wait)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	bool armed;
> -
> -	spin_lock_irq(&b->rb_lock);
> -	armed = __intel_engine_add_wait(engine, wait);
> -	spin_unlock_irq(&b->rb_lock);
> -	if (armed)
> -		return armed;
> -
> -	/* Make the caller recheck if its request has already started. */
> -	return intel_engine_has_started(engine, wait->seqno);
> -}
> -
> -static inline bool chain_wakeup(struct rb_node *rb, int priority)
> -{
> -	return rb && to_wait(rb)->tsk->prio <= priority;
> -}
> +	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
>   
> -static inline int wakeup_priority(struct intel_breadcrumbs *b,
> -				  struct task_struct *tsk)
> -{
> -	if (tsk == b->signaler)
> -		return INT_MIN;
> +	if (b->irq_enabled)
> +		irq_enable(engine);
>   	else
> -		return tsk->prio;
> -}
> -
> -static void __intel_engine_remove_wait(struct intel_engine_cs *engine,
> -				       struct intel_wait *wait)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	lockdep_assert_held(&b->rb_lock);
> -
> -	if (RB_EMPTY_NODE(&wait->node))
> -		goto out;
> -
> -	if (b->irq_wait == wait) {
> -		const int priority = wakeup_priority(b, wait->tsk);
> -		struct rb_node *next;
> -
> -		/* We are the current bottom-half. Find the next candidate,
> -		 * the first waiter in the queue on the remaining oldest
> -		 * request. As multiple seqnos may complete in the time it
> -		 * takes us to wake up and find the next waiter, we have to
> -		 * wake up that waiter for it to perform its own coherent
> -		 * completion check.
> -		 */
> -		next = rb_next(&wait->node);
> -		if (chain_wakeup(next, priority)) {
> -			/* If the next waiter is already complete,
> -			 * wake it up and continue onto the next waiter. So
> -			 * if have a small herd, they will wake up in parallel
> -			 * rather than sequentially, which should reduce
> -			 * the overall latency in waking all the completed
> -			 * clients.
> -			 *
> -			 * However, waking up a chain adds extra latency to
> -			 * the first_waiter. This is undesirable if that
> -			 * waiter is a high priority task.
> -			 */
> -			u32 seqno = intel_engine_get_seqno(engine);
> -
> -			while (i915_seqno_passed(seqno, to_wait(next)->seqno)) {
> -				struct rb_node *n = rb_next(next);
> -
> -				__intel_breadcrumbs_finish(b, to_wait(next));
> -				next = n;
> -				if (!chain_wakeup(next, priority))
> -					break;
> -			}
> -		}
> -
> -		__intel_breadcrumbs_next(engine, next);
> -	} else {
> -		GEM_BUG_ON(rb_first(&b->waiters) == &wait->node);
> -	}
> -
> -	GEM_BUG_ON(RB_EMPTY_NODE(&wait->node));
> -	rb_erase(&wait->node, &b->waiters);
> -	RB_CLEAR_NODE(&wait->node);
> +		irq_disable(engine);
>   
> -out:
> -	GEM_BUG_ON(b->irq_wait == wait);
> -	GEM_BUG_ON(rb_first(&b->waiters) !=
> -		   (b->irq_wait ? &b->irq_wait->node : NULL));
> +	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
>   
> -void intel_engine_remove_wait(struct intel_engine_cs *engine,
> -			      struct intel_wait *wait)
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	/* Quick check to see if this waiter was already decoupled from
> -	 * the tree by the bottom-half to avoid contention on the spinlock
> -	 * by the herd.
> -	 */
> -	if (RB_EMPTY_NODE(&wait->node)) {
> -		GEM_BUG_ON(READ_ONCE(b->irq_wait) == wait);
> -		return;
> -	}
> -
> -	spin_lock_irq(&b->rb_lock);
> -	__intel_engine_remove_wait(engine, wait);
> -	spin_unlock_irq(&b->rb_lock);
> +	cancel_fake_irq(engine);
>   }
>   
> -static void signaler_set_rtpriority(void)
> +bool i915_request_enable_breadcrumb(struct i915_request *rq)
>   {
> -	 struct sched_param param = { .sched_priority = 1 };
> -
> -	 sched_setscheduler_nocheck(current, SCHED_FIFO, &param);
> -}
> +	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>   
> -static int intel_breadcrumbs_signaler(void *arg)
> -{
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct i915_request *rq, *n;
> +	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
>   
> -	/* Install ourselves with high priority to reduce signalling latency */
> -	signaler_set_rtpriority();
> +	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> +		return true;
>   
> -	do {
> -		bool do_schedule = true;
> -		LIST_HEAD(list);
> -		u32 seqno;
> +	spin_lock(&b->irq_lock);
> +	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) &&
> +	    !__request_completed(rq)) {
> +		struct intel_context *ce = rq->hw_context;
> +		struct list_head *pos;
>   
> -		set_current_state(TASK_INTERRUPTIBLE);
> -		if (list_empty(&b->signals))
> -			goto sleep;
> +		__intel_breadcrumbs_arm_irq(b);
>   
>   		/*
> -		 * We are either woken up by the interrupt bottom-half,
> -		 * or by a client adding a new signaller. In both cases,
> -		 * the GPU seqno may have advanced beyond our oldest signal.
> -		 * If it has, propagate the signal, remove the waiter and
> -		 * check again with the next oldest signal. Otherwise we
> -		 * need to wait for a new interrupt from the GPU or for
> -		 * a new client.
> +		 * We keep the seqno in retirement order, so we can break
> +		 * inside intel_engine_breadcrumbs_irq as soon as we've passed
> +		 * the last completed request (or seen a request that hasn't
> +		 * event started). We could iterate the timeline->requests list,
> +		 * but keeping a separate signalers_list has the advantage of
> +		 * hopefully being much smaller than the full list and so
> +		 * provides faster iteration and detection when there are no
> +		 * more interrupts required for this context.
> +		 *
> +		 * We typically expect to add new signalers in order, so we
> +		 * start looking for our insertion point from the tail of
> +		 * the list.
>   		 */
> -		seqno = intel_engine_get_seqno(engine);
> -
> -		spin_lock_irq(&b->rb_lock);
> -		list_for_each_entry_safe(rq, n, &b->signals, signaling.link) {
> -			u32 this = rq->signaling.wait.seqno;
> +		list_for_each_prev(pos, &ce->signals) {
> +			struct i915_request *it =
> +				list_entry(pos, typeof(*it), signal_link);
>   
> -			GEM_BUG_ON(!rq->signaling.wait.seqno);
> -
> -			if (!i915_seqno_passed(seqno, this))
> +			if (i915_seqno_passed(rq->fence.seqno, it->fence.seqno))
>   				break;
> -
> -			if (likely(this == i915_request_global_seqno(rq))) {
> -				__intel_engine_remove_wait(engine,
> -							   &rq->signaling.wait);
> -
> -				rq->signaling.wait.seqno = 0;
> -				__list_del_entry(&rq->signaling.link);
> -
> -				if (!i915_request_signaled(rq)) {
> -					list_add_tail(&rq->signaling.link,
> -						      &list);
> -					i915_request_get(rq);
> -				}
> -			}
>   		}
> -		spin_unlock_irq(&b->rb_lock);
> -
> -		if (!list_empty(&list)) {
> -			local_bh_disable();
> -			list_for_each_entry_safe(rq, n, &list, signaling.link) {
> -				dma_fence_signal(&rq->fence);
> -				GEM_BUG_ON(!i915_request_completed(rq));
> -				i915_request_put(rq);
> -			}
> -			local_bh_enable(); /* kick start the tasklets */
> -
> -			/*
> -			 * If the engine is saturated we may be continually
> -			 * processing completed requests. This angers the
> -			 * NMI watchdog if we never let anything else
> -			 * have access to the CPU. Let's pretend to be nice
> -			 * and relinquish the CPU if we burn through the
> -			 * entire RT timeslice!
> -			 */
> -			do_schedule = need_resched();
> -		}
> -
> -		if (unlikely(do_schedule)) {
> -sleep:
> -			if (kthread_should_park())
> -				kthread_parkme();
> -
> -			if (unlikely(kthread_should_stop()))
> -				break;
> -
> -			schedule();
> -		}
> -	} while (1);
> -	__set_current_state(TASK_RUNNING);
> -
> -	return 0;
> -}
> +		list_add(&rq->signal_link, pos);
> +		if (pos == &ce->signals) /* catch transitions from empty list */
> +			list_move_tail(&ce->signal_link, &b->signalers);
>   
> -static void insert_signal(struct intel_breadcrumbs *b,
> -			  struct i915_request *request,
> -			  const u32 seqno)
> -{
> -	struct i915_request *iter;
> -
> -	lockdep_assert_held(&b->rb_lock);
> -
> -	/*
> -	 * A reasonable assumption is that we are called to add signals
> -	 * in sequence, as the requests are submitted for execution and
> -	 * assigned a global_seqno. This will be the case for the majority
> -	 * of internally generated signals (inter-engine signaling).
> -	 *
> -	 * Out of order waiters triggering random signaling enabling will
> -	 * be more problematic, but hopefully rare enough and the list
> -	 * small enough that the O(N) insertion sort is not an issue.
> -	 */
> -
> -	list_for_each_entry_reverse(iter, &b->signals, signaling.link)
> -		if (i915_seqno_passed(seqno, iter->signaling.wait.seqno))
> -			break;
> -
> -	list_add(&request->signaling.link, &iter->signaling.link);
> -}
> -
> -bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup)
> -{
> -	struct intel_engine_cs *engine = request->engine;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct intel_wait *wait = &request->signaling.wait;
> -	u32 seqno;
> -
> -	/*
> -	 * Note that we may be called from an interrupt handler on another
> -	 * device (e.g. nouveau signaling a fence completion causing us
> -	 * to submit a request, and so enable signaling). As such,
> -	 * we need to make sure that all other users of b->rb_lock protect
> -	 * against interrupts, i.e. use spin_lock_irqsave.
> -	 */
> -
> -	/* locked by dma_fence_enable_sw_signaling() (irqsafe fence->lock) */
> -	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&request->lock);
> -
> -	seqno = i915_request_global_seqno(request);
> -	if (!seqno) /* will be enabled later upon execution */
> -		return true;
> -
> -	GEM_BUG_ON(wait->seqno);
> -	wait->tsk = b->signaler;
> -	wait->request = request;
> -	wait->seqno = seqno;
> -
> -	/*
> -	 * Add ourselves into the list of waiters, but registering our
> -	 * bottom-half as the signaller thread. As per usual, only the oldest
> -	 * waiter (not just signaller) is tasked as the bottom-half waking
> -	 * up all completed waiters after the user interrupt.
> -	 *
> -	 * If we are the oldest waiter, enable the irq (after which we
> -	 * must double check that the seqno did not complete).
> -	 */
> -	spin_lock(&b->rb_lock);
> -	insert_signal(b, request, seqno);
> -	wakeup &= __intel_engine_add_wait(engine, wait);
> -	spin_unlock(&b->rb_lock);
> -
> -	if (wakeup) {
> -		wake_up_process(b->signaler);
> -		return !intel_wait_complete(wait);
> +		set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>   	}
> +	spin_unlock(&b->irq_lock);
>   
> -	return true;
> +	return !__request_completed(rq);
>   }
>   
> -void intel_engine_cancel_signaling(struct i915_request *request)
> +void i915_request_cancel_breadcrumb(struct i915_request *rq)
>   {
> -	struct intel_engine_cs *engine = request->engine;
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	struct intel_breadcrumbs *b = &rq->engine->breadcrumbs;
>   
> -	GEM_BUG_ON(!irqs_disabled());
> -	lockdep_assert_held(&request->lock);
> -
> -	if (!READ_ONCE(request->signaling.wait.seqno))
> +	if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
>   		return;
>   
> -	spin_lock(&b->rb_lock);
> -	__intel_engine_remove_wait(engine, &request->signaling.wait);
> -	if (fetch_and_zero(&request->signaling.wait.seqno))
> -		__list_del_entry(&request->signaling.link);
> -	spin_unlock(&b->rb_lock);
> -}
> -
> -int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct task_struct *tsk;
> -
> -	spin_lock_init(&b->rb_lock);
> -	spin_lock_init(&b->irq_lock);
> -
> -	timer_setup(&b->fake_irq, intel_breadcrumbs_fake_irq, 0);
> -	timer_setup(&b->hangcheck, intel_breadcrumbs_hangcheck, 0);
> -
> -	INIT_LIST_HEAD(&b->signals);
> -
> -	/* Spawn a thread to provide a common bottom-half for all signals.
> -	 * As this is an asynchronous interface we cannot steal the current
> -	 * task for handling the bottom-half to the user interrupt, therefore
> -	 * we create a thread to do the coherent seqno dance after the
> -	 * interrupt and then signal the waitqueue (via the dma-buf/fence).
> -	 */
> -	tsk = kthread_run(intel_breadcrumbs_signaler, engine,
> -			  "i915/signal:%d", engine->id);
> -	if (IS_ERR(tsk))
> -		return PTR_ERR(tsk);
> -
> -	b->signaler = tsk;
> -
> -	return 0;
> -}
> +	spin_lock(&b->irq_lock);
> +	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
> +		struct intel_context *ce = rq->hw_context;
>   
> -static void cancel_fake_irq(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +		list_del(&rq->signal_link);
> +		if (list_empty(&ce->signals))
> +			list_del_init(&ce->signal_link);
>   
> -	del_timer_sync(&b->fake_irq); /* may queue b->hangcheck */
> -	del_timer_sync(&b->hangcheck);
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> +		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +	}
> +	spin_unlock(&b->irq_lock);
>   }
>   
> -void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
> +void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> +				    struct drm_printer *p)
>   {
>   	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	unsigned long flags;
> +	struct intel_context *ce;
> +	struct i915_request *rq;
>   
> -	spin_lock_irqsave(&b->irq_lock, flags);
> -
> -	/*
> -	 * Leave the fake_irq timer enabled (if it is running), but clear the
> -	 * bit so that it turns itself off on its next wake up and goes back
> -	 * to the long hangcheck interval if still required.
> -	 */
> -	clear_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
> -
> -	if (b->irq_enabled)
> -		irq_enable(engine);
> -	else
> -		irq_disable(engine);
> -
> -	spin_unlock_irqrestore(&b->irq_lock, flags);
> -}
> -
> -void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> +	if (list_empty(&b->signalers))
> +		return;
>   
> -	/* The engines should be idle and all requests accounted for! */
> -	WARN_ON(READ_ONCE(b->irq_wait));
> -	WARN_ON(!RB_EMPTY_ROOT(&b->waiters));
> -	WARN_ON(!list_empty(&b->signals));
> +	drm_printf(p, "Signals:\n");
>   
> -	if (!IS_ERR_OR_NULL(b->signaler))
> -		kthread_stop(b->signaler);
> +	spin_lock_irq(&b->irq_lock);
> +	list_for_each_entry(ce, &b->signalers, signal_link) {
> +		list_for_each_entry(rq, &ce->signals, signal_link) {
> +			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
> +				   rq->fence.context, rq->fence.seqno,
> +				   i915_request_completed(rq) ? "!" :
> +				   i915_request_started(rq) ? "*" :
> +				   "",
> +				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
> +		}
> +	}
> +	spin_unlock_irq(&b->irq_lock);
>   
> -	cancel_fake_irq(engine);
> +	if (test_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings))
> +		drm_printf(p, "Fake irq active\n");
>   }
> -
> -#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> -#include "selftests/intel_breadcrumbs.c"
> -#endif
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index d25e510b7b08..ee43c98227bb 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -458,12 +458,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
>   {
>   	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -
> -	/* After manually advancing the seqno, fake the interrupt in case
> -	 * there are any waiters for that seqno.
> -	 */
> -	intel_engine_wakeup(engine);
> -
>   	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
>   }
>   
> @@ -607,6 +601,7 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
>   
>   	i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE);
>   
> +	intel_engine_init_breadcrumbs(engine);
>   	intel_engine_init_execlist(engine);
>   	intel_engine_init_hangcheck(engine);
>   	intel_engine_init_batch_pool(engine);
> @@ -717,20 +712,14 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   		}
>   	}
>   
> -	ret = intel_engine_init_breadcrumbs(engine);
> -	if (ret)
> -		goto err_unpin_preempt;
> -
>   	ret = measure_breadcrumb_dw(engine);
>   	if (ret < 0)
> -		goto err_breadcrumbs;
> +		goto err_unpin_preempt;
>   
>   	engine->emit_fini_breadcrumb_dw = ret;
>   
>   	return 0;
>   
> -err_breadcrumbs:
> -	intel_engine_fini_breadcrumbs(engine);
>   err_unpin_preempt:
>   	if (i915->preempt_context)
>   		__intel_context_unpin(i915->preempt_context, engine);
> @@ -1294,12 +1283,14 @@ static void print_request(struct drm_printer *m,
>   
>   	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
>   
> -	drm_printf(m, "%s%x%s [%llx:%llx]%s @ %dms: %s\n",
> +	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
>   		   prefix,
>   		   rq->global_seqno,
>   		   i915_request_completed(rq) ? "!" :
>   		   i915_request_started(rq) ? "*" :
>   		   "",
> +		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +			    &rq->fence.flags) ?  "+" : "",
>   		   rq->fence.context, rq->fence.seqno,
>   		   buf,
>   		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
> @@ -1492,12 +1483,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   		       struct drm_printer *m,
>   		       const char *header, ...)
>   {
> -	struct intel_breadcrumbs * const b = &engine->breadcrumbs;
>   	struct i915_gpu_error * const error = &engine->i915->gpu_error;
>   	struct i915_request *rq;
>   	intel_wakeref_t wakeref;
> -	unsigned long flags;
> -	struct rb_node *rb;
>   
>   	if (header) {
>   		va_list ap;
> @@ -1565,21 +1553,12 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   
>   	intel_execlists_show_requests(engine, m, print_request, 8);
>   
> -	spin_lock_irqsave(&b->rb_lock, flags);
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = rb_entry(rb, typeof(*w), node);
> -
> -		drm_printf(m, "\t%s [%d:%c] waiting for %x\n",
> -			   w->tsk->comm, w->tsk->pid,
> -			   task_state_to_char(w->tsk),
> -			   w->seqno);
> -	}
> -	spin_unlock_irqrestore(&b->rb_lock, flags);
> -
>   	drm_printf(m, "HWSP:\n");
>   	hexdump(m, engine->status_page.addr, PAGE_SIZE);
>   
>   	drm_printf(m, "Idle? %s\n", yesno(intel_engine_is_idle(engine)));
> +
> +	intel_engine_print_breadcrumbs(engine, m);
>   }
>   
>   static u8 user_class_map[] = {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 668ed67336a2..b889b27f8aeb 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -743,7 +743,7 @@ static int init_ring_common(struct intel_engine_cs *engine)
>   	}
>   
>   	/* Papering over lost _interrupts_ immediately following the restart */
> -	intel_engine_wakeup(engine);
> +	intel_engine_queue_breadcrumbs(engine);
>   out:
>   	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index ae753c45cbe7..efef2aa1abce 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -5,6 +5,7 @@
>   #include <drm/drm_util.h>
>   
>   #include <linux/hashtable.h>
> +#include <linux/irq_work.h>
>   #include <linux/seqlock.h>
>   
>   #include "i915_gem_batch_pool.h"
> @@ -381,22 +382,19 @@ struct intel_engine_cs {
>   	 * the overhead of waking that client is much preferred.
>   	 */
>   	struct intel_breadcrumbs {
> -		spinlock_t irq_lock; /* protects irq_*; irqsafe */
> -		struct intel_wait *irq_wait; /* oldest waiter by retirement */
> +		spinlock_t irq_lock;
> +		struct list_head signalers;
>   
> -		spinlock_t rb_lock; /* protects the rb and wraps irq_lock */
> -		struct rb_root waiters; /* sorted by retirement, priority */
> -		struct list_head signals; /* sorted by retirement */
> -		struct task_struct *signaler; /* used for fence signalling */
> +		struct irq_work irq_work; /* for use from inside irq_lock */
>   
>   		struct timer_list fake_irq; /* used after a missed interrupt */
>   		struct timer_list hangcheck; /* detect missed interrupts */
>   
>   		unsigned int hangcheck_interrupts;
>   		unsigned int irq_enabled;
> -		unsigned int irq_count;
>   
> -		bool irq_armed : 1;
> +		bool irq_armed;
> +		bool irq_fired;
>   	} breadcrumbs;
>   
>   	struct {
> @@ -885,83 +883,29 @@ static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
>   void intel_engine_get_instdone(struct intel_engine_cs *engine,
>   			       struct intel_instdone *instdone);
>   
> -/* intel_breadcrumbs.c -- user interrupt bottom-half for waiters */
> -int intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> -
> -static inline void intel_wait_init(struct intel_wait *wait)
> -{
> -	wait->tsk = current;
> -	wait->request = NULL;
> -}
> -
> -static inline void intel_wait_init_for_seqno(struct intel_wait *wait, u32 seqno)
> -{
> -	wait->tsk = current;
> -	wait->seqno = seqno;
> -}
> -
> -static inline bool intel_wait_has_seqno(const struct intel_wait *wait)
> -{
> -	return wait->seqno;
> -}
> -
> -static inline bool
> -intel_wait_update_seqno(struct intel_wait *wait, u32 seqno)
> -{
> -	wait->seqno = seqno;
> -	return intel_wait_has_seqno(wait);
> -}
> -
> -static inline bool
> -intel_wait_update_request(struct intel_wait *wait,
> -			  const struct i915_request *rq)
> -{
> -	return intel_wait_update_seqno(wait, i915_request_global_seqno(rq));
> -}
> -
> -static inline bool
> -intel_wait_check_seqno(const struct intel_wait *wait, u32 seqno)
> -{
> -	return wait->seqno == seqno;
> -}
> -
> -static inline bool
> -intel_wait_check_request(const struct intel_wait *wait,
> -			 const struct i915_request *rq)
> -{
> -	return intel_wait_check_seqno(wait, i915_request_global_seqno(rq));
> -}
> +void intel_engine_init_breadcrumbs(struct intel_engine_cs *engine);
> +void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>   
> -static inline bool intel_wait_complete(const struct intel_wait *wait)
> -{
> -	return RB_EMPTY_NODE(&wait->node);
> -}
> +void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
> +void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
>   
> -bool intel_engine_add_wait(struct intel_engine_cs *engine,
> -			   struct intel_wait *wait);
> -void intel_engine_remove_wait(struct intel_engine_cs *engine,
> -			      struct intel_wait *wait);
> -bool intel_engine_enable_signaling(struct i915_request *request, bool wakeup);
> -void intel_engine_cancel_signaling(struct i915_request *request);
> +bool intel_engine_signal_breadcrumbs(struct intel_engine_cs *engine);
> +void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
>   
> -static inline bool intel_engine_has_waiter(const struct intel_engine_cs *engine)
> +static inline void
> +intel_engine_queue_breadcrumbs(struct intel_engine_cs *engine)
>   {
> -	return READ_ONCE(engine->breadcrumbs.irq_wait);
> +	irq_work_queue(&engine->breadcrumbs.irq_work);
>   }
>   
> -unsigned int intel_engine_wakeup(struct intel_engine_cs *engine);
> -#define ENGINE_WAKEUP_WAITER BIT(0)
> -#define ENGINE_WAKEUP_ASLEEP BIT(1)
> -
> -void intel_engine_pin_breadcrumbs_irq(struct intel_engine_cs *engine);
> -void intel_engine_unpin_breadcrumbs_irq(struct intel_engine_cs *engine);
> -
> -void __intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
> -void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine);
> +bool intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine);
>   
>   void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine);
>   void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine);
>   
> +void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> +				    struct drm_printer *p);
> +
>   static inline u32 *gen8_emit_pipe_control(u32 *batch, u32 flags, u32 offset)
>   {
>   	memset(batch, 0, 6 * sizeof(u32));
> diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> index 4a83a1c6c406..88e5ab586337 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> +++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
> @@ -15,7 +15,6 @@ selftest(scatterlist, scatterlist_mock_selftests)
>   selftest(syncmap, i915_syncmap_mock_selftests)
>   selftest(uncore, intel_uncore_mock_selftests)
>   selftest(engine, intel_engine_cs_mock_selftests)
> -selftest(breadcrumbs, intel_breadcrumbs_mock_selftests)
>   selftest(timelines, i915_timeline_mock_selftests)
>   selftest(requests, i915_request_mock_selftests)
>   selftest(objects, i915_gem_object_mock_selftests)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 4d4b86b5fa11..6733dc5b6b4c 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -25,9 +25,12 @@
>   #include <linux/prime_numbers.h>
>   
>   #include "../i915_selftest.h"
> +#include "i915_random.h"
>   #include "igt_live_test.h"
> +#include "lib_sw_fence.h"
>   
>   #include "mock_context.h"
> +#include "mock_drm.h"
>   #include "mock_gem_device.h"
>   
>   static int igt_add_request(void *arg)
> @@ -247,6 +250,254 @@ static int igt_request_rewind(void *arg)
>   	return err;
>   }
>   
> +struct smoketest {
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context **contexts;
> +	atomic_long_t num_waits, num_fences;
> +	int ncontexts, max_batch;
> +	struct i915_request *(*request_alloc)(struct i915_gem_context *,
> +					      struct intel_engine_cs *);
> +};
> +
> +static struct i915_request *
> +__mock_request_alloc(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine)
> +{
> +	return mock_request(engine, ctx, 0);
> +}
> +
> +static struct i915_request *
> +__live_request_alloc(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine)
> +{
> +	return i915_request_alloc(engine, ctx);
> +}
> +
> +static int __igt_breadcrumbs_smoketest(void *arg)
> +{
> +	struct smoketest *t = arg;
> +	struct mutex * const BKL = &t->engine->i915->drm.struct_mutex;
> +	const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
> +	const unsigned int total = 4 * t->ncontexts + 1;
> +	unsigned int num_waits = 0, num_fences = 0;
> +	struct i915_request **requests;
> +	I915_RND_STATE(prng);
> +	unsigned int *order;
> +	int err = 0;
> +
> +	/*
> +	 * A very simple test to catch the most egregious of list handling bugs.
> +	 *
> +	 * At its heart, we simply create oodles of requests running across
> +	 * multiple kthreads and enable signaling on them, for the sole purpose
> +	 * of stressing our breadcrumb handling. The only inspection we do is
> +	 * that the fences were marked as signaled.
> +	 */
> +
> +	requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
> +	if (!requests)
> +		return -ENOMEM;
> +
> +	order = i915_random_order(total, &prng);
> +	if (!order) {
> +		err = -ENOMEM;
> +		goto out_requests;
> +	}
> +
> +	while (!kthread_should_stop()) {
> +		struct i915_sw_fence *submit, *wait;
> +		unsigned int n, count;
> +
> +		submit = heap_fence_create(GFP_KERNEL);
> +		if (!submit) {
> +			err = -ENOMEM;
> +			break;
> +		}
> +
> +		wait = heap_fence_create(GFP_KERNEL);
> +		if (!wait) {
> +			i915_sw_fence_commit(submit);
> +			heap_fence_put(submit);
> +			err = ENOMEM;
> +			break;
> +		}
> +
> +		i915_random_reorder(order, total, &prng);
> +		count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
> +
> +		for (n = 0; n < count; n++) {
> +			struct i915_gem_context *ctx =
> +				t->contexts[order[n] % t->ncontexts];
> +			struct i915_request *rq;
> +
> +			mutex_lock(BKL);
> +
> +			rq = t->request_alloc(ctx, t->engine);
> +			if (IS_ERR(rq)) {
> +				mutex_unlock(BKL);
> +				err = PTR_ERR(rq);
> +				count = n;
> +				break;
> +			}
> +
> +			err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> +							       submit,
> +							       GFP_KERNEL);
> +
> +			requests[n] = i915_request_get(rq);
> +			i915_request_add(rq);
> +
> +			mutex_unlock(BKL);
> +
> +			if (err >= 0)
> +				err = i915_sw_fence_await_dma_fence(wait,
> +								    &rq->fence,
> +								    0,
> +								    GFP_KERNEL);
> +
> +			if (err < 0) {
> +				i915_request_put(rq);
> +				count = n;
> +				break;
> +			}
> +		}
> +
> +		i915_sw_fence_commit(submit);
> +		i915_sw_fence_commit(wait);
> +
> +		if (!wait_event_timeout(wait->wait,
> +					i915_sw_fence_done(wait),
> +					HZ / 2)) {
> +			struct i915_request *rq = requests[count - 1];
> +
> +			pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
> +			       count,
> +			       rq->fence.context, rq->fence.seqno,
> +			       t->engine->name);
> +			i915_gem_set_wedged(t->engine->i915);
> +			GEM_BUG_ON(!i915_request_completed(rq));
> +			i915_sw_fence_wait(wait);
> +			err = -EIO;
> +		}
> +
> +		for (n = 0; n < count; n++) {
> +			struct i915_request *rq = requests[n];
> +
> +			if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +				      &rq->fence.flags)) {
> +				pr_err("%llu:%llu was not signaled!\n",
> +				       rq->fence.context, rq->fence.seqno);
> +				err = -EINVAL;
> +			}
> +
> +			i915_request_put(rq);
> +		}
> +
> +		heap_fence_put(wait);
> +		heap_fence_put(submit);
> +
> +		if (err < 0)
> +			break;
> +
> +		num_fences += count;
> +		num_waits++;
> +
> +		cond_resched();
> +	}
> +
> +	atomic_long_add(num_fences, &t->num_fences);
> +	atomic_long_add(num_waits, &t->num_waits);
> +
> +	kfree(order);
> +out_requests:
> +	kfree(requests);
> +	return err;
> +}
> +
> +static int mock_breadcrumbs_smoketest(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct smoketest t = {
> +		.engine = i915->engine[RCS],
> +		.ncontexts = 1024,
> +		.max_batch = 1024,
> +		.request_alloc = __mock_request_alloc
> +	};
> +	unsigned int ncpus = num_online_cpus();
> +	struct task_struct **threads;
> +	unsigned int n;
> +	int ret = 0;
> +
> +	/*
> +	 * Smoketest our breadcrumb/signal handling for requests across multiple
> +	 * threads. A very simple test to only catch the most egregious of bugs.
> +	 * See __igt_breadcrumbs_smoketest();
> +	 */
> +
> +	threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
> +	if (!threads)
> +		return -ENOMEM;
> +
> +	t.contexts =
> +		kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
> +	if (!t.contexts) {
> +		ret = -ENOMEM;
> +		goto out_threads;
> +	}
> +
> +	mutex_lock(&t.engine->i915->drm.struct_mutex);
> +	for (n = 0; n < t.ncontexts; n++) {
> +		t.contexts[n] = mock_context(t.engine->i915, "mock");
> +		if (!t.contexts[n]) {
> +			ret = -ENOMEM;
> +			goto out_contexts;
> +		}
> +	}
> +	mutex_unlock(&t.engine->i915->drm.struct_mutex);
> +
> +	for (n = 0; n < ncpus; n++) {
> +		threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
> +					 &t, "igt/%d", n);
> +		if (IS_ERR(threads[n])) {
> +			ret = PTR_ERR(threads[n]);
> +			ncpus = n;
> +			break;
> +		}
> +
> +		get_task_struct(threads[n]);
> +	}
> +
> +	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
> +
> +	for (n = 0; n < ncpus; n++) {
> +		int err;
> +
> +		err = kthread_stop(threads[n]);
> +		if (err < 0 && !ret)
> +			ret = err;
> +
> +		put_task_struct(threads[n]);
> +	}
> +	pr_info("Completed %lu waits for %lu fence across %d cpus\n",
> +		atomic_long_read(&t.num_waits),
> +		atomic_long_read(&t.num_fences),
> +		ncpus);
> +
> +	mutex_lock(&t.engine->i915->drm.struct_mutex);
> +out_contexts:
> +	for (n = 0; n < t.ncontexts; n++) {
> +		if (!t.contexts[n])
> +			break;
> +		mock_context_close(t.contexts[n]);
> +	}
> +	mutex_unlock(&t.engine->i915->drm.struct_mutex);
> +	kfree(t.contexts);
> +out_threads:
> +	kfree(threads);
> +
> +	return ret;
> +}
> +
>   int i915_request_mock_selftests(void)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -254,6 +505,7 @@ int i915_request_mock_selftests(void)
>   		SUBTEST(igt_wait_request),
>   		SUBTEST(igt_fence_wait),
>   		SUBTEST(igt_request_rewind),
> +		SUBTEST(mock_breadcrumbs_smoketest),
>   	};
>   	struct drm_i915_private *i915;
>   	intel_wakeref_t wakeref;
> @@ -812,6 +1064,178 @@ static int live_sequential_engines(void *arg)
>   	return err;
>   }
>   
> +static int
> +max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> +{
> +	struct i915_request *rq;
> +	int ret;
> +
> +	/*
> +	 * Before execlists, all contexts share the same ringbuffer. With
> +	 * execlists, each context/engine has a separate ringbuffer and
> +	 * for the purposes of this test, inexhaustible.
> +	 *
> +	 * For the global ringbuffer though, we have to be very careful
> +	 * that we do not wrap while preventing the execution of requests
> +	 * with a unsignaled fence.
> +	 */
> +	if (HAS_EXECLISTS(ctx->i915))
> +		return INT_MAX;
> +
> +	rq = i915_request_alloc(engine, ctx);
> +	if (IS_ERR(rq)) {
> +		ret = PTR_ERR(rq);
> +	} else {
> +		int sz;
> +
> +		ret = rq->ring->size - rq->reserved_space;
> +		i915_request_add(rq);
> +
> +		sz = rq->ring->emit - rq->head;
> +		if (sz < 0)
> +			sz += rq->ring->size;
> +		ret /= sz;
> +		ret /= 2; /* leave half spare, in case of emergency! */
> +	}
> +
> +	return ret;
> +}
> +
> +static int live_breadcrumbs_smoketest(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct smoketest t[I915_NUM_ENGINES];
> +	unsigned int ncpus = num_online_cpus();
> +	unsigned long num_waits, num_fences;
> +	struct intel_engine_cs *engine;
> +	struct task_struct **threads;
> +	struct igt_live_test live;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	struct drm_file *file;
> +	unsigned int n;
> +	int ret = 0;
> +
> +	/*
> +	 * Smoketest our breadcrumb/signal handling for requests across multiple
> +	 * threads. A very simple test to only catch the most egregious of bugs.
> +	 * See __igt_breadcrumbs_smoketest();
> +	 *
> +	 * On real hardware this time.
> +	 */
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	file = mock_file(i915);
> +	if (IS_ERR(file)) {
> +		ret = PTR_ERR(file);
> +		goto out_rpm;
> +	}
> +
> +	threads = kcalloc(ncpus * I915_NUM_ENGINES,
> +			  sizeof(*threads),
> +			  GFP_KERNEL);
> +	if (!threads) {
> +		ret = -ENOMEM;
> +		goto out_file;
> +	}
> +
> +	memset(&t[0], 0, sizeof(t[0]));
> +	t[0].request_alloc = __live_request_alloc;
> +	t[0].ncontexts = 64;
> +	t[0].contexts = kmalloc_array(t[0].ncontexts,
> +				      sizeof(*t[0].contexts),
> +				      GFP_KERNEL);
> +	if (!t[0].contexts) {
> +		ret = -ENOMEM;
> +		goto out_threads;
> +	}
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	for (n = 0; n < t[0].ncontexts; n++) {
> +		t[0].contexts[n] = live_context(i915, file);
> +		if (!t[0].contexts[n]) {
> +			ret = -ENOMEM;
> +			goto out_contexts;
> +		}
> +	}
> +
> +	ret = igt_live_test_begin(&live, i915, __func__, "");
> +	if (ret)
> +		goto out_contexts;
> +
> +	for_each_engine(engine, i915, id) {
> +		t[id] = t[0];
> +		t[id].engine = engine;
> +		t[id].max_batch = max_batches(t[0].contexts[0], engine);
> +		if (t[id].max_batch < 0) {
> +			ret = t[id].max_batch;
> +			mutex_unlock(&i915->drm.struct_mutex);
> +			goto out_flush;
> +		}
> +		/* One ring interleaved between requests from all cpus */
> +		t[id].max_batch /= num_online_cpus() + 1;
> +		pr_debug("Limiting batches to %d requests on %s\n",
> +			 t[id].max_batch, engine->name);
> +
> +		for (n = 0; n < ncpus; n++) {
> +			struct task_struct *tsk;
> +
> +			tsk = kthread_run(__igt_breadcrumbs_smoketest,
> +					  &t[id], "igt/%d.%d", id, n);
> +			if (IS_ERR(tsk)) {
> +				ret = PTR_ERR(tsk);
> +				mutex_unlock(&i915->drm.struct_mutex);
> +				goto out_flush;
> +			}
> +
> +			get_task_struct(tsk);
> +			threads[id * ncpus + n] = tsk;
> +		}
> +	}
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
> +
> +out_flush:
> +	num_waits = 0;
> +	num_fences = 0;
> +	for_each_engine(engine, i915, id) {
> +		for (n = 0; n < ncpus; n++) {
> +			struct task_struct *tsk = threads[id * ncpus + n];
> +			int err;
> +
> +			if (!tsk)
> +				continue;
> +
> +			err = kthread_stop(tsk);
> +			if (err < 0 && !ret)
> +				ret = err;
> +
> +			put_task_struct(tsk);
> +		}
> +
> +		num_waits += atomic_long_read(&t[id].num_waits);
> +		num_fences += atomic_long_read(&t[id].num_fences);
> +	}
> +	pr_info("Completed %lu waits for %lu fences across %d engines and %d cpus\n",
> +		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	ret = igt_live_test_end(&live) ?: ret;
> +out_contexts:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	kfree(t[0].contexts);
> +out_threads:
> +	kfree(threads);
> +out_file:
> +	mock_file_free(i915, file);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +
> +	return ret;
> +}
> +
>   int i915_request_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -819,6 +1243,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_all_engines),
>   		SUBTEST(live_sequential_engines),
>   		SUBTEST(live_empty_request),
> +		SUBTEST(live_breadcrumbs_smoketest),
>   	};
>   
>   	if (i915_terminally_wedged(&i915->gpu_error))
> diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> index 0e70df0230b8..9ebd9225684e 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
> @@ -185,11 +185,6 @@ void igt_spinner_fini(struct igt_spinner *spin)
>   
>   bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
>   {
> -	if (!wait_event_timeout(rq->execute,
> -				READ_ONCE(rq->global_seqno),
> -				msecs_to_jiffies(10)))
> -		return false;
> -
>   	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
>   					       rq->fence.seqno),
>   			     10) &&
> diff --git a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c b/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> deleted file mode 100644
> index f03b407fdbe2..000000000000
> --- a/drivers/gpu/drm/i915/selftests/intel_breadcrumbs.c
> +++ /dev/null
> @@ -1,470 +0,0 @@
> -/*
> - * Copyright © 2016 Intel Corporation
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a
> - * copy of this software and associated documentation files (the "Software"),
> - * to deal in the Software without restriction, including without limitation
> - * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> - * and/or sell copies of the Software, and to permit persons to whom the
> - * Software is furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice (including the next
> - * paragraph) shall be included in all copies or substantial portions of the
> - * Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> - * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
> - * IN THE SOFTWARE.
> - *
> - */
> -
> -#include "../i915_selftest.h"
> -#include "i915_random.h"
> -
> -#include "mock_gem_device.h"
> -#include "mock_engine.h"
> -
> -static int check_rbtree(struct intel_engine_cs *engine,
> -			const unsigned long *bitmap,
> -			const struct intel_wait *waiters,
> -			const int count)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -	struct rb_node *rb;
> -	int n;
> -
> -	if (&b->irq_wait->node != rb_first(&b->waiters)) {
> -		pr_err("First waiter does not match first element of wait-tree\n");
> -		return -EINVAL;
> -	}
> -
> -	n = find_first_bit(bitmap, count);
> -	for (rb = rb_first(&b->waiters); rb; rb = rb_next(rb)) {
> -		struct intel_wait *w = container_of(rb, typeof(*w), node);
> -		int idx = w - waiters;
> -
> -		if (!test_bit(idx, bitmap)) {
> -			pr_err("waiter[%d, seqno=%d] removed but still in wait-tree\n",
> -			       idx, w->seqno);
> -			return -EINVAL;
> -		}
> -
> -		if (n != idx) {
> -			pr_err("waiter[%d, seqno=%d] does not match expected next element in tree [%d]\n",
> -			       idx, w->seqno, n);
> -			return -EINVAL;
> -		}
> -
> -		n = find_next_bit(bitmap, count, n + 1);
> -	}
> -
> -	return 0;
> -}
> -
> -static int check_completion(struct intel_engine_cs *engine,
> -			    const unsigned long *bitmap,
> -			    const struct intel_wait *waiters,
> -			    const int count)
> -{
> -	int n;
> -
> -	for (n = 0; n < count; n++) {
> -		if (intel_wait_complete(&waiters[n]) != !!test_bit(n, bitmap))
> -			continue;
> -
> -		pr_err("waiter[%d, seqno=%d] is %s, but expected %s\n",
> -		       n, waiters[n].seqno,
> -		       intel_wait_complete(&waiters[n]) ? "complete" : "active",
> -		       test_bit(n, bitmap) ? "active" : "complete");
> -		return -EINVAL;
> -	}
> -
> -	return 0;
> -}
> -
> -static int check_rbtree_empty(struct intel_engine_cs *engine)
> -{
> -	struct intel_breadcrumbs *b = &engine->breadcrumbs;
> -
> -	if (b->irq_wait) {
> -		pr_err("Empty breadcrumbs still has a waiter\n");
> -		return -EINVAL;
> -	}
> -
> -	if (!RB_EMPTY_ROOT(&b->waiters)) {
> -		pr_err("Empty breadcrumbs, but wait-tree not empty\n");
> -		return -EINVAL;
> -	}
> -
> -	return 0;
> -}
> -
> -static int igt_random_insert_remove(void *arg)
> -{
> -	const u32 seqno_bias = 0x1000;
> -	I915_RND_STATE(prng);
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_wait *waiters;
> -	const int count = 4096;
> -	unsigned int *order;
> -	unsigned long *bitmap;
> -	int err = -ENOMEM;
> -	int n;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
> -			 GFP_KERNEL);
> -	if (!bitmap)
> -		goto out_waiters;
> -
> -	order = i915_random_order(count, &prng);
> -	if (!order)
> -		goto out_bitmap;
> -
> -	for (n = 0; n < count; n++)
> -		intel_wait_init_for_seqno(&waiters[n], seqno_bias + n);
> -
> -	err = check_rbtree(engine, bitmap, waiters, count);
> -	if (err)
> -		goto out_order;
> -
> -	/* Add and remove waiters into the rbtree in random order. At each
> -	 * step, we verify that the rbtree is correctly ordered.
> -	 */
> -	for (n = 0; n < count; n++) {
> -		int i = order[n];
> -
> -		intel_engine_add_wait(engine, &waiters[i]);
> -		__set_bit(i, bitmap);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err)
> -			goto out_order;
> -	}
> -
> -	i915_random_reorder(order, count, &prng);
> -	for (n = 0; n < count; n++) {
> -		int i = order[n];
> -
> -		intel_engine_remove_wait(engine, &waiters[i]);
> -		__clear_bit(i, bitmap);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err)
> -			goto out_order;
> -	}
> -
> -	err = check_rbtree_empty(engine);
> -out_order:
> -	kfree(order);
> -out_bitmap:
> -	kfree(bitmap);
> -out_waiters:
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -static int igt_insert_complete(void *arg)
> -{
> -	const u32 seqno_bias = 0x1000;
> -	struct intel_engine_cs *engine = arg;
> -	struct intel_wait *waiters;
> -	const int count = 4096;
> -	unsigned long *bitmap;
> -	int err = -ENOMEM;
> -	int n, m;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	bitmap = kcalloc(DIV_ROUND_UP(count, BITS_PER_LONG), sizeof(*bitmap),
> -			 GFP_KERNEL);
> -	if (!bitmap)
> -		goto out_waiters;
> -
> -	for (n = 0; n < count; n++) {
> -		intel_wait_init_for_seqno(&waiters[n], n + seqno_bias);
> -		intel_engine_add_wait(engine, &waiters[n]);
> -		__set_bit(n, bitmap);
> -	}
> -	err = check_rbtree(engine, bitmap, waiters, count);
> -	if (err)
> -		goto out_bitmap;
> -
> -	/* On each step, we advance the seqno so that several waiters are then
> -	 * complete (we increase the seqno by increasingly larger values to
> -	 * retire more and more waiters at once). All retired waiters should
> -	 * be woken and removed from the rbtree, and so that we check.
> -	 */
> -	for (n = 0; n < count; n = m) {
> -		int seqno = 2 * n;
> -
> -		GEM_BUG_ON(find_first_bit(bitmap, count) != n);
> -
> -		if (intel_wait_complete(&waiters[n])) {
> -			pr_err("waiter[%d, seqno=%d] completed too early\n",
> -			       n, waiters[n].seqno);
> -			err = -EINVAL;
> -			goto out_bitmap;
> -		}
> -
> -		/* complete the following waiters */
> -		mock_seqno_advance(engine, seqno + seqno_bias);
> -		for (m = n; m <= seqno; m++) {
> -			if (m == count)
> -				break;
> -
> -			GEM_BUG_ON(!test_bit(m, bitmap));
> -			__clear_bit(m, bitmap);
> -		}
> -
> -		intel_engine_remove_wait(engine, &waiters[n]);
> -		RB_CLEAR_NODE(&waiters[n].node);
> -
> -		err = check_rbtree(engine, bitmap, waiters, count);
> -		if (err) {
> -			pr_err("rbtree corrupt after seqno advance to %d\n",
> -			       seqno + seqno_bias);
> -			goto out_bitmap;
> -		}
> -
> -		err = check_completion(engine, bitmap, waiters, count);
> -		if (err) {
> -			pr_err("completions after seqno advance to %d failed\n",
> -			       seqno + seqno_bias);
> -			goto out_bitmap;
> -		}
> -	}
> -
> -	err = check_rbtree_empty(engine);
> -out_bitmap:
> -	kfree(bitmap);
> -out_waiters:
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -struct igt_wakeup {
> -	struct task_struct *tsk;
> -	atomic_t *ready, *set, *done;
> -	struct intel_engine_cs *engine;
> -	unsigned long flags;
> -#define STOP 0
> -#define IDLE 1
> -	wait_queue_head_t *wq;
> -	u32 seqno;
> -};
> -
> -static bool wait_for_ready(struct igt_wakeup *w)
> -{
> -	DEFINE_WAIT(ready);
> -
> -	set_bit(IDLE, &w->flags);
> -	if (atomic_dec_and_test(w->done))
> -		wake_up_var(w->done);
> -
> -	if (test_bit(STOP, &w->flags))
> -		goto out;
> -
> -	for (;;) {
> -		prepare_to_wait(w->wq, &ready, TASK_INTERRUPTIBLE);
> -		if (atomic_read(w->ready) == 0)
> -			break;
> -
> -		schedule();
> -	}
> -	finish_wait(w->wq, &ready);
> -
> -out:
> -	clear_bit(IDLE, &w->flags);
> -	if (atomic_dec_and_test(w->set))
> -		wake_up_var(w->set);
> -
> -	return !test_bit(STOP, &w->flags);
> -}
> -
> -static int igt_wakeup_thread(void *arg)
> -{
> -	struct igt_wakeup *w = arg;
> -	struct intel_wait wait;
> -
> -	while (wait_for_ready(w)) {
> -		GEM_BUG_ON(kthread_should_stop());
> -
> -		intel_wait_init_for_seqno(&wait, w->seqno);
> -		intel_engine_add_wait(w->engine, &wait);
> -		for (;;) {
> -			set_current_state(TASK_UNINTERRUPTIBLE);
> -			if (i915_seqno_passed(intel_engine_get_seqno(w->engine),
> -					      w->seqno))
> -				break;
> -
> -			if (test_bit(STOP, &w->flags)) /* emergency escape */
> -				break;
> -
> -			schedule();
> -		}
> -		intel_engine_remove_wait(w->engine, &wait);
> -		__set_current_state(TASK_RUNNING);
> -	}
> -
> -	return 0;
> -}
> -
> -static void igt_wake_all_sync(atomic_t *ready,
> -			      atomic_t *set,
> -			      atomic_t *done,
> -			      wait_queue_head_t *wq,
> -			      int count)
> -{
> -	atomic_set(set, count);
> -	atomic_set(ready, 0);
> -	wake_up_all(wq);
> -
> -	wait_var_event(set, !atomic_read(set));
> -	atomic_set(ready, count);
> -	atomic_set(done, count);
> -}
> -
> -static int igt_wakeup(void *arg)
> -{
> -	I915_RND_STATE(prng);
> -	struct intel_engine_cs *engine = arg;
> -	struct igt_wakeup *waiters;
> -	DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq);
> -	const int count = 4096;
> -	const u32 max_seqno = count / 4;
> -	atomic_t ready, set, done;
> -	int err = -ENOMEM;
> -	int n, step;
> -
> -	mock_engine_reset(engine);
> -
> -	waiters = kvmalloc_array(count, sizeof(*waiters), GFP_KERNEL);
> -	if (!waiters)
> -		goto out_engines;
> -
> -	/* Create a large number of threads, each waiting on a random seqno.
> -	 * Multiple waiters will be waiting for the same seqno.
> -	 */
> -	atomic_set(&ready, count);
> -	for (n = 0; n < count; n++) {
> -		waiters[n].wq = &wq;
> -		waiters[n].ready = &ready;
> -		waiters[n].set = &set;
> -		waiters[n].done = &done;
> -		waiters[n].engine = engine;
> -		waiters[n].flags = BIT(IDLE);
> -
> -		waiters[n].tsk = kthread_run(igt_wakeup_thread, &waiters[n],
> -					     "i915/igt:%d", n);
> -		if (IS_ERR(waiters[n].tsk))
> -			goto out_waiters;
> -
> -		get_task_struct(waiters[n].tsk);
> -	}
> -
> -	for (step = 1; step <= max_seqno; step <<= 1) {
> -		u32 seqno;
> -
> -		/* The waiter threads start paused as we assign them a random
> -		 * seqno and reset the engine. Once the engine is reset,
> -		 * we signal that the threads may begin their wait upon their
> -		 * seqno.
> -		 */
> -		for (n = 0; n < count; n++) {
> -			GEM_BUG_ON(!test_bit(IDLE, &waiters[n].flags));
> -			waiters[n].seqno =
> -				1 + prandom_u32_state(&prng) % max_seqno;
> -		}
> -		mock_seqno_advance(engine, 0);
> -		igt_wake_all_sync(&ready, &set, &done, &wq, count);
> -
> -		/* Simulate the GPU doing chunks of work, with one or more
> -		 * seqno appearing to finish at the same time. A random number
> -		 * of threads will be waiting upon the update and hopefully be
> -		 * woken.
> -		 */
> -		for (seqno = 1; seqno <= max_seqno + step; seqno += step) {
> -			usleep_range(50, 500);
> -			mock_seqno_advance(engine, seqno);
> -		}
> -		GEM_BUG_ON(intel_engine_get_seqno(engine) < 1 + max_seqno);
> -
> -		/* With the seqno now beyond any of the waiting threads, they
> -		 * should all be woken, see that they are complete and signal
> -		 * that they are ready for the next test. We wait until all
> -		 * threads are complete and waiting for us (i.e. not a seqno).
> -		 */
> -		if (!wait_var_event_timeout(&done,
> -					    !atomic_read(&done), 10 * HZ)) {
> -			pr_err("Timed out waiting for %d remaining waiters\n",
> -			       atomic_read(&done));
> -			err = -ETIMEDOUT;
> -			break;
> -		}
> -
> -		err = check_rbtree_empty(engine);
> -		if (err)
> -			break;
> -	}
> -
> -out_waiters:
> -	for (n = 0; n < count; n++) {
> -		if (IS_ERR(waiters[n].tsk))
> -			break;
> -
> -		set_bit(STOP, &waiters[n].flags);
> -	}
> -	mock_seqno_advance(engine, INT_MAX); /* wakeup any broken waiters */
> -	igt_wake_all_sync(&ready, &set, &done, &wq, n);
> -
> -	for (n = 0; n < count; n++) {
> -		if (IS_ERR(waiters[n].tsk))
> -			break;
> -
> -		kthread_stop(waiters[n].tsk);
> -		put_task_struct(waiters[n].tsk);
> -	}
> -
> -	kvfree(waiters);
> -out_engines:
> -	mock_engine_flush(engine);
> -	return err;
> -}
> -
> -int intel_breadcrumbs_mock_selftests(void)
> -{
> -	static const struct i915_subtest tests[] = {
> -		SUBTEST(igt_random_insert_remove),
> -		SUBTEST(igt_insert_complete),
> -		SUBTEST(igt_wakeup),
> -	};
> -	struct drm_i915_private *i915;
> -	int err;
> -
> -	i915 = mock_gem_device();
> -	if (!i915)
> -		return -ENOMEM;
> -
> -	err = i915_subtests(tests, i915->engine[RCS]);
> -	drm_dev_put(&i915->drm);
> -
> -	return err;
> -}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 2c38ea5892d9..7b6f3bea9ef8 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1127,7 +1127,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>   
>   	wait_for_completion(&arg.completion);
>   
> -	if (wait_for(waitqueue_active(&rq->execute), 10)) {
> +	if (wait_for(!list_empty(&rq->fence.cb_list), 10)) {
>   		struct drm_printer p = drm_info_printer(i915->drm.dev);
>   
>   		pr_err("igt/evict_vma kthread did not wait\n");
> diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> index b26f07b55d86..2bfa72c1654b 100644
> --- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> +++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
> @@ -76,3 +76,57 @@ void timed_fence_fini(struct timed_fence *tf)
>   	destroy_timer_on_stack(&tf->timer);
>   	i915_sw_fence_fini(&tf->fence);
>   }
> +
> +struct heap_fence {
> +	struct i915_sw_fence fence;
> +	union {
> +		struct kref ref;
> +		struct rcu_head rcu;
> +	};
> +};
> +
> +static int __i915_sw_fence_call
> +heap_fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	struct heap_fence *h = container_of(fence, typeof(*h), fence);
> +
> +	switch (state) {
> +	case FENCE_COMPLETE:
> +		break;
> +
> +	case FENCE_FREE:
> +		heap_fence_put(&h->fence);
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +struct i915_sw_fence *heap_fence_create(gfp_t gfp)
> +{
> +	struct heap_fence *h;
> +
> +	h = kmalloc(sizeof(*h), gfp);
> +	if (!h)
> +		return NULL;
> +
> +	i915_sw_fence_init(&h->fence, heap_fence_notify);
> +	refcount_set(&h->ref.refcount, 2);
> +
> +	return &h->fence;
> +}
> +
> +static void heap_fence_release(struct kref *ref)
> +{
> +	struct heap_fence *h = container_of(ref, typeof(*h), ref);
> +
> +	i915_sw_fence_fini(&h->fence);
> +
> +	kfree_rcu(h, rcu);
> +}
> +
> +void heap_fence_put(struct i915_sw_fence *fence)
> +{
> +	struct heap_fence *h = container_of(fence, typeof(*h), fence);
> +
> +	kref_put(&h->ref, heap_fence_release);
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> index 474aafb92ae1..1f9927e10f3a 100644
> --- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> +++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.h
> @@ -39,4 +39,7 @@ struct timed_fence {
>   void timed_fence_init(struct timed_fence *tf, unsigned long expires);
>   void timed_fence_fini(struct timed_fence *tf);
>   
> +struct i915_sw_fence *heap_fence_create(gfp_t gfp);
> +void heap_fence_put(struct i915_sw_fence *fence);
> +
>   #endif /* _LIB_SW_FENCE_H_ */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 3b226ebc6bc4..08f0cab02e0f 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -86,17 +86,21 @@ static struct mock_request *first_request(struct mock_engine *engine)
>   static void advance(struct mock_request *request)
>   {
>   	list_del_init(&request->link);
> -	mock_seqno_advance(request->base.engine, request->base.global_seqno);
> +	intel_engine_write_global_seqno(request->base.engine,
> +					request->base.global_seqno);
>   	i915_request_mark_complete(&request->base);
>   	GEM_BUG_ON(!i915_request_completed(&request->base));
> +
> +	intel_engine_queue_breadcrumbs(request->base.engine);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
>   {
>   	struct mock_engine *engine = from_timer(engine, t, hw_delay);
>   	struct mock_request *request;
> +	unsigned long flags;
>   
> -	spin_lock(&engine->hw_lock);
> +	spin_lock_irqsave(&engine->hw_lock, flags);
>   
>   	/* Timer fired, first request is complete */
>   	request = first_request(engine);
> @@ -116,7 +120,7 @@ static void hw_delay_complete(struct timer_list *t)
>   		advance(request);
>   	}
>   
> -	spin_unlock(&engine->hw_lock);
> +	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
>   static void mock_context_unpin(struct intel_context *ce)
> @@ -191,11 +195,12 @@ static void mock_submit_request(struct i915_request *request)
>   	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
> +	unsigned long flags;
>   
>   	i915_request_submit(request);
>   	GEM_BUG_ON(!request->global_seqno);
>   
> -	spin_lock_irq(&engine->hw_lock);
> +	spin_lock_irqsave(&engine->hw_lock, flags);
>   	list_add_tail(&mock->link, &engine->hw_queue);
>   	if (mock->link.prev == &engine->hw_queue) {
>   		if (mock->delay)
> @@ -203,7 +208,7 @@ static void mock_submit_request(struct i915_request *request)
>   		else
>   			advance(mock);
>   	}
> -	spin_unlock_irq(&engine->hw_lock);
> +	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> @@ -273,7 +278,7 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   
>   void mock_engine_reset(struct intel_engine_cs *engine)
>   {
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, 0);
> +	intel_engine_write_global_seqno(engine, 0);
>   }
>   
>   void mock_engine_free(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.h b/drivers/gpu/drm/i915/selftests/mock_engine.h
> index 133d0c21790d..b9cc3a245f16 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.h
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.h
> @@ -46,10 +46,4 @@ void mock_engine_flush(struct intel_engine_cs *engine);
>   void mock_engine_reset(struct intel_engine_cs *engine);
>   void mock_engine_free(struct intel_engine_cs *engine);
>   
> -static inline void mock_seqno_advance(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -	intel_engine_wakeup(engine);
> -}
> -
>   #endif /* !__MOCK_ENGINE_H__ */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2019-01-28 16:24 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28  1:02 [PATCH 01/28] drm/i915: Wait for a moment before forcibly resetting the device Chris Wilson
2019-01-28  1:02 ` [PATCH 02/28] drm/i915: Rename execlists->queue_priority to preempt_priority_hint Chris Wilson
2019-01-28 10:56   ` Tvrtko Ursulin
2019-01-28 11:04     ` Chris Wilson
2019-01-28  1:02 ` [PATCH 03/28] drm/i915/execlists: Suppress preempting self Chris Wilson
2019-01-28  1:02 ` [PATCH 04/28] drm/i915/execlists: Suppress redundant preemption Chris Wilson
2019-01-28  1:02 ` [PATCH 05/28] drm/i915/selftests: Exercise some AB...BA preemption chains Chris Wilson
2019-01-28  1:02 ` [PATCH 06/28] drm/i915: Stop tracking MRU activity on VMA Chris Wilson
2019-01-28 10:09   ` Tvrtko Ursulin
2019-01-28  1:02 ` [PATCH 07/28] drm/i915: Pull VM lists under the VM mutex Chris Wilson
2019-01-28  1:02 ` [PATCH 08/28] drm/i915: Move vma lookup to its own lock Chris Wilson
2019-01-28  1:02 ` [PATCH 09/28] drm/i915: Always allocate an object/vma for the HWSP Chris Wilson
2019-01-28  1:02 ` [PATCH 10/28] drm/i915: Add timeline barrier support Chris Wilson
2019-01-28  1:02 ` [PATCH 11/28] drm/i915: Move list of timelines under its own lock Chris Wilson
2019-01-28  1:02 ` [PATCH 12/28] drm/i915: Introduce concept of per-timeline (context) HWSP Chris Wilson
2019-01-28  1:02 ` [PATCH 13/28] drm/i915: Enlarge vma->pin_count Chris Wilson
2019-01-28  1:02 ` [PATCH 14/28] drm/i915: Allocate a status page for each timeline Chris Wilson
2019-01-28  1:02 ` [PATCH 15/28] drm/i915: Share per-timeline HWSP using a slab suballocator Chris Wilson
2019-01-28  1:02 ` [PATCH 16/28] drm/i915: Track the context's seqno in its own timeline HWSP Chris Wilson
2019-01-28  1:02 ` [PATCH 17/28] drm/i915: Track active timelines Chris Wilson
2019-01-28  1:02 ` [PATCH 18/28] drm/i915: Identify active requests Chris Wilson
2019-01-28  1:02 ` [PATCH 19/28] drm/i915: Remove the intel_engine_notify tracepoint Chris Wilson
2019-01-28  1:02 ` [PATCH 20/28] drm/i915: Replace global breadcrumbs with per-context interrupt tracking Chris Wilson
2019-01-28 16:24   ` Tvrtko Ursulin
2019-01-28  1:02 ` [PATCH 21/28] drm/i915: Drop fake breadcrumb irq Chris Wilson
2019-01-28  1:02 ` [PATCH 22/28] drm/i915: Generalise GPU activity tracking Chris Wilson
2019-01-28  8:09   ` [PATCH] drm/i915: Allocate active tracking nodes from a slabcache Chris Wilson
2019-01-28  1:02 ` [PATCH 23/28] " Chris Wilson
2019-01-28  1:02 ` [PATCH 24/28] drm/i915: Pull i915_gem_active into the i915_active family Chris Wilson
2019-01-28  1:02 ` [PATCH 25/28] drm/i915: Keep timeline HWSP allocated until the system is idle Chris Wilson
2019-01-28  1:02 ` [PATCH 26/28] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
2019-01-28  1:02 ` [PATCH 27/28] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-01-28  1:02 ` [PATCH 28/28] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-01-28  2:33 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Wait for a moment before forcibly resetting the device Patchwork
2019-01-28  2:46 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-01-28  2:57 ` ✓ Fi.CI.BAT: success " Patchwork
2019-01-28  4:23 ` ✗ Fi.CI.IGT: failure " Patchwork
2019-01-28  9:24 ` [PATCH 01/28] " Mika Kuoppala
2019-01-28  9:38   ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.