[PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight
@ 2019-02-26 10:23 Chris Wilson
  2019-02-26 10:23 ` [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
                   ` (19 more replies)
  0 siblings, 20 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx

When a request has its priority changed, we traverse the graph of all of
its signalers to raise their priorities to match (priority inheritance).
If the request has already started executing its payload, we know that
all of its signalers must have signaled and we do not need to process
our list of signalers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_scheduler.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8bc042551692..38efefd22dce 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -18,6 +18,11 @@ node_to_request(const struct i915_sched_node *node)
 	return container_of(node, const struct i915_request, sched);
 }
 
+static inline bool node_started(const struct i915_sched_node *node)
+{
+	return i915_request_started(node_to_request(node));
+}
+
 static inline bool node_signaled(const struct i915_sched_node *node)
 {
 	return i915_request_completed(node_to_request(node));
@@ -301,6 +306,10 @@ static void __i915_schedule(struct i915_request *rq,
 	list_for_each_entry(dep, &dfs, dfs_link) {
 		struct i915_sched_node *node = dep->signaler;
 
+		/* If we are already flying, we know we have no signalers */
+		if (node_started(node))
+			continue;
+
 		/*
 		 * Within an engine, there can be no cycle, but we may
 		 * refer to the same dependency chain multiple times
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
@ 2019-02-26 10:23 ` Chris Wilson
  2019-02-28 12:33   ` Tvrtko Ursulin
  2019-02-26 10:23 ` [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3
Cc: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c         |  15 ++
 drivers/gpu/drm/i915/i915_scheduler.c       |   1 -
 drivers/gpu/drm/i915/i915_scheduler.h       |   2 +
 drivers/gpu/drm/i915/intel_guc_submission.c |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c            |   9 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c  | 163 ++++++++++++++++++++
 6 files changed, 189 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 935db5548f80..00a1ea7cd907 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -353,11 +353,14 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
+
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -401,10 +404,22 @@ void __i915_request_unsubmit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+
+	/*
+	 * As we do not allow WAIT to preempt inflight requests,
+	 * once we have executed a request, along with triggering
+	 * any execution callbacks, we must preserve its ordering
+	 * within the non-preemptible FIFO.
+	 */
+	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
+	request->sched.attr.priority |= __NO_PREEMPTION;
+
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
 		i915_request_cancel_breadcrumb(request);
+
 	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
+
 	spin_unlock(&request->lock);
 
 	/* Transfer back from the global per-engine timeline to per-context */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 38efefd22dce..9fb96ff57a29 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -322,7 +322,6 @@ static void __i915_schedule(struct i915_request *rq,
 			if (node_signaled(p->signaler))
 				continue;
 
-			GEM_BUG_ON(p->signaler->attr.priority < node->attr.priority);
 			if (prio > READ_ONCE(p->signaler->attr.priority))
 				list_move_tail(&p->dfs_link, &dfs);
 		}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..54bd6c89817e 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -33,6 +33,8 @@ enum {
 #define I915_PRIORITY_WAIT	((u8)BIT(0))
 #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
 
+#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
+
 struct i915_sched_attr {
 	/**
 	 * @priority: execution and service priority
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 20cbceeabeae..a2846ea1e62c 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -720,7 +720,7 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static inline int port_prio(const struct execlist_port *port)
 {
-	return rq_prio(port_request(port));
+	return rq_prio(port_request(port)) | __NO_PREEMPTION;
 }
 
 static bool __guc_dequeue(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c4f4966b0f4f..0e20f3bc8210 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static int effective_prio(const struct i915_request *rq)
+{
+	/* Restrict mere WAIT boosts from triggering preemption */
+	return rq_prio(rq) | __NO_PREEMPTION;
+}
+
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
 	struct i915_priolist *p;
@@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq)
 {
-	const int last_prio = rq_prio(rq);
+	int last_prio;
 
 	if (!intel_engine_has_preemption(engine))
 		return false;
@@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * preempt. If that hint is stale or we may be trying to preempt
 	 * ourselves, ignore the request.
 	 */
+	last_prio = effective_prio(rq);
 	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
 				      last_prio))
 		return false;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 0f7a5bf69646..7172f6c7f25a 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -407,6 +407,168 @@ static int live_suppress_self_preempt(void *arg)
 	goto err_client_b;
 }
 
+static int __i915_sw_fence_call
+dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
+static struct i915_request *dummy_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
+	if (!rq)
+		return NULL;
+
+	INIT_LIST_HEAD(&rq->active_list);
+	rq->engine = engine;
+
+	i915_sched_node_init(&rq->sched);
+
+	/* mark this request as permanently incomplete */
+	rq->fence.seqno = 1;
+	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
+	GEM_BUG_ON(i915_request_completed(rq));
+
+	i915_sw_fence_init(&rq->submit, dummy_notify);
+	i915_sw_fence_commit(&rq->submit);
+
+	return rq;
+}
+
+static void dummy_request_free(struct i915_request *dummy)
+{
+	i915_request_mark_complete(dummy);
+	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	i915_sw_fence_fini(&dummy->submit);
+
+	dma_fence_free(&dummy->fence);
+}
+
+static int live_suppress_wait_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct preempt_client client[4];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+	int i;
+
+	/*
+	 * Waiters are given a little priority nudge, but not enough
+	 * to actually cause any preemption. Double check that we do
+	 * not needlessly generate preempt-to-idle cycles.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
+		goto err_unlock;
+	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
+		goto err_client_0;
+	if (preempt_client_init(i915, &client[2])) /* head of queue */
+		goto err_client_1;
+	if (preempt_client_init(i915, &client[3])) /* bystander */
+		goto err_client_2;
+
+	for_each_engine(engine, i915, id) {
+		int depth;
+
+		if (!engine->emit_init_breadcrumb)
+			continue;
+
+		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
+			struct i915_request *rq[ARRAY_SIZE(client)];
+			struct i915_request *dummy;
+
+			engine->execlists.preempt_hang.count = 0;
+
+			dummy = dummy_request(engine);
+			if (!dummy)
+				goto err_client_3;
+
+			for (i = 0; i < ARRAY_SIZE(client); i++) {
+				rq[i] = igt_spinner_create_request(&client[i].spin,
+								   client[i].ctx, engine,
+								   MI_NOOP);
+				if (IS_ERR(rq[i])) {
+					err = PTR_ERR(rq[i]);
+					goto err_wedged;
+				}
+
+				/* Disable NEWCLIENT promotion */
+				__i915_active_request_set(&rq[i]->timeline->last_request,
+							  dummy);
+				i915_request_add(rq[i]);
+			}
+
+			dummy_request_free(dummy);
+
+			GEM_BUG_ON(i915_request_completed(rq[0]));
+			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
+				pr_err("%s: First client failed to start\n",
+				       engine->name);
+				goto err_wedged;
+			}
+			GEM_BUG_ON(!i915_request_started(rq[0]));
+
+			if (i915_request_wait(rq[depth],
+					      I915_WAIT_LOCKED |
+					      I915_WAIT_PRIORITY,
+					      1) != -ETIME) {
+				pr_err("%s: Waiter depth:%d completed!\n",
+				       engine->name, depth);
+				goto err_wedged;
+			}
+
+			for (i = 0; i < ARRAY_SIZE(client); i++)
+				igt_spinner_end(&client[i].spin);
+
+			if (igt_flush_test(i915, I915_WAIT_LOCKED))
+				goto err_wedged;
+
+			if (engine->execlists.preempt_hang.count) {
+				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
+				       engine->name,
+				       engine->execlists.preempt_hang.count,
+				       depth);
+				err = -EINVAL;
+				goto err_client_3;
+			}
+		}
+	}
+
+	err = 0;
+err_client_3:
+	preempt_client_fini(&client[3]);
+err_client_2:
+	preempt_client_fini(&client[2]);
+err_client_1:
+	preempt_client_fini(&client[1]);
+err_client_0:
+	preempt_client_fini(&client[0]);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		igt_spinner_end(&client[i].spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_3;
+}
+
 static int live_chain_preempt(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -887,6 +1049,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
+		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
  2019-02-26 10:23 ` [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
@ 2019-02-26 10:23 ` Chris Wilson
  2019-02-28 13:11   ` Tvrtko Ursulin
  2019-02-26 10:23 ` [PATCH 04/11] drm/i915: Make request allocation caches global Chris Wilson
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0e20f3bc8210..dba19baf6808 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,6 +164,8 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
+	int prio = rq_prio(rq);
+
+	/*
+	 * On unwinding the active request, we give it a priority bump
+	 * equivalent to a freshly submitted request. This protects it from
+	 * being gazumped again, but it would be preferable if we didn't
+	 * let it be gazumped in the first place!
+	 *
+	 * See __unwind_incomplete_requests()
+	 */
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
+		/*
+		 * After preemption, we insert the active request at the
+		 * end of the new priority level. This means that we will be
+		 * _lower_ priority than the preemptee all things equal (and
+		 * so the preemption is valid), so adjust our comparison
+		 * accordingly.
+		 */
+		prio |= ACTIVE_PRIORITY;
+		prio--;
+	}
+
 	/* Restrict mere WAIT boosts from triggering preemption */
-	return rq_prio(rq) | __NO_PREEMPTION;
+	return prio | __NO_PREEMPTION;
 }
 
 static int queue_prio(const struct intel_engine_execlists *execlists)
@@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
+	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
 
 	lockdep_assert_held(&engine->timeline.lock);
 
@@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * The active request is now effectively the start of a new client
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
+	 *
+	 * One consequence of this preemption boost is that we may jump
+	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
+	 * making those priorities non-preemptible. They will be moved forward
+	 * in the priority queue, but they will not gain immediate access to
+	 * the GPU.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
-		prio |= I915_PRIORITY_NEWCLIENT;
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
+		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 04/11] drm/i915: Make request allocation caches global
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
  2019-02-26 10:23 ` [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
  2019-02-26 10:23 ` [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-02-26 10:23 ` Chris Wilson
  2019-02-27 10:29   ` Tvrtko Ursulin
  2019-02-26 10:23 ` [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex Chris Wilson
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx

As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/i915_active.c            |   7 +-
 drivers/gpu/drm/i915/i915_active.h            |   1 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 -
 drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
 drivers/gpu/drm/i915/i915_globals.c           | 113 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_globals.h           |  15 +++
 drivers/gpu/drm/i915/i915_pci.c               |   8 +-
 drivers/gpu/drm/i915/i915_request.c           |  53 ++++++--
 drivers/gpu/drm/i915/i915_request.h           |  10 ++
 drivers/gpu/drm/i915/i915_scheduler.c         |  66 +++++++---
 drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
 drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  45 ++++---
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 ----
 drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
 drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
 20 files changed, 313 insertions(+), 150 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_globals.c
 create mode 100644 drivers/gpu/drm/i915/i915_globals.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 1787e1299b1b..a1d834068765 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -77,6 +77,7 @@ i915-y += \
 	  i915_gem_tiling.o \
 	  i915_gem_userptr.o \
 	  i915_gemfs.o \
+	  i915_globals.o \
 	  i915_query.o \
 	  i915_request.o \
 	  i915_scheduler.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index db7bb5bd5add..d9f6471ac16c 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -294,7 +294,12 @@ int __init i915_global_active_init(void)
 	return 0;
 }
 
-void __exit i915_global_active_exit(void)
+void i915_global_active_shrink(void)
+{
+	kmem_cache_shrink(global.slab_cache);
+}
+
+void i915_global_active_exit(void)
 {
 	kmem_cache_destroy(global.slab_cache);
 }
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 12b5c1d287d1..5fbd9102384b 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
 int i915_global_active_init(void);
+void i915_global_active_shrink(void);
 void i915_global_active_exit(void);
 
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cc09caf3870e..f16016b330b3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1473,9 +1473,6 @@ struct drm_i915_private {
 	struct kmem_cache *objects;
 	struct kmem_cache *vmas;
 	struct kmem_cache *luts;
-	struct kmem_cache *requests;
-	struct kmem_cache *dependencies;
-	struct kmem_cache *priorities;
 
 	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
 	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2b261524cfa4..713ed6fbdcc8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -42,6 +42,7 @@
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
 #include "i915_gemfs.h"
+#include "i915_globals.h"
 #include "i915_reset.h"
 #include "i915_trace.h"
 #include "i915_vgpu.h"
@@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
 	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
 		i915->gt.epoch = 1;
 
+	i915_globals_unpark();
+
 	intel_enable_gt_powersave(i915);
 	i915_update_gfx_val(i915);
 	if (INTEL_GEN(i915) >= 6)
@@ -2892,12 +2895,11 @@ static void shrink_caches(struct drm_i915_private *i915)
 	 * filled slabs to prioritise allocating from the mostly full slabs,
 	 * with the aim of reducing fragmentation.
 	 */
-	kmem_cache_shrink(i915->priorities);
-	kmem_cache_shrink(i915->dependencies);
-	kmem_cache_shrink(i915->requests);
 	kmem_cache_shrink(i915->luts);
 	kmem_cache_shrink(i915->vmas);
 	kmem_cache_shrink(i915->objects);
+
+	i915_globals_park();
 }
 
 struct sleep_rcu_work {
@@ -5235,23 +5237,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->luts)
 		goto err_vmas;
 
-	dev_priv->requests = KMEM_CACHE(i915_request,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT |
-					SLAB_TYPESAFE_BY_RCU);
-	if (!dev_priv->requests)
-		goto err_luts;
-
-	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
-					    SLAB_HWCACHE_ALIGN |
-					    SLAB_RECLAIM_ACCOUNT);
-	if (!dev_priv->dependencies)
-		goto err_requests;
-
-	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!dev_priv->priorities)
-		goto err_dependencies;
-
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
@@ -5276,12 +5261,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 	return 0;
 
-err_dependencies:
-	kmem_cache_destroy(dev_priv->dependencies);
-err_requests:
-	kmem_cache_destroy(dev_priv->requests);
-err_luts:
-	kmem_cache_destroy(dev_priv->luts);
 err_vmas:
 	kmem_cache_destroy(dev_priv->vmas);
 err_objects:
@@ -5299,9 +5278,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 
 	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
 
-	kmem_cache_destroy(dev_priv->priorities);
-	kmem_cache_destroy(dev_priv->dependencies);
-	kmem_cache_destroy(dev_priv->requests);
 	kmem_cache_destroy(dev_priv->luts);
 	kmem_cache_destroy(dev_priv->vmas);
 	kmem_cache_destroy(dev_priv->objects);
diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
new file mode 100644
index 000000000000..7fd1b3945a04
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -0,0 +1,113 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+#include "i915_active.h"
+#include "i915_globals.h"
+#include "i915_request.h"
+#include "i915_scheduler.h"
+
+int __init i915_globals_init(void)
+{
+	int err;
+
+	err = i915_global_active_init();
+	if (err)
+		return err;
+
+	err = i915_global_request_init();
+	if (err)
+		goto err_active;
+
+	err = i915_global_scheduler_init();
+	if (err)
+		goto err_request;
+
+	return 0;
+
+err_request:
+	i915_global_request_exit();
+err_active:
+	i915_global_active_exit();
+	return err;
+}
+
+static void i915_globals_shrink(void)
+{
+	/*
+	 * kmem_cache_shrink() discards empty slabs and reorders partially
+	 * filled slabs to prioritise allocating from the mostly full slabs,
+	 * with the aim of reducing fragmentation.
+	 */
+	i915_global_active_shrink();
+	i915_global_request_shrink();
+	i915_global_scheduler_shrink();
+}
+
+static atomic_t active;
+static atomic_t epoch;
+struct park_work {
+	struct rcu_work work;
+	int epoch;
+};
+
+static void __i915_globals_park(struct work_struct *work)
+{
+	struct park_work *wrk = container_of(work, typeof(*wrk), work.work);
+
+	/* Confirm nothing woke up in the last grace period */
+	if (wrk->epoch == atomic_read(&epoch))
+		i915_globals_shrink();
+
+	kfree(wrk);
+}
+
+void i915_globals_park(void)
+{
+	struct park_work *wrk;
+
+	/*
+	 * Defer shrinking the global slab caches (and other work) until
+	 * after a RCU grace period has completed with no activity. This
+	 * is to try and reduce the latency impact on the consumers caused
+	 * by us shrinking the caches the same time as they are trying to
+	 * allocate, with the assumption being that if we idle long enough
+	 * for an RCU grace period to elapse since the last use, it is likely
+	 * to be longer until we need the caches again.
+	 */
+	if (!atomic_dec_and_test(&active))
+		return;
+
+	wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
+	if (!wrk)
+		return;
+
+	wrk->epoch = atomic_inc_return(&epoch);
+	INIT_RCU_WORK(&wrk->work, __i915_globals_park);
+	queue_rcu_work(system_wq, &wrk->work);
+}
+
+void i915_globals_unpark(void)
+{
+	atomic_inc(&epoch);
+	atomic_inc(&active);
+}
+
+void __exit i915_globals_exit(void)
+{
+	/* Flush any residual park_work */
+	rcu_barrier();
+	flush_scheduled_work();
+
+	i915_global_scheduler_exit();
+	i915_global_request_exit();
+	i915_global_active_exit();
+
+	/* And ensure that our DESTROY_BY_RCU slabs are truly destroyed */
+	rcu_barrier();
+}
diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
new file mode 100644
index 000000000000..e468f0413a73
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.h
@@ -0,0 +1,15 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_GLOBALS_H_
+#define _I915_GLOBALS_H_
+
+int i915_globals_init(void);
+void i915_globals_park(void);
+void i915_globals_unpark(void);
+void i915_globals_exit(void);
+
+#endif /* _I915_GLOBALS_H_ */
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index c4d6b8da9b03..a9211c370cd1 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -28,8 +28,8 @@
 
 #include <drm/drm_drv.h>
 
-#include "i915_active.h"
 #include "i915_drv.h"
+#include "i915_globals.h"
 #include "i915_selftest.h"
 
 #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
@@ -802,7 +802,9 @@ static int __init i915_init(void)
 	bool use_kms = true;
 	int err;
 
-	i915_global_active_init();
+	err = i915_globals_init();
+	if (err)
+		return err;
 
 	err = i915_mock_selftests();
 	if (err)
@@ -835,7 +837,7 @@ static void __exit i915_exit(void)
 		return;
 
 	pci_unregister_driver(&i915_pci_driver);
-	i915_global_active_exit();
+	i915_globals_exit();
 }
 
 module_init(i915_init);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 00a1ea7cd907..c65f6c990fdd 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -32,6 +32,11 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+static struct i915_global_request {
+	struct kmem_cache *slab_requests;
+	struct kmem_cache *slab_dependencies;
+} global;
+
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
 {
 	return "i915";
@@ -86,7 +91,7 @@ static void i915_fence_release(struct dma_fence *fence)
 	 */
 	i915_sw_fence_fini(&rq->submit);
 
-	kmem_cache_free(rq->i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 }
 
 const struct dma_fence_ops i915_fence_ops = {
@@ -292,7 +297,7 @@ static void i915_request_retire(struct i915_request *request)
 
 	unreserve_gt(request->i915);
 
-	i915_sched_node_fini(request->i915, &request->sched);
+	i915_sched_node_fini(&request->sched);
 	i915_request_put(request);
 }
 
@@ -506,7 +511,7 @@ i915_request_alloc_slow(struct intel_context *ce)
 	ring_retire_requests(ring);
 
 out:
-	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
 }
 
 static int add_timeline_barrier(struct i915_request *rq)
@@ -594,7 +599,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 *
 	 * Do not use kmem_cache_zalloc() here!
 	 */
-	rq = kmem_cache_alloc(i915->requests,
+	rq = kmem_cache_alloc(global.slab_requests,
 			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 	if (unlikely(!rq)) {
 		rq = i915_request_alloc_slow(ce);
@@ -681,7 +686,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
-	kmem_cache_free(i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
 	intel_context_unpin(ce);
@@ -700,9 +705,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 
 	if (to->engine->schedule) {
-		ret = i915_sched_node_add_dependency(to->i915,
-						     &to->sched,
-						     &from->sched);
+		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
 		if (ret < 0)
 			return ret;
 	}
@@ -1190,3 +1193,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
 #endif
+
+int __init i915_global_request_init(void)
+{
+	global.slab_requests = KMEM_CACHE(i915_request,
+					  SLAB_HWCACHE_ALIGN |
+					  SLAB_RECLAIM_ACCOUNT |
+					  SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_requests)
+		return -ENOMEM;
+
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN |
+					      SLAB_RECLAIM_ACCOUNT);
+	if (!global.slab_dependencies)
+		goto err_requests;
+
+	return 0;
+
+err_requests:
+	kmem_cache_destroy(global.slab_requests);
+	return -ENOMEM;
+}
+
+void i915_global_request_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_requests);
+}
+
+void i915_global_request_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_requests);
+}
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 1e127c1c53fa..be3ded6bcf56 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -29,6 +29,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "i915_selftest.h"
 #include "i915_sw_fence.h"
 
 #include <uapi/drm/i915_drm.h>
@@ -196,6 +197,11 @@ struct i915_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
+
+	I915_SELFTEST_DECLARE(struct {
+		struct list_head link;
+		unsigned long delay;
+	} mock;)
 };
 
 #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
@@ -371,4 +377,8 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 
 void i915_retire_requests(struct drm_i915_private *i915);
 
+int i915_global_request_init(void);
+void i915_global_request_shrink(void);
+void i915_global_request_exit(void);
+
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 9fb96ff57a29..50018ad30233 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -10,6 +10,11 @@
 #include "i915_request.h"
 #include "i915_scheduler.h"
 
+static struct i915_global_scheduler {
+	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_priorities;
+} global;
+
 static DEFINE_SPINLOCK(schedule_lock);
 
 static const struct i915_request *
@@ -37,16 +42,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
 }
 
 static struct i915_dependency *
-i915_dependency_alloc(struct drm_i915_private *i915)
+i915_dependency_alloc(void)
 {
-	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
 }
 
 static void
-i915_dependency_free(struct drm_i915_private *i915,
-		     struct i915_dependency *dep)
+i915_dependency_free(struct i915_dependency *dep)
 {
-	kmem_cache_free(i915->dependencies, dep);
+	kmem_cache_free(global.slab_dependencies, dep);
 }
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -73,25 +77,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 	return ret;
 }
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal)
 {
 	struct i915_dependency *dep;
 
-	dep = i915_dependency_alloc(i915);
+	dep = i915_dependency_alloc();
 	if (!dep)
 		return -ENOMEM;
 
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
 					      I915_DEPENDENCY_ALLOC))
-		i915_dependency_free(i915, dep);
+		i915_dependency_free(dep);
 
 	return 0;
 }
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node)
+void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
 
@@ -111,7 +113,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->wait_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	/* Remove ourselves from everyone who depends upon us */
@@ -121,7 +123,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->signal_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	spin_unlock(&schedule_lock);
@@ -198,7 +200,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	if (prio == I915_PRIORITY_NORMAL) {
 		p = &execlists->default_priolist;
 	} else {
-		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
+		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
 		/* Convert an allocation failure to a priority bump */
 		if (unlikely(!p)) {
 			prio = I915_PRIORITY_NORMAL; /* recurses just once */
@@ -423,3 +425,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
 
 	spin_unlock_bh(&schedule_lock);
 }
+
+void __i915_priolist_free(struct i915_priolist *p)
+{
+	kmem_cache_free(global.slab_priorities, p);
+}
+
+int __init i915_global_scheduler_init(void)
+{
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN);
+	if (!global.slab_dependencies)
+		return -ENOMEM;
+
+	global.slab_priorities = KMEM_CACHE(i915_priolist,
+					    SLAB_HWCACHE_ALIGN);
+	if (!global.slab_priorities)
+		goto err_priorities;
+
+	return 0;
+
+err_priorities:
+	kmem_cache_destroy(global.slab_priorities);
+	return -ENOMEM;
+}
+
+void i915_global_scheduler_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_priorities);
+}
+
+void i915_global_scheduler_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_priorities);
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 54bd6c89817e..5196ce07b6c2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -85,6 +85,23 @@ struct i915_dependency {
 #define I915_DEPENDENCY_ALLOC BIT(0)
 };
 
+struct i915_priolist {
+	struct list_head requests[I915_PRIORITY_COUNT];
+	struct rb_node node;
+	unsigned long used;
+	int priority;
+};
+
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
+
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)
+
 void i915_sched_node_init(struct i915_sched_node *node);
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      struct i915_dependency *dep,
 				      unsigned long flags);
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal);
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node);
+void i915_sched_node_fini(struct i915_sched_node *node);
 
 void i915_schedule(struct i915_request *request,
 		   const struct i915_sched_attr *attr);
@@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
 struct list_head *
 i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
 
+void __i915_priolist_free(struct i915_priolist *p);
+static inline void i915_priolist_free(struct i915_priolist *p)
+{
+	if (p->priority != I915_PRIORITY_NORMAL)
+		__i915_priolist_free(p);
+}
+
+int i915_global_scheduler_init(void);
+void i915_global_scheduler_shrink(void);
+void i915_global_scheduler_exit(void);
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index a2846ea1e62c..56ba2fcbabe6 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 done:
 	execlists->queue_priority_hint =
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dba19baf6808..29b2a2f34edb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -818,8 +818,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 done:
@@ -972,8 +971,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index de8dba7565b0..5284f243931a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -187,23 +187,6 @@ enum intel_engine_id {
 #define _VECS(n) (VECS + (n))
 };
 
-struct i915_priolist {
-	struct list_head requests[I915_PRIORITY_COUNT];
-	struct rb_node node;
-	unsigned long used;
-	int priority;
-};
-
-#define priolist_for_each_request(it, plist, idx) \
-	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
-		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
-
-#define priolist_for_each_request_consume(it, n, plist, idx) \
-	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
-		list_for_each_entry_safe(it, n, \
-					 &(plist)->requests[idx - 1], \
-					 sched.link)
-
 struct st_preempt_hang {
 	struct completion completion;
 	unsigned int count;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 7172f6c7f25a..2d582a21eba9 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -441,7 +441,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
 static void dummy_request_free(struct i915_request *dummy)
 {
 	i915_request_mark_complete(dummy);
-	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	i915_sched_node_fini(&dummy->sched);
 	i915_sw_fence_fini(&dummy->submit);
 
 	dma_fence_free(&dummy->fence);
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 6f3fb803c747..ec1ae948954c 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -76,26 +76,26 @@ static void mock_ring_free(struct intel_ring *base)
 	kfree(ring);
 }
 
-static struct mock_request *first_request(struct mock_engine *engine)
+static struct i915_request *first_request(struct mock_engine *engine)
 {
 	return list_first_entry_or_null(&engine->hw_queue,
-					struct mock_request,
-					link);
+					struct i915_request,
+					mock.link);
 }
 
-static void advance(struct mock_request *request)
+static void advance(struct i915_request *request)
 {
-	list_del_init(&request->link);
-	i915_request_mark_complete(&request->base);
-	GEM_BUG_ON(!i915_request_completed(&request->base));
+	list_del_init(&request->mock.link);
+	i915_request_mark_complete(request);
+	GEM_BUG_ON(!i915_request_completed(request));
 
-	intel_engine_queue_breadcrumbs(request->base.engine);
+	intel_engine_queue_breadcrumbs(request->engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
-	struct mock_request *request;
+	struct i915_request *request;
 	unsigned long flags;
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
@@ -110,8 +110,9 @@ static void hw_delay_complete(struct timer_list *t)
 	 * requeue the timer for the next delayed request.
 	 */
 	while ((request = first_request(engine))) {
-		if (request->delay) {
-			mod_timer(&engine->hw_delay, jiffies + request->delay);
+		if (request->mock.delay) {
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 			break;
 		}
 
@@ -169,10 +170,8 @@ mock_context_pin(struct intel_engine_cs *engine,
 
 static int mock_request_alloc(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
-
-	INIT_LIST_HEAD(&mock->link);
-	mock->delay = 0;
+	INIT_LIST_HEAD(&request->mock.link);
+	request->mock.delay = 0;
 
 	return 0;
 }
@@ -190,7 +189,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
 
 static void mock_submit_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	unsigned long flags;
@@ -198,12 +196,13 @@ static void mock_submit_request(struct i915_request *request)
 	i915_request_submit(request);
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
-	list_add_tail(&mock->link, &engine->hw_queue);
-	if (mock->link.prev == &engine->hw_queue) {
-		if (mock->delay)
-			mod_timer(&engine->hw_delay, jiffies + mock->delay);
+	list_add_tail(&request->mock.link, &engine->hw_queue);
+	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
+		if (request->mock.delay)
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 		else
-			advance(mock);
+			advance(request);
 	}
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
@@ -263,12 +262,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 {
 	struct mock_engine *mock =
 		container_of(engine, typeof(*mock), base);
-	struct mock_request *request, *rn;
+	struct i915_request *request, *rn;
 
 	del_timer_sync(&mock->hw_delay);
 
 	spin_lock_irq(&mock->hw_lock);
-	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
+	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
 		advance(request);
 	spin_unlock_irq(&mock->hw_lock);
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index fc516a2970f4..5a98caba6d69 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
 
 	destroy_workqueue(i915->wq);
 
-	kmem_cache_destroy(i915->priorities);
-	kmem_cache_destroy(i915->dependencies);
-	kmem_cache_destroy(i915->requests);
 	kmem_cache_destroy(i915->vmas);
 	kmem_cache_destroy(i915->objects);
 
@@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->vmas)
 		goto err_objects;
 
-	i915->requests = KMEM_CACHE(mock_request,
-				    SLAB_HWCACHE_ALIGN |
-				    SLAB_RECLAIM_ACCOUNT |
-				    SLAB_TYPESAFE_BY_RCU);
-	if (!i915->requests)
-		goto err_vmas;
-
-	i915->dependencies = KMEM_CACHE(i915_dependency,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT);
-	if (!i915->dependencies)
-		goto err_requests;
-
-	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!i915->priorities)
-		goto err_dependencies;
-
 	i915_timelines_init(i915);
 
 	INIT_LIST_HEAD(&i915->gt.active_rings);
@@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_timelines_fini(i915);
-	kmem_cache_destroy(i915->priorities);
-err_dependencies:
-	kmem_cache_destroy(i915->dependencies);
-err_requests:
-	kmem_cache_destroy(i915->requests);
-err_vmas:
 	kmem_cache_destroy(i915->vmas);
 err_objects:
 	kmem_cache_destroy(i915->objects);
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
index 0dc29e242597..d1a7c9608712 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.c
+++ b/drivers/gpu/drm/i915/selftests/mock_request.c
@@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
 	     unsigned long delay)
 {
 	struct i915_request *request;
-	struct mock_request *mock;
 
 	/* NB the i915->requests slab cache is enlarged to fit mock_request */
 	request = i915_request_alloc(engine, context);
 	if (IS_ERR(request))
 		return NULL;
 
-	mock = container_of(request, typeof(*mock), base);
-	mock->delay = delay;
-
-	return &mock->base;
+	request->mock.delay = delay;
+	return request;
 }
 
 bool mock_cancel_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	bool was_queued;
 
 	spin_lock_irq(&engine->hw_lock);
-	was_queued = !list_empty(&mock->link);
-	list_del_init(&mock->link);
+	was_queued = !list_empty(&request->mock.link);
+	list_del_init(&request->mock.link);
 	spin_unlock_irq(&engine->hw_lock);
 
 	if (was_queued)
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
index 995fb728380c..4acf0211df20 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.h
+++ b/drivers/gpu/drm/i915/selftests/mock_request.h
@@ -29,13 +29,6 @@
 
 #include "../i915_request.h"
 
-struct mock_request {
-	struct i915_request base;
-
-	struct list_head link;
-	unsigned long delay;
-};
-
 struct i915_request *
 mock_request(struct intel_engine_cs *engine,
 	     struct i915_gem_context *context,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (2 preceding siblings ...)
  2019-02-26 10:23 ` [PATCH 04/11] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-26 10:23 ` Chris Wilson
  2019-02-28  7:43   ` Tvrtko Ursulin
  2019-02-26 10:23 ` [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx

A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c            | 6 +++++-
 drivers/gpu/drm/i915/i915_timeline.c           | 1 +
 drivers/gpu/drm/i915/i915_timeline.h           | 2 ++
 drivers/gpu/drm/i915/selftests/i915_request.c  | 4 +---
 drivers/gpu/drm/i915/selftests/mock_timeline.c | 1 +
 5 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c65f6c990fdd..719d1a5ab082 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -563,6 +563,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		return ERR_CAST(ce);
 
 	reserve_gt(i915);
+	mutex_lock(&ce->ring->timeline->mutex);
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
 	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
@@ -688,6 +689,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
+	mutex_unlock(&ce->ring->timeline->mutex);
 	unreserve_gt(i915);
 	intel_context_unpin(ce);
 	return ERR_PTR(ret);
@@ -880,7 +882,7 @@ void i915_request_add(struct i915_request *request)
 	GEM_TRACE("%s fence %llx:%lld\n",
 		  engine->name, request->fence.context, request->fence.seqno);
 
-	lockdep_assert_held(&request->i915->drm.struct_mutex);
+	lockdep_assert_held(&request->timeline->mutex);
 	trace_i915_request_add(request);
 
 	/*
@@ -991,6 +993,8 @@ void i915_request_add(struct i915_request *request)
 	 */
 	if (prev && i915_request_completed(prev))
 		i915_request_retire_upto(prev);
+
+	mutex_unlock(&request->timeline->mutex);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b2202d2e58a2..87a80558da28 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -162,6 +162,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
+	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->barrier);
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 7bec7d2e45bf..36c3849f7108 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -44,6 +44,8 @@ struct i915_timeline {
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
 
+	struct mutex mutex; /* protects the flow of requests */
+
 	unsigned int pin_count;
 	const u32 *hwsp_seqno;
 	struct i915_vma *hwsp_ggtt;
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7da52e3d67af..7e1b65b8eb19 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -141,14 +141,12 @@ static int igt_fence_wait(void *arg)
 		err = -ENOMEM;
 		goto out_locked;
 	}
-	mutex_unlock(&i915->drm.struct_mutex); /* safe as we are single user */
 
 	if (dma_fence_wait_timeout(&request->fence, false, T) != -ETIME) {
 		pr_err("fence wait success before submit (expected timeout)!\n");
-		goto out_device;
+		goto out_locked;
 	}
 
-	mutex_lock(&i915->drm.struct_mutex);
 	i915_request_add(request);
 	mutex_unlock(&i915->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index d2de9ece2118..416d85233263 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -14,6 +14,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 	timeline->fence_context = context;
 
 	spin_lock_init(&timeline->lock);
+	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->barrier);
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (3 preceding siblings ...)
  2019-02-26 10:23 ` [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex Chris Wilson
@ 2019-02-26 10:23 ` Chris Wilson
  2019-02-27 10:44   ` Tvrtko Ursulin
  2019-02-27 11:15   ` [PATCH] " Chris Wilson
  2019-02-26 10:24 ` [PATCH 07/11] drm/i915: Compute the global scheduler caps Chris Wilson
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:23 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v5
---
 drivers/gpu/drm/i915/i915_request.c           |  31 +-
 drivers/gpu/drm/i915/i915_request.h           |  11 +
 drivers/gpu/drm/i915/i915_timeline.c          | 290 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h          |  11 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 113 +++++++
 5 files changed, 417 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 719d1a5ab082..d354967d6ae8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -325,11 +325,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -532,8 +527,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -610,24 +607,27 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->hwsp_cacheline = tl->hwsp_cacheline;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -687,6 +687,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	mutex_unlock(&ce->ring->timeline->mutex);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index be3ded6bcf56..ea1e6f0ade53 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,6 +38,7 @@ struct drm_file;
 struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
+struct i915_timeline_cacheline;
 
 struct i915_capture_list {
 	struct i915_capture_list *next;
@@ -148,6 +149,16 @@ struct i915_request {
 	 */
 	const u32 *hwsp_seqno;
 
+	/*
+	 * If we need to access the timeline's seqno for this request in
+	 * another request, we need to keep a read reference to this associated
+	 * cacheline, so that we do not free and recycle it before the foriegn
+	 * observers have completed. Hence, we keep a pointer to the cacheline
+	 * inside the timeline's HWSP vma, but it is only valid while this
+	 * request has not completed and guarded by the timeline mutex.
+	 */
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	/** Position in the ring of the start of the request */
 	u32 head;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 87a80558da28..f976b5a13a47 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,29 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	unsigned long vaddr;
+#define CACHELINE_MASK GENMASK(5, 0)
+#define CACHELINE_FREE BIT(6)
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +122,77 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, cl->vaddr & CACHELINE_MASK);
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (cl->vaddr & CACHELINE_FREE)
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+	void *vaddr;
+
+	GEM_BUG_ON(cacheline > CACHELINE_MASK);
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		kfree(cl);
+		return ERR_CAST(vaddr);
+	}
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->vaddr = (unsigned long)vaddr;
+	cl->vaddr |= cacheline;
+
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(cl->vaddr & CACHELINE_FREE);
+	cl->vaddr |= CACHELINE_FREE;
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,29 +214,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
-	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
+		timeline->hwsp_cacheline = cl;
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
-	}
-	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
-		i915_vma_put(hwsp);
-		return PTR_ERR(vaddr);
+		vaddr = (void *)(cl->vaddr & PAGE_MASK);
+	} else {
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+
+		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 	}
 
 	timeline->hwsp_seqno =
 		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
+
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
@@ -240,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
 
-	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	if (timeline->hwsp_cacheline)
+		cacheline_free(timeline->hwsp_cacheline);
+	else
+		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+
 	i915_vma_put(timeline->hwsp_ggtt);
 }
 
@@ -285,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -294,6 +387,156 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	unsigned int cacheline;
+	struct i915_vma *vma;
+	void *vaddr;
+	int err;
+
+	/*
+	 * If there is an outstanding GPU reference to this cacheline,
+	 * such as it being sampled by a HW semaphore on another timeline,
+	 * we cannot wraparound our seqno value (the HW semaphore does
+	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
+	 * So if the cacheline is still busy, we must detach ourselves
+	 * from it and leave it inflight alongside its users.
+	 *
+	 * However, if nobody is watching and we can guarantee that nobody
+	 * will, we could simply reuse the same cacheline.
+	 *
+	 * if (i915_active_request_is_signaled(&tl->last_request) &&
+	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
+	 *	return 0;
+	 *
+	 * That seems unlikely for a busy timeline that needed to wrap in
+	 * the first place, so just replace the cacheline.
+	 */
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_unpin;
+	}
+	GEM_BUG_ON(cl->hwsp->vma != vma);
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
+	i915_vma_put(tl->hwsp_ggtt);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+
+	vaddr = (void *)(cl->vaddr & PAGE_MASK);
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	tl->hwsp_seqno =
+		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
+
+	tl->hwsp_offset += i915_ggtt_offset(vma);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
+	return 0;
+
+err_cacheline:
+	cacheline_free(cl);
+err_unpin:
+	i915_vma_unpin(vma);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+static int cacheline_ref(struct i915_timeline_cacheline *cl,
+			 struct i915_request *rq)
+{
+	return i915_active_ref(&cl->active, rq->fence.context, rq);
+}
+
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *to,
+			    u32 *hwsp)
+{
+	struct i915_timeline_cacheline *cl = from->hwsp_cacheline;
+	struct i915_timeline *tl = from->timeline;
+	int err;
+
+	GEM_BUG_ON(to->timeline == tl);
+
+	mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING);
+	err = i915_request_completed(from);
+	if (!err)
+		err = cacheline_ref(cl, to);
+	if (!err) {
+		if (likely(cl == tl->hwsp_cacheline)) {
+			*hwsp = tl->hwsp_offset;
+		} else { /* across a seqno wrap, recover the original offset */
+			*hwsp = (cl->vaddr & CACHELINE_MASK) * CACHELINE_BYTES;
+			*hwsp += i915_ggtt_offset(cl->hwsp->vma);
+		}
+	}
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -301,6 +544,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 36c3849f7108..60b1dfad93ed 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -51,6 +51,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -162,8 +164,15 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *until,
+			    u32 *hwsp_offset);
+
 void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 12ea69b1a1e5..844701759ffc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
 #undef NUM_TIMELINES
 }
 
+static int live_hwsp_wrap(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_timeline *tl;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	/*
+	 * Across a seqno wrap, we need to keep the old cacheline alive for
+	 * foreign GPU references.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	tl = i915_timeline_create(i915, __func__, NULL);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
+		goto out_rpm;
+	}
+	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+		goto out_free;
+
+	err = i915_timeline_pin(tl);
+	if (err)
+		goto out_free;
+
+	for_each_engine(engine, i915, id) {
+		const u32 *hwsp_seqno[2];
+		struct i915_request *rq;
+		u32 seqno[2];
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		tl->seqno = -4u;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
+			 seqno[0], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[0] = tl->hwsp_seqno;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
+			 seqno[1], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[1] = tl->hwsp_seqno;
+
+		/* With wrap should come a new hwsp */
+		GEM_BUG_ON(seqno[1] >= seqno[0]);
+		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
+
+		i915_request_add(rq);
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("Wait for timeline writes timed out!\n");
+			err = -EIO;
+			goto out;
+		}
+
+		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
+			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
+			       *hwsp_seqno[0], *hwsp_seqno[1],
+			       seqno[0], seqno[1]);
+			err = -EINVAL;
+			goto out;
+		}
+
+		i915_retire_requests(i915); /* recycle HWSP */
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	i915_timeline_unpin(tl);
+out_free:
+	i915_timeline_put(tl);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_hwsp_recycle(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_recycle),
 		SUBTEST(live_hwsp_engine),
 		SUBTEST(live_hwsp_alternate),
+		SUBTEST(live_hwsp_wrap),
 	};
 
 	return i915_subtests(tests, i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 07/11] drm/i915: Compute the global scheduler caps
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (4 preceding siblings ...)
  2019-02-26 10:23 ` [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-02-26 10:24 ` Chris Wilson
  2019-02-28  7:45   ` Tvrtko Ursulin
  2019-02-26 10:24 ` [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:24 UTC (permalink / raw)
  To: intel-gfx

Do a pass over all the engines upon starting to determine the global
scheduler capability flags (those that are agreed upon by all).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 39 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 713ed6fbdcc8..f6fe10fce0ec 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4700,6 +4700,8 @@ static int __i915_gem_restart_engines(void *data)
 		}
 	}
 
+	intel_engines_set_scheduler_caps(i915);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index b7b626195eda..ef49b1b0537b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -608,6 +608,45 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
 	return err;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
+{
+	static const struct {
+		u8 engine;
+		u8 sched;
+	} map[] = {
+#define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
+		MAP(PREEMPTION, PREEMPTION),
+#undef MAP
+	};
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 enabled, disabled;
+
+	enabled = 0;
+	disabled = 0;
+	for_each_engine(engine, i915, id) { /* all engines must agree! */
+		int i;
+
+		if (engine->schedule)
+			enabled |= (I915_SCHEDULER_CAP_ENABLED |
+				    I915_SCHEDULER_CAP_PRIORITY);
+		else
+			disabled |= (I915_SCHEDULER_CAP_ENABLED |
+				     I915_SCHEDULER_CAP_PRIORITY);
+
+		for (i = 0; i < ARRAY_SIZE(map); i++) {
+			if (engine->flags & BIT(map[i].engine))
+				enabled |= BIT(map[i].sched);
+			else
+				disabled |= BIT(map[i].sched);
+		}
+	}
+
+	i915->caps.scheduler = enabled & ~disabled;
+	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
+		i915->caps.scheduler = 0;
+}
+
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 29b2a2f34edb..f57cfe2fc078 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2327,12 +2327,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-
-	engine->i915->caps.scheduler =
-		I915_SCHEDULER_CAP_ENABLED |
-		I915_SCHEDULER_CAP_PRIORITY;
-	if (intel_engine_has_preemption(engine))
-		engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5284f243931a..b8ec7e40a59b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -576,6 +576,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
+
 static inline bool __execlists_need_preempt(int prio, int last)
 {
 	/*
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (5 preceding siblings ...)
  2019-02-26 10:24 ` [PATCH 07/11] drm/i915: Compute the global scheduler caps Chris Wilson
@ 2019-02-26 10:24 ` Chris Wilson
  2019-02-28 10:49   ` Tvrtko Ursulin
  2019-02-26 10:24 ` [PATCH 09/11] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:24 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement for single clients that utilize multiple
engines (typically media players), without regressing multiple clients
that can saturate the system.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c           |   2 +-
 drivers/gpu/drm/i915/i915_request.c       | 138 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
 drivers/gpu/drm/i915/intel_lrc.c          |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
 include/uapi/drm/i915_drm.h               |   1 +
 10 files changed, 158 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c6354f6cdbdb..c08abdef5eb6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -351,7 +351,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
 		break;
 	case I915_PARAM_HAS_SEMAPHORES:
-		value = 0;
+		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
 		break;
 	case I915_PARAM_HAS_SECURE_BATCHES:
 		value = capable(CAP_SYS_ADMIN);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d354967d6ae8..59e30b8c4ee9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -32,9 +33,16 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
 static struct i915_global_request {
 	struct kmem_cache *slab_requests;
 	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_execute_cbs;
 } global;
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -325,6 +333,69 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kmem_cache_free(global.slab_execute_cbs, cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+
+	if (i915_request_is_active(signal))
+		return 0;
+
+	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irq(&signal->lock);
+	if (i915_request_is_active(signal)) {
+		i915_sw_fence_complete(cb->fence);
+		kmem_cache_free(global.slab_execute_cbs, cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irq(&signal->lock);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -361,6 +432,8 @@ void __i915_request_submit(struct i915_request *request)
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
 
+	__notify_execute_cb(request);
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -608,6 +681,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	}
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 
 	tl = ce->ring->timeline;
 	ret = i915_timeline_get_seqno(tl, rq, &seqno);
@@ -696,6 +770,52 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 hwsp_offset;
+	u32 *cs;
+	int err;
+
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
+
+	/* We need to pin the signaler's HWSP until we are finished reading. */
+	err = i915_timeline_read_hwsp(from, to, &hwsp_offset);
+	if (err)
+		return err;
+
+	/* Only submit our spinner after the signaler is running! */
+	err = i915_request_await_execution(to, from, gfp);
+	if (err)
+		return err;
+
+	cs = intel_ring_begin(to, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Using greater-than-or-equal here means we have to worry
+	 * about seqno wraparound. To side step that issue, we swap
+	 * the timeline HWSP upon wrapping, so that everyone listening
+	 * for the old (pre-wrap) values do not see the much smaller
+	 * (post-wrap) values than they were expecting (and so wait
+	 * forever).
+	 */
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_GTE_SDD;
+	*cs++ = from->fence.seqno;
+	*cs++ = hwsp_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(to, cs);
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -717,6 +837,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (intel_engine_has_semaphores(to->engine) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
@@ -1208,14 +1331,23 @@ int __init i915_global_request_init(void)
 	if (!global.slab_requests)
 		return -ENOMEM;
 
+	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
+					     SLAB_HWCACHE_ALIGN |
+					     SLAB_RECLAIM_ACCOUNT |
+					     SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_execute_cbs)
+		goto err_requests;
+
 	global.slab_dependencies = KMEM_CACHE(i915_dependency,
 					      SLAB_HWCACHE_ALIGN |
 					      SLAB_RECLAIM_ACCOUNT);
 	if (!global.slab_dependencies)
-		goto err_requests;
+		goto err_execute_cbs;
 
 	return 0;
 
+err_execute_cbs:
+	kmem_cache_destroy(global.slab_execute_cbs);
 err_requests:
 	kmem_cache_destroy(global.slab_requests);
 	return -ENOMEM;
@@ -1224,11 +1356,13 @@ int __init i915_global_request_init(void)
 void i915_global_request_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_execute_cbs);
 	kmem_cache_shrink(global.slab_requests);
 }
 
 void i915_global_request_exit(void)
 {
 	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_execute_cbs);
 	kmem_cache_destroy(global.slab_requests);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index ea1e6f0ade53..3e0d62d35226 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -129,6 +129,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ef49b1b0537b..7e0fe433ab17 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -616,6 +616,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 	} map[] = {
 #define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
 		MAP(PREEMPTION, PREEMPTION),
+		MAP(SEMAPHORES, SEMAPHORES),
 #undef MAP
 	};
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index b96a31bc1080..0efaadd3bc32 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -106,7 +106,12 @@
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
 #define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f57cfe2fc078..53d6f7fdb50e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2324,6 +2324,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = NULL;
 	engine->unpark = NULL;
 
+	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b8ec7e40a59b..2e7264119ec4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -499,6 +499,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 	unsigned int flags;
 
 	/*
@@ -576,6 +577,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8304a7f1ec3f..b10eea3f6d24 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -479,6 +479,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
+#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 09/11] drm/i915: Prioritise non-busywait semaphore workloads
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (6 preceding siblings ...)
  2019-02-26 10:24 ` [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-02-26 10:24 ` Chris Wilson
  2019-02-26 10:24 ` [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:24 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |  5 +++++
 drivers/gpu/drm/i915/i915_scheduler.h |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c      |  2 +-
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 59e30b8c4ee9..1524de65b37f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -813,6 +813,7 @@ emit_semaphore_wait(struct i915_request *to,
 	*cs++ = 0;
 
 	intel_ring_advance(to, cs);
+	to->sched.semaphore |= I915_SCHED_HAS_SEMAPHORE;
 	return 0;
 }
 
@@ -1083,6 +1084,21 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		/*
+		 * Boost actual workloads past semaphores!
+		 *
+		 * With semaphores we spin on one engine waiting for another,
+		 * simply to reduce the latency of starting our work when
+		 * the signaler completes. However, if there is any other
+		 * work that we could be doing on this engine instead, that
+		 * is better utilisation and will reduce the overall duration
+		 * of the current work. To avoid PI boosting a semaphore
+		 * far in the distance past over useful work, we keep a history
+		 * of any semaphore use along our dependency chain.
+		 */
+		if (!request->sched.semaphore)
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 50018ad30233..fd684b9ed108 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -39,6 +39,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->semaphore = 0;
 }
 
 static struct i915_dependency *
@@ -69,6 +70,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		dep->signaler = signal;
 		dep->flags = flags;
 
+		/* Keep track of whether anyone on this chain has a semaphore */
+		if (signal->semaphore && !node_started(signal))
+			node->semaphore |= signal->semaphore << 1;
+
 		ret = true;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 5196ce07b6c2..24c2c027fd2c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
 
@@ -74,6 +75,8 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	unsigned long semaphore;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53d6f7fdb50e..2268860cca44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (7 preceding siblings ...)
  2019-02-26 10:24 ` [PATCH 09/11] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-02-26 10:24 ` Chris Wilson
  2019-02-28 13:20   ` Tvrtko Ursulin
  2019-02-26 10:24 ` [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code Chris Wilson
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:24 UTC (permalink / raw)
  To: intel-gfx

If we resubmitting the active context, simply skip the submission as
performing the submission from the interrupt handler has higher
throughput than continually provoking lite-restores. If however, we find
ourselves with a new client, we check whether or not we can dequeue into
the second port or to resolve preemption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2268860cca44..3e9f7103f31f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1205,12 +1205,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 		tasklet_hi_schedule(&execlists->tasklet);
 }
 
-static void submit_queue(struct intel_engine_cs *engine, int prio)
+static bool inflight(const struct intel_engine_execlists *execlists,
+		     const struct i915_request *rq)
 {
-	if (prio > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = prio;
+	const struct i915_request *active = port_request(execlists->port);
+
+	return active && active->hw_context == rq->hw_context;
+}
+
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	if (rq_prio(rq) <= execlists->queue_priority_hint)
+		return;
+
+	execlists->queue_priority_hint = rq_prio(rq);
+
+	if (!inflight(execlists, rq))
 		__submit_queue_imm(engine);
-	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1226,7 +1240,7 @@ static void execlists_submit_request(struct i915_request *request)
 	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
-	submit_queue(engine, rq_prio(request));
+	submit_queue(engine, request);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (8 preceding siblings ...)
  2019-02-26 10:24 ` [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
@ 2019-02-26 10:24 ` Chris Wilson
  2019-02-28  7:42   ` Tvrtko Ursulin
  2019-02-26 10:56 ` ✗ Fi.CI.BAT: failure for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight Patchwork
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-26 10:24 UTC (permalink / raw)
  To: intel-gfx

Gcc has a slight preference if we use __ffs() to subtract one from the
index once rather than each use:

__execlists_submission_tasklet              2867    2847     -20

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_scheduler.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 24c2c027fd2c..068a6750540f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -100,9 +100,11 @@ struct i915_priolist {
 		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
 
 #define priolist_for_each_request_consume(it, n, plist, idx) \
-	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+	for (; \
+	     (plist)->used ? (idx = __ffs((plist)->used)), 1 : 0; \
+	     (plist)->used &= ~BIT(idx)) \
 		list_for_each_entry_safe(it, n, \
-					 &(plist)->requests[idx - 1], \
+					 &(plist)->requests[idx], \
 					 sched.link)
 
 void i915_sched_node_init(struct i915_sched_node *node);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (9 preceding siblings ...)
  2019-02-26 10:24 ` [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code Chris Wilson
@ 2019-02-26 10:56 ` Patchwork
  2019-02-26 11:26 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2) Patchwork
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-26 10:56 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight
URL   : https://patchwork.freedesktop.org/series/57244/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
Kernel: arch/x86/boot/bzImage is ready  (#1)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (10 preceding siblings ...)
  2019-02-26 10:56 ` ✗ Fi.CI.BAT: failure for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight Patchwork
@ 2019-02-26 11:26 ` Patchwork
  2019-02-26 11:31 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-26 11:26 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
URL   : https://patchwork.freedesktop.org/series/57244/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
2fb1e47f69e3 drm/i915: Skip scanning for signalers if we are already inflight
d2afc58479f7 drm/i915/execlists: Suppress mere WAIT preemption
5e85beeeea1f drm/i915/execlists: Suppress redundant preemption
ef2ec8aa3d7a drm/i915: Make request allocation caches global
-:162: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#162: 
new file mode 100644

-:167: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#167: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:286: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#286: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:627: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#627: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:627: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#627: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:631: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#631: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:631: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#631: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 808 lines checked
1531ba977791 drm/i915: Introduce i915_timeline.mutex
135a214d2a38 drm/i915: Keep timeline HWSP allocated until idle across the system
dcfb15979ec2 drm/i915: Compute the global scheduler caps
1296737150c9 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:335: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#335: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:337: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#337: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:338: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#338: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:339: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#339: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:340: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#340: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 300 lines checked
b6cd8c175fd0 drm/i915: Prioritise non-busywait semaphore workloads
b09202593de8 drm/i915/execlists: Skip direct submission if only lite-restore
37624d9dc7aa drm/i915: Use __ffs() in for_each_priolist for more compact code

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (11 preceding siblings ...)
  2019-02-26 11:26 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2) Patchwork
@ 2019-02-26 11:31 ` Patchwork
  2019-02-26 11:51 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-26 11:31 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
URL   : https://patchwork.freedesktop.org/series/57244/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Skip scanning for signalers if we are already inflight
Okay!

Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3581:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3578:16: warning: expression using sizeof(void)

Commit: drm/i915: Introduce i915_timeline.mutex
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915: Compute the global scheduler caps
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915/execlists: Skip direct submission if only lite-restore
Okay!

Commit: drm/i915: Use __ffs() in for_each_priolist for more compact code
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (12 preceding siblings ...)
  2019-02-26 11:31 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-26 11:51 ` Patchwork
  2019-02-26 15:17 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-26 11:51 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
URL   : https://patchwork.freedesktop.org/series/57244/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5660 -> Patchwork_12309
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/57244/revisions/2/

Known issues
------------

  Here are the changes found in Patchwork_12309 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_create@basic-files:
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] +106

  * igt@kms_busy@basic-flip-c:
    - fi-byt-j1900:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-byt-j1900:       NOTRUN -> SKIP [fdo#109271] +52

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
    - fi-kbl-7567u:       PASS -> DMESG-WARN [fdo#108566]

  
#### Possible fixes ####

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-bsw-kefka:       SKIP [fdo#109271] -> PASS

  * igt@i915_pm_rpm@basic-rte:
    - fi-bsw-kefka:       FAIL [fdo#108800] -> PASS

  * igt@i915_pm_rpm@module-reload:
    - {fi-icl-y}:         INCOMPLETE [fdo#108840] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108800]: https://bugs.freedesktop.org/show_bug.cgi?id=108800
  [fdo#108840]: https://bugs.freedesktop.org/show_bug.cgi?id=108840
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315


Participating hosts (42 -> 39)
------------------------------

  Additional (2): fi-byt-j1900 fi-gdg-551 
  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-ctg-p8600 fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5660 -> Patchwork_12309

  CI_DRM_5660: 2cca9f74156f1061298a7249f7af80dcbb9178d9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4857: 25911cdde500aa6ddede601faec91741c6963c27 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12309: 37624d9dc7aa1a7e6b49ffaef07c05c01146b572 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

37624d9dc7aa drm/i915: Use __ffs() in for_each_priolist for more compact code
b09202593de8 drm/i915/execlists: Skip direct submission if only lite-restore
b6cd8c175fd0 drm/i915: Prioritise non-busywait semaphore workloads
1296737150c9 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
dcfb15979ec2 drm/i915: Compute the global scheduler caps
135a214d2a38 drm/i915: Keep timeline HWSP allocated until idle across the system
1531ba977791 drm/i915: Introduce i915_timeline.mutex
ef2ec8aa3d7a drm/i915: Make request allocation caches global
5e85beeeea1f drm/i915/execlists: Suppress redundant preemption
d2afc58479f7 drm/i915/execlists: Suppress mere WAIT preemption
2fb1e47f69e3 drm/i915: Skip scanning for signalers if we are already inflight

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12309/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (13 preceding siblings ...)
  2019-02-26 11:51 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-26 15:17 ` Patchwork
  2019-02-27 10:19 ` [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Tvrtko Ursulin
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-26 15:17 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2)
URL   : https://patchwork.freedesktop.org/series/57244/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5660_full -> Patchwork_12309_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12309_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_parallel@bsd1:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] +127

  * igt@gem_exec_params@no-blt:
    - shard-snb:          NOTRUN -> SKIP [fdo#109271] +84

  * igt@i915_pm_rpm@gem-execbuf-stress-extra-wait:
    - shard-apl:          PASS -> INCOMPLETE [fdo#103927]

  * igt@kms_busy@extended-modeset-hang-newfb-render-c:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#107956] +1

  * igt@kms_busy@extended-modeset-hang-newfb-render-e:
    - shard-snb:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +5

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-f:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_busy@extended-modeset-hang-oldfb-render-f:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +12

  * igt@kms_chamelium@vga-edid-read:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] +16

  * igt@kms_color@pipe-c-ctm-max:
    - shard-apl:          PASS -> FAIL [fdo#108147]

  * igt@kms_cursor_crc@cursor-128x42-offscreen:
    - shard-skl:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-256x256-dpms:
    - shard-kbl:          NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-64x21-onscreen:
    - shard-skl:          NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-alpha-opaque:
    - shard-glk:          PASS -> FAIL [fdo#109350]

  * igt@kms_cursor_crc@cursor-size-change:
    - shard-apl:          PASS -> FAIL [fdo#103232]

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-skl:          NOTRUN -> FAIL [fdo#103833]

  * igt@kms_flip@2x-flip-vs-modeset-vs-hang:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] +18

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-skl:          PASS -> FAIL [fdo#105363]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-onoff:
    - shard-glk:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-indfb-draw-mmap-wc:
    - shard-skl:          PASS -> FAIL [fdo#105682]

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-pwrite:
    - shard-skl:          PASS -> FAIL [fdo#103167] +2

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-skl:          NOTRUN -> FAIL [fdo#103167]

  * igt@kms_plane@pixel-format-pipe-a-planes-source-clamping:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#106885]

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-skl:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - shard-skl:          NOTRUN -> FAIL [fdo#107815] / [fdo#108145] +1

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - shard-skl:          PASS -> FAIL [fdo#107815] / [fdo#108145]

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          PASS -> FAIL [fdo#107815]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-kbl:          NOTRUN -> FAIL [fdo#108145] / [fdo#108590]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-transparant-fb:
    - shard-skl:          NOTRUN -> FAIL [fdo#108145] +1

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-y:
    - shard-apl:          PASS -> FAIL [fdo#103166] +2

  * igt@kms_rotation_crc@multiplane-rotation-cropping-top:
    - shard-kbl:          PASS -> FAIL [fdo#109016]

  * igt@kms_rotation_crc@primary-rotation-270:
    - shard-skl:          PASS -> FAIL [fdo#103925] / [fdo#107815]

  * igt@kms_setmode@basic:
    - shard-skl:          NOTRUN -> FAIL [fdo#99912]

  
#### Possible fixes ####

  * igt@gem_busy@extended-semaphore-blt:
    - shard-kbl:          SKIP [fdo#109271] -> PASS +5
    - shard-glk:          SKIP [fdo#109271] -> PASS +3

  * igt@gem_busy@extended-semaphore-vebox:
    - shard-apl:          SKIP [fdo#109271] -> PASS +3
    - shard-skl:          SKIP [fdo#109271] -> PASS +3

  * igt@gem_ctx_isolation@vcs0-s3:
    - shard-skl:          INCOMPLETE [fdo#104108] / [fdo#107773] -> PASS

  * igt@i915_pm_rpm@gem-idle:
    - shard-skl:          INCOMPLETE [fdo#107807] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-render-c:
    - shard-hsw:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_cursor_crc@cursor-128x42-onscreen:
    - shard-apl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-256x256-suspend:
    - shard-apl:          FAIL [fdo#103191] / [fdo#103232] -> PASS

  * igt@kms_flip_tiling@flip-changes-tiling-y:
    - shard-skl:          FAIL [fdo#107931] / [fdo#108303] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
    - shard-apl:          FAIL [fdo#103167] -> PASS

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-skl:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane@plane-position-covered-pipe-c-planes:
    - shard-apl:          FAIL [fdo#103166] -> PASS +1

  * igt@kms_setmode@basic:
    - shard-apl:          FAIL [fdo#99912] -> PASS

  * igt@perf_pmu@busy-hang-bcs0:
    - shard-apl:          INCOMPLETE [fdo#103927] -> PASS

  
#### Warnings ####

  * igt@kms_concurrent@pipe-e:
    - shard-snb:          SKIP [fdo#109271] / [fdo#109278] -> INCOMPLETE [fdo#105411]

  
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103833]: https://bugs.freedesktop.org/show_bug.cgi?id=103833
  [fdo#103925]: https://bugs.freedesktop.org/show_bug.cgi?id=103925
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#105682]: https://bugs.freedesktop.org/show_bug.cgi?id=105682
  [fdo#106885]: https://bugs.freedesktop.org/show_bug.cgi?id=106885
  [fdo#107773]: https://bugs.freedesktop.org/show_bug.cgi?id=107773
  [fdo#107807]: https://bugs.freedesktop.org/show_bug.cgi?id=107807
  [fdo#107815]: https://bugs.freedesktop.org/show_bug.cgi?id=107815
  [fdo#107931]: https://bugs.freedesktop.org/show_bug.cgi?id=107931
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108147]: https://bugs.freedesktop.org/show_bug.cgi?id=108147
  [fdo#108303]: https://bugs.freedesktop.org/show_bug.cgi?id=108303
  [fdo#108590]: https://bugs.freedesktop.org/show_bug.cgi?id=108590
  [fdo#109016]: https://bugs.freedesktop.org/show_bug.cgi?id=109016
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109350]: https://bugs.freedesktop.org/show_bug.cgi?id=109350
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (6 -> 6)
------------------------------

  No changes in participating hosts


Build changes
-------------

    * Linux: CI_DRM_5660 -> Patchwork_12309

  CI_DRM_5660: 2cca9f74156f1061298a7249f7af80dcbb9178d9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4857: 25911cdde500aa6ddede601faec91741c6963c27 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12309: 37624d9dc7aa1a7e6b49ffaef07c05c01146b572 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12309/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (14 preceding siblings ...)
  2019-02-26 15:17 ` ✓ Fi.CI.IGT: " Patchwork
@ 2019-02-27 10:19 ` Tvrtko Ursulin
  2019-02-27 11:38 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3) Patchwork
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-27 10:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:23, Chris Wilson wrote:
> When a request has its priority changed, we traverse the graph of all of
> its signalers to raise their priorities to match (priority inheritance).
> If the request has already started executing its payload, we know that
> all of its signalers must have signaled and we do not need to process
> our list of signalers.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_scheduler.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 8bc042551692..38efefd22dce 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -18,6 +18,11 @@ node_to_request(const struct i915_sched_node *node)
>   	return container_of(node, const struct i915_request, sched);
>   }
>   
> +static inline bool node_started(const struct i915_sched_node *node)
> +{
> +	return i915_request_started(node_to_request(node));
> +}
> +
>   static inline bool node_signaled(const struct i915_sched_node *node)
>   {
>   	return i915_request_completed(node_to_request(node));
> @@ -301,6 +306,10 @@ static void __i915_schedule(struct i915_request *rq,
>   	list_for_each_entry(dep, &dfs, dfs_link) {
>   		struct i915_sched_node *node = dep->signaler;
>   
> +		/* If we are already flying, we know we have no signalers */
> +		if (node_started(node))
> +			continue;
> +
>   		/*
>   		 * Within an engine, there can be no cycle, but we may
>   		 * refer to the same dependency chain multiple times
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/11] drm/i915: Make request allocation caches global
  2019-02-26 10:23 ` [PATCH 04/11] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-27 10:29   ` Tvrtko Ursulin
  2019-02-27 10:44     ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-27 10:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:23, Chris Wilson wrote:
> As kmem_caches share the same properties (size, allocation/free behaviour)
> for all potential devices, we can use global caches. While this
> potential has worse fragmentation behaviour (one can argue that
> different devices would have different activity lifetimes, but you can
> also argue that activity is temporal across the system) it is the
> default behaviour of the system at large to amalgamate matching caches.
> 
> The benefit for us is much reduced pointer dancing along the frequent
> allocation paths.
> 
> v2: Defer shrinking until after a global grace period for futureproofing
> multiple consumers of the slab caches, similar to the current strategy
> for avoiding shrinking too early.

I suggested to call i915_globals_park directly from __i915_gem_park for 
symmetry with how i915_gem_unpark calls i915_globals_unpark. 
i915_globals has it's own delayed setup so I don't think it benefits 
from the double indirection courtesy of being called from shrink_caches.

Otherwise I had not other complaints, but this asymmetry for no reason 
is bugging me.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/i915_active.c            |   7 +-
>   drivers/gpu/drm/i915/i915_active.h            |   1 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 -
>   drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
>   drivers/gpu/drm/i915/i915_globals.c           | 113 ++++++++++++++++++
>   drivers/gpu/drm/i915/i915_globals.h           |  15 +++
>   drivers/gpu/drm/i915/i915_pci.c               |   8 +-
>   drivers/gpu/drm/i915/i915_request.c           |  53 ++++++--
>   drivers/gpu/drm/i915/i915_request.h           |  10 ++
>   drivers/gpu/drm/i915/i915_scheduler.c         |  66 +++++++---
>   drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
>   drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
>   drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
>   drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  45 ++++---
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 ----
>   drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
>   drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
>   20 files changed, 313 insertions(+), 150 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.c
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 1787e1299b1b..a1d834068765 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -77,6 +77,7 @@ i915-y += \
>   	  i915_gem_tiling.o \
>   	  i915_gem_userptr.o \
>   	  i915_gemfs.o \
> +	  i915_globals.o \
>   	  i915_query.o \
>   	  i915_request.o \
>   	  i915_scheduler.o \
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index db7bb5bd5add..d9f6471ac16c 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -294,7 +294,12 @@ int __init i915_global_active_init(void)
>   	return 0;
>   }
>   
> -void __exit i915_global_active_exit(void)
> +void i915_global_active_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_cache);
> +}
> +
> +void i915_global_active_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_cache);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index 12b5c1d287d1..5fbd9102384b 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
>   #endif
>   
>   int i915_global_active_init(void);
> +void i915_global_active_shrink(void);
>   void i915_global_active_exit(void);
>   
>   #endif /* _I915_ACTIVE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cc09caf3870e..f16016b330b3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1473,9 +1473,6 @@ struct drm_i915_private {
>   	struct kmem_cache *objects;
>   	struct kmem_cache *vmas;
>   	struct kmem_cache *luts;
> -	struct kmem_cache *requests;
> -	struct kmem_cache *dependencies;
> -	struct kmem_cache *priorities;
>   
>   	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
>   	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2b261524cfa4..713ed6fbdcc8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -42,6 +42,7 @@
>   #include "i915_drv.h"
>   #include "i915_gem_clflush.h"
>   #include "i915_gemfs.h"
> +#include "i915_globals.h"
>   #include "i915_reset.h"
>   #include "i915_trace.h"
>   #include "i915_vgpu.h"
> @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
>   	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
>   		i915->gt.epoch = 1;
>   
> +	i915_globals_unpark();
> +
>   	intel_enable_gt_powersave(i915);
>   	i915_update_gfx_val(i915);
>   	if (INTEL_GEN(i915) >= 6)
> @@ -2892,12 +2895,11 @@ static void shrink_caches(struct drm_i915_private *i915)
>   	 * filled slabs to prioritise allocating from the mostly full slabs,
>   	 * with the aim of reducing fragmentation.
>   	 */
> -	kmem_cache_shrink(i915->priorities);
> -	kmem_cache_shrink(i915->dependencies);
> -	kmem_cache_shrink(i915->requests);
>   	kmem_cache_shrink(i915->luts);
>   	kmem_cache_shrink(i915->vmas);
>   	kmem_cache_shrink(i915->objects);
> +
> +	i915_globals_park();
>   }
>   
>   struct sleep_rcu_work {
> @@ -5235,23 +5237,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   	if (!dev_priv->luts)
>   		goto err_vmas;
>   
> -	dev_priv->requests = KMEM_CACHE(i915_request,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT |
> -					SLAB_TYPESAFE_BY_RCU);
> -	if (!dev_priv->requests)
> -		goto err_luts;
> -
> -	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
> -					    SLAB_HWCACHE_ALIGN |
> -					    SLAB_RECLAIM_ACCOUNT);
> -	if (!dev_priv->dependencies)
> -		goto err_requests;
> -
> -	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!dev_priv->priorities)
> -		goto err_dependencies;
> -
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>   	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>   
> @@ -5276,12 +5261,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   
>   	return 0;
>   
> -err_dependencies:
> -	kmem_cache_destroy(dev_priv->dependencies);
> -err_requests:
> -	kmem_cache_destroy(dev_priv->requests);
> -err_luts:
> -	kmem_cache_destroy(dev_priv->luts);
>   err_vmas:
>   	kmem_cache_destroy(dev_priv->vmas);
>   err_objects:
> @@ -5299,9 +5278,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   
>   	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
>   
> -	kmem_cache_destroy(dev_priv->priorities);
> -	kmem_cache_destroy(dev_priv->dependencies);
> -	kmem_cache_destroy(dev_priv->requests);
>   	kmem_cache_destroy(dev_priv->luts);
>   	kmem_cache_destroy(dev_priv->vmas);
>   	kmem_cache_destroy(dev_priv->objects);
> diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> new file mode 100644
> index 000000000000..7fd1b3945a04
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -0,0 +1,113 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +#include "i915_active.h"
> +#include "i915_globals.h"
> +#include "i915_request.h"
> +#include "i915_scheduler.h"
> +
> +int __init i915_globals_init(void)
> +{
> +	int err;
> +
> +	err = i915_global_active_init();
> +	if (err)
> +		return err;
> +
> +	err = i915_global_request_init();
> +	if (err)
> +		goto err_active;
> +
> +	err = i915_global_scheduler_init();
> +	if (err)
> +		goto err_request;
> +
> +	return 0;
> +
> +err_request:
> +	i915_global_request_exit();
> +err_active:
> +	i915_global_active_exit();
> +	return err;
> +}
> +
> +static void i915_globals_shrink(void)
> +{
> +	/*
> +	 * kmem_cache_shrink() discards empty slabs and reorders partially
> +	 * filled slabs to prioritise allocating from the mostly full slabs,
> +	 * with the aim of reducing fragmentation.
> +	 */
> +	i915_global_active_shrink();
> +	i915_global_request_shrink();
> +	i915_global_scheduler_shrink();
> +}
> +
> +static atomic_t active;
> +static atomic_t epoch;
> +struct park_work {
> +	struct rcu_work work;
> +	int epoch;
> +};
> +
> +static void __i915_globals_park(struct work_struct *work)
> +{
> +	struct park_work *wrk = container_of(work, typeof(*wrk), work.work);
> +
> +	/* Confirm nothing woke up in the last grace period */
> +	if (wrk->epoch == atomic_read(&epoch))
> +		i915_globals_shrink();
> +
> +	kfree(wrk);
> +}
> +
> +void i915_globals_park(void)
> +{
> +	struct park_work *wrk;
> +
> +	/*
> +	 * Defer shrinking the global slab caches (and other work) until
> +	 * after a RCU grace period has completed with no activity. This
> +	 * is to try and reduce the latency impact on the consumers caused
> +	 * by us shrinking the caches the same time as they are trying to
> +	 * allocate, with the assumption being that if we idle long enough
> +	 * for an RCU grace period to elapse since the last use, it is likely
> +	 * to be longer until we need the caches again.
> +	 */
> +	if (!atomic_dec_and_test(&active))
> +		return;
> +
> +	wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
> +	if (!wrk)
> +		return;
> +
> +	wrk->epoch = atomic_inc_return(&epoch);
> +	INIT_RCU_WORK(&wrk->work, __i915_globals_park);
> +	queue_rcu_work(system_wq, &wrk->work);
> +}
> +
> +void i915_globals_unpark(void)
> +{
> +	atomic_inc(&epoch);
> +	atomic_inc(&active);
> +}
> +
> +void __exit i915_globals_exit(void)
> +{
> +	/* Flush any residual park_work */
> +	rcu_barrier();
> +	flush_scheduled_work();
> +
> +	i915_global_scheduler_exit();
> +	i915_global_request_exit();
> +	i915_global_active_exit();
> +
> +	/* And ensure that our DESTROY_BY_RCU slabs are truly destroyed */
> +	rcu_barrier();
> +}
> diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
> new file mode 100644
> index 000000000000..e468f0413a73
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -0,0 +1,15 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _I915_GLOBALS_H_
> +#define _I915_GLOBALS_H_
> +
> +int i915_globals_init(void);
> +void i915_globals_park(void);
> +void i915_globals_unpark(void);
> +void i915_globals_exit(void);
> +
> +#endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index c4d6b8da9b03..a9211c370cd1 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -28,8 +28,8 @@
>   
>   #include <drm/drm_drv.h>
>   
> -#include "i915_active.h"
>   #include "i915_drv.h"
> +#include "i915_globals.h"
>   #include "i915_selftest.h"
>   
>   #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
> @@ -802,7 +802,9 @@ static int __init i915_init(void)
>   	bool use_kms = true;
>   	int err;
>   
> -	i915_global_active_init();
> +	err = i915_globals_init();
> +	if (err)
> +		return err;
>   
>   	err = i915_mock_selftests();
>   	if (err)
> @@ -835,7 +837,7 @@ static void __exit i915_exit(void)
>   		return;
>   
>   	pci_unregister_driver(&i915_pci_driver);
> -	i915_global_active_exit();
> +	i915_globals_exit();
>   }
>   
>   module_init(i915_init);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 00a1ea7cd907..c65f6c990fdd 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -32,6 +32,11 @@
>   #include "i915_active.h"
>   #include "i915_reset.h"
>   
> +static struct i915_global_request {
> +	struct kmem_cache *slab_requests;
> +	struct kmem_cache *slab_dependencies;
> +} global;
> +
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
>   {
>   	return "i915";
> @@ -86,7 +91,7 @@ static void i915_fence_release(struct dma_fence *fence)
>   	 */
>   	i915_sw_fence_fini(&rq->submit);
>   
> -	kmem_cache_free(rq->i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   }
>   
>   const struct dma_fence_ops i915_fence_ops = {
> @@ -292,7 +297,7 @@ static void i915_request_retire(struct i915_request *request)
>   
>   	unreserve_gt(request->i915);
>   
> -	i915_sched_node_fini(request->i915, &request->sched);
> +	i915_sched_node_fini(&request->sched);
>   	i915_request_put(request);
>   }
>   
> @@ -506,7 +511,7 @@ i915_request_alloc_slow(struct intel_context *ce)
>   	ring_retire_requests(ring);
>   
>   out:
> -	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
>   }
>   
>   static int add_timeline_barrier(struct i915_request *rq)
> @@ -594,7 +599,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 *
>   	 * Do not use kmem_cache_zalloc() here!
>   	 */
> -	rq = kmem_cache_alloc(i915->requests,
> +	rq = kmem_cache_alloc(global.slab_requests,
>   			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>   	if (unlikely(!rq)) {
>   		rq = i915_request_alloc_slow(ce);
> @@ -681,7 +686,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> -	kmem_cache_free(i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	unreserve_gt(i915);
>   	intel_context_unpin(ce);
> @@ -700,9 +705,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		return 0;
>   
>   	if (to->engine->schedule) {
> -		ret = i915_sched_node_add_dependency(to->i915,
> -						     &to->sched,
> -						     &from->sched);
> +		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
>   		if (ret < 0)
>   			return ret;
>   	}
> @@ -1190,3 +1193,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
>   #include "selftests/mock_request.c"
>   #include "selftests/i915_request.c"
>   #endif
> +
> +int __init i915_global_request_init(void)
> +{
> +	global.slab_requests = KMEM_CACHE(i915_request,
> +					  SLAB_HWCACHE_ALIGN |
> +					  SLAB_RECLAIM_ACCOUNT |
> +					  SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_requests)
> +		return -ENOMEM;
> +
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN |
> +					      SLAB_RECLAIM_ACCOUNT);
> +	if (!global.slab_dependencies)
> +		goto err_requests;
> +
> +	return 0;
> +
> +err_requests:
> +	kmem_cache_destroy(global.slab_requests);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_request_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_requests);
> +}
> +
> +void i915_global_request_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_requests);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 1e127c1c53fa..be3ded6bcf56 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -29,6 +29,7 @@
>   
>   #include "i915_gem.h"
>   #include "i915_scheduler.h"
> +#include "i915_selftest.h"
>   #include "i915_sw_fence.h"
>   
>   #include <uapi/drm/i915_drm.h>
> @@ -196,6 +197,11 @@ struct i915_request {
>   	struct drm_i915_file_private *file_priv;
>   	/** file_priv list entry for this request */
>   	struct list_head client_link;
> +
> +	I915_SELFTEST_DECLARE(struct {
> +		struct list_head link;
> +		unsigned long delay;
> +	} mock;)
>   };
>   
>   #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> @@ -371,4 +377,8 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
>   
>   void i915_retire_requests(struct drm_i915_private *i915);
>   
> +int i915_global_request_init(void);
> +void i915_global_request_shrink(void);
> +void i915_global_request_exit(void);
> +
>   #endif /* I915_REQUEST_H */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 9fb96ff57a29..50018ad30233 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -10,6 +10,11 @@
>   #include "i915_request.h"
>   #include "i915_scheduler.h"
>   
> +static struct i915_global_scheduler {
> +	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_priorities;
> +} global;
> +
>   static DEFINE_SPINLOCK(schedule_lock);
>   
>   static const struct i915_request *
> @@ -37,16 +42,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
>   }
>   
>   static struct i915_dependency *
> -i915_dependency_alloc(struct drm_i915_private *i915)
> +i915_dependency_alloc(void)
>   {
> -	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
>   }
>   
>   static void
> -i915_dependency_free(struct drm_i915_private *i915,
> -		     struct i915_dependency *dep)
> +i915_dependency_free(struct i915_dependency *dep)
>   {
> -	kmem_cache_free(i915->dependencies, dep);
> +	kmem_cache_free(global.slab_dependencies, dep);
>   }
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -73,25 +77,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   	return ret;
>   }
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal)
>   {
>   	struct i915_dependency *dep;
>   
> -	dep = i915_dependency_alloc(i915);
> +	dep = i915_dependency_alloc();
>   	if (!dep)
>   		return -ENOMEM;
>   
>   	if (!__i915_sched_node_add_dependency(node, signal, dep,
>   					      I915_DEPENDENCY_ALLOC))
> -		i915_dependency_free(i915, dep);
> +		i915_dependency_free(dep);
>   
>   	return 0;
>   }
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node)
> +void i915_sched_node_fini(struct i915_sched_node *node)
>   {
>   	struct i915_dependency *dep, *tmp;
>   
> @@ -111,7 +113,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->wait_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	/* Remove ourselves from everyone who depends upon us */
> @@ -121,7 +123,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->signal_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	spin_unlock(&schedule_lock);
> @@ -198,7 +200,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>   	if (prio == I915_PRIORITY_NORMAL) {
>   		p = &execlists->default_priolist;
>   	} else {
> -		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
> +		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
>   		/* Convert an allocation failure to a priority bump */
>   		if (unlikely(!p)) {
>   			prio = I915_PRIORITY_NORMAL; /* recurses just once */
> @@ -423,3 +425,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
>   
>   	spin_unlock_bh(&schedule_lock);
>   }
> +
> +void __i915_priolist_free(struct i915_priolist *p)
> +{
> +	kmem_cache_free(global.slab_priorities, p);
> +}
> +
> +int __init i915_global_scheduler_init(void)
> +{
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_dependencies)
> +		return -ENOMEM;
> +
> +	global.slab_priorities = KMEM_CACHE(i915_priolist,
> +					    SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_priorities)
> +		goto err_priorities;
> +
> +	return 0;
> +
> +err_priorities:
> +	kmem_cache_destroy(global.slab_priorities);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_scheduler_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_priorities);
> +}
> +
> +void i915_global_scheduler_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_priorities);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 54bd6c89817e..5196ce07b6c2 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -85,6 +85,23 @@ struct i915_dependency {
>   #define I915_DEPENDENCY_ALLOC BIT(0)
>   };
>   
> +struct i915_priolist {
> +	struct list_head requests[I915_PRIORITY_COUNT];
> +	struct rb_node node;
> +	unsigned long used;
> +	int priority;
> +};
> +
> +#define priolist_for_each_request(it, plist, idx) \
> +	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> +		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> +
> +#define priolist_for_each_request_consume(it, n, plist, idx) \
> +	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> +		list_for_each_entry_safe(it, n, \
> +					 &(plist)->requests[idx - 1], \
> +					 sched.link)
> +
>   void i915_sched_node_init(struct i915_sched_node *node);
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				      struct i915_dependency *dep,
>   				      unsigned long flags);
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal);
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node);
> +void i915_sched_node_fini(struct i915_sched_node *node);
>   
>   void i915_schedule(struct i915_request *request,
>   		   const struct i915_sched_attr *attr);
> @@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
>   struct list_head *
>   i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
>   
> +void __i915_priolist_free(struct i915_priolist *p);
> +static inline void i915_priolist_free(struct i915_priolist *p)
> +{
> +	if (p->priority != I915_PRIORITY_NORMAL)
> +		__i915_priolist_free(p);
> +}
> +
> +int i915_global_scheduler_init(void);
> +void i915_global_scheduler_shrink(void);
> +void i915_global_scheduler_exit(void);
> +
>   #endif /* _I915_SCHEDULER_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index a2846ea1e62c..56ba2fcbabe6 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   done:
>   	execlists->queue_priority_hint =
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index dba19baf6808..29b2a2f34edb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -818,8 +818,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   done:
> @@ -972,8 +971,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   	/* Remaining _unready_ requests will be nop'ed when submitted */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index de8dba7565b0..5284f243931a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -187,23 +187,6 @@ enum intel_engine_id {
>   #define _VECS(n) (VECS + (n))
>   };
>   
> -struct i915_priolist {
> -	struct list_head requests[I915_PRIORITY_COUNT];
> -	struct rb_node node;
> -	unsigned long used;
> -	int priority;
> -};
> -
> -#define priolist_for_each_request(it, plist, idx) \
> -	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> -		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> -
> -#define priolist_for_each_request_consume(it, n, plist, idx) \
> -	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> -		list_for_each_entry_safe(it, n, \
> -					 &(plist)->requests[idx - 1], \
> -					 sched.link)
> -
>   struct st_preempt_hang {
>   	struct completion completion;
>   	unsigned int count;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 7172f6c7f25a..2d582a21eba9 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -441,7 +441,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
>   static void dummy_request_free(struct i915_request *dummy)
>   {
>   	i915_request_mark_complete(dummy);
> -	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	i915_sched_node_fini(&dummy->sched);
>   	i915_sw_fence_fini(&dummy->submit);
>   
>   	dma_fence_free(&dummy->fence);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 6f3fb803c747..ec1ae948954c 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -76,26 +76,26 @@ static void mock_ring_free(struct intel_ring *base)
>   	kfree(ring);
>   }
>   
> -static struct mock_request *first_request(struct mock_engine *engine)
> +static struct i915_request *first_request(struct mock_engine *engine)
>   {
>   	return list_first_entry_or_null(&engine->hw_queue,
> -					struct mock_request,
> -					link);
> +					struct i915_request,
> +					mock.link);
>   }
>   
> -static void advance(struct mock_request *request)
> +static void advance(struct i915_request *request)
>   {
> -	list_del_init(&request->link);
> -	i915_request_mark_complete(&request->base);
> -	GEM_BUG_ON(!i915_request_completed(&request->base));
> +	list_del_init(&request->mock.link);
> +	i915_request_mark_complete(request);
> +	GEM_BUG_ON(!i915_request_completed(request));
>   
> -	intel_engine_queue_breadcrumbs(request->base.engine);
> +	intel_engine_queue_breadcrumbs(request->engine);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
>   {
>   	struct mock_engine *engine = from_timer(engine, t, hw_delay);
> -	struct mock_request *request;
> +	struct i915_request *request;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> @@ -110,8 +110,9 @@ static void hw_delay_complete(struct timer_list *t)
>   	 * requeue the timer for the next delayed request.
>   	 */
>   	while ((request = first_request(engine))) {
> -		if (request->delay) {
> -			mod_timer(&engine->hw_delay, jiffies + request->delay);
> +		if (request->mock.delay) {
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   			break;
>   		}
>   
> @@ -169,10 +170,8 @@ mock_context_pin(struct intel_engine_cs *engine,
>   
>   static int mock_request_alloc(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
> -
> -	INIT_LIST_HEAD(&mock->link);
> -	mock->delay = 0;
> +	INIT_LIST_HEAD(&request->mock.link);
> +	request->mock.delay = 0;
>   
>   	return 0;
>   }
> @@ -190,7 +189,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   
>   static void mock_submit_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	unsigned long flags;
> @@ -198,12 +196,13 @@ static void mock_submit_request(struct i915_request *request)
>   	i915_request_submit(request);
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> -	list_add_tail(&mock->link, &engine->hw_queue);
> -	if (mock->link.prev == &engine->hw_queue) {
> -		if (mock->delay)
> -			mod_timer(&engine->hw_delay, jiffies + mock->delay);
> +	list_add_tail(&request->mock.link, &engine->hw_queue);
> +	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
> +		if (request->mock.delay)
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   		else
> -			advance(mock);
> +			advance(request);
>   	}
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
> @@ -263,12 +262,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   {
>   	struct mock_engine *mock =
>   		container_of(engine, typeof(*mock), base);
> -	struct mock_request *request, *rn;
> +	struct i915_request *request, *rn;
>   
>   	del_timer_sync(&mock->hw_delay);
>   
>   	spin_lock_irq(&mock->hw_lock);
> -	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
> +	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
>   		advance(request);
>   	spin_unlock_irq(&mock->hw_lock);
>   }
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index fc516a2970f4..5a98caba6d69 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
>   
>   	destroy_workqueue(i915->wq);
>   
> -	kmem_cache_destroy(i915->priorities);
> -	kmem_cache_destroy(i915->dependencies);
> -	kmem_cache_destroy(i915->requests);
>   	kmem_cache_destroy(i915->vmas);
>   	kmem_cache_destroy(i915->objects);
>   
> @@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
>   	if (!i915->vmas)
>   		goto err_objects;
>   
> -	i915->requests = KMEM_CACHE(mock_request,
> -				    SLAB_HWCACHE_ALIGN |
> -				    SLAB_RECLAIM_ACCOUNT |
> -				    SLAB_TYPESAFE_BY_RCU);
> -	if (!i915->requests)
> -		goto err_vmas;
> -
> -	i915->dependencies = KMEM_CACHE(i915_dependency,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT);
> -	if (!i915->dependencies)
> -		goto err_requests;
> -
> -	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!i915->priorities)
> -		goto err_dependencies;
> -
>   	i915_timelines_init(i915);
>   
>   	INIT_LIST_HEAD(&i915->gt.active_rings);
> @@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
>   err_unlock:
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	i915_timelines_fini(i915);
> -	kmem_cache_destroy(i915->priorities);
> -err_dependencies:
> -	kmem_cache_destroy(i915->dependencies);
> -err_requests:
> -	kmem_cache_destroy(i915->requests);
> -err_vmas:
>   	kmem_cache_destroy(i915->vmas);
>   err_objects:
>   	kmem_cache_destroy(i915->objects);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
> index 0dc29e242597..d1a7c9608712 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.c
> @@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
>   	     unsigned long delay)
>   {
>   	struct i915_request *request;
> -	struct mock_request *mock;
>   
>   	/* NB the i915->requests slab cache is enlarged to fit mock_request */
>   	request = i915_request_alloc(engine, context);
>   	if (IS_ERR(request))
>   		return NULL;
>   
> -	mock = container_of(request, typeof(*mock), base);
> -	mock->delay = delay;
> -
> -	return &mock->base;
> +	request->mock.delay = delay;
> +	return request;
>   }
>   
>   bool mock_cancel_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	bool was_queued;
>   
>   	spin_lock_irq(&engine->hw_lock);
> -	was_queued = !list_empty(&mock->link);
> -	list_del_init(&mock->link);
> +	was_queued = !list_empty(&request->mock.link);
> +	list_del_init(&request->mock.link);
>   	spin_unlock_irq(&engine->hw_lock);
>   
>   	if (was_queued)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
> index 995fb728380c..4acf0211df20 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.h
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.h
> @@ -29,13 +29,6 @@
>   
>   #include "../i915_request.h"
>   
> -struct mock_request {
> -	struct i915_request base;
> -
> -	struct list_head link;
> -	unsigned long delay;
> -};
> -
>   struct i915_request *
>   mock_request(struct intel_engine_cs *engine,
>   	     struct i915_gem_context *context,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/11] drm/i915: Make request allocation caches global
  2019-02-27 10:29   ` Tvrtko Ursulin
@ 2019-02-27 10:44     ` Chris Wilson
  2019-02-27 14:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-27 10:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-27 10:29:43)
> 
> On 26/02/2019 10:23, Chris Wilson wrote:
> > As kmem_caches share the same properties (size, allocation/free behaviour)
> > for all potential devices, we can use global caches. While this
> > potential has worse fragmentation behaviour (one can argue that
> > different devices would have different activity lifetimes, but you can
> > also argue that activity is temporal across the system) it is the
> > default behaviour of the system at large to amalgamate matching caches.
> > 
> > The benefit for us is much reduced pointer dancing along the frequent
> > allocation paths.
> > 
> > v2: Defer shrinking until after a global grace period for futureproofing
> > multiple consumers of the slab caches, similar to the current strategy
> > for avoiding shrinking too early.
> 
> I suggested to call i915_globals_park directly from __i915_gem_park for 
> symmetry with how i915_gem_unpark calls i915_globals_unpark. 
> i915_globals has it's own delayed setup so I don't think it benefits 
> from the double indirection courtesy of being called from shrink_caches.

I replied I left that change until a later patch after the final
conversions. Mostly so that we had a standalone patch to revert if the
rcu_work turns out badly. In this patch, it was to be the simple
translation over to global_shrink, except you asked for it to be truly
global and so we needed another layer of counters.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-26 10:23 ` [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-02-27 10:44   ` Tvrtko Ursulin
  2019-02-27 10:51     ` Chris Wilson
  2019-02-27 11:15   ` [PATCH] " Chris Wilson
  1 sibling, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-27 10:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:23, Chris Wilson wrote:
> In preparation for enabling HW semaphores, we need to keep in flight
> timeline HWSP alive until its use across entire system has completed,
> as any other timeline active on the GPU may still refer back to the
> already retired timeline. We both have to delay recycling available
> cachelines and unpinning old HWSP until the next idle point.
> 
> An easy option would be to simply keep all used HWSP until the system as
> a whole was idle, i.e. we could release them all at once on parking.
> However, on a busy system, we may never see a global idle point,
> essentially meaning the resource will be leaked until we are forced to
> do a GC pass. We already employ a fine-grained idle detection mechanism
> for vma, which we can reuse here so that each cacheline can be freed
> immediately after the last request using it is retired.
> 
> v3: Keep track of the activity of each cacheline.
> v4: cacheline_free() on canceling the seqno tracking
> v5: Finally with a testcase to exercise wraparound
> v6: Pack cacheline into empty bits of page-aligned vaddr

Have you considered using page_(mask/unmask/pack/unpack)_bits helpers 
instead of open coding it?

Is the additional sharing of bits for two variables a thing which makes 
it not as elegant?

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v5
> ---
>   drivers/gpu/drm/i915/i915_request.c           |  31 +-
>   drivers/gpu/drm/i915/i915_request.h           |  11 +
>   drivers/gpu/drm/i915/i915_timeline.c          | 290 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_timeline.h          |  11 +-
>   .../gpu/drm/i915/selftests/i915_timeline.c    | 113 +++++++
>   5 files changed, 417 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 719d1a5ab082..d354967d6ae8 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -325,11 +325,6 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> -static u32 timeline_get_seqno(struct i915_timeline *tl)
> -{
> -	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> -}
> -
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -532,8 +527,10 @@ struct i915_request *
>   i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> -	struct i915_request *rq;
>   	struct intel_context *ce;
> +	struct i915_timeline *tl;
> +	struct i915_request *rq;
> +	u32 seqno;
>   	int ret;
>   
>   	lockdep_assert_held(&i915->drm.struct_mutex);
> @@ -610,24 +607,27 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		}
>   	}
>   
> -	rq->rcustate = get_state_synchronize_rcu();
> -
>   	INIT_LIST_HEAD(&rq->active_list);
> +
> +	tl = ce->ring->timeline;
> +	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> +	if (ret)
> +		goto err_free;
> +
>   	rq->i915 = i915;
>   	rq->engine = engine;
>   	rq->gem_context = ctx;
>   	rq->hw_context = ce;
>   	rq->ring = ce->ring;
> -	rq->timeline = ce->ring->timeline;
> +	rq->timeline = tl;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
> -	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
> +	rq->hwsp_seqno = tl->hwsp_seqno;
> +	rq->hwsp_cacheline = tl->hwsp_cacheline;
> +	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
>   
>   	spin_lock_init(&rq->lock);
> -	dma_fence_init(&rq->fence,
> -		       &i915_fence_ops,
> -		       &rq->lock,
> -		       rq->timeline->fence_context,
> -		       timeline_get_seqno(rq->timeline));
> +	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
> +		       tl->fence_context, seqno);
>   
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
> @@ -687,6 +687,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> +err_free:
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	mutex_unlock(&ce->ring->timeline->mutex);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index be3ded6bcf56..ea1e6f0ade53 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -38,6 +38,7 @@ struct drm_file;
>   struct drm_i915_gem_object;
>   struct i915_request;
>   struct i915_timeline;
> +struct i915_timeline_cacheline;
>   
>   struct i915_capture_list {
>   	struct i915_capture_list *next;
> @@ -148,6 +149,16 @@ struct i915_request {
>   	 */
>   	const u32 *hwsp_seqno;
>   
> +	/*
> +	 * If we need to access the timeline's seqno for this request in
> +	 * another request, we need to keep a read reference to this associated
> +	 * cacheline, so that we do not free and recycle it before the foriegn
> +	 * observers have completed. Hence, we keep a pointer to the cacheline
> +	 * inside the timeline's HWSP vma, but it is only valid while this
> +	 * request has not completed and guarded by the timeline mutex.
> +	 */
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
>   	/** Position in the ring of the start of the request */
>   	u32 head;
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 87a80558da28..f976b5a13a47 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -6,19 +6,29 @@
>   
>   #include "i915_drv.h"
>   
> -#include "i915_timeline.h"
> +#include "i915_active.h"
>   #include "i915_syncmap.h"
> +#include "i915_timeline.h"
>   
>   struct i915_timeline_hwsp {
> -	struct i915_vma *vma;
> +	struct i915_gt_timelines *gt;
>   	struct list_head free_link;
> +	struct i915_vma *vma;
>   	u64 free_bitmap;
>   };
>   
> -static inline struct i915_timeline_hwsp *
> -i915_timeline_hwsp(const struct i915_timeline *tl)
> +struct i915_timeline_cacheline {
> +	struct i915_active active;
> +	struct i915_timeline_hwsp *hwsp;
> +	unsigned long vaddr;
> +#define CACHELINE_MASK GENMASK(5, 0)
> +#define CACHELINE_FREE BIT(6)
> +};
> +
> +static inline struct drm_i915_private *
> +hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
>   {
> -	return tl->hwsp_ggtt->private;
> +	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
>   }
>   
>   static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
> @@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   		vma->private = hwsp;
>   		hwsp->vma = vma;
>   		hwsp->free_bitmap = ~0ull;
> +		hwsp->gt = gt;
>   
>   		spin_lock(&gt->hwsp_lock);
>   		list_add(&hwsp->free_link, &gt->hwsp_free_list);
> @@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   	return hwsp->vma;
>   }
>   
> -static void hwsp_free(struct i915_timeline *timeline)
> +static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
>   {
> -	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> -	struct i915_timeline_hwsp *hwsp;
> -
> -	hwsp = i915_timeline_hwsp(timeline);
> -	if (!hwsp) /* leave global HWSP alone! */
> -		return;
> +	struct i915_gt_timelines *gt = hwsp->gt;
>   
>   	spin_lock(&gt->hwsp_lock);
>   
> @@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	if (!hwsp->free_bitmap)
>   		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
>   
> -	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
> +	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
> +	hwsp->free_bitmap |= BIT_ULL(cacheline);
>   
>   	/* And if no one is left using it, give the page back to the system */
>   	if (hwsp->free_bitmap == ~0ull) {
> @@ -115,6 +122,77 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	spin_unlock(&gt->hwsp_lock);
>   }
>   
> +static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
> +
> +	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
> +	i915_vma_put(cl->hwsp->vma);
> +	__idle_hwsp_free(cl->hwsp, cl->vaddr & CACHELINE_MASK);
> +
> +	i915_active_fini(&cl->active);
> +	kfree(cl);
> +}
> +
> +static void __cacheline_retire(struct i915_active *active)
> +{
> +	struct i915_timeline_cacheline *cl =
> +		container_of(active, typeof(*cl), active);
> +
> +	i915_vma_unpin(cl->hwsp->vma);
> +	if (cl->vaddr & CACHELINE_FREE)
> +		__idle_cacheline_free(cl);
> +}
> +
> +static struct i915_timeline_cacheline *
> +cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	void *vaddr;
> +
> +	GEM_BUG_ON(cacheline > CACHELINE_MASK);
> +
> +	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
> +	if (!cl)
> +		return ERR_PTR(-ENOMEM);
> +
> +	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
> +	if (IS_ERR(vaddr)) {
> +		kfree(cl);
> +		return ERR_CAST(vaddr);
> +	}
> +
> +	i915_vma_get(hwsp->vma);
> +	cl->hwsp = hwsp;
> +	cl->vaddr = (unsigned long)vaddr;
> +	cl->vaddr |= cacheline;
> +
> +	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
> +
> +	return cl;
> +}
> +
> +static void cacheline_acquire(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl && i915_active_acquire(&cl->active))
> +		__i915_vma_pin(cl->hwsp->vma);
> +}
> +
> +static void cacheline_release(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl)
> +		i915_active_release(&cl->active);
> +}
> +
> +static void cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(cl->vaddr & CACHELINE_FREE);
> +	cl->vaddr |= CACHELINE_FREE;
> +
> +	if (i915_active_is_idle(&cl->active))
> +		__idle_cacheline_free(cl);
> +}
> +
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *timeline,
>   		       const char *name,
> @@ -136,29 +214,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->name = name;
>   	timeline->pin_count = 0;
>   	timeline->has_initial_breadcrumb = !hwsp;
> +	timeline->hwsp_cacheline = NULL;
>   
> -	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>   	if (!hwsp) {
> +		struct i915_timeline_cacheline *cl;
>   		unsigned int cacheline;
>   
>   		hwsp = hwsp_alloc(timeline, &cacheline);
>   		if (IS_ERR(hwsp))
>   			return PTR_ERR(hwsp);
>   
> +		cl = cacheline_alloc(hwsp->private, cacheline);
> +		if (IS_ERR(cl)) {
> +			__idle_hwsp_free(hwsp->private, cacheline);
> +			return PTR_ERR(cl);
> +		}
> +
> +		timeline->hwsp_cacheline = cl;
>   		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
> -	}
> -	timeline->hwsp_ggtt = i915_vma_get(hwsp);
>   
> -	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> -	if (IS_ERR(vaddr)) {
> -		hwsp_free(timeline);
> -		i915_vma_put(hwsp);
> -		return PTR_ERR(vaddr);
> +		vaddr = (void *)(cl->vaddr & PAGE_MASK);
> +	} else {
> +		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> +
> +		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> +		if (IS_ERR(vaddr))
> +			return PTR_ERR(vaddr);
>   	}
>   
>   	timeline->hwsp_seqno =
>   		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
>   
> +	timeline->hwsp_ggtt = i915_vma_get(hwsp);
> +	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
> +
>   	timeline->fence_context = dma_fence_context_alloc(1);
>   
>   	spin_lock_init(&timeline->lock);
> @@ -240,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
>   
>   	i915_syncmap_free(&timeline->sync);
> -	hwsp_free(timeline);
>   
> -	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +	if (timeline->hwsp_cacheline)
> +		cacheline_free(timeline->hwsp_cacheline);
> +	else
> +		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +
>   	i915_vma_put(timeline->hwsp_ggtt);
>   }
>   
> @@ -285,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   		i915_ggtt_offset(tl->hwsp_ggtt) +
>   		offset_in_page(tl->hwsp_offset);
>   
> +	cacheline_acquire(tl->hwsp_cacheline);
>   	timeline_add_to_active(tl);
>   
>   	return 0;
> @@ -294,6 +387,156 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   	return err;
>   }
>   
> +static u32 timeline_advance(struct i915_timeline *tl)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
> +
> +	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static void timeline_rollback(struct i915_timeline *tl)
> +{
> +	tl->seqno -= 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static noinline int
> +__i915_timeline_get_seqno(struct i915_timeline *tl,
> +			  struct i915_request *rq,
> +			  u32 *seqno)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	unsigned int cacheline;
> +	struct i915_vma *vma;
> +	void *vaddr;
> +	int err;
> +
> +	/*
> +	 * If there is an outstanding GPU reference to this cacheline,
> +	 * such as it being sampled by a HW semaphore on another timeline,
> +	 * we cannot wraparound our seqno value (the HW semaphore does
> +	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
> +	 * So if the cacheline is still busy, we must detach ourselves
> +	 * from it and leave it inflight alongside its users.
> +	 *
> +	 * However, if nobody is watching and we can guarantee that nobody
> +	 * will, we could simply reuse the same cacheline.
> +	 *
> +	 * if (i915_active_request_is_signaled(&tl->last_request) &&
> +	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
> +	 *	return 0;
> +	 *
> +	 * That seems unlikely for a busy timeline that needed to wrap in
> +	 * the first place, so just replace the cacheline.
> +	 */
> +
> +	vma = hwsp_alloc(tl, &cacheline);
> +	if (IS_ERR(vma)) {
> +		err = PTR_ERR(vma);
> +		goto err_rollback;
> +	}
> +
> +	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
> +	if (err) {
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_rollback;
> +	}
> +
> +	cl = cacheline_alloc(vma->private, cacheline);
> +	if (IS_ERR(cl)) {
> +		err = PTR_ERR(cl);
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_unpin;
> +	}
> +	GEM_BUG_ON(cl->hwsp->vma != vma);
> +
> +	/*
> +	 * Attach the old cacheline to the current request, so that we only
> +	 * free it after the current request is retired, which ensures that
> +	 * all writes into the cacheline from previous requests are complete.
> +	 */
> +	err = i915_active_ref(&tl->hwsp_cacheline->active,
> +			      tl->fence_context, rq);
> +	if (err)
> +		goto err_cacheline;
> +
> +	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
> +	cacheline_free(tl->hwsp_cacheline);
> +
> +	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
> +	i915_vma_put(tl->hwsp_ggtt);
> +
> +	tl->hwsp_ggtt = i915_vma_get(vma);
> +
> +	vaddr = (void *)(cl->vaddr & PAGE_MASK);
> +	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
> +	tl->hwsp_seqno =
> +		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
> +
> +	tl->hwsp_offset += i915_ggtt_offset(vma);
> +
> +	cacheline_acquire(cl);
> +	tl->hwsp_cacheline = cl;
> +
> +	*seqno = timeline_advance(tl);
> +	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
> +	return 0;
> +
> +err_cacheline:
> +	cacheline_free(cl);
> +err_unpin:
> +	i915_vma_unpin(vma);
> +err_rollback:
> +	timeline_rollback(tl);
> +	return err;
> +}
> +
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno)
> +{
> +	*seqno = timeline_advance(tl);
> +
> +	/* Replace the HWSP on wraparound for HW semaphores */
> +	if (unlikely(!*seqno && tl->hwsp_cacheline))
> +		return __i915_timeline_get_seqno(tl, rq, seqno);
> +
> +	return 0;
> +}
> +
> +static int cacheline_ref(struct i915_timeline_cacheline *cl,
> +			 struct i915_request *rq)
> +{
> +	return i915_active_ref(&cl->active, rq->fence.context, rq);
> +}
> +
> +int i915_timeline_read_hwsp(struct i915_request *from,
> +			    struct i915_request *to,
> +			    u32 *hwsp)
> +{
> +	struct i915_timeline_cacheline *cl = from->hwsp_cacheline;
> +	struct i915_timeline *tl = from->timeline;
> +	int err;
> +
> +	GEM_BUG_ON(to->timeline == tl);
> +
> +	mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING);
> +	err = i915_request_completed(from);
> +	if (!err)
> +		err = cacheline_ref(cl, to);
> +	if (!err) {
> +		if (likely(cl == tl->hwsp_cacheline)) {
> +			*hwsp = tl->hwsp_offset;
> +		} else { /* across a seqno wrap, recover the original offset */
> +			*hwsp = (cl->vaddr & CACHELINE_MASK) * CACHELINE_BYTES;
> +			*hwsp += i915_ggtt_offset(cl->hwsp->vma);
> +		}
> +	}
> +	mutex_unlock(&tl->mutex);
> +
> +	return err;
> +}
> +
>   void i915_timeline_unpin(struct i915_timeline *tl)
>   {
>   	GEM_BUG_ON(!tl->pin_count);
> @@ -301,6 +544,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
>   		return;
>   
>   	timeline_remove_from_active(tl);
> +	cacheline_release(tl->hwsp_cacheline);
>   
>   	/*
>   	 * Since this timeline is idle, all bariers upon which we were waiting
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 36c3849f7108..60b1dfad93ed 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -34,7 +34,7 @@
>   #include "i915_utils.h"
>   
>   struct i915_vma;
> -struct i915_timeline_hwsp;
> +struct i915_timeline_cacheline;
>   
>   struct i915_timeline {
>   	u64 fence_context;
> @@ -51,6 +51,8 @@ struct i915_timeline {
>   	struct i915_vma *hwsp_ggtt;
>   	u32 hwsp_offset;
>   
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
>   	bool has_initial_breadcrumb;
>   
>   	/**
> @@ -162,8 +164,15 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
>   }
>   
>   int i915_timeline_pin(struct i915_timeline *tl);
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno);
>   void i915_timeline_unpin(struct i915_timeline *tl);
>   
> +int i915_timeline_read_hwsp(struct i915_request *from,
> +			    struct i915_request *until,
> +			    u32 *hwsp_offset);
> +
>   void i915_timelines_init(struct drm_i915_private *i915);
>   void i915_timelines_park(struct drm_i915_private *i915);
>   void i915_timelines_fini(struct drm_i915_private *i915);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> index 12ea69b1a1e5..844701759ffc 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> @@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
>   #undef NUM_TIMELINES
>   }
>   
> +static int live_hwsp_wrap(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_timeline *tl;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = 0;
> +
> +	/*
> +	 * Across a seqno wrap, we need to keep the old cacheline alive for
> +	 * foreign GPU references.
> +	 */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	tl = i915_timeline_create(i915, __func__, NULL);
> +	if (IS_ERR(tl)) {
> +		err = PTR_ERR(tl);
> +		goto out_rpm;
> +	}
> +	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
> +		goto out_free;
> +
> +	err = i915_timeline_pin(tl);
> +	if (err)
> +		goto out_free;
> +
> +	for_each_engine(engine, i915, id) {
> +		const u32 *hwsp_seqno[2];
> +		struct i915_request *rq;
> +		u32 seqno[2];
> +
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		rq = i915_request_alloc(engine, i915->kernel_context);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		tl->seqno = -4u;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
> +			 seqno[0], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[0] = tl->hwsp_seqno;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
> +			 seqno[1], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[1] = tl->hwsp_seqno;
> +
> +		/* With wrap should come a new hwsp */
> +		GEM_BUG_ON(seqno[1] >= seqno[0]);
> +		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
> +
> +		i915_request_add(rq);
> +
> +		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
> +			pr_err("Wait for timeline writes timed out!\n");
> +			err = -EIO;
> +			goto out;
> +		}
> +
> +		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
> +			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
> +			       *hwsp_seqno[0], *hwsp_seqno[1],
> +			       seqno[0], seqno[1]);
> +			err = -EINVAL;
> +			goto out;
> +		}
> +
> +		i915_retire_requests(i915); /* recycle HWSP */
> +	}
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	i915_timeline_unpin(tl);
> +out_free:
> +	i915_timeline_put(tl);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	return err;
> +}
> +
>   static int live_hwsp_recycle(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_hwsp_recycle),
>   		SUBTEST(live_hwsp_engine),
>   		SUBTEST(live_hwsp_alternate),
> +		SUBTEST(live_hwsp_wrap),
>   	};
>   
>   	return i915_subtests(tests, i915);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-27 10:44   ` Tvrtko Ursulin
@ 2019-02-27 10:51     ` Chris Wilson
  0 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-27 10:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-27 10:44:42)
> 
> On 26/02/2019 10:23, Chris Wilson wrote:
> > In preparation for enabling HW semaphores, we need to keep in flight
> > timeline HWSP alive until its use across entire system has completed,
> > as any other timeline active on the GPU may still refer back to the
> > already retired timeline. We both have to delay recycling available
> > cachelines and unpinning old HWSP until the next idle point.
> > 
> > An easy option would be to simply keep all used HWSP until the system as
> > a whole was idle, i.e. we could release them all at once on parking.
> > However, on a busy system, we may never see a global idle point,
> > essentially meaning the resource will be leaked until we are forced to
> > do a GC pass. We already employ a fine-grained idle detection mechanism
> > for vma, which we can reuse here so that each cacheline can be freed
> > immediately after the last request using it is retired.
> > 
> > v3: Keep track of the activity of each cacheline.
> > v4: cacheline_free() on canceling the seqno tracking
> > v5: Finally with a testcase to exercise wraparound
> > v6: Pack cacheline into empty bits of page-aligned vaddr
> 
> Have you considered using page_(mask/unmask/pack/unpack)_bits helpers 
> instead of open coding it?

I forgot what they were called and was too lazy to read i915_utils.h
 
> Is the additional sharing of bits for two variables a thing which makes 
> it not as elegant?

Or the opposite: Do those wasted bits not stand out as being
inelegant? :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-26 10:23 ` [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
  2019-02-27 10:44   ` Tvrtko Ursulin
@ 2019-02-27 11:15   ` Chris Wilson
  2019-02-27 14:20     ` Tvrtko Ursulin
  1 sibling, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-02-27 11:15 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr
v7: Use i915_utils to hide the pointer casting around bit manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v5
---
 drivers/gpu/drm/i915/i915_request.c           |  31 +-
 drivers/gpu/drm/i915/i915_request.h           |  11 +
 drivers/gpu/drm/i915/i915_timeline.c          | 290 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h          |  11 +-
 drivers/gpu/drm/i915/i915_utils.h             |   3 +
 .../gpu/drm/i915/selftests/i915_timeline.c    | 113 +++++++
 6 files changed, 420 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 719d1a5ab082..d354967d6ae8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -325,11 +325,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -532,8 +527,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -610,24 +607,27 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->hwsp_cacheline = tl->hwsp_cacheline;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -687,6 +687,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	mutex_unlock(&ce->ring->timeline->mutex);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index be3ded6bcf56..ea1e6f0ade53 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,6 +38,7 @@ struct drm_file;
 struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
+struct i915_timeline_cacheline;
 
 struct i915_capture_list {
 	struct i915_capture_list *next;
@@ -148,6 +149,16 @@ struct i915_request {
 	 */
 	const u32 *hwsp_seqno;
 
+	/*
+	 * If we need to access the timeline's seqno for this request in
+	 * another request, we need to keep a read reference to this associated
+	 * cacheline, so that we do not free and recycle it before the foriegn
+	 * observers have completed. Hence, we keep a pointer to the cacheline
+	 * inside the timeline's HWSP vma, but it is only valid while this
+	 * request has not completed and guarded by the timeline mutex.
+	 */
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	/** Position in the ring of the start of the request */
 	u32 head;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 87a80558da28..e45e81b4cab6 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,29 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	void *vaddr;
+#define CACHELINE_BITS 6
+#define CACHELINE_FREE CACHELINE_BITS
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +122,76 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (ptr_test_bit(cl->vaddr, CACHELINE_FREE))
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+	void *vaddr;
+
+	GEM_BUG_ON(cacheline >= BIT(CACHELINE_BITS));
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		kfree(cl);
+		return ERR_CAST(vaddr);
+	}
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->vaddr = page_pack_bits(vaddr, cacheline);
+
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(ptr_test_bit(cl->vaddr, CACHELINE_FREE));
+	cl->vaddr = ptr_set_bit(cl->vaddr, CACHELINE_FREE);
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,29 +213,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
-	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
+		timeline->hwsp_cacheline = cl;
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
-	}
-	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
-		i915_vma_put(hwsp);
-		return PTR_ERR(vaddr);
+		vaddr = page_mask_bits(cl->vaddr);
+	} else {
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+
+		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 	}
 
 	timeline->hwsp_seqno =
 		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
+
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
@@ -240,9 +328,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
 
-	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	if (timeline->hwsp_cacheline)
+		cacheline_free(timeline->hwsp_cacheline);
+	else
+		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+
 	i915_vma_put(timeline->hwsp_ggtt);
 }
 
@@ -285,6 +376,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -294,6 +386,157 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	unsigned int cacheline;
+	struct i915_vma *vma;
+	void *vaddr;
+	int err;
+
+	/*
+	 * If there is an outstanding GPU reference to this cacheline,
+	 * such as it being sampled by a HW semaphore on another timeline,
+	 * we cannot wraparound our seqno value (the HW semaphore does
+	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
+	 * So if the cacheline is still busy, we must detach ourselves
+	 * from it and leave it inflight alongside its users.
+	 *
+	 * However, if nobody is watching and we can guarantee that nobody
+	 * will, we could simply reuse the same cacheline.
+	 *
+	 * if (i915_active_request_is_signaled(&tl->last_request) &&
+	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
+	 *	return 0;
+	 *
+	 * That seems unlikely for a busy timeline that needed to wrap in
+	 * the first place, so just replace the cacheline.
+	 */
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_unpin;
+	}
+	GEM_BUG_ON(cl->hwsp->vma != vma);
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
+	i915_vma_put(tl->hwsp_ggtt);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+
+	vaddr = page_mask_bits(cl->vaddr);
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	tl->hwsp_seqno =
+		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
+
+	tl->hwsp_offset += i915_ggtt_offset(vma);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
+	return 0;
+
+err_cacheline:
+	cacheline_free(cl);
+err_unpin:
+	i915_vma_unpin(vma);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+static int cacheline_ref(struct i915_timeline_cacheline *cl,
+			 struct i915_request *rq)
+{
+	return i915_active_ref(&cl->active, rq->fence.context, rq);
+}
+
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *to,
+			    u32 *hwsp)
+{
+	struct i915_timeline_cacheline *cl = from->hwsp_cacheline;
+	struct i915_timeline *tl = from->timeline;
+	int err;
+
+	GEM_BUG_ON(to->timeline == tl);
+
+	mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING);
+	err = i915_request_completed(from);
+	if (!err)
+		err = cacheline_ref(cl, to);
+	if (!err) {
+		if (likely(cl == tl->hwsp_cacheline)) {
+			*hwsp = tl->hwsp_offset;
+		} else { /* across a seqno wrap, recover the original offset */
+			*hwsp = i915_ggtt_offset(cl->hwsp->vma) +
+				ptr_unmask_bits(cl->vaddr, CACHELINE_BITS) *
+				CACHELINE_BYTES;
+		}
+	}
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -301,6 +544,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 36c3849f7108..60b1dfad93ed 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -51,6 +51,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -162,8 +164,15 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *until,
+			    u32 *hwsp_offset);
+
 void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 9726df37c4c4..1fb9c8b2f5b3 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -92,6 +92,9 @@
 	((typeof(ptr))((unsigned long)(ptr) | __bits));			\
 })
 
+#define ptr_set_bit(ptr, bit) ((typeof(ptr))((unsigned long)(ptr) | BIT(bit)))
+#define ptr_test_bit(ptr, bit) ((unsigned long)(ptr) & BIT(bit))
+
 #define page_mask_bits(ptr) ptr_mask_bits(ptr, PAGE_SHIFT)
 #define page_unmask_bits(ptr) ptr_unmask_bits(ptr, PAGE_SHIFT)
 #define page_pack_bits(ptr, bits) ptr_pack_bits(ptr, bits, PAGE_SHIFT)
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 12ea69b1a1e5..844701759ffc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
 #undef NUM_TIMELINES
 }
 
+static int live_hwsp_wrap(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_timeline *tl;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	/*
+	 * Across a seqno wrap, we need to keep the old cacheline alive for
+	 * foreign GPU references.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	tl = i915_timeline_create(i915, __func__, NULL);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
+		goto out_rpm;
+	}
+	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+		goto out_free;
+
+	err = i915_timeline_pin(tl);
+	if (err)
+		goto out_free;
+
+	for_each_engine(engine, i915, id) {
+		const u32 *hwsp_seqno[2];
+		struct i915_request *rq;
+		u32 seqno[2];
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		tl->seqno = -4u;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
+			 seqno[0], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[0] = tl->hwsp_seqno;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
+			 seqno[1], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[1] = tl->hwsp_seqno;
+
+		/* With wrap should come a new hwsp */
+		GEM_BUG_ON(seqno[1] >= seqno[0]);
+		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
+
+		i915_request_add(rq);
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("Wait for timeline writes timed out!\n");
+			err = -EIO;
+			goto out;
+		}
+
+		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
+			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
+			       *hwsp_seqno[0], *hwsp_seqno[1],
+			       seqno[0], seqno[1]);
+			err = -EINVAL;
+			goto out;
+		}
+
+		i915_retire_requests(i915); /* recycle HWSP */
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	i915_timeline_unpin(tl);
+out_free:
+	i915_timeline_put(tl);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_hwsp_recycle(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_recycle),
 		SUBTEST(live_hwsp_engine),
 		SUBTEST(live_hwsp_alternate),
+		SUBTEST(live_hwsp_wrap),
 	};
 
 	return i915_subtests(tests, i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (15 preceding siblings ...)
  2019-02-27 10:19 ` [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Tvrtko Ursulin
@ 2019-02-27 11:38 ` Patchwork
  2019-02-27 11:42 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-27 11:38 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
URL   : https://patchwork.freedesktop.org/series/57244/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
12e479dbed40 drm/i915/execlists: Suppress mere WAIT preemption
8294f8314187 drm/i915/execlists: Suppress redundant preemption
1b4cf06284a2 drm/i915: Make request allocation caches global
-:162: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#162: 
new file mode 100644

-:167: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#167: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:286: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#286: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:627: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#627: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:627: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#627: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:631: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#631: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:631: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#631: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 808 lines checked
6c4e3e58cfad drm/i915: Introduce i915_timeline.mutex
d6145821b7b8 drm/i915: Keep timeline HWSP allocated until idle across the system
91d567c1020d drm/i915: Compute the global scheduler caps
06906ce0e44f drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:335: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#335: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:337: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#337: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:338: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#338: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:339: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#339: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:340: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#340: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 300 lines checked
4466ab4a1898 drm/i915: Prioritise non-busywait semaphore workloads
85cae32da616 drm/i915/execlists: Skip direct submission if only lite-restore
cf099e9f7bbf drm/i915: Use __ffs() in for_each_priolist for more compact code

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (16 preceding siblings ...)
  2019-02-27 11:38 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3) Patchwork
@ 2019-02-27 11:42 ` Patchwork
  2019-02-27 12:03 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-02-27 13:58 ` ✓ Fi.CI.IGT: " Patchwork
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-27 11:42 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
URL   : https://patchwork.freedesktop.org/series/57244/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3581:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3578:16: warning: expression using sizeof(void)

Commit: drm/i915: Introduce i915_timeline.mutex
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915: Compute the global scheduler caps
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915/execlists: Skip direct submission if only lite-restore
Okay!

Commit: drm/i915: Use __ffs() in for_each_priolist for more compact code
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (17 preceding siblings ...)
  2019-02-27 11:42 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-27 12:03 ` Patchwork
  2019-02-27 13:58 ` ✓ Fi.CI.IGT: " Patchwork
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-27 12:03 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
URL   : https://patchwork.freedesktop.org/series/57244/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5665 -> Patchwork_12313
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/57244/revisions/3/

Known issues
------------

  Here are the changes found in Patchwork_12313 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109294]: https://bugs.freedesktop.org/show_bug.cgi?id=109294
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109527]: https://bugs.freedesktop.org/show_bug.cgi?id=109527
  [fdo#109528]: https://bugs.freedesktop.org/show_bug.cgi?id=109528
  [fdo#109530]: https://bugs.freedesktop.org/show_bug.cgi?id=109530
  [fdo#109567]: https://bugs.freedesktop.org/show_bug.cgi?id=109567


Participating hosts (41 -> 37)
------------------------------

  Additional (1): fi-icl-y 
  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 


Build changes
-------------

    * Linux: CI_DRM_5665 -> Patchwork_12313

  CI_DRM_5665: 63fb470a53f50de6cda4195ad7a2d7da249daec9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4859: 1d8f3320cbc06fa73ad1487453a63993f17b9d57 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12313: cf099e9f7bbf4bfe77b4311996cf8edfb3ef14ff @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

cf099e9f7bbf drm/i915: Use __ffs() in for_each_priolist for more compact code
85cae32da616 drm/i915/execlists: Skip direct submission if only lite-restore
4466ab4a1898 drm/i915: Prioritise non-busywait semaphore workloads
06906ce0e44f drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
91d567c1020d drm/i915: Compute the global scheduler caps
d6145821b7b8 drm/i915: Keep timeline HWSP allocated until idle across the system
6c4e3e58cfad drm/i915: Introduce i915_timeline.mutex
1b4cf06284a2 drm/i915: Make request allocation caches global
8294f8314187 drm/i915/execlists: Suppress redundant preemption
12e479dbed40 drm/i915/execlists: Suppress mere WAIT preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12313/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
  2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
                   ` (18 preceding siblings ...)
  2019-02-27 12:03 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-27 13:58 ` Patchwork
  19 siblings, 0 replies; 43+ messages in thread
From: Patchwork @ 2019-02-27 13:58 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3)
URL   : https://patchwork.freedesktop.org/series/57244/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5665_full -> Patchwork_12313_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12313_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_eio@reset-stress:
    - shard-snb:          PASS -> FAIL [fdo#109661]

  * igt@gem_exec_schedule@out-order-vebox:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] +26

  * igt@i915_pm_rpm@basic-rte:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724] +6

  * igt@i915_pm_rpm@system-suspend-modeset:
    - shard-iclb:         PASS -> FAIL [fdo#103375] +3

  * igt@kms_atomic_transition@4x-modeset-transitions-nonblocking-fencing:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +7

  * igt@kms_busy@basic-modeset-d:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +4

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-pageflip-hang-newfb-render-f:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109278] +1

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-c:
    - shard-kbl:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_chamelium@hdmi-crc-bgr565:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109284]

  * igt@kms_color@pipe-b-gamma:
    - shard-iclb:         NOTRUN -> FAIL [fdo#104782]

  * igt@kms_cursor_crc@cursor-256x256-random:
    - shard-apl:          PASS -> FAIL [fdo#103232] +2

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-iclb:         PASS -> INCOMPLETE [fdo#107713]

  * igt@kms_flip@2x-plain-flip-ts-check:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109274]

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          PASS -> FAIL [fdo#105363] +1

  * igt@kms_flip@modeset-vs-vblank-race-interruptible:
    - shard-glk:          PASS -> FAIL [fdo#103060]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
    - shard-apl:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-indfb-draw-mmap-wc:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109280] +3

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-iclb:         PASS -> FAIL [fdo#103167] +5

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-onoff:
    - shard-skl:          NOTRUN -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-shrfb-msflip-blt:
    - shard-glk:          NOTRUN -> SKIP [fdo#109271] +9

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-spr-indfb-move:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] +91

  * igt@kms_frontbuffer_tracking@psr-rgb101010-draw-mmap-wc:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] +38

  * igt@kms_plane@pixel-format-pipe-a-planes-source-clamping:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#106885] +1

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          NOTRUN -> FAIL [fdo#107815]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-glk:          NOTRUN -> FAIL [fdo#108145]

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - shard-skl:          NOTRUN -> FAIL [fdo#103166] / [fdo#107815]

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-iclb:         PASS -> FAIL [fdo#103166] +2

  * igt@kms_rotation_crc@multiplane-rotation-cropping-top:
    - shard-kbl:          PASS -> FAIL [fdo#109016] +1

  * igt@kms_setmode@basic:
    - shard-hsw:          PASS -> FAIL [fdo#99912]

  * igt@kms_universal_plane@cursor-fb-leak-pipe-f:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +3

  * igt@kms_universal_plane@universal-plane-pipe-b-functional:
    - shard-glk:          PASS -> FAIL [fdo#103166]

  
#### Possible fixes ####

  * igt@gem_busy@extended-semaphore-blt:
    - shard-iclb:         SKIP [fdo#109275] -> PASS +2
    - shard-kbl:          SKIP [fdo#109271] -> PASS +4
    - shard-glk:          SKIP [fdo#109271] -> PASS +3

  * igt@gem_busy@extended-semaphore-vebox:
    - shard-apl:          SKIP [fdo#109271] -> PASS +3
    - shard-skl:          SKIP [fdo#109271] -> PASS +3

  * igt@gem_exec_parallel@bsd:
    - shard-glk:          INCOMPLETE [fdo#103359] / [k.org#198133] -> PASS

  * igt@i915_pm_rpm@gem-evict-pwrite:
    - shard-iclb:         DMESG-WARN [fdo#107724] -> PASS +2

  * igt@i915_pm_rpm@reg-read-ioctl:
    - shard-iclb:         INCOMPLETE [fdo#107713] / [fdo#108840] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-render-a:
    - shard-kbl:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-indfb-fliptrack:
    - shard-hsw:          INCOMPLETE [fdo#103540] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-pwrite:
    - shard-apl:          FAIL [fdo#103167] -> PASS +2

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-shrfb-draw-mmap-gtt:
    - shard-glk:          FAIL [fdo#103167] -> PASS +2

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-skl:          INCOMPLETE [fdo#104108] / [fdo#107773] -> PASS

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-iclb:         FAIL [fdo#103167] -> PASS

  * igt@kms_plane@plane-position-covered-pipe-a-planes:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
    - shard-iclb:         FAIL [fdo#103166] -> PASS +1

  * igt@kms_rotation_crc@multiplane-rotation-cropping-bottom:
    - shard-kbl:          DMESG-FAIL [fdo#105763] -> PASS

  * igt@kms_setmode@basic:
    - shard-kbl:          FAIL [fdo#99912] -> PASS

  * igt@kms_vblank@pipe-c-ts-continuation-modeset-rpm:
    - shard-apl:          FAIL [fdo#104894] -> PASS

  
  [fdo#103060]: https://bugs.freedesktop.org/show_bug.cgi?id=103060
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#104894]: https://bugs.freedesktop.org/show_bug.cgi?id=104894
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#106885]: https://bugs.freedesktop.org/show_bug.cgi?id=106885
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107773]: https://bugs.freedesktop.org/show_bug.cgi?id=107773
  [fdo#107815]: https://bugs.freedesktop.org/show_bug.cgi?id=107815
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108840]: https://bugs.freedesktop.org/show_bug.cgi?id=108840
  [fdo#109016]: https://bugs.freedesktop.org/show_bug.cgi?id=109016
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109275]: https://bugs.freedesktop.org/show_bug.cgi?id=109275
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109661]: https://bugs.freedesktop.org/show_bug.cgi?id=109661
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (7 -> 7)
------------------------------

  No changes in participating hosts


Build changes
-------------

    * Linux: CI_DRM_5665 -> Patchwork_12313

  CI_DRM_5665: 63fb470a53f50de6cda4195ad7a2d7da249daec9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4859: 1d8f3320cbc06fa73ad1487453a63993f17b9d57 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12313: cf099e9f7bbf4bfe77b4311996cf8edfb3ef14ff @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12313/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/11] drm/i915: Make request allocation caches global
  2019-02-27 10:44     ` Chris Wilson
@ 2019-02-27 14:17       ` Tvrtko Ursulin
  2019-02-27 14:43         ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-27 14:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 27/02/2019 10:44, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-02-27 10:29:43)
>>
>> On 26/02/2019 10:23, Chris Wilson wrote:
>>> As kmem_caches share the same properties (size, allocation/free behaviour)
>>> for all potential devices, we can use global caches. While this
>>> potential has worse fragmentation behaviour (one can argue that
>>> different devices would have different activity lifetimes, but you can
>>> also argue that activity is temporal across the system) it is the
>>> default behaviour of the system at large to amalgamate matching caches.
>>>
>>> The benefit for us is much reduced pointer dancing along the frequent
>>> allocation paths.
>>>
>>> v2: Defer shrinking until after a global grace period for futureproofing
>>> multiple consumers of the slab caches, similar to the current strategy
>>> for avoiding shrinking too early.
>>
>> I suggested to call i915_globals_park directly from __i915_gem_park for
>> symmetry with how i915_gem_unpark calls i915_globals_unpark.
>> i915_globals has it's own delayed setup so I don't think it benefits
>> from the double indirection courtesy of being called from shrink_caches.
> 
> I replied I left that change until a later patch after the final
> conversions. Mostly so that we had a standalone patch to revert if the
> rcu_work turns out badly. In this patch, it was to be the simple
> translation over to global_shrink, except you asked for it to be truly
> global and so we needed another layer of counters.

It's a hard sell I think. Because why even have rcu work now in this 
case? You could make i915_globals_park just shrink if active counter 
dropped to zero. I don't see a benefit in a temporary asymmetric solution.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-27 11:15   ` [PATCH] " Chris Wilson
@ 2019-02-27 14:20     ` Tvrtko Ursulin
  0 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-27 14:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 27/02/2019 11:15, Chris Wilson wrote:
> In preparation for enabling HW semaphores, we need to keep in flight
> timeline HWSP alive until its use across entire system has completed,
> as any other timeline active on the GPU may still refer back to the
> already retired timeline. We both have to delay recycling available
> cachelines and unpinning old HWSP until the next idle point.
> 
> An easy option would be to simply keep all used HWSP until the system as
> a whole was idle, i.e. we could release them all at once on parking.
> However, on a busy system, we may never see a global idle point,
> essentially meaning the resource will be leaked until we are forced to
> do a GC pass. We already employ a fine-grained idle detection mechanism
> for vma, which we can reuse here so that each cacheline can be freed
> immediately after the last request using it is retired.
> 
> v3: Keep track of the activity of each cacheline.
> v4: cacheline_free() on canceling the seqno tracking
> v5: Finally with a testcase to exercise wraparound
> v6: Pack cacheline into empty bits of page-aligned vaddr
> v7: Use i915_utils to hide the pointer casting around bit manipulation
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v5
> ---
>   drivers/gpu/drm/i915/i915_request.c           |  31 +-
>   drivers/gpu/drm/i915/i915_request.h           |  11 +
>   drivers/gpu/drm/i915/i915_timeline.c          | 290 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_timeline.h          |  11 +-
>   drivers/gpu/drm/i915/i915_utils.h             |   3 +
>   .../gpu/drm/i915/selftests/i915_timeline.c    | 113 +++++++
>   6 files changed, 420 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 719d1a5ab082..d354967d6ae8 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -325,11 +325,6 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> -static u32 timeline_get_seqno(struct i915_timeline *tl)
> -{
> -	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> -}
> -
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -532,8 +527,10 @@ struct i915_request *
>   i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> -	struct i915_request *rq;
>   	struct intel_context *ce;
> +	struct i915_timeline *tl;
> +	struct i915_request *rq;
> +	u32 seqno;
>   	int ret;
>   
>   	lockdep_assert_held(&i915->drm.struct_mutex);
> @@ -610,24 +607,27 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		}
>   	}
>   
> -	rq->rcustate = get_state_synchronize_rcu();
> -
>   	INIT_LIST_HEAD(&rq->active_list);
> +
> +	tl = ce->ring->timeline;
> +	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> +	if (ret)
> +		goto err_free;
> +
>   	rq->i915 = i915;
>   	rq->engine = engine;
>   	rq->gem_context = ctx;
>   	rq->hw_context = ce;
>   	rq->ring = ce->ring;
> -	rq->timeline = ce->ring->timeline;
> +	rq->timeline = tl;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
> -	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
> +	rq->hwsp_seqno = tl->hwsp_seqno;
> +	rq->hwsp_cacheline = tl->hwsp_cacheline;
> +	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
>   
>   	spin_lock_init(&rq->lock);
> -	dma_fence_init(&rq->fence,
> -		       &i915_fence_ops,
> -		       &rq->lock,
> -		       rq->timeline->fence_context,
> -		       timeline_get_seqno(rq->timeline));
> +	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
> +		       tl->fence_context, seqno);
>   
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
> @@ -687,6 +687,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> +err_free:
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	mutex_unlock(&ce->ring->timeline->mutex);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index be3ded6bcf56..ea1e6f0ade53 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -38,6 +38,7 @@ struct drm_file;
>   struct drm_i915_gem_object;
>   struct i915_request;
>   struct i915_timeline;
> +struct i915_timeline_cacheline;
>   
>   struct i915_capture_list {
>   	struct i915_capture_list *next;
> @@ -148,6 +149,16 @@ struct i915_request {
>   	 */
>   	const u32 *hwsp_seqno;
>   
> +	/*
> +	 * If we need to access the timeline's seqno for this request in
> +	 * another request, we need to keep a read reference to this associated
> +	 * cacheline, so that we do not free and recycle it before the foriegn
> +	 * observers have completed. Hence, we keep a pointer to the cacheline
> +	 * inside the timeline's HWSP vma, but it is only valid while this
> +	 * request has not completed and guarded by the timeline mutex.
> +	 */
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
>   	/** Position in the ring of the start of the request */
>   	u32 head;
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 87a80558da28..e45e81b4cab6 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -6,19 +6,29 @@
>   
>   #include "i915_drv.h"
>   
> -#include "i915_timeline.h"
> +#include "i915_active.h"
>   #include "i915_syncmap.h"
> +#include "i915_timeline.h"
>   
>   struct i915_timeline_hwsp {
> -	struct i915_vma *vma;
> +	struct i915_gt_timelines *gt;
>   	struct list_head free_link;
> +	struct i915_vma *vma;
>   	u64 free_bitmap;
>   };
>   
> -static inline struct i915_timeline_hwsp *
> -i915_timeline_hwsp(const struct i915_timeline *tl)
> +struct i915_timeline_cacheline {
> +	struct i915_active active;
> +	struct i915_timeline_hwsp *hwsp;
> +	void *vaddr;
> +#define CACHELINE_BITS 6
> +#define CACHELINE_FREE CACHELINE_BITS
> +};
> +
> +static inline struct drm_i915_private *
> +hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
>   {
> -	return tl->hwsp_ggtt->private;
> +	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
>   }
>   
>   static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
> @@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   		vma->private = hwsp;
>   		hwsp->vma = vma;
>   		hwsp->free_bitmap = ~0ull;
> +		hwsp->gt = gt;
>   
>   		spin_lock(&gt->hwsp_lock);
>   		list_add(&hwsp->free_link, &gt->hwsp_free_list);
> @@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
>   	return hwsp->vma;
>   }
>   
> -static void hwsp_free(struct i915_timeline *timeline)
> +static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
>   {
> -	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
> -	struct i915_timeline_hwsp *hwsp;
> -
> -	hwsp = i915_timeline_hwsp(timeline);
> -	if (!hwsp) /* leave global HWSP alone! */
> -		return;
> +	struct i915_gt_timelines *gt = hwsp->gt;
>   
>   	spin_lock(&gt->hwsp_lock);
>   
> @@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	if (!hwsp->free_bitmap)
>   		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
>   
> -	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
> +	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
> +	hwsp->free_bitmap |= BIT_ULL(cacheline);
>   
>   	/* And if no one is left using it, give the page back to the system */
>   	if (hwsp->free_bitmap == ~0ull) {
> @@ -115,6 +122,76 @@ static void hwsp_free(struct i915_timeline *timeline)
>   	spin_unlock(&gt->hwsp_lock);
>   }
>   
> +static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
> +
> +	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
> +	i915_vma_put(cl->hwsp->vma);
> +	__idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
> +
> +	i915_active_fini(&cl->active);
> +	kfree(cl);
> +}
> +
> +static void __cacheline_retire(struct i915_active *active)
> +{
> +	struct i915_timeline_cacheline *cl =
> +		container_of(active, typeof(*cl), active);
> +
> +	i915_vma_unpin(cl->hwsp->vma);
> +	if (ptr_test_bit(cl->vaddr, CACHELINE_FREE))
> +		__idle_cacheline_free(cl);
> +}
> +
> +static struct i915_timeline_cacheline *
> +cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	void *vaddr;
> +
> +	GEM_BUG_ON(cacheline >= BIT(CACHELINE_BITS));
> +
> +	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
> +	if (!cl)
> +		return ERR_PTR(-ENOMEM);
> +
> +	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
> +	if (IS_ERR(vaddr)) {
> +		kfree(cl);
> +		return ERR_CAST(vaddr);
> +	}
> +
> +	i915_vma_get(hwsp->vma);
> +	cl->hwsp = hwsp;
> +	cl->vaddr = page_pack_bits(vaddr, cacheline);
> +
> +	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
> +
> +	return cl;
> +}
> +
> +static void cacheline_acquire(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl && i915_active_acquire(&cl->active))
> +		__i915_vma_pin(cl->hwsp->vma);
> +}
> +
> +static void cacheline_release(struct i915_timeline_cacheline *cl)
> +{
> +	if (cl)
> +		i915_active_release(&cl->active);
> +}
> +
> +static void cacheline_free(struct i915_timeline_cacheline *cl)
> +{
> +	GEM_BUG_ON(ptr_test_bit(cl->vaddr, CACHELINE_FREE));
> +	cl->vaddr = ptr_set_bit(cl->vaddr, CACHELINE_FREE);
> +
> +	if (i915_active_is_idle(&cl->active))
> +		__idle_cacheline_free(cl);
> +}
> +
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *timeline,
>   		       const char *name,
> @@ -136,29 +213,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->name = name;
>   	timeline->pin_count = 0;
>   	timeline->has_initial_breadcrumb = !hwsp;
> +	timeline->hwsp_cacheline = NULL;
>   
> -	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
>   	if (!hwsp) {
> +		struct i915_timeline_cacheline *cl;
>   		unsigned int cacheline;
>   
>   		hwsp = hwsp_alloc(timeline, &cacheline);
>   		if (IS_ERR(hwsp))
>   			return PTR_ERR(hwsp);
>   
> +		cl = cacheline_alloc(hwsp->private, cacheline);
> +		if (IS_ERR(cl)) {
> +			__idle_hwsp_free(hwsp->private, cacheline);
> +			return PTR_ERR(cl);
> +		}
> +
> +		timeline->hwsp_cacheline = cl;
>   		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
> -	}
> -	timeline->hwsp_ggtt = i915_vma_get(hwsp);
>   
> -	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> -	if (IS_ERR(vaddr)) {
> -		hwsp_free(timeline);
> -		i915_vma_put(hwsp);
> -		return PTR_ERR(vaddr);
> +		vaddr = page_mask_bits(cl->vaddr);
> +	} else {
> +		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
> +
> +		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
> +		if (IS_ERR(vaddr))
> +			return PTR_ERR(vaddr);
>   	}
>   
>   	timeline->hwsp_seqno =
>   		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
>   
> +	timeline->hwsp_ggtt = i915_vma_get(hwsp);
> +	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
> +
>   	timeline->fence_context = dma_fence_context_alloc(1);
>   
>   	spin_lock_init(&timeline->lock);
> @@ -240,9 +328,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
>   
>   	i915_syncmap_free(&timeline->sync);
> -	hwsp_free(timeline);
>   
> -	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +	if (timeline->hwsp_cacheline)
> +		cacheline_free(timeline->hwsp_cacheline);
> +	else
> +		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
> +
>   	i915_vma_put(timeline->hwsp_ggtt);
>   }
>   
> @@ -285,6 +376,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   		i915_ggtt_offset(tl->hwsp_ggtt) +
>   		offset_in_page(tl->hwsp_offset);
>   
> +	cacheline_acquire(tl->hwsp_cacheline);
>   	timeline_add_to_active(tl);
>   
>   	return 0;
> @@ -294,6 +386,157 @@ int i915_timeline_pin(struct i915_timeline *tl)
>   	return err;
>   }
>   
> +static u32 timeline_advance(struct i915_timeline *tl)
> +{
> +	GEM_BUG_ON(!tl->pin_count);
> +	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
> +
> +	return tl->seqno += 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static void timeline_rollback(struct i915_timeline *tl)
> +{
> +	tl->seqno -= 1 + tl->has_initial_breadcrumb;
> +}
> +
> +static noinline int
> +__i915_timeline_get_seqno(struct i915_timeline *tl,
> +			  struct i915_request *rq,
> +			  u32 *seqno)
> +{
> +	struct i915_timeline_cacheline *cl;
> +	unsigned int cacheline;
> +	struct i915_vma *vma;
> +	void *vaddr;
> +	int err;
> +
> +	/*
> +	 * If there is an outstanding GPU reference to this cacheline,
> +	 * such as it being sampled by a HW semaphore on another timeline,
> +	 * we cannot wraparound our seqno value (the HW semaphore does
> +	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
> +	 * So if the cacheline is still busy, we must detach ourselves
> +	 * from it and leave it inflight alongside its users.
> +	 *
> +	 * However, if nobody is watching and we can guarantee that nobody
> +	 * will, we could simply reuse the same cacheline.
> +	 *
> +	 * if (i915_active_request_is_signaled(&tl->last_request) &&
> +	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
> +	 *	return 0;
> +	 *
> +	 * That seems unlikely for a busy timeline that needed to wrap in
> +	 * the first place, so just replace the cacheline.
> +	 */
> +
> +	vma = hwsp_alloc(tl, &cacheline);
> +	if (IS_ERR(vma)) {
> +		err = PTR_ERR(vma);
> +		goto err_rollback;
> +	}
> +
> +	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
> +	if (err) {
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_rollback;
> +	}
> +
> +	cl = cacheline_alloc(vma->private, cacheline);
> +	if (IS_ERR(cl)) {
> +		err = PTR_ERR(cl);
> +		__idle_hwsp_free(vma->private, cacheline);
> +		goto err_unpin;
> +	}
> +	GEM_BUG_ON(cl->hwsp->vma != vma);
> +
> +	/*
> +	 * Attach the old cacheline to the current request, so that we only
> +	 * free it after the current request is retired, which ensures that
> +	 * all writes into the cacheline from previous requests are complete.
> +	 */
> +	err = i915_active_ref(&tl->hwsp_cacheline->active,
> +			      tl->fence_context, rq);
> +	if (err)
> +		goto err_cacheline;
> +
> +	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
> +	cacheline_free(tl->hwsp_cacheline);
> +
> +	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
> +	i915_vma_put(tl->hwsp_ggtt);
> +
> +	tl->hwsp_ggtt = i915_vma_get(vma);
> +
> +	vaddr = page_mask_bits(cl->vaddr);
> +	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
> +	tl->hwsp_seqno =
> +		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
> +
> +	tl->hwsp_offset += i915_ggtt_offset(vma);
> +
> +	cacheline_acquire(cl);
> +	tl->hwsp_cacheline = cl;
> +
> +	*seqno = timeline_advance(tl);
> +	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
> +	return 0;
> +
> +err_cacheline:
> +	cacheline_free(cl);
> +err_unpin:
> +	i915_vma_unpin(vma);
> +err_rollback:
> +	timeline_rollback(tl);
> +	return err;
> +}
> +
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno)
> +{
> +	*seqno = timeline_advance(tl);
> +
> +	/* Replace the HWSP on wraparound for HW semaphores */
> +	if (unlikely(!*seqno && tl->hwsp_cacheline))
> +		return __i915_timeline_get_seqno(tl, rq, seqno);
> +
> +	return 0;
> +}
> +
> +static int cacheline_ref(struct i915_timeline_cacheline *cl,
> +			 struct i915_request *rq)
> +{
> +	return i915_active_ref(&cl->active, rq->fence.context, rq);
> +}
> +
> +int i915_timeline_read_hwsp(struct i915_request *from,
> +			    struct i915_request *to,
> +			    u32 *hwsp)
> +{
> +	struct i915_timeline_cacheline *cl = from->hwsp_cacheline;
> +	struct i915_timeline *tl = from->timeline;
> +	int err;
> +
> +	GEM_BUG_ON(to->timeline == tl);
> +
> +	mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING);
> +	err = i915_request_completed(from);
> +	if (!err)
> +		err = cacheline_ref(cl, to);
> +	if (!err) {
> +		if (likely(cl == tl->hwsp_cacheline)) {
> +			*hwsp = tl->hwsp_offset;
> +		} else { /* across a seqno wrap, recover the original offset */
> +			*hwsp = i915_ggtt_offset(cl->hwsp->vma) +
> +				ptr_unmask_bits(cl->vaddr, CACHELINE_BITS) *
> +				CACHELINE_BYTES;
> +		}
> +	}
> +	mutex_unlock(&tl->mutex);
> +
> +	return err;
> +}
> +
>   void i915_timeline_unpin(struct i915_timeline *tl)
>   {
>   	GEM_BUG_ON(!tl->pin_count);
> @@ -301,6 +544,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
>   		return;
>   
>   	timeline_remove_from_active(tl);
> +	cacheline_release(tl->hwsp_cacheline);
>   
>   	/*
>   	 * Since this timeline is idle, all bariers upon which we were waiting
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 36c3849f7108..60b1dfad93ed 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -34,7 +34,7 @@
>   #include "i915_utils.h"
>   
>   struct i915_vma;
> -struct i915_timeline_hwsp;
> +struct i915_timeline_cacheline;
>   
>   struct i915_timeline {
>   	u64 fence_context;
> @@ -51,6 +51,8 @@ struct i915_timeline {
>   	struct i915_vma *hwsp_ggtt;
>   	u32 hwsp_offset;
>   
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
>   	bool has_initial_breadcrumb;
>   
>   	/**
> @@ -162,8 +164,15 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
>   }
>   
>   int i915_timeline_pin(struct i915_timeline *tl);
> +int i915_timeline_get_seqno(struct i915_timeline *tl,
> +			    struct i915_request *rq,
> +			    u32 *seqno);
>   void i915_timeline_unpin(struct i915_timeline *tl);
>   
> +int i915_timeline_read_hwsp(struct i915_request *from,
> +			    struct i915_request *until,
> +			    u32 *hwsp_offset);
> +
>   void i915_timelines_init(struct drm_i915_private *i915);
>   void i915_timelines_park(struct drm_i915_private *i915);
>   void i915_timelines_fini(struct drm_i915_private *i915);
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 9726df37c4c4..1fb9c8b2f5b3 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -92,6 +92,9 @@
>   	((typeof(ptr))((unsigned long)(ptr) | __bits));			\
>   })
>   
> +#define ptr_set_bit(ptr, bit) ((typeof(ptr))((unsigned long)(ptr) | BIT(bit)))
> +#define ptr_test_bit(ptr, bit) ((unsigned long)(ptr) & BIT(bit))
> +
>   #define page_mask_bits(ptr) ptr_mask_bits(ptr, PAGE_SHIFT)
>   #define page_unmask_bits(ptr) ptr_unmask_bits(ptr, PAGE_SHIFT)
>   #define page_pack_bits(ptr, bits) ptr_pack_bits(ptr, bits, PAGE_SHIFT)
> diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> index 12ea69b1a1e5..844701759ffc 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> @@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
>   #undef NUM_TIMELINES
>   }
>   
> +static int live_hwsp_wrap(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_timeline *tl;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = 0;
> +
> +	/*
> +	 * Across a seqno wrap, we need to keep the old cacheline alive for
> +	 * foreign GPU references.
> +	 */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	tl = i915_timeline_create(i915, __func__, NULL);
> +	if (IS_ERR(tl)) {
> +		err = PTR_ERR(tl);
> +		goto out_rpm;
> +	}
> +	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
> +		goto out_free;
> +
> +	err = i915_timeline_pin(tl);
> +	if (err)
> +		goto out_free;
> +
> +	for_each_engine(engine, i915, id) {
> +		const u32 *hwsp_seqno[2];
> +		struct i915_request *rq;
> +		u32 seqno[2];
> +
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		rq = i915_request_alloc(engine, i915->kernel_context);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		tl->seqno = -4u;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
> +			 seqno[0], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[0] = tl->hwsp_seqno;
> +
> +		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
> +			 seqno[1], tl->hwsp_offset);
> +
> +		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
> +		if (err) {
> +			i915_request_add(rq);
> +			goto out;
> +		}
> +		hwsp_seqno[1] = tl->hwsp_seqno;
> +
> +		/* With wrap should come a new hwsp */
> +		GEM_BUG_ON(seqno[1] >= seqno[0]);
> +		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
> +
> +		i915_request_add(rq);
> +
> +		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
> +			pr_err("Wait for timeline writes timed out!\n");
> +			err = -EIO;
> +			goto out;
> +		}
> +
> +		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
> +			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
> +			       *hwsp_seqno[0], *hwsp_seqno[1],
> +			       seqno[0], seqno[1]);
> +			err = -EINVAL;
> +			goto out;
> +		}
> +
> +		i915_retire_requests(i915); /* recycle HWSP */
> +	}
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	i915_timeline_unpin(tl);
> +out_free:
> +	i915_timeline_put(tl);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	return err;
> +}
> +
>   static int live_hwsp_recycle(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_hwsp_recycle),
>   		SUBTEST(live_hwsp_engine),
>   		SUBTEST(live_hwsp_alternate),
> +		SUBTEST(live_hwsp_wrap),
>   	};
>   
>   	return i915_subtests(tests, i915);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 04/11] drm/i915: Make request allocation caches global
  2019-02-27 14:17       ` Tvrtko Ursulin
@ 2019-02-27 14:43         ` Chris Wilson
  0 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-27 14:43 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-27 14:17:25)
> 
> On 27/02/2019 10:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-02-27 10:29:43)
> >>
> >> On 26/02/2019 10:23, Chris Wilson wrote:
> >>> As kmem_caches share the same properties (size, allocation/free behaviour)
> >>> for all potential devices, we can use global caches. While this
> >>> potential has worse fragmentation behaviour (one can argue that
> >>> different devices would have different activity lifetimes, but you can
> >>> also argue that activity is temporal across the system) it is the
> >>> default behaviour of the system at large to amalgamate matching caches.
> >>>
> >>> The benefit for us is much reduced pointer dancing along the frequent
> >>> allocation paths.
> >>>
> >>> v2: Defer shrinking until after a global grace period for futureproofing
> >>> multiple consumers of the slab caches, similar to the current strategy
> >>> for avoiding shrinking too early.
> >>
> >> I suggested to call i915_globals_park directly from __i915_gem_park for
> >> symmetry with how i915_gem_unpark calls i915_globals_unpark.
> >> i915_globals has it's own delayed setup so I don't think it benefits
> >> from the double indirection courtesy of being called from shrink_caches.
> > 
> > I replied I left that change until a later patch after the final
> > conversions. Mostly so that we had a standalone patch to revert if the
> > rcu_work turns out badly. In this patch, it was to be the simple
> > translation over to global_shrink, except you asked for it to be truly
> > global and so we needed another layer of counters.
> 
> It's a hard sell I think. Because why even have rcu work now in this 
> case? You could make i915_globals_park just shrink if active counter 
> dropped to zero. I don't see a benefit in a temporary asymmetric solution.

I did do just that in v1!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code
  2019-02-26 10:24 ` [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code Chris Wilson
@ 2019-02-28  7:42   ` Tvrtko Ursulin
  0 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28  7:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:24, Chris Wilson wrote:
> Gcc has a slight preference if we use __ffs() to subtract one from the
> index once rather than each use:
> 
> __execlists_submission_tasklet              2867    2847     -20
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_scheduler.h | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 24c2c027fd2c..068a6750540f 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -100,9 +100,11 @@ struct i915_priolist {
>   		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
>   
>   #define priolist_for_each_request_consume(it, n, plist, idx) \
> -	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> +	for (; \
> +	     (plist)->used ? (idx = __ffs((plist)->used)), 1 : 0; \
> +	     (plist)->used &= ~BIT(idx)) \
>   		list_for_each_entry_safe(it, n, \
> -					 &(plist)->requests[idx - 1], \
> +					 &(plist)->requests[idx], \
>   					 sched.link)
>   
>   void i915_sched_node_init(struct i915_sched_node *node);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex
  2019-02-26 10:23 ` [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex Chris Wilson
@ 2019-02-28  7:43   ` Tvrtko Ursulin
  2019-02-28  8:09     ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28  7:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:23, Chris Wilson wrote:
> A simple mutex used for guarding the flow of requests in and out of the
> timeline. In the short-term, it will be used only to guard the addition
> of requests into the timeline, taken on alloc and released on commit so
> that only one caller can construct a request into the timeline
> (important as the seqno and ring pointers must be serialised). This will
> be used by observers to ensure that the seqno/hwsp is stable. Later,
> when we have reduced retiring to only operate on a single timeline at a
> time, we can then use the mutex as the sole guard required for retiring.

At which point does this gets used? In media scalability patches or later?

Regards,

Tvrtko


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c            | 6 +++++-
>   drivers/gpu/drm/i915/i915_timeline.c           | 1 +
>   drivers/gpu/drm/i915/i915_timeline.h           | 2 ++
>   drivers/gpu/drm/i915/selftests/i915_request.c  | 4 +---
>   drivers/gpu/drm/i915/selftests/mock_timeline.c | 1 +
>   5 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c65f6c990fdd..719d1a5ab082 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -563,6 +563,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		return ERR_CAST(ce);
>   
>   	reserve_gt(i915);
> +	mutex_lock(&ce->ring->timeline->mutex);
>   
>   	/* Move our oldest request to the slab-cache (if not in use!) */
>   	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
> @@ -688,6 +689,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
> +	mutex_unlock(&ce->ring->timeline->mutex);
>   	unreserve_gt(i915);
>   	intel_context_unpin(ce);
>   	return ERR_PTR(ret);
> @@ -880,7 +882,7 @@ void i915_request_add(struct i915_request *request)
>   	GEM_TRACE("%s fence %llx:%lld\n",
>   		  engine->name, request->fence.context, request->fence.seqno);
>   
> -	lockdep_assert_held(&request->i915->drm.struct_mutex);
> +	lockdep_assert_held(&request->timeline->mutex);
>   	trace_i915_request_add(request);
>   
>   	/*
> @@ -991,6 +993,8 @@ void i915_request_add(struct i915_request *request)
>   	 */
>   	if (prev && i915_request_completed(prev))
>   		i915_request_retire_upto(prev);
> +
> +	mutex_unlock(&request->timeline->mutex);
>   }
>   
>   static unsigned long local_clock_us(unsigned int *cpu)
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index b2202d2e58a2..87a80558da28 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -162,6 +162,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->fence_context = dma_fence_context_alloc(1);
>   
>   	spin_lock_init(&timeline->lock);
> +	mutex_init(&timeline->mutex);
>   
>   	INIT_ACTIVE_REQUEST(&timeline->barrier);
>   	INIT_ACTIVE_REQUEST(&timeline->last_request);
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 7bec7d2e45bf..36c3849f7108 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -44,6 +44,8 @@ struct i915_timeline {
>   #define TIMELINE_CLIENT 0 /* default subclass */
>   #define TIMELINE_ENGINE 1
>   
> +	struct mutex mutex; /* protects the flow of requests */
> +
>   	unsigned int pin_count;
>   	const u32 *hwsp_seqno;
>   	struct i915_vma *hwsp_ggtt;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 7da52e3d67af..7e1b65b8eb19 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -141,14 +141,12 @@ static int igt_fence_wait(void *arg)
>   		err = -ENOMEM;
>   		goto out_locked;
>   	}
> -	mutex_unlock(&i915->drm.struct_mutex); /* safe as we are single user */
>   
>   	if (dma_fence_wait_timeout(&request->fence, false, T) != -ETIME) {
>   		pr_err("fence wait success before submit (expected timeout)!\n");
> -		goto out_device;
> +		goto out_locked;
>   	}
>   
> -	mutex_lock(&i915->drm.struct_mutex);
>   	i915_request_add(request);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> index d2de9ece2118..416d85233263 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> @@ -14,6 +14,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
>   	timeline->fence_context = context;
>   
>   	spin_lock_init(&timeline->lock);
> +	mutex_init(&timeline->mutex);
>   
>   	INIT_ACTIVE_REQUEST(&timeline->barrier);
>   	INIT_ACTIVE_REQUEST(&timeline->last_request);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 07/11] drm/i915: Compute the global scheduler caps
  2019-02-26 10:24 ` [PATCH 07/11] drm/i915: Compute the global scheduler caps Chris Wilson
@ 2019-02-28  7:45   ` Tvrtko Ursulin
  0 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28  7:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:24, Chris Wilson wrote:
> Do a pass over all the engines upon starting to determine the global
> scheduler capability flags (those that are agreed upon by all).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c         |  2 ++
>   drivers/gpu/drm/i915/intel_engine_cs.c  | 39 +++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>   4 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 713ed6fbdcc8..f6fe10fce0ec 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4700,6 +4700,8 @@ static int __i915_gem_restart_engines(void *data)
>   		}
>   	}
>   
> +	intel_engines_set_scheduler_caps(i915);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index b7b626195eda..ef49b1b0537b 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -608,6 +608,45 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
>   	return err;
>   }
>   
> +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
> +{
> +	static const struct {
> +		u8 engine;
> +		u8 sched;
> +	} map[] = {
> +#define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
> +		MAP(PREEMPTION, PREEMPTION),
> +#undef MAP
> +	};
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	u32 enabled, disabled;
> +
> +	enabled = 0;
> +	disabled = 0;
> +	for_each_engine(engine, i915, id) { /* all engines must agree! */
> +		int i;
> +
> +		if (engine->schedule)
> +			enabled |= (I915_SCHEDULER_CAP_ENABLED |
> +				    I915_SCHEDULER_CAP_PRIORITY);
> +		else
> +			disabled |= (I915_SCHEDULER_CAP_ENABLED |
> +				     I915_SCHEDULER_CAP_PRIORITY);
> +
> +		for (i = 0; i < ARRAY_SIZE(map); i++) {
> +			if (engine->flags & BIT(map[i].engine))
> +				enabled |= BIT(map[i].sched);
> +			else
> +				disabled |= BIT(map[i].sched);
> +		}
> +	}
> +
> +	i915->caps.scheduler = enabled & ~disabled;
> +	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
> +		i915->caps.scheduler = 0;
> +}
> +
>   static void __intel_context_unpin(struct i915_gem_context *ctx,
>   				  struct intel_engine_cs *engine)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 29b2a2f34edb..f57cfe2fc078 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2327,12 +2327,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -
> -	engine->i915->caps.scheduler =
> -		I915_SCHEDULER_CAP_ENABLED |
> -		I915_SCHEDULER_CAP_PRIORITY;
> -	if (intel_engine_has_preemption(engine))
> -		engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION;
>   }
>   
>   static void
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 5284f243931a..b8ec7e40a59b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -576,6 +576,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
> +
>   static inline bool __execlists_need_preempt(int prio, int last)
>   {
>   	/*
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex
  2019-02-28  7:43   ` Tvrtko Ursulin
@ 2019-02-28  8:09     ` Chris Wilson
  0 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-02-28  8:09 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-28 07:43:41)
> 
> On 26/02/2019 10:23, Chris Wilson wrote:
> > A simple mutex used for guarding the flow of requests in and out of the
> > timeline. In the short-term, it will be used only to guard the addition
> > of requests into the timeline, taken on alloc and released on commit so
> > that only one caller can construct a request into the timeline
> > (important as the seqno and ring pointers must be serialised). This will
> > be used by observers to ensure that the seqno/hwsp is stable. Later,
> > when we have reduced retiring to only operate on a single timeline at a
> > time, we can then use the mutex as the sole guard required for retiring.
> 
> At which point does this gets used? In media scalability patches or later?

It is in this series to explicitly lock i915_timeline_read_hswp(). It's
real calling is in removing the global ordering, as then request
alloc/retirement is under each timeline lock, and nothing but the
timeline locks.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-26 10:24 ` [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-02-28 10:49   ` Tvrtko Ursulin
  0 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28 10:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:24, Chris Wilson wrote:
> Having introduced per-context seqno, we now have a means to identity
> progress across the system without feel of rollback as befell the
> global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
> advance of submission safe in the knowledge that our target seqno and
> address is stable.
> 
> However, since we are telling the GPU to busy-spin on the target address
> until it matches the signaling seqno, we only want to do so when we are
> sure that busy-spin will be completed quickly. To achieve this we only
> submit the request to HW once the signaler is itself executing (modulo
> preemption causing us to wait longer), and we only do so for default and
> above priority requests (so that idle priority tasks never themselves
> hog the GPU waiting for others).
> 
> As might be reasonably expected, HW semaphores excel in inter-engine
> synchronisation microbenchmarks (where the reduced latency / increased
> throughput more than offset the power cost of spinning on a second ring)
> and have significant improvement for single clients that utilize multiple
> engines (typically media players), without regressing multiple clients
> that can saturate the system.
> 
> v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
> v4: Tell the world and include it as part of scheduler caps.

I think I asked for a table of results showing power, and power/perf to 
be put in the commit message.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c           |   2 +-
>   drivers/gpu/drm/i915/i915_request.c       | 138 +++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_request.h       |   1 +
>   drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
>   drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
>   drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
>   drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
>   drivers/gpu/drm/i915/intel_lrc.c          |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
>   include/uapi/drm/i915_drm.h               |   1 +
>   10 files changed, 158 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index c6354f6cdbdb..c08abdef5eb6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -351,7 +351,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>   		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
>   		break;
>   	case I915_PARAM_HAS_SEMAPHORES:
> -		value = 0;
> +		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
>   		break;
>   	case I915_PARAM_HAS_SECURE_BATCHES:
>   		value = capable(CAP_SYS_ADMIN);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d354967d6ae8..59e30b8c4ee9 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -22,8 +22,9 @@
>    *
>    */
>   
> -#include <linux/prefetch.h>
>   #include <linux/dma-fence-array.h>
> +#include <linux/irq_work.h>
> +#include <linux/prefetch.h>
>   #include <linux/sched.h>
>   #include <linux/sched/clock.h>
>   #include <linux/sched/signal.h>
> @@ -32,9 +33,16 @@
>   #include "i915_active.h"
>   #include "i915_reset.h"
>   
> +struct execute_cb {
> +	struct list_head link;
> +	struct irq_work work;
> +	struct i915_sw_fence *fence;
> +};
> +
>   static struct i915_global_request {
>   	struct kmem_cache *slab_requests;
>   	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_execute_cbs;
>   } global;
>   
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
> @@ -325,6 +333,69 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> +static void irq_execute_cb(struct irq_work *wrk)
> +{
> +	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
> +
> +	i915_sw_fence_complete(cb->fence);
> +	kmem_cache_free(global.slab_execute_cbs, cb);
> +}
> +
> +static void __notify_execute_cb(struct i915_request *rq)
> +{
> +	struct execute_cb *cb;
> +
> +	lockdep_assert_held(&rq->lock);
> +
> +	if (list_empty(&rq->execute_cb))
> +		return;
> +
> +	list_for_each_entry(cb, &rq->execute_cb, link)
> +		irq_work_queue(&cb->work);
> +
> +	/*
> +	 * XXX Rollback on __i915_request_unsubmit()
> +	 *
> +	 * In the future, perhaps when we have an active time-slicing scheduler,
> +	 * it will be interesting to unsubmit parallel execution and remove
> +	 * busywaits from the GPU until their master is restarted. This is
> +	 * quite hairy, we have to carefully rollback the fence and do a
> +	 * preempt-to-idle cycle on the target engine, all the while the
> +	 * master execute_cb may refire.
> +	 */
> +	INIT_LIST_HEAD(&rq->execute_cb);
> +}
> +
> +static int
> +i915_request_await_execution(struct i915_request *rq,
> +			     struct i915_request *signal,
> +			     gfp_t gfp)
> +{
> +	struct execute_cb *cb;
> +
> +	if (i915_request_is_active(signal))
> +		return 0;
> +
> +	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
> +	if (!cb)
> +		return -ENOMEM;
> +
> +	cb->fence = &rq->submit;
> +	i915_sw_fence_await(cb->fence);
> +	init_irq_work(&cb->work, irq_execute_cb);
> +
> +	spin_lock_irq(&signal->lock);
> +	if (i915_request_is_active(signal)) {
> +		i915_sw_fence_complete(cb->fence);
> +		kmem_cache_free(global.slab_execute_cbs, cb);
> +	} else {
> +		list_add_tail(&cb->link, &signal->execute_cb);
> +	}
> +	spin_unlock_irq(&signal->lock);
> +
> +	return 0;
> +}
> +
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -361,6 +432,8 @@ void __i915_request_submit(struct i915_request *request)
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
>   
> +	__notify_execute_cb(request);
> +
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> @@ -608,6 +681,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	}
>   
>   	INIT_LIST_HEAD(&rq->active_list);
> +	INIT_LIST_HEAD(&rq->execute_cb);
>   
>   	tl = ce->ring->timeline;
>   	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> @@ -696,6 +770,52 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	return ERR_PTR(ret);
>   }
>   
> +static int
> +emit_semaphore_wait(struct i915_request *to,
> +		    struct i915_request *from,
> +		    gfp_t gfp)
> +{
> +	u32 hwsp_offset;
> +	u32 *cs;
> +	int err;
> +
> +	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
> +	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
> +
> +	/* We need to pin the signaler's HWSP until we are finished reading. */
> +	err = i915_timeline_read_hwsp(from, to, &hwsp_offset);
> +	if (err)
> +		return err;
> +
> +	/* Only submit our spinner after the signaler is running! */
> +	err = i915_request_await_execution(to, from, gfp);
> +	if (err)
> +		return err;
> +
> +	cs = intel_ring_begin(to, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	/*
> +	 * Using greater-than-or-equal here means we have to worry
> +	 * about seqno wraparound. To side step that issue, we swap
> +	 * the timeline HWSP upon wrapping, so that everyone listening
> +	 * for the old (pre-wrap) values do not see the much smaller
> +	 * (post-wrap) values than they were expecting (and so wait
> +	 * forever).
> +	 */
> +	*cs++ = MI_SEMAPHORE_WAIT |
> +		MI_SEMAPHORE_GLOBAL_GTT |
> +		MI_SEMAPHORE_POLL |
> +		MI_SEMAPHORE_SAD_GTE_SDD;
> +	*cs++ = from->fence.seqno;
> +	*cs++ = hwsp_offset;
> +	*cs++ = 0;
> +
> +	intel_ring_advance(to, cs);
> +	return 0;
> +}
> +
>   static int
>   i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   {
> @@ -717,6 +837,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
>   						       &from->submit,
>   						       I915_FENCE_GFP);
> +	} else if (intel_engine_has_semaphores(to->engine) &&
> +		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
> +		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
>   	} else {
>   		ret = i915_sw_fence_await_dma_fence(&to->submit,
>   						    &from->fence, 0,
> @@ -1208,14 +1331,23 @@ int __init i915_global_request_init(void)
>   	if (!global.slab_requests)
>   		return -ENOMEM;
>   
> +	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
> +					     SLAB_HWCACHE_ALIGN |
> +					     SLAB_RECLAIM_ACCOUNT |
> +					     SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_execute_cbs)
> +		goto err_requests;
> +
>   	global.slab_dependencies = KMEM_CACHE(i915_dependency,
>   					      SLAB_HWCACHE_ALIGN |
>   					      SLAB_RECLAIM_ACCOUNT);
>   	if (!global.slab_dependencies)
> -		goto err_requests;
> +		goto err_execute_cbs;
>   
>   	return 0;
>   
> +err_execute_cbs:
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   err_requests:
>   	kmem_cache_destroy(global.slab_requests);
>   	return -ENOMEM;
> @@ -1224,11 +1356,13 @@ int __init i915_global_request_init(void)
>   void i915_global_request_shrink(void)
>   {
>   	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_execute_cbs);
>   	kmem_cache_shrink(global.slab_requests);
>   }
>   
>   void i915_global_request_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   	kmem_cache_destroy(global.slab_requests);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index ea1e6f0ade53..3e0d62d35226 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -129,6 +129,7 @@ struct i915_request {
>   	 */
>   	struct i915_sw_fence submit;
>   	wait_queue_entry_t submitq;
> +	struct list_head execute_cb;
>   
>   	/*
>   	 * A list of everyone we wait upon, and everyone who waits upon us.
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 7c58b049ecb5..8d1400d378d7 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
>   	__i915_sw_fence_notify(fence, FENCE_FREE);
>   }
>   
> -static void i915_sw_fence_complete(struct i915_sw_fence *fence)
> +void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   
> @@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   	__i915_sw_fence_complete(fence, NULL);
>   }
>   
> -static void i915_sw_fence_await(struct i915_sw_fence *fence)
> +void i915_sw_fence_await(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 0e055ea0179f..6dec9e1d1102 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    unsigned long timeout,
>   				    gfp_t gfp);
>   
> +void i915_sw_fence_await(struct i915_sw_fence *fence);
> +void i915_sw_fence_complete(struct i915_sw_fence *fence);
> +
>   static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
>   {
>   	return atomic_read(&fence->pending) <= 0;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index ef49b1b0537b..7e0fe433ab17 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -616,6 +616,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
>   	} map[] = {
>   #define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
>   		MAP(PREEMPTION, PREEMPTION),
> +		MAP(SEMAPHORES, SEMAPHORES),
>   #undef MAP
>   	};
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> index b96a31bc1080..0efaadd3bc32 100644
> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> @@ -106,7 +106,12 @@
>   #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
>   #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
>   #define   MI_SEMAPHORE_POLL		(1<<15)
> +#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
>   #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
> +#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
> +#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
> +#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
> +#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
>   #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
>   #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
>   #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f57cfe2fc078..53d6f7fdb50e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2324,6 +2324,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->park = NULL;
>   	engine->unpark = NULL;
>   
> +	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b8ec7e40a59b..2e7264119ec4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -499,6 +499,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
>   	unsigned int flags;
>   
>   	/*
> @@ -576,6 +577,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +static inline bool
> +intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> +}
> +
>   void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
>   
>   static inline bool __execlists_need_preempt(int prio, int last)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 8304a7f1ec3f..b10eea3f6d24 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -479,6 +479,7 @@ typedef struct drm_i915_irq_wait {
>   #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
>   #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
>   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
> +#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
>   
>   #define I915_PARAM_HUC_STATUS		 42
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-26 10:23 ` [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
@ 2019-02-28 12:33   ` Tvrtko Ursulin
  0 siblings, 0 replies; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28 12:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Matthew Auld


On 26/02/2019 10:23, Chris Wilson wrote:
> WAIT is occasionally suppressed by virtue of preempted requests being
> promoted to NEWCLIENT if they have not all ready received that boost.
> Make this consistent for all WAIT boosts that they are not allowed to
> preempt executing contexts and are merely granted the right to be at the
> front of the queue for the next execution slot. This is in keeping with
> the desire that the WAIT boost be a minor tweak that does not give
> excessive promotion to its user and open ourselves to trivial abuse.
> 
> The problem with the inconsistent WAIT preemption becomes more apparent
> as the preemption is propagated across the engines, where one engine may
> preempt and the other not, and we be relying on the exact execution
> order being consistent across engines (e.g. using HW semaphores to
> coordinate parallel execution).
> 
> v2: Also protect GuC submission from false preemption loops.
> v3: Build bug safeguards and better debug messages for st.
> v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
> unwind), applying it earlier during submit causes out-of-order execution
> combined with execute fences.
> v5: Call sw_fence_fini for our dummy request (Matthew)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> #v3

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Cc: Matthew Auld <matthew.auld@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c         |  15 ++
>   drivers/gpu/drm/i915/i915_scheduler.c       |   1 -
>   drivers/gpu/drm/i915/i915_scheduler.h       |   2 +
>   drivers/gpu/drm/i915/intel_guc_submission.c |   2 +-
>   drivers/gpu/drm/i915/intel_lrc.c            |   9 +-
>   drivers/gpu/drm/i915/selftests/intel_lrc.c  | 163 ++++++++++++++++++++
>   6 files changed, 189 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 935db5548f80..00a1ea7cd907 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -353,11 +353,14 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
> +
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> @@ -401,10 +404,22 @@ void __i915_request_unsubmit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
> +	/*
> +	 * As we do not allow WAIT to preempt inflight requests,
> +	 * once we have executed a request, along with triggering
> +	 * any execution callbacks, we must preserve its ordering
> +	 * within the non-preemptible FIFO.
> +	 */
> +	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
> +	request->sched.attr.priority |= __NO_PREEMPTION;
> +
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
>   		i915_request_cancel_breadcrumb(request);
> +
>   	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>   	clear_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>   	spin_unlock(&request->lock);
>   
>   	/* Transfer back from the global per-engine timeline to per-context */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 38efefd22dce..9fb96ff57a29 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -322,7 +322,6 @@ static void __i915_schedule(struct i915_request *rq,
>   			if (node_signaled(p->signaler))
>   				continue;
>   
> -			GEM_BUG_ON(p->signaler->attr.priority < node->attr.priority);
>   			if (prio > READ_ONCE(p->signaler->attr.priority))
>   				list_move_tail(&p->dfs_link, &dfs);
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index dbe9cb7ecd82..54bd6c89817e 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -33,6 +33,8 @@ enum {
>   #define I915_PRIORITY_WAIT	((u8)BIT(0))
>   #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
>   
> +#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
> +
>   struct i915_sched_attr {
>   	/**
>   	 * @priority: execution and service priority
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 20cbceeabeae..a2846ea1e62c 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -720,7 +720,7 @@ static inline int rq_prio(const struct i915_request *rq)
>   
>   static inline int port_prio(const struct execlist_port *port)
>   {
> -	return rq_prio(port_request(port));
> +	return rq_prio(port_request(port)) | __NO_PREEMPTION;
>   }
>   
>   static bool __guc_dequeue(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index c4f4966b0f4f..0e20f3bc8210 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> +static int effective_prio(const struct i915_request *rq)
> +{
> +	/* Restrict mere WAIT boosts from triggering preemption */
> +	return rq_prio(rq) | __NO_PREEMPTION;
> +}
> +
>   static int queue_prio(const struct intel_engine_execlists *execlists)
>   {
>   	struct i915_priolist *p;
> @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
>   				const struct i915_request *rq)
>   {
> -	const int last_prio = rq_prio(rq);
> +	int last_prio;
>   
>   	if (!intel_engine_has_preemption(engine))
>   		return false;
> @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	 * preempt. If that hint is stale or we may be trying to preempt
>   	 * ourselves, ignore the request.
>   	 */
> +	last_prio = effective_prio(rq);
>   	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
>   				      last_prio))
>   		return false;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 0f7a5bf69646..7172f6c7f25a 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -407,6 +407,168 @@ static int live_suppress_self_preempt(void *arg)
>   	goto err_client_b;
>   }
>   
> +static int __i915_sw_fence_call
> +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	return NOTIFY_DONE;
> +}
> +
> +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> +{
> +	struct i915_request *rq;
> +
> +	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> +	if (!rq)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&rq->active_list);
> +	rq->engine = engine;
> +
> +	i915_sched_node_init(&rq->sched);
> +
> +	/* mark this request as permanently incomplete */
> +	rq->fence.seqno = 1;
> +	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> +	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> +	GEM_BUG_ON(i915_request_completed(rq));
> +
> +	i915_sw_fence_init(&rq->submit, dummy_notify);
> +	i915_sw_fence_commit(&rq->submit);
> +
> +	return rq;
> +}
> +
> +static void dummy_request_free(struct i915_request *dummy)
> +{
> +	i915_request_mark_complete(dummy);
> +	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	i915_sw_fence_fini(&dummy->submit);
> +
> +	dma_fence_free(&dummy->fence);
> +}
> +
> +static int live_suppress_wait_preempt(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct preempt_client client[4];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = -ENOMEM;
> +	int i;
> +
> +	/*
> +	 * Waiters are given a little priority nudge, but not enough
> +	 * to actually cause any preemption. Double check that we do
> +	 * not needlessly generate preempt-to-idle cycles.
> +	 */
> +
> +	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
> +		goto err_unlock;
> +	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
> +		goto err_client_0;
> +	if (preempt_client_init(i915, &client[2])) /* head of queue */
> +		goto err_client_1;
> +	if (preempt_client_init(i915, &client[3])) /* bystander */
> +		goto err_client_2;
> +
> +	for_each_engine(engine, i915, id) {
> +		int depth;
> +
> +		if (!engine->emit_init_breadcrumb)
> +			continue;
> +
> +		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
> +			struct i915_request *rq[ARRAY_SIZE(client)];
> +			struct i915_request *dummy;
> +
> +			engine->execlists.preempt_hang.count = 0;
> +
> +			dummy = dummy_request(engine);
> +			if (!dummy)
> +				goto err_client_3;
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++) {
> +				rq[i] = igt_spinner_create_request(&client[i].spin,
> +								   client[i].ctx, engine,
> +								   MI_NOOP);
> +				if (IS_ERR(rq[i])) {
> +					err = PTR_ERR(rq[i]);
> +					goto err_wedged;
> +				}
> +
> +				/* Disable NEWCLIENT promotion */
> +				__i915_active_request_set(&rq[i]->timeline->last_request,
> +							  dummy);
> +				i915_request_add(rq[i]);
> +			}
> +
> +			dummy_request_free(dummy);
> +
> +			GEM_BUG_ON(i915_request_completed(rq[0]));
> +			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
> +				pr_err("%s: First client failed to start\n",
> +				       engine->name);
> +				goto err_wedged;
> +			}
> +			GEM_BUG_ON(!i915_request_started(rq[0]));
> +
> +			if (i915_request_wait(rq[depth],
> +					      I915_WAIT_LOCKED |
> +					      I915_WAIT_PRIORITY,
> +					      1) != -ETIME) {
> +				pr_err("%s: Waiter depth:%d completed!\n",
> +				       engine->name, depth);
> +				goto err_wedged;
> +			}
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++)
> +				igt_spinner_end(&client[i].spin);
> +
> +			if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +				goto err_wedged;
> +
> +			if (engine->execlists.preempt_hang.count) {
> +				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
> +				       engine->name,
> +				       engine->execlists.preempt_hang.count,
> +				       depth);
> +				err = -EINVAL;
> +				goto err_client_3;
> +			}
> +		}
> +	}
> +
> +	err = 0;
> +err_client_3:
> +	preempt_client_fini(&client[3]);
> +err_client_2:
> +	preempt_client_fini(&client[2]);
> +err_client_1:
> +	preempt_client_fini(&client[1]);
> +err_client_0:
> +	preempt_client_fini(&client[0]);
> +err_unlock:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +
> +err_wedged:
> +	for (i = 0; i < ARRAY_SIZE(client); i++)
> +		igt_spinner_end(&client[i].spin);
> +	i915_gem_set_wedged(i915);
> +	err = -EIO;
> +	goto err_client_3;
> +}
> +
>   static int live_chain_preempt(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -887,6 +1049,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_preempt),
>   		SUBTEST(live_late_preempt),
>   		SUBTEST(live_suppress_self_preempt),
> +		SUBTEST(live_suppress_wait_preempt),
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-02-26 10:23 ` [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-02-28 13:11   ` Tvrtko Ursulin
  2019-03-01 11:31     ` Tvrtko Ursulin
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28 13:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:23, Chris Wilson wrote:
> On unwinding the active request we give it a small (limited to internal
> priority levels) boost to prevent it from being gazumped a second time.
> However, this means that it can be promoted to above the request that
> triggered the preemption request, causing a preempt-to-idle cycle for no
> change. We can avoid this if we take the boost into account when
> checking if the preemption request is valid.
> 
> v2: After preemption the active request will be after the preemptee if
> they end up with equal priority.
> 
> v3: Tvrtko pointed out that this, the existing logic, makes
> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> 
> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> v5: Except not all priorities were made equal, and the WAIT not preempting
> is only if we start off as !NEWCLIENT.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>   1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0e20f3bc8210..dba19baf6808 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -164,6 +164,8 @@
>   #define WA_TAIL_DWORDS 2
>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>   
> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> +
>   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   					    struct intel_engine_cs *engine,
>   					    struct intel_context *ce);
> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
>   
>   static int effective_prio(const struct i915_request *rq)
>   {
> +	int prio = rq_prio(rq);
> +
> +	/*
> +	 * On unwinding the active request, we give it a priority bump
> +	 * equivalent to a freshly submitted request. This protects it from
> +	 * being gazumped again, but it would be preferable if we didn't
> +	 * let it be gazumped in the first place!
> +	 *
> +	 * See __unwind_incomplete_requests()
> +	 */
> +	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> +		/*
> +		 * After preemption, we insert the active request at the
> +		 * end of the new priority level. This means that we will be
> +		 * _lower_ priority than the preemptee all things equal (and
> +		 * so the preemption is valid), so adjust our comparison
> +		 * accordingly.
> +		 */
> +		prio |= ACTIVE_PRIORITY;
> +		prio--;
> +	}
> +
>   	/* Restrict mere WAIT boosts from triggering preemption */
> -	return rq_prio(rq) | __NO_PREEMPTION;
> +	return prio | __NO_PREEMPTION;
>   }
>   
>   static int queue_prio(const struct intel_engine_execlists *execlists)
> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   {
>   	struct i915_request *rq, *rn, *active = NULL;
>   	struct list_head *uninitialized_var(pl);
> -	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> +	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>   
>   	lockdep_assert_held(&engine->timeline.lock);
>   
> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   	 * The active request is now effectively the start of a new client
>   	 * stream, so give it the equivalent small priority bump to prevent
>   	 * it being gazumped a second time by another peer.
> +	 *
> +	 * One consequence of this preemption boost is that we may jump
> +	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> +	 * making those priorities non-preemptible. They will be moved forward

After the previous patch wait priority is non-preemptible by definition 
making this suggestion preemption boost is making it so not accurate.

> +	 * in the priority queue, but they will not gain immediate access to
> +	 * the GPU.
>   	 */
> -	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> -		prio |= I915_PRIORITY_NEWCLIENT;
> +	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {

What is the importance of the has_started check? Hasn't the active 
request been running by definition?

> +		prio |= ACTIVE_PRIORITY;
>   		active->sched.attr.priority = prio;
>   		list_move_tail(&active->sched.link,
>   			       i915_sched_lookup_priolist(engine, prio));
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore
  2019-02-26 10:24 ` [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
@ 2019-02-28 13:20   ` Tvrtko Ursulin
  2019-03-01 10:22     ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-02-28 13:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 26/02/2019 10:24, Chris Wilson wrote:
> If we resubmitting the active context, simply skip the submission as

we are

> performing the submission from the interrupt handler has higher

 From the tasklet? You mean wait for ctx complete and then 
execlists_dequeue, instead of lite-restore?

Regards,

Tvrtko

> throughput than continually provoking lite-restores. If however, we find
> ourselves with a new client, we check whether or not we can dequeue into
> the second port or to resolve preemption.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2268860cca44..3e9f7103f31f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1205,12 +1205,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   		tasklet_hi_schedule(&execlists->tasklet);
>   }
>   
> -static void submit_queue(struct intel_engine_cs *engine, int prio)
> +static bool inflight(const struct intel_engine_execlists *execlists,
> +		     const struct i915_request *rq)
>   {
> -	if (prio > engine->execlists.queue_priority_hint) {
> -		engine->execlists.queue_priority_hint = prio;
> +	const struct i915_request *active = port_request(execlists->port);
> +
> +	return active && active->hw_context == rq->hw_context;
> +}
> +
> +static void submit_queue(struct intel_engine_cs *engine,
> +			 const struct i915_request *rq)
> +{
> +	struct intel_engine_execlists *execlists = &engine->execlists;
> +
> +	if (rq_prio(rq) <= execlists->queue_priority_hint)
> +		return;
> +
> +	execlists->queue_priority_hint = rq_prio(rq);
> +
> +	if (!inflight(execlists, rq))
>   		__submit_queue_imm(engine);
> -	}
>   }
>   
>   static void execlists_submit_request(struct i915_request *request)
> @@ -1226,7 +1240,7 @@ static void execlists_submit_request(struct i915_request *request)
>   	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
>   	GEM_BUG_ON(list_empty(&request->sched.link));
>   
> -	submit_queue(engine, rq_prio(request));
> +	submit_queue(engine, request);
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore
  2019-02-28 13:20   ` Tvrtko Ursulin
@ 2019-03-01 10:22     ` Chris Wilson
  2019-03-01 10:27       ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-03-01 10:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-28 13:20:50)
> 
> On 26/02/2019 10:24, Chris Wilson wrote:
> > If we resubmitting the active context, simply skip the submission as
> 
> we are

It took multiple reads to even notice that the third word was missing,
mainly because I kept skipping the "we".

> > performing the submission from the interrupt handler has higher
> 
>  From the tasklet?

Yes, the interrupt handler bottom half.

> You mean wait for ctx complete and then 
> execlists_dequeue, instead of lite-restore?

Yes. We can measure the relatively large impact of lite-restoring on
throughput (the delay in the GPU reloading the context is quite
noticeable), but it only affects a certain microbenchmark. In theory,
making sure we avoid the stall while handling the interrupt and
resubmitting should offset that cost and be much preferred for
multi-context situations, but there the interrupt overhead on an *idle*
system (~5-7us) is not as significant and so the impact seems not as
significant (in fact due to quirky hw, sometimes it is preferable not to
resubmit during an inflight CS event).

Anyway I just keep getting annoyed by the extra latency induced by
lite-restore without a good way to balance it against avoiding the CS
completion stall. And I live in fear of our tasklet being thrown to
ksoftirqd again.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore
  2019-03-01 10:22     ` Chris Wilson
@ 2019-03-01 10:27       ` Chris Wilson
  0 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-03-01 10:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2019-03-01 10:22:30)
> Quoting Tvrtko Ursulin (2019-02-28 13:20:50)
> > 
> > On 26/02/2019 10:24, Chris Wilson wrote:
> > > If we resubmitting the active context, simply skip the submission as
> > 
> > we are
> 
> It took multiple reads to even notice that the third word was missing,
> mainly because I kept skipping the "we".
> 
> > > performing the submission from the interrupt handler has higher
> > 
> >  From the tasklet?
> 
> Yes, the interrupt handler bottom half.
> 
> > You mean wait for ctx complete and then 
> > execlists_dequeue, instead of lite-restore?
> 
> Yes. We can measure the relatively large impact of lite-restoring on
> throughput (the delay in the GPU reloading the context is quite
> noticeable), but it only affects a certain microbenchmark. In theory,
> making sure we avoid the stall while handling the interrupt and
> resubmitting should offset that cost and be much preferred for
> multi-context situations, but there the interrupt overhead on an *idle*
> system (~5-7us) is not as significant and so the impact seems not as
> significant (in fact due to quirky hw, sometimes it is preferable not to
> resubmit during an inflight CS event).

On idle systems, that lite-restore cost is visible even on top of a
normal context switch, i.e. submitting
	A, AB, [CS interrupt], B 
is marginally slower than
	A, [CS interrupt], B
!!!

/o\
Sometimes I don't get our HW at all,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-02-28 13:11   ` Tvrtko Ursulin
@ 2019-03-01 11:31     ` Tvrtko Ursulin
  2019-03-01 11:36       ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 11:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


ping on below

On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> 
> On 26/02/2019 10:23, Chris Wilson wrote:
>> On unwinding the active request we give it a small (limited to internal
>> priority levels) boost to prevent it from being gazumped a second time.
>> However, this means that it can be promoted to above the request that
>> triggered the preemption request, causing a preempt-to-idle cycle for no
>> change. We can avoid this if we take the boost into account when
>> checking if the preemption request is valid.
>>
>> v2: After preemption the active request will be after the preemptee if
>> they end up with equal priority.
>>
>> v3: Tvrtko pointed out that this, the existing logic, makes
>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
>>
>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
>> v5: Except not all priorities were made equal, and the WAIT not 
>> preempting
>> is only if we start off as !NEWCLIENT.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>>   1 file changed, 34 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 0e20f3bc8210..dba19baf6808 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -164,6 +164,8 @@
>>   #define WA_TAIL_DWORDS 2
>>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
>> +
>>   static int execlists_context_deferred_alloc(struct i915_gem_context 
>> *ctx,
>>                           struct intel_engine_cs *engine,
>>                           struct intel_context *ce);
>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct 
>> i915_request *rq)
>>   static int effective_prio(const struct i915_request *rq)
>>   {
>> +    int prio = rq_prio(rq);
>> +
>> +    /*
>> +     * On unwinding the active request, we give it a priority bump
>> +     * equivalent to a freshly submitted request. This protects it from
>> +     * being gazumped again, but it would be preferable if we didn't
>> +     * let it be gazumped in the first place!
>> +     *
>> +     * See __unwind_incomplete_requests()
>> +     */
>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
>> +        /*
>> +         * After preemption, we insert the active request at the
>> +         * end of the new priority level. This means that we will be
>> +         * _lower_ priority than the preemptee all things equal (and
>> +         * so the preemption is valid), so adjust our comparison
>> +         * accordingly.
>> +         */
>> +        prio |= ACTIVE_PRIORITY;
>> +        prio--;
>> +    }
>> +
>>       /* Restrict mere WAIT boosts from triggering preemption */
>> -    return rq_prio(rq) | __NO_PREEMPTION;
>> +    return prio | __NO_PREEMPTION;
>>   }
>>   static int queue_prio(const struct intel_engine_execlists *execlists)
>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct 
>> intel_engine_cs *engine)
>>   {
>>       struct i915_request *rq, *rn, *active = NULL;
>>       struct list_head *uninitialized_var(pl);
>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>>       lockdep_assert_held(&engine->timeline.lock);
>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct 
>> intel_engine_cs *engine)
>>        * The active request is now effectively the start of a new client
>>        * stream, so give it the equivalent small priority bump to prevent
>>        * it being gazumped a second time by another peer.
>> +     *
>> +     * One consequence of this preemption boost is that we may jump
>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
>> +     * making those priorities non-preemptible. They will be moved 
>> forward
> 
> After the previous patch wait priority is non-preemptible by definition 
> making this suggestion preemption boost is making it so not accurate.
> 
>> +     * in the priority queue, but they will not gain immediate access to
>> +     * the GPU.
>>        */
>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
>> -        prio |= I915_PRIORITY_NEWCLIENT;
>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> 
> What is the importance of the has_started check? Hasn't the active 
> request been running by definition?
> 
>> +        prio |= ACTIVE_PRIORITY;
>>           active->sched.attr.priority = prio;
>>           list_move_tail(&active->sched.link,
>>                      i915_sched_lookup_priolist(engine, prio));
>>
> 
> Regards,
> 
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 11:31     ` Tvrtko Ursulin
@ 2019-03-01 11:36       ` Chris Wilson
  2019-03-01 15:07         ` Tvrtko Ursulin
  0 siblings, 1 reply; 43+ messages in thread
From: Chris Wilson @ 2019-03-01 11:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
> 
> ping on below
> 
> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> > 
> > On 26/02/2019 10:23, Chris Wilson wrote:
> >> On unwinding the active request we give it a small (limited to internal
> >> priority levels) boost to prevent it from being gazumped a second time.
> >> However, this means that it can be promoted to above the request that
> >> triggered the preemption request, causing a preempt-to-idle cycle for no
> >> change. We can avoid this if we take the boost into account when
> >> checking if the preemption request is valid.
> >>
> >> v2: After preemption the active request will be after the preemptee if
> >> they end up with equal priority.
> >>
> >> v3: Tvrtko pointed out that this, the existing logic, makes
> >> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> >>
> >> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> >> v5: Except not all priorities were made equal, and the WAIT not 
> >> preempting
> >> is only if we start off as !NEWCLIENT.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
> >>   1 file changed, 34 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >> b/drivers/gpu/drm/i915/intel_lrc.c
> >> index 0e20f3bc8210..dba19baf6808 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -164,6 +164,8 @@
> >>   #define WA_TAIL_DWORDS 2
> >>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> >> +
> >>   static int execlists_context_deferred_alloc(struct i915_gem_context 
> >> *ctx,
> >>                           struct intel_engine_cs *engine,
> >>                           struct intel_context *ce);
> >> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct 
> >> i915_request *rq)
> >>   static int effective_prio(const struct i915_request *rq)
> >>   {
> >> +    int prio = rq_prio(rq);
> >> +
> >> +    /*
> >> +     * On unwinding the active request, we give it a priority bump
> >> +     * equivalent to a freshly submitted request. This protects it from
> >> +     * being gazumped again, but it would be preferable if we didn't
> >> +     * let it be gazumped in the first place!
> >> +     *
> >> +     * See __unwind_incomplete_requests()
> >> +     */
> >> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> >> +        /*
> >> +         * After preemption, we insert the active request at the
> >> +         * end of the new priority level. This means that we will be
> >> +         * _lower_ priority than the preemptee all things equal (and
> >> +         * so the preemption is valid), so adjust our comparison
> >> +         * accordingly.
> >> +         */
> >> +        prio |= ACTIVE_PRIORITY;
> >> +        prio--;
> >> +    }
> >> +
> >>       /* Restrict mere WAIT boosts from triggering preemption */
> >> -    return rq_prio(rq) | __NO_PREEMPTION;
> >> +    return prio | __NO_PREEMPTION;
> >>   }
> >>   static int queue_prio(const struct intel_engine_execlists *execlists)
> >> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct 
> >> intel_engine_cs *engine)
> >>   {
> >>       struct i915_request *rq, *rn, *active = NULL;
> >>       struct list_head *uninitialized_var(pl);
> >> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> >> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
> >>       lockdep_assert_held(&engine->timeline.lock);
> >> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct 
> >> intel_engine_cs *engine)
> >>        * The active request is now effectively the start of a new client
> >>        * stream, so give it the equivalent small priority bump to prevent
> >>        * it being gazumped a second time by another peer.
> >> +     *
> >> +     * One consequence of this preemption boost is that we may jump
> >> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> >> +     * making those priorities non-preemptible. They will be moved 
> >> forward
> > 
> > After the previous patch wait priority is non-preemptible by definition 
> > making this suggestion preemption boost is making it so not accurate.
> > 
> >> +     * in the priority queue, but they will not gain immediate access to
> >> +     * the GPU.
> >>        */
> >> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> >> -        prio |= I915_PRIORITY_NEWCLIENT;
> >> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> > 
> > What is the importance of the has_started check? Hasn't the active 
> > request been running by definition?

No. Semaphores. This is all about defending against incorrect promotion
while a request is still spinning on its dependencies (or else we get
promoted above them and PI is broken).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 11:36       ` Chris Wilson
@ 2019-03-01 15:07         ` Tvrtko Ursulin
  2019-03-01 15:14           ` Chris Wilson
  0 siblings, 1 reply; 43+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 11:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
>>
>> ping on below
>>
>> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
>>>
>>> On 26/02/2019 10:23, Chris Wilson wrote:
>>>> On unwinding the active request we give it a small (limited to internal
>>>> priority levels) boost to prevent it from being gazumped a second time.
>>>> However, this means that it can be promoted to above the request that
>>>> triggered the preemption request, causing a preempt-to-idle cycle for no
>>>> change. We can avoid this if we take the boost into account when
>>>> checking if the preemption request is valid.
>>>>
>>>> v2: After preemption the active request will be after the preemptee if
>>>> they end up with equal priority.
>>>>
>>>> v3: Tvrtko pointed out that this, the existing logic, makes
>>>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
>>>>
>>>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
>>>> v5: Except not all priorities were made equal, and the WAIT not
>>>> preempting
>>>> is only if we start off as !NEWCLIENT.
>>>>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>>>>    1 file changed, 34 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>>>> b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index 0e20f3bc8210..dba19baf6808 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -164,6 +164,8 @@
>>>>    #define WA_TAIL_DWORDS 2
>>>>    #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>>>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
>>>> +
>>>>    static int execlists_context_deferred_alloc(struct i915_gem_context
>>>> *ctx,
>>>>                            struct intel_engine_cs *engine,
>>>>                            struct intel_context *ce);
>>>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct
>>>> i915_request *rq)
>>>>    static int effective_prio(const struct i915_request *rq)
>>>>    {
>>>> +    int prio = rq_prio(rq);
>>>> +
>>>> +    /*
>>>> +     * On unwinding the active request, we give it a priority bump
>>>> +     * equivalent to a freshly submitted request. This protects it from
>>>> +     * being gazumped again, but it would be preferable if we didn't
>>>> +     * let it be gazumped in the first place!
>>>> +     *
>>>> +     * See __unwind_incomplete_requests()
>>>> +     */
>>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
>>>> +        /*
>>>> +         * After preemption, we insert the active request at the
>>>> +         * end of the new priority level. This means that we will be
>>>> +         * _lower_ priority than the preemptee all things equal (and
>>>> +         * so the preemption is valid), so adjust our comparison
>>>> +         * accordingly.
>>>> +         */
>>>> +        prio |= ACTIVE_PRIORITY;
>>>> +        prio--;
>>>> +    }
>>>> +
>>>>        /* Restrict mere WAIT boosts from triggering preemption */
>>>> -    return rq_prio(rq) | __NO_PREEMPTION;
>>>> +    return prio | __NO_PREEMPTION;
>>>>    }
>>>>    static int queue_prio(const struct intel_engine_execlists *execlists)
>>>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct
>>>> intel_engine_cs *engine)
>>>>    {
>>>>        struct i915_request *rq, *rn, *active = NULL;
>>>>        struct list_head *uninitialized_var(pl);
>>>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
>>>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>>>>        lockdep_assert_held(&engine->timeline.lock);
>>>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct
>>>> intel_engine_cs *engine)
>>>>         * The active request is now effectively the start of a new client
>>>>         * stream, so give it the equivalent small priority bump to prevent
>>>>         * it being gazumped a second time by another peer.
>>>> +     *
>>>> +     * One consequence of this preemption boost is that we may jump
>>>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
>>>> +     * making those priorities non-preemptible. They will be moved
>>>> forward
>>>
>>> After the previous patch wait priority is non-preemptible by definition
>>> making this suggestion preemption boost is making it so not accurate.
>>>
>>>> +     * in the priority queue, but they will not gain immediate access to
>>>> +     * the GPU.
>>>>         */
>>>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
>>>> -        prio |= I915_PRIORITY_NEWCLIENT;
>>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
>>>
>>> What is the importance of the has_started check? Hasn't the active
>>> request been running by definition?
> 
> No. Semaphores. This is all about defending against incorrect promotion
> while a request is still spinning on its dependencies (or else we get
> promoted above them and PI is broken).

Is init_breadcrumb after the semaphore, ie. __i915_request_has_started 
will be false while spinning on the semaphore. That possibly makes 
sense.. But you know what I'll say next. It is extremely subtle and 
sprinkled over the code so here we definitely need a comment explaining it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 15:07         ` Tvrtko Ursulin
@ 2019-03-01 15:14           ` Chris Wilson
  0 siblings, 0 replies; 43+ messages in thread
From: Chris Wilson @ 2019-03-01 15:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 15:07:58)
> 
> On 01/03/2019 11:36, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
> >>
> >> ping on below
> >>
> >> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> >>>
> >>> On 26/02/2019 10:23, Chris Wilson wrote:
> >>>> On unwinding the active request we give it a small (limited to internal
> >>>> priority levels) boost to prevent it from being gazumped a second time.
> >>>> However, this means that it can be promoted to above the request that
> >>>> triggered the preemption request, causing a preempt-to-idle cycle for no
> >>>> change. We can avoid this if we take the boost into account when
> >>>> checking if the preemption request is valid.
> >>>>
> >>>> v2: After preemption the active request will be after the preemptee if
> >>>> they end up with equal priority.
> >>>>
> >>>> v3: Tvrtko pointed out that this, the existing logic, makes
> >>>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> >>>>
> >>>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> >>>> v5: Except not all priorities were made equal, and the WAIT not
> >>>> preempting
> >>>> is only if we start off as !NEWCLIENT.
> >>>>
> >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
> >>>>    1 file changed, 34 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> >>>> b/drivers/gpu/drm/i915/intel_lrc.c
> >>>> index 0e20f3bc8210..dba19baf6808 100644
> >>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>>> @@ -164,6 +164,8 @@
> >>>>    #define WA_TAIL_DWORDS 2
> >>>>    #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >>>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> >>>> +
> >>>>    static int execlists_context_deferred_alloc(struct i915_gem_context
> >>>> *ctx,
> >>>>                            struct intel_engine_cs *engine,
> >>>>                            struct intel_context *ce);
> >>>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct
> >>>> i915_request *rq)
> >>>>    static int effective_prio(const struct i915_request *rq)
> >>>>    {
> >>>> +    int prio = rq_prio(rq);
> >>>> +
> >>>> +    /*
> >>>> +     * On unwinding the active request, we give it a priority bump
> >>>> +     * equivalent to a freshly submitted request. This protects it from
> >>>> +     * being gazumped again, but it would be preferable if we didn't
> >>>> +     * let it be gazumped in the first place!
> >>>> +     *
> >>>> +     * See __unwind_incomplete_requests()
> >>>> +     */
> >>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> >>>> +        /*
> >>>> +         * After preemption, we insert the active request at the
> >>>> +         * end of the new priority level. This means that we will be
> >>>> +         * _lower_ priority than the preemptee all things equal (and
> >>>> +         * so the preemption is valid), so adjust our comparison
> >>>> +         * accordingly.
> >>>> +         */
> >>>> +        prio |= ACTIVE_PRIORITY;
> >>>> +        prio--;
> >>>> +    }
> >>>> +
> >>>>        /* Restrict mere WAIT boosts from triggering preemption */
> >>>> -    return rq_prio(rq) | __NO_PREEMPTION;
> >>>> +    return prio | __NO_PREEMPTION;
> >>>>    }
> >>>>    static int queue_prio(const struct intel_engine_execlists *execlists)
> >>>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct
> >>>> intel_engine_cs *engine)
> >>>>    {
> >>>>        struct i915_request *rq, *rn, *active = NULL;
> >>>>        struct list_head *uninitialized_var(pl);
> >>>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> >>>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
> >>>>        lockdep_assert_held(&engine->timeline.lock);
> >>>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct
> >>>> intel_engine_cs *engine)
> >>>>         * The active request is now effectively the start of a new client
> >>>>         * stream, so give it the equivalent small priority bump to prevent
> >>>>         * it being gazumped a second time by another peer.
> >>>> +     *
> >>>> +     * One consequence of this preemption boost is that we may jump
> >>>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> >>>> +     * making those priorities non-preemptible. They will be moved
> >>>> forward
> >>>
> >>> After the previous patch wait priority is non-preemptible by definition
> >>> making this suggestion preemption boost is making it so not accurate.
> >>>
> >>>> +     * in the priority queue, but they will not gain immediate access to
> >>>> +     * the GPU.
> >>>>         */
> >>>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> >>>> -        prio |= I915_PRIORITY_NEWCLIENT;
> >>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> >>>
> >>> What is the importance of the has_started check? Hasn't the active
> >>> request been running by definition?
> > 
> > No. Semaphores. This is all about defending against incorrect promotion
> > while a request is still spinning on its dependencies (or else we get
> > promoted above them and PI is broken).
> 
> Is init_breadcrumb after the semaphore, ie. __i915_request_has_started 
> will be false while spinning on the semaphore. 

Yes, that's the raison d'etre for making init_breadcrumbs and
i915_request_has_started.

> That possibly makes 
> sense.. But you know what I'll say next. It is extremely subtle and 
> sprinkled over the code so here we definitely need a comment explaining it.

I've been trying, but it's all baked into the meaning of has-started and
active request. :| But I didn't extend the comment before
i915_request_has_started() to explain the bit about semaphores, so mea
culpa.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2019-03-01 15:14 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26 10:23 [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Chris Wilson
2019-02-26 10:23 ` [PATCH 02/11] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
2019-02-28 12:33   ` Tvrtko Ursulin
2019-02-26 10:23 ` [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption Chris Wilson
2019-02-28 13:11   ` Tvrtko Ursulin
2019-03-01 11:31     ` Tvrtko Ursulin
2019-03-01 11:36       ` Chris Wilson
2019-03-01 15:07         ` Tvrtko Ursulin
2019-03-01 15:14           ` Chris Wilson
2019-02-26 10:23 ` [PATCH 04/11] drm/i915: Make request allocation caches global Chris Wilson
2019-02-27 10:29   ` Tvrtko Ursulin
2019-02-27 10:44     ` Chris Wilson
2019-02-27 14:17       ` Tvrtko Ursulin
2019-02-27 14:43         ` Chris Wilson
2019-02-26 10:23 ` [PATCH 05/11] drm/i915: Introduce i915_timeline.mutex Chris Wilson
2019-02-28  7:43   ` Tvrtko Ursulin
2019-02-28  8:09     ` Chris Wilson
2019-02-26 10:23 ` [PATCH 06/11] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
2019-02-27 10:44   ` Tvrtko Ursulin
2019-02-27 10:51     ` Chris Wilson
2019-02-27 11:15   ` [PATCH] " Chris Wilson
2019-02-27 14:20     ` Tvrtko Ursulin
2019-02-26 10:24 ` [PATCH 07/11] drm/i915: Compute the global scheduler caps Chris Wilson
2019-02-28  7:45   ` Tvrtko Ursulin
2019-02-26 10:24 ` [PATCH 08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-02-28 10:49   ` Tvrtko Ursulin
2019-02-26 10:24 ` [PATCH 09/11] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-02-26 10:24 ` [PATCH 10/11] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
2019-02-28 13:20   ` Tvrtko Ursulin
2019-03-01 10:22     ` Chris Wilson
2019-03-01 10:27       ` Chris Wilson
2019-02-26 10:24 ` [PATCH 11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code Chris Wilson
2019-02-28  7:42   ` Tvrtko Ursulin
2019-02-26 10:56 ` ✗ Fi.CI.BAT: failure for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight Patchwork
2019-02-26 11:26 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev2) Patchwork
2019-02-26 11:31 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-26 11:51 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-26 15:17 ` ✓ Fi.CI.IGT: " Patchwork
2019-02-27 10:19 ` [PATCH 01/11] drm/i915: Skip scanning for signalers if we are already inflight Tvrtko Ursulin
2019-02-27 11:38 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/11] drm/i915: Skip scanning for signalers if we are already inflight (rev3) Patchwork
2019-02-27 11:42 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-27 12:03 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-27 13:58 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.